Refactoring Practice¶
Let’s practice refactoring some code to be more Pythonic.
Below are a number of programs that we have inherited. Our job is to clean them up and continue development.
Get Odds¶
def get_all_odds(numbers):
odds=filter(lambda x:x%2, numbers)
odd_list=list(odds)
return odd_list
def has_odds(numbers):
odds = get_all_odds(numbers)
at_least_one_odd = len(odds) > 0
if at_least_one_odd:
return True
else:
return False
if __name__ == "__main__":
assert get_all_odds([1, 2, 3, 4, 5, -2, -1]) == [1, 3, 5, -1]
assert has_odds([0, 2, 5]) is True
assert has_odds([0, 2, 4]) is False
- No reason to do
liston separate line afterfilter: “Simple is better than complex.” - Space out our
lambdaexpression for readability: PEP 8 spacing and “Readability counts.” - Move our
lambdaexpression into a variable for improved clarity: “Beautiful is better than ugly.”
def get_all_odds(numbers):
is_odd = lambda x: x % 2
odd_list = list(filter(is_odd, numbers))
return odd_list
- Just return our
odd_listvalue instead of using a pointless variable: “Simple is better than complex.” - Use a
definstead of alambda(everyone understands plain old functions): PEP 8 and “Explicit is better than implicit.” and “Simple is better than complex.”
def get_all_odds(numbers):
def is_odd(num): return num % 2
return list(filter(is_odd, numbers))
- Replace this
filterwith a list comprehension that does the same thing: “Complex is better than complicated.” - Remove
is_oddfunction because it’s fairly simple: “Complex is better than complicated.” - Explicitly state that
n % 2 != 0is what we’re looking for: “Explicit is better than implicit.” - Add brief but descriptive docstring: PEP 8 “Explicit is better than implicit.” and “Now is better than never.” Because when else are we really going to write documentation?
def get_all_odds(numbers):
"""Return all odd numbers in the given list."""
return [n for n in numbers if n % 2 != 0]
Has Odds¶
The has_odds method we started with is:
def has_odds(numbers):
odds = get_all_odds(numbers)
at_least_one_odd = len(odds) > 0
if at_least_one_odd:
return True
else:
return False
- Replace
ifstatement with a simplereturnstatement: “Sparse is better than dense.” and “If the implementation is easy to explain, it may be a good idea.” and “Simple is better than complex.” - Remove unnecessary
at_least_one_oddvariable that is immediately returned: “Simple is better than complex.”
def has_odds(numbers):
odds = get_all_odds(numbers)
return len(odds) > 0
- Remove unnecessary variable which is immediately used: “Simple is better than complex.” and “Beautiful is better than ugly.”
- Rely on truthiness of list (non-empty list is
True): “Beautiful is better than ugly.” and “Simple is better than complex.”
def has_odds(numbers):
return bool(get_all_odds(numbers))
- Improve performance by using a generator expression (
anywill returnFalseif no values exist and returnTruewhen it finds the first item, so it is potentially more efficient): “Simple is better than complex.” and “Flat is better than nested.” - Refactor code to use
is_oddfunction (which can be used by both functions): “Beautiful is better than ugly.” - Add missing docstrings: “Beautiful is better than ugly.” and “Now is better than never.”
def get_all_odds(numbers):
"""Return all odd numbers in the given list."""
return [n for n in numbers if is_odd(n)]
def is_odd(n):
"""Return True if n is odd."""
return n % 2 != 0
def has_odds(numbers):
"""Return True if given iterable contains any odd numbers."""
return any(is_odd(n) for n in numbers)
Statistics¶
We have a module with some basic statistics helper functions.
def mean(x):
t = 0
for n in x:
t += n
meanavg = t / len(x)
return meanavg
def median(x):
if len(x)%2 != 0:
return sorted(x)[int(len(x)/2)]
else:
midavg = (sorted(x)[int(len(x)/2)] + sorted(x)[int(len(x)/2-1)])/2.0
return midavg
def mode(x):
o = {}
for n in x:
if n not in o:
o[n] = 0
o[n] += 1
m = max(o.values())
modeavg = [k for (k, v) in o.items() if v == m][0]
return modeavg
if __name__ == "__main__":
values = [5, 8, 2, 7, 2, 60]
assert mean(values) == 14
assert median(values) == 6
assert mode(values) == 2
print("Tests passed.")
Let’s start by refactoring mean:
- Use descriptive and meaningful names for
numbersandtotal: PEP 8 and “Readability counts.” - Remove unnecessary variable which is immediately returned: “Simple is better than complex.”
def mean(numbers):
total = 0
for n in numbers:
total += n
return total / len(numbers)
- Use
suminstead of a for loop to calculatetotal: “Simple is better than complex.” - Remove unnecessary
totalvariable
def mean(numbers):
return sum(numbers) / len(numbers)
Now let’s work on median:
- Use more descriptive variable name (
numbers): PEP 8 and “Readability counts.” - Reduce inefficiency and repetition by only sorting numbers once: “Simple is better than complex.”
- Reduce repetition by making
lengthvariable - Remove unnecessary variable before
return
def median(numbers):
numbers = sorted(numbers)
length = len(numbers)
if length%2 != 0:
return numbers[int(length/2)]
else:
return (numbers[int(length/2)] + numbers[int(length/2-1)])/2.0
- Add more space around
%operator for readability: PEP 8 - Add
mid_pointvariable to reduce repetition - Space out
/operator and simplify2.0to2: PEP 8
def median(numbers):
numbers = sorted(numbers)
length = len(numbers)
mid_point = int(length/2)
if length % 2 != 0:
return numbers[mid_point]
else:
return (numbers[mid_point] + numbers[mid_point - 1]) / 2
Let’s refactor mode:
- Rename
xtonumbers,otooccurrences, andmtomost - Remove unnecessary
modeavgvariable - Replace
ifstatement withsetdefault - Consider replacing
setdefaultwithget
def mode(numbers):
occurrences = {}
for n in numbers:
occurrences[n] = occurrences.get(n, 0) + 1
most = max(occurrences.values())
return [k for (k, v) in occurrences.items() if v == most][0]
- Use
defaultdictinstead - Switch from
defaultdictto single-lineCounterstatement
def mode(numbers):
occurrences = Counter(numbers)
most = max(occurrences.values())
return [k for (k, v) in occurrences.items() if v == most][0]
- Use
most_commonmethod to get the most commonly occurring number - Consider whether to squash two lines into one
return
def mode(numbers):
value, _ = Counter(numbers).most_common(1)[0]
return value
This actually isn’t the most Pythonic implementation of mean, median, and mode. It would be most Pythonic to just use the versions built-in to the standard library (since Python 3.4):
>>> from statistics import mean, median, mode
>>> numbers = [5, 8, 2, 7, 2, 60]
>>> mean(numbers)
14.0
>>> median(numbers)
6.0
>>> mode(numbers)
2
Color Ratios¶
def get_color_ratios(colors, ratios):
"""Return dictionary of color ratios from color and ratio lists."""
assert len(colors) == len(ratios)
color_ratios = {}
for i in range(len(colors)):
color_ratios[colors[i]] = ratios[i]
return color_ratios
if __name__ == "__main__":
test_colors = ["red", "green", "blue"]
test_ratios = [0.1, 0.6, 0.3]
combined_dict = {'red': 0.1, 'green': 0.6, 'blue': 0.3}
assert get_color_ratios(test_colors, test_ratios) == combined_dict
Use enumerate to get the i-th index of the list as we loop over it: “Explicit is better than implicit.” and “Simple is better than complex.”
def get_color_ratios(colors, ratios):
"""Return dictionary of color ratios from color and ratio lists."""
assert len(colors) == len(ratios)
color_ratios = {}
for i, color in enumerate(colors):
color_ratios[color] = ratios[i]
return color_ratios
Use zip to combine colors and ratios lists, looping over both at once: “Complex is better than complicated.”
def get_color_ratios(colors, ratios):
"""Return dictionary of color ratios from color and ratio lists."""
assert len(colors) == len(ratios)
color_ratios = {}
for items in zip(colors, ratios):
color_ratios[items[0]] = items[1]
return color_ratios
Unpack each zipped item into verbose variable names while looping: PEP 8 and “Explicit is better than implicit.”
def get_color_ratios(colors, ratios):
"""Return dictionary of color ratios from color and ratio lists."""
assert len(colors) == len(ratios)
color_ratios = {}
for color, ratio in zip(colors, ratios):
color_ratios[color] = ratio
return color_ratios
Move this for loop into a dictionary comprehension: “Complex is better than complicated.” and “Flat is better than nested.”
def get_color_ratios(colors, ratios):
"""Return dictionary of color ratios from color and ratio lists."""
assert len(colors) == len(ratios)
color_ratios = {c: r for (c, r) in zip(colors, ratios)}
return color_ratios
- Remove unnecessary variable which is returned immediately: “Simple is better than complex.”
- Use ability of
dictto take list of two-tuples to make dictionary: “Simple is better than complex.” and “Beautiful is better than ugly.”
def get_color_ratios(colors, ratios):
"""Return dictionary of color ratios from color and ratio lists."""
assert len(colors) == len(ratios)
return dict(zip(colors, ratios))
Numbers¶
"""
numbers - translate number words to digits
Example Usage:
.. code-block:: bash
$ numbers.py one two three
123
$ numbers.py zero zero five zero six zero
5060
$ numbers.py blah two blah four
2 4
Any leading "zero" arguments are ignored.
Invalid digit words are converted to spaces.
"""
import sys
def translate_numbers(*number_strings):
numbers = []
for item in zip(range(len(number_strings)), number_strings):
i,n = item
if n == "zero":
if i != 0:
numbers.append(0)
elif n == "one":
numbers.append(1)
elif n == "two":
numbers.append(2)
elif n == "three":
numbers.append(3)
elif n == "four":
numbers.append(4)
elif n == "five":
numbers.append(5)
elif n == "six":
numbers.append(6)
elif n == "seven":
numbers.append(7)
elif n == "eight":
numbers.append(8)
elif n == "nine":
numbers.append(9)
else:
numbers.append(' ')
return numbers
def main(numbers):
number_list = numbers
numbers = translate_numbers(*number_list)
numbers = map(str, numbers)
print("".join(numbers))
if __name__ == "__main__":
main(sys.argv[1:])
Let’s tackle the main function first:
- Remove unnecessary
number_listvariable by combining first two lines: “Simple is better than complex.” - Use more clear variable name instead of re-using old variable name: PEP 8 and “Readability counts.”
def main(numbers):
translation = map(str, translate_numbers(*numbers))
print("".join(translation))
Remove unnecessary translation variable: “Beautiful is better than ugly.”
def main(numbers):
print("".join(map(str, translate_numbers(*numbers))))
Now let’s work on translate_numbers:
- Use
enumerate: “Complex is better than complicated.” - Do item unpacking right in for loop: “Simple is better than complex.”
- Do append one-time only: “Simple is better than complex.” and “Readability counts.”
def translate_numbers(*number_strings):
numbers = []
for i, n in enumerate(number_strings):
x = None
if n == "zero":
if i != 0:
x = 0
elif n == "one":
x = 1
elif n == "two":
x = 2
elif n == "three":
x = 3
elif n == "four":
x = 4
elif n == "five":
x = 5
elif n == "six":
x = 6
elif n == "seven":
x = 7
elif n == "eight":
x = 8
elif n == "nine":
x = 9
else:
x = ' '
if x is not None:
numbers.append(x)
return numbers
- Put string to number translations in dictionary: “Explicit is better than implicit.”
- Collapse most of the logic to a dictionary item lookup, catching key errors: PEP 8 and “Errors should never pass silently.”
- Use double quotes more consistently: PEP 8 and “Readability counts.”
def translate_numbers(*number_strings):
translation = {
"one": 1,
"two": 2,
"three": 3,
"four": 4,
"five": 5,
"six": 6,
"seven": 7,
"eight": 8,
"nine": 9,
}
numbers = []
for i, n in enumerate(number_strings):
x = None
if n == "zero":
if i != 0:
x = 0
else:
try:
x = translation[n]
except KeyError:
x = " "
if x is not None:
numbers.append(x)
return numbers
- Add zero to
translationdictionary: “Explicit is better than implicit.” and “Flat is better than nested.” - Invert nested-if special case and move
x = Noneinside if statement: “Complex is better than complicated.” - Use dictionary’s
getmethod to use a default on key errors: “Explicit is better than implicit.” - Use single quotes for dictionary key strings (personal preference)
def translate_numbers(*number_strings):
translation = {
'zero': 0,
'one': 1,
'two': 2,
'three': 3,
'four': 4,
'five': 5,
'six': 6,
'seven': 7,
'eight': 8,
'nine': 9,
}
numbers = []
for i, n in enumerate(number_strings):
if n == "zero" and i == 0:
x = None
else:
x = translation.get(n, " ")
if x is not None:
numbers.append(x)
return numbers
- Invert zero special case logic entirely and collapse if and else: “Simple is better than complex.”
- Remove last
x is Nonecheck and just do append all at once: “Simple is better than complex.” - Remove unnecessary
xand just append immediately: “Beautiful is better than ugly.”
def translate_numbers(*number_strings):
translation = {
'zero': 0,
'one': 1,
'two': 2,
'three': 3,
'four': 4,
'five': 5,
'six': 6,
'seven': 7,
'eight': 8,
'nine': 9,
}
numbers = []
for i, n in enumerate(number_strings):
if not (n == "zero" and i == 0):
numbers.append(translation.get(n, " "))
return numbers
- Move
translationinto global variable (capitalized to denote constant): PEP 8 - Convert for loop into a list comprehension: “Complex is better than complicated.”
- Remove unnecessary
numbersvariable which is returned immediately: “Beautiful is better than ugly.” - Use DeMorgan’s law to flip boolean logic in condition: “Simple is better than complex.”
NUMBERS = {
'zero': 0,
'one': 1,
'two': 2,
'three': 3,
'four': 4,
'five': 5,
'six': 6,
'seven': 7,
'eight': 8,
'nine': 9,
}
def translate_numbers(*number_strings):
return [
NUMBERS.get(n, " ")
for i, n in enumerate(number_strings)
if i != 0 or n != "zero"
]
Person¶
This class uses a convenient decorator, total_ordering, from the functools module. It simplifies the effort involved in specifying all of the possible rich comparison operations. The class need only define an __eq__ method and one or more of __lt__, __le__, __gt__, or __ge__. This class decorator supplies the rest of the methods not already defined.
from functools import total_ordering
@total_ordering
class Person:
"""Person with first and last name."""
def __init__(self, first_name, last_name):
self.first_name = first_name
self.last_name = last_name
def __lt__(self, other):
last_equal = (self.last_name == other.last_name)
less_than = (last_equal and self.first_name < other.first_name) or self.last_name < other.last_name
return less_than
def __eq__(self, other):
same_names = (self.last_name == other.last_name) and (self.first_name == other.first_name)
if same_names:
return True
else:
return False
def __str__(self):
full_name = self.first_name + " " + self.last_name
return full_name
def __repr__(self):
representation = self.__class__.__name__ + "(" + repr(self.first_name) + ", " + repr(self.last_name) + ")"
return representation
if __name__ == "__main__":
james = Person("James", "Rankin")
zane = Person("Zane", "Shavers")
jackie = Person("Jackie", "Taylor")
ladonna = Person("Ladonna", "Taylor")
assert sorted([zane, jackie, james, ladonna]) == [james, zane, jackie, ladonna]
print("Tests passed")
- Rely on comparison operation on
tupleto compare values in order: “Simple is better than complex.” - Remove variables unnecessarily assigned and then returned immediately: “Beautiful is better than ugly.”
- Use string formatting to improve
__repr__, relying on line continuation for line wrapping: “Beautiful is better than ugly.” and “Complex is better than complicated.” and “Readability counts.” - Use the string
joinmethod to join together the first and last name with a space: PEP 8 and “Simple is better than complex.”
@total_ordering
class Person:
"""Person with first and last name."""
def __init__(self, first_name, last_name):
self.first_name = first_name
self.last_name = last_name
def __lt__(self, other):
return (self.last_name, self.first_name) < (other.last_name, other.first_name)
def __eq__(self, other):
return (self.last_name, self.first_name) == (other.last_name, other.first_name)
def __str__(self):
return " ".join((self.first_name, self.last_name))
def __repr__(self):
return "{class_name}({first_name}, {last_name})".format(
class_name=self.__class__.__name__,
first_name=repr(self.first_name),
last_name=repr(self.last_name),
)
Make a legal_names property returning a tuple which can be used to shorten code: “Simple is better than complex.” and “Readability counts.”
def __lt__(self, other):
return self.legal_names < other.legal_names
def __eq__(self, other):
return self.legal_names == other.legal_names
@property
def legal_names(self):
return (self.last_name, self.first_name)
Point¶
from collections import namedtuple
import math
BasePoint = namedtuple('BasePoint', 'x y z')
class Point(BasePoint):
"""Three-dimensional point."""
def distance(self, other):
"""Calculate the distance between two points."""
a1 = self[0]
b1 = self[1]
c1 = self[2]
a2 = other[0]
b2 = other[1]
c2 = other[2]
return math.sqrt((a1-a2)**2+(b1-b2)**2+(c1-c2)**2)
def shift(self, other):
"""Return new copy of our point, shifted by other."""
a = self[0] + other[0]
b = self[1] + other[1]
c = self[2] + other[2]
new_point = Point(a, b, c)
return new_point
def scale(self, value):
"""Scale Point by the given numeric value."""
new_point = Point(self[0] * value, self[1] * value, self[2] * value)
return new_point
if __name__ == "__main__":
p1 = Point(1, 2, 3)
p2 = Point(4, 5, 6)
assert p2.shift(p1) == p1.shift(p2) == Point(5, 7, 9)
assert p1.scale(2) == Point(2, 4, 6)
assert p1.distance(p2) == p2.distance(p1) == 5.196152422706632
print("Tests passed.")
- Use tuple unpacking to shorten code in
distanceandshift: “Readability counts.” and “Complex is better than complicated.” - Split logic-heavy
distancereturn line into two lines (using unpacking again): “Readability counts.” - Remove unnecessary variables that are immediately returned: “Simple is better than complex.” and “Beautiful is better than ugly.”
- Use ability of
namedtupleto reference named attributes to refactorscalefunction: “Explicit is better than implicit.” - Rewrite
scaleline, swapping order of arithmetic to be more readable: “Readability counts.” - Change
shiftandscaleinto more Pythonic+and*operators: “Complex is better than complicated.”
class Point(BasePoint):
"""Three-dimensional point."""
def distance(self, other):
"""Calculate the distance between two points."""
a1, b1, c1 = self
a2, b2, c2 = other
x, y, z = (a1 - a2), (b1 - b2), (c1 - c2)
return math.sqrt(x**2 + y**2 + z**2)
def __add__(self, other):
"""Return new copy of our point, shifted by other."""
a1, b1, c1 = self
a2, b2, c2 = other
return Point((a1+a2), (b1+b2), (c1+c2))
def __mul__(self, scalar):
"""Scale Point by the given numeric value."""
return Point(scalar*self.x, scalar*self.y, scalar*self.z)
def __rmul__(self, value):
"""Scale Point by the given numeric value."""
return self.__mul__(value)
Our new tests:
if __name__ == "__main__":
p1 = Point(1, 2, 3)
p2 = Point(4, 5, 6)
assert p2 + p1 == p1 + p2 == Point(5, 7, 9)
assert p1 * 2 == 2 * p1 == Point(2, 4, 6)
assert p1.distance(p2) == p2.distance(p1) == 5.196152422706632
print("Tests passed.")
Usage¶
def print_usage():
print("hello - print out a message\n")
print("Usage: hello.py [-h] [-o OUTPUT]\n")
print("Options:")
print("-o OUTPUT, --output=OUTPUT output the given string")
print("-h --help show this help message and exit")
if __name__ == "__main__":
print_usage()
- Convert multiple prints into a single
printstatement: “Simple is better than complex.” - Use
+to concatenate multiple strings, wrapping one long line with\: “Complex is better than complicated.” - Add missing
\nto break lines apart: “Readability counts.”
def print_usage():
usage_message = "hello - print out a message\n\n" + \
"Usage: hello.py [-h] [-o OUTPUT]\n\n" + \
"Options:\n" + \
"-o OUTPUT, --output=OUTPUT output the given string\n" + \
"-h --help show this help message and exit"
print(usage_message)
- Use implicit line wrapping instead of
\to wrap lines: PEP 8 and “Readability counts.” and “Simple is better than complex.” - Line up all strings: “Readability counts.”
- Remove
+signs, relying on implicit string concatenation (warning: this may be seen as un-Pythonic by some): “Simple is better than complex.”
def print_usage():
usage_message = (
"hello - print out a message\n\n"
"Usage: hello.py [-h] [-o OUTPUT]\n\n"
"Options:\n"
"-o OUTPUT, --output=OUTPUT output the given string\n"
"-h --help show this help message and exit"
)
print(usage_message)
- Use multi-line string: “Simple is better than complex.”
- Replace
\nwith actual newlines: “Simple is better than complex.” - Strip whitespace from ends of the string to compensate for extra newlines: “Complex is better than complicated.”
def print_usage():
usage_message = """
hello - print out a message
Usage: hello.py [-h] [-o OUTPUT]
Options:
-o OUTPUT, --output=OUTPUT output the given string
-h --help show this help message and exit"""
print(usage_message.strip())
- Make a list of strings, one for each line and join them with
\n - Remove now unnecessary
strip
def print_usage():
usage_message = "\n".join([
"hello - print out a message",
"",
"Usage: hello.py [-h] [-o OUTPUT]",
"",
"Options:",
"-o OUTPUT, --output=OUTPUT output the given string",
"-h --help show this help message and exit"
])
print(usage_message)
- Make a multi-line docstring stating the usage information for this program: “Explicit is better than implicit.”
- Print out the docstring as usage, stripping extra whitespace: “Readability counts.” and “Simple is better than complex.”
"""
hello - print out a message
Usage: hello.py [-h] [-o OUTPUT]
Options:
-o OUTPUT, --output=OUTPUT output the given string
-h --help show this help message and exit
"""
def print_usage():
print(__doc__.strip())