Refactoring Practice

Let’s practice refactoring some code to be more Pythonic.

Below are a number of programs that we have inherited. Our job is to clean them up and continue development.

Get Odds

def get_all_odds(numbers):
    odds=filter(lambda x:x%2, numbers)
    odd_list=list(odds)
    return odd_list

def has_odds(numbers):
    odds = get_all_odds(numbers)
    at_least_one_odd = len(odds) > 0
    if at_least_one_odd:
        return True
    else:
        return False

if __name__ == "__main__":
    assert get_all_odds([1, 2, 3, 4, 5, -2, -1]) == [1, 3, 5, -1]
    assert has_odds([0, 2, 5]) is True
    assert has_odds([0, 2, 4]) is False
  1. No reason to do list on separate line after filter: “Simple is better than complex.”
  2. Space out our lambda expression for readability: PEP 8 spacing and “Readability counts.”
  3. Move our lambda expression into a variable for improved clarity: “Beautiful is better than ugly.”
def get_all_odds(numbers):
    is_odd = lambda x: x % 2
    odd_list = list(filter(is_odd, numbers))
    return odd_list
  1. Just return our odd_list value instead of using a pointless variable: “Simple is better than complex.”
  2. Use a def instead of a lambda (everyone understands plain old functions): PEP 8 and “Explicit is better than implicit.” and “Simple is better than complex.”
def get_all_odds(numbers):
    def is_odd(num): return num % 2
    return list(filter(is_odd, numbers))
  1. Replace this filter with a list comprehension that does the same thing: “Complex is better than complicated.”
  2. Remove is_odd function because it’s fairly simple: “Complex is better than complicated.”
  3. Explicitly state that n % 2 != 0 is what we’re looking for: “Explicit is better than implicit.”
  4. Add brief but descriptive docstring: PEP 8 “Explicit is better than implicit.” and “Now is better than never.” Because when else are we really going to write documentation?
def get_all_odds(numbers):
    """Return all odd numbers in the given list."""
    return [n for n in numbers if n % 2 != 0]

Has Odds

The has_odds method we started with is:

def has_odds(numbers):
    odds = get_all_odds(numbers)
    at_least_one_odd = len(odds) > 0
    if at_least_one_odd:
        return True
    else:
        return False
  1. Replace if statement with a simple return statement: “Sparse is better than dense.” and “If the implementation is easy to explain, it may be a good idea.” and “Simple is better than complex.”
  2. Remove unnecessary at_least_one_odd variable that is immediately returned: “Simple is better than complex.”
def has_odds(numbers):
    odds = get_all_odds(numbers)
    return len(odds) > 0
  1. Remove unnecessary variable which is immediately used: “Simple is better than complex.” and “Beautiful is better than ugly.”
  2. Rely on truthiness of list (non-empty list is True): “Beautiful is better than ugly.” and “Simple is better than complex.”
def has_odds(numbers):
    return bool(get_all_odds(numbers))
  1. Improve performance by using a generator expression (any will return False if no values exist and return True when it finds the first item, so it is potentially more efficient): “Simple is better than complex.” and “Flat is better than nested.”
  2. Refactor code to use is_odd function (which can be used by both functions): “Beautiful is better than ugly.”
  3. Add missing docstrings: “Beautiful is better than ugly.” and “Now is better than never.”
def get_all_odds(numbers):
    """Return all odd numbers in the given list."""
    return [n for n in numbers if is_odd(n)]

def is_odd(n):
    """Return True if n is odd."""
    return n % 2 != 0

def has_odds(numbers):
    """Return True if given iterable contains any odd numbers."""
    return any(is_odd(n) for n in numbers)

Statistics

We have a module with some basic statistics helper functions.

def mean(x):
    t = 0
    for n in x:
        t += n
    meanavg = t / len(x)
    return meanavg


def median(x):
    if len(x)%2 != 0:
        return sorted(x)[int(len(x)/2)]
    else:
        midavg = (sorted(x)[int(len(x)/2)] + sorted(x)[int(len(x)/2-1)])/2.0
        return midavg


def mode(x):
    o = {}
    for n in x:
        if n not in o:
            o[n] = 0
        o[n] += 1
    m = max(o.values())
    modeavg = [k for (k, v) in o.items() if v == m][0]
    return modeavg


if __name__ == "__main__":
    values = [5, 8, 2, 7, 2, 60]
    assert mean(values) == 14
    assert median(values) == 6
    assert mode(values) == 2
    print("Tests passed.")

Let’s start by refactoring mean:

  1. Use descriptive and meaningful names for numbers and total: PEP 8 and “Readability counts.”
  2. Remove unnecessary variable which is immediately returned: “Simple is better than complex.”
def mean(numbers):
 total = 0
 for n in numbers:
     total += n
 return total / len(numbers)
  1. Use sum instead of a for loop to calculate total: “Simple is better than complex.”
  2. Remove unnecessary total variable
def mean(numbers):
    return sum(numbers) / len(numbers)

Now let’s work on median:

  1. Use more descriptive variable name (numbers): PEP 8 and “Readability counts.”
  2. Reduce inefficiency and repetition by only sorting numbers once: “Simple is better than complex.”
  3. Reduce repetition by making length variable
  4. Remove unnecessary variable before return
def median(numbers):
    numbers = sorted(numbers)
    length = len(numbers)
    if length%2 != 0:
        return numbers[int(length/2)]
    else:
        return (numbers[int(length/2)] + numbers[int(length/2-1)])/2.0
  1. Add more space around % operator for readability: PEP 8
  2. Add mid_point variable to reduce repetition
  3. Space out / operator and simplify 2.0 to 2: PEP 8
def median(numbers):
    numbers = sorted(numbers)
    length = len(numbers)
    mid_point = int(length/2)
    if length % 2 != 0:
        return numbers[mid_point]
    else:
        return (numbers[mid_point] + numbers[mid_point - 1]) / 2

Let’s refactor mode:

  1. Rename x to numbers, o to occurrences, and m to most
  2. Remove unnecessary modeavg variable
  3. Replace if statement with setdefault
  4. Consider replacing setdefault with get
def mode(numbers):
    occurrences = {}
    for n in numbers:
        occurrences[n] = occurrences.get(n, 0) + 1
    most = max(occurrences.values())
    return [k for (k, v) in occurrences.items() if v == most][0]
  1. Use defaultdict instead
  2. Switch from defaultdict to single-line Counter statement
def mode(numbers):
    occurrences = Counter(numbers)
    most = max(occurrences.values())
    return [k for (k, v) in occurrences.items() if v == most][0]
  1. Use most_common method to get the most commonly occurring number
  2. Consider whether to squash two lines into one return
def mode(numbers):
    value, _ = Counter(numbers).most_common(1)[0]
    return value

This actually isn’t the most Pythonic implementation of mean, median, and mode. It would be most Pythonic to just use the versions built-in to the standard library (since Python 3.4):

>>> from statistics import mean, median, mode
>>> numbers = [5, 8, 2, 7, 2, 60]
>>> mean(numbers)
14.0
>>> median(numbers)
6.0
>>> mode(numbers)
2

Color Ratios

def get_color_ratios(colors, ratios):
    """Return dictionary of color ratios from color and ratio lists."""
    assert len(colors) == len(ratios)
    color_ratios = {}
    for i in range(len(colors)):
        color_ratios[colors[i]] = ratios[i]
    return color_ratios


if __name__ == "__main__":
    test_colors = ["red", "green", "blue"]
    test_ratios = [0.1, 0.6, 0.3]
    combined_dict = {'red': 0.1, 'green': 0.6, 'blue': 0.3}
    assert get_color_ratios(test_colors, test_ratios) == combined_dict

Use enumerate to get the i-th index of the list as we loop over it: “Explicit is better than implicit.” and “Simple is better than complex.”

def get_color_ratios(colors, ratios):
    """Return dictionary of color ratios from color and ratio lists."""
    assert len(colors) == len(ratios)
    color_ratios = {}
    for i, color in enumerate(colors):
        color_ratios[color] = ratios[i]
    return color_ratios

Use zip to combine colors and ratios lists, looping over both at once: “Complex is better than complicated.”

def get_color_ratios(colors, ratios):
    """Return dictionary of color ratios from color and ratio lists."""
    assert len(colors) == len(ratios)
    color_ratios = {}
    for items in zip(colors, ratios):
        color_ratios[items[0]] = items[1]
    return color_ratios

Unpack each zipped item into verbose variable names while looping: PEP 8 and “Explicit is better than implicit.”

def get_color_ratios(colors, ratios):
    """Return dictionary of color ratios from color and ratio lists."""
    assert len(colors) == len(ratios)
    color_ratios = {}
    for color, ratio in zip(colors, ratios):
        color_ratios[color] = ratio
    return color_ratios

Move this for loop into a dictionary comprehension: “Complex is better than complicated.” and “Flat is better than nested.”

def get_color_ratios(colors, ratios):
    """Return dictionary of color ratios from color and ratio lists."""
    assert len(colors) == len(ratios)
    color_ratios = {c: r for (c, r) in zip(colors, ratios)}
    return color_ratios
  1. Remove unnecessary variable which is returned immediately: “Simple is better than complex.”
  2. Use ability of dict to take list of two-tuples to make dictionary: “Simple is better than complex.” and “Beautiful is better than ugly.”
def get_color_ratios(colors, ratios):
    """Return dictionary of color ratios from color and ratio lists."""
    assert len(colors) == len(ratios)
    return dict(zip(colors, ratios))

Numbers

"""
numbers - translate number words to digits

Example Usage:

.. code-block:: bash

    $ numbers.py one two three
    123
    $ numbers.py zero zero five zero six zero
    5060
    $ numbers.py blah two blah four
    2 4

Any leading "zero" arguments are ignored.
Invalid digit words are converted to spaces.

"""
import sys


def translate_numbers(*number_strings):
    numbers = []
    for item in zip(range(len(number_strings)), number_strings):
        i,n = item
        if n == "zero":
            if i != 0:
                numbers.append(0)
        elif n == "one":
            numbers.append(1)
        elif n == "two":
            numbers.append(2)
        elif n == "three":
            numbers.append(3)
        elif n == "four":
            numbers.append(4)
        elif n == "five":
            numbers.append(5)
        elif n == "six":
            numbers.append(6)
        elif n == "seven":
            numbers.append(7)
        elif n == "eight":
            numbers.append(8)
        elif n == "nine":
            numbers.append(9)
        else:
            numbers.append(' ')
    return numbers


def main(numbers):
    number_list = numbers
    numbers = translate_numbers(*number_list)
    numbers = map(str, numbers)
    print("".join(numbers))


if __name__ == "__main__":
    main(sys.argv[1:])

Let’s tackle the main function first:

  1. Remove unnecessary number_list variable by combining first two lines: “Simple is better than complex.”
  2. Use more clear variable name instead of re-using old variable name: PEP 8 and “Readability counts.”
def main(numbers):
    translation = map(str, translate_numbers(*numbers))
    print("".join(translation))

Remove unnecessary translation variable: “Beautiful is better than ugly.”

def main(numbers):
    print("".join(map(str, translate_numbers(*numbers))))

Now let’s work on translate_numbers:

  1. Use enumerate: “Complex is better than complicated.”
  2. Do item unpacking right in for loop: “Simple is better than complex.”
  3. Do append one-time only: “Simple is better than complex.” and “Readability counts.”
def translate_numbers(*number_strings):
    numbers = []
    for i, n in enumerate(number_strings):
        x = None
        if n == "zero":
            if i != 0:
                x = 0
        elif n == "one":
            x = 1
        elif n == "two":
            x = 2
        elif n == "three":
            x = 3
        elif n == "four":
            x = 4
        elif n == "five":
            x = 5
        elif n == "six":
            x = 6
        elif n == "seven":
            x = 7
        elif n == "eight":
            x = 8
        elif n == "nine":
            x = 9
        else:
            x = ' '
        if x is not None:
            numbers.append(x)
    return numbers
  1. Put string to number translations in dictionary: “Explicit is better than implicit.”
  2. Collapse most of the logic to a dictionary item lookup, catching key errors: PEP 8 and “Errors should never pass silently.”
  3. Use double quotes more consistently: PEP 8 and “Readability counts.”
def translate_numbers(*number_strings):
    translation = {
        "one": 1,
        "two": 2,
        "three": 3,
        "four": 4,
        "five": 5,
        "six": 6,
        "seven": 7,
        "eight": 8,
        "nine": 9,
    }
    numbers = []
    for i, n in enumerate(number_strings):
        x = None
        if n == "zero":
            if i != 0:
                x = 0
        else:
            try:
                x = translation[n]
            except KeyError:
                x = " "
        if x is not None:
            numbers.append(x)
    return numbers
  1. Add zero to translation dictionary: “Explicit is better than implicit.” and “Flat is better than nested.”
  2. Invert nested-if special case and move x = None inside if statement: “Complex is better than complicated.”
  3. Use dictionary’s get method to use a default on key errors: “Explicit is better than implicit.”
  4. Use single quotes for dictionary key strings (personal preference)
def translate_numbers(*number_strings):
    translation = {
        'zero': 0,
        'one': 1,
        'two': 2,
        'three': 3,
        'four': 4,
        'five': 5,
        'six': 6,
        'seven': 7,
        'eight': 8,
        'nine': 9,
    }
    numbers = []
    for i, n in enumerate(number_strings):
        if n == "zero" and i == 0:
            x = None
        else:
            x = translation.get(n, " ")
        if x is not None:
            numbers.append(x)
    return numbers
  1. Invert zero special case logic entirely and collapse if and else: “Simple is better than complex.”
  2. Remove last x is None check and just do append all at once: “Simple is better than complex.”
  3. Remove unnecessary x and just append immediately: “Beautiful is better than ugly.”
def translate_numbers(*number_strings):
    translation = {
        'zero': 0,
        'one': 1,
        'two': 2,
        'three': 3,
        'four': 4,
        'five': 5,
        'six': 6,
        'seven': 7,
        'eight': 8,
        'nine': 9,
    }
    numbers = []
    for i, n in enumerate(number_strings):
        if not (n == "zero" and i == 0):
            numbers.append(translation.get(n, " "))
    return numbers
  1. Move translation into global variable (capitalized to denote constant): PEP 8
  2. Convert for loop into a list comprehension: “Complex is better than complicated.”
  3. Remove unnecessary numbers variable which is returned immediately: “Beautiful is better than ugly.”
  4. Use DeMorgan’s law to flip boolean logic in condition: “Simple is better than complex.”
NUMBERS = {
    'zero': 0,
    'one': 1,
    'two': 2,
    'three': 3,
    'four': 4,
    'five': 5,
    'six': 6,
    'seven': 7,
    'eight': 8,
    'nine': 9,
}


def translate_numbers(*number_strings):
    return [
        NUMBERS.get(n, " ")
        for i, n in enumerate(number_strings)
        if i != 0 or n != "zero"
    ]

Person

This class uses a convenient decorator, total_ordering, from the functools module. It simplifies the effort involved in specifying all of the possible rich comparison operations. The class need only define an __eq__ method and one or more of __lt__, __le__, __gt__, or __ge__. This class decorator supplies the rest of the methods not already defined.

from functools import total_ordering


@total_ordering
class Person:

    """Person with first and last name."""

    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    def __lt__(self, other):
        last_equal = (self.last_name == other.last_name)
        less_than = (last_equal and self.first_name < other.first_name) or self.last_name < other.last_name
        return less_than

    def __eq__(self, other):
        same_names = (self.last_name == other.last_name) and (self.first_name == other.first_name)
        if same_names:
            return True
        else:
            return False

    def __str__(self):
        full_name = self.first_name + " " + self.last_name
        return full_name

    def __repr__(self):
        representation = self.__class__.__name__ + "(" + repr(self.first_name) + ", " + repr(self.last_name) + ")"
        return representation


if __name__ == "__main__":
    james = Person("James", "Rankin")
    zane = Person("Zane", "Shavers")
    jackie = Person("Jackie", "Taylor")
    ladonna = Person("Ladonna", "Taylor")
    assert sorted([zane, jackie, james, ladonna]) == [james, zane, jackie, ladonna]
    print("Tests passed")
  1. Rely on comparison operation on tuple to compare values in order: “Simple is better than complex.”
  2. Remove variables unnecessarily assigned and then returned immediately: “Beautiful is better than ugly.”
  3. Use string formatting to improve __repr__, relying on line continuation for line wrapping: “Beautiful is better than ugly.” and “Complex is better than complicated.” and “Readability counts.”
  4. Use the string join method to join together the first and last name with a space: PEP 8 and “Simple is better than complex.”
@total_ordering
class Person:

    """Person with first and last name."""

    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    def __lt__(self, other):
        return (self.last_name, self.first_name) < (other.last_name, other.first_name)

    def __eq__(self, other):
        return (self.last_name, self.first_name) == (other.last_name, other.first_name)

    def __str__(self):
        return " ".join((self.first_name, self.last_name))

    def __repr__(self):
        return "{class_name}({first_name}, {last_name})".format(
            class_name=self.__class__.__name__,
            first_name=repr(self.first_name),
            last_name=repr(self.last_name),
        )

Make a legal_names property returning a tuple which can be used to shorten code: “Simple is better than complex.” and “Readability counts.”

def __lt__(self, other):
    return self.legal_names < other.legal_names

def __eq__(self, other):
    return self.legal_names == other.legal_names

@property
def legal_names(self):
    return (self.last_name, self.first_name)

Point

from collections import namedtuple
import math


BasePoint = namedtuple('BasePoint', 'x y z')


class Point(BasePoint):

    """Three-dimensional point."""

    def distance(self, other):
        """Calculate the distance between two points."""
        a1 = self[0]
        b1 = self[1]
        c1 = self[2]
        a2 = other[0]
        b2 = other[1]
        c2 = other[2]
        return math.sqrt((a1-a2)**2+(b1-b2)**2+(c1-c2)**2)

    def shift(self, other):
        """Return new copy of our point, shifted by other."""
        a = self[0] + other[0]
        b = self[1] + other[1]
        c = self[2] + other[2]
        new_point = Point(a, b, c)
        return new_point

    def scale(self, value):
        """Scale Point by the given numeric value."""
        new_point = Point(self[0] * value, self[1] * value, self[2] * value)
        return new_point


if __name__ == "__main__":
    p1 = Point(1, 2, 3)
    p2 = Point(4, 5, 6)
    assert p2.shift(p1) == p1.shift(p2) == Point(5, 7, 9)
    assert p1.scale(2) == Point(2, 4, 6)
    assert p1.distance(p2) == p2.distance(p1) == 5.196152422706632
    print("Tests passed.")
  1. Use tuple unpacking to shorten code in distance and shift: “Readability counts.” and “Complex is better than complicated.”
  2. Split logic-heavy distance return line into two lines (using unpacking again): “Readability counts.”
  3. Remove unnecessary variables that are immediately returned: “Simple is better than complex.” and “Beautiful is better than ugly.”
  4. Use ability of namedtuple to reference named attributes to refactor scale function: “Explicit is better than implicit.”
  5. Rewrite scale line, swapping order of arithmetic to be more readable: “Readability counts.”
  6. Change shift and scale into more Pythonic + and * operators: “Complex is better than complicated.”
class Point(BasePoint):

    """Three-dimensional point."""

    def distance(self, other):
        """Calculate the distance between two points."""
        a1, b1, c1 = self
        a2, b2, c2 = other
        x, y, z = (a1 - a2), (b1 - b2), (c1 - c2)
        return math.sqrt(x**2 + y**2 + z**2)

    def __add__(self, other):
        """Return new copy of our point, shifted by other."""
        a1, b1, c1 = self
        a2, b2, c2 = other
        return Point((a1+a2), (b1+b2), (c1+c2))

    def __mul__(self, scalar):
        """Scale Point by the given numeric value."""
        return Point(scalar*self.x, scalar*self.y, scalar*self.z)

    def __rmul__(self, value):
        """Scale Point by the given numeric value."""
        return self.__mul__(value)

Our new tests:

if __name__ == "__main__":
    p1 = Point(1, 2, 3)
    p2 = Point(4, 5, 6)
    assert p2 + p1 == p1 + p2 == Point(5, 7, 9)
    assert p1 * 2 == 2 * p1 == Point(2, 4, 6)
    assert p1.distance(p2) == p2.distance(p1) == 5.196152422706632
    print("Tests passed.")

Usage

def print_usage():
    print("hello - print out a message\n")
    print("Usage: hello.py [-h] [-o OUTPUT]\n")
    print("Options:")
    print("-o OUTPUT, --output=OUTPUT       output the given string")
    print("-h --help                        show this help message and exit")


if __name__ == "__main__":
    print_usage()
  1. Convert multiple prints into a single print statement: “Simple is better than complex.”
  2. Use + to concatenate multiple strings, wrapping one long line with \: “Complex is better than complicated.”
  3. Add missing \n to break lines apart: “Readability counts.”
def print_usage():
    usage_message = "hello - print out a message\n\n" + \
        "Usage: hello.py [-h] [-o OUTPUT]\n\n" + \
        "Options:\n" + \
        "-o OUTPUT, --output=OUTPUT       output the given string\n" + \
        "-h --help                        show this help message and exit"
    print(usage_message)
  1. Use implicit line wrapping instead of \ to wrap lines: PEP 8 and “Readability counts.” and “Simple is better than complex.”
  2. Line up all strings: “Readability counts.”
  3. Remove + signs, relying on implicit string concatenation (warning: this may be seen as un-Pythonic by some): “Simple is better than complex.”
def print_usage():
    usage_message = (
        "hello - print out a message\n\n"
        "Usage: hello.py [-h] [-o OUTPUT]\n\n"
        "Options:\n"
        "-o OUTPUT, --output=OUTPUT       output the given string\n"
        "-h --help                        show this help message and exit"
    )
    print(usage_message)
  1. Use multi-line string: “Simple is better than complex.”
  2. Replace \n with actual newlines: “Simple is better than complex.”
  3. Strip whitespace from ends of the string to compensate for extra newlines: “Complex is better than complicated.”
def print_usage():
    usage_message = """
hello - print out a message

Usage: hello.py [-h] [-o OUTPUT]

Options:
-o OUTPUT, --output=OUTPUT       output the given string
-h --help                        show this help message and exit"""
    print(usage_message.strip())
  1. Make a list of strings, one for each line and join them with \n
  2. Remove now unnecessary strip
def print_usage():
    usage_message = "\n".join([
        "hello - print out a message",
        "",
        "Usage: hello.py [-h] [-o OUTPUT]",
        "",
        "Options:",
        "-o OUTPUT, --output=OUTPUT       output the given string",
        "-h --help                        show this help message and exit"
    ])
    print(usage_message)
  1. Make a multi-line docstring stating the usage information for this program: “Explicit is better than implicit.”
  2. Print out the docstring as usage, stripping extra whitespace: “Readability counts.” and “Simple is better than complex.”
"""
hello - print out a message

Usage: hello.py [-h] [-o OUTPUT]

Options:
-o OUTPUT, --output=OUTPUT       output the given string
-h --help                        show this help message and exit

"""


def print_usage():
    print(__doc__.strip())