Collections

namedtuple

Let’s make a function that splits a name based on the first space character:

>>> def split_name(name):
...     first, last = name.rsplit(" ", 1)
...     return first, last
...

If someone calls our function, they’ll have a tuple of two values. They could probably guess what the values are, but what if they weren’t sure?

>>> split_name("Trey Hunner")
('Trey', 'Hunner')

We can use namedtuple to make the returned value more informative:

>>> from collections import namedtuple
>>> def split_name(name):
...     Name = namedtuple('Name', ['first', 'last'])
...     first, last = name.rsplit(" ", 1)
...     return Name(first, last)
...
>>> split_name("Trey Hunner")
Name(first='Trey', last='Hunner')

We can use this new value just like a tuple but we can also use it like an object with attributes:

>>> name = split_name("Trey Hunner")
>>> first_name, last_name = name
>>> first_name
'Trey'
>>> last_name
'Hunner'
>>> name.first
'Trey'
>>> name.last
'Hunner'

Whenever you need to wrap some values together, think of namedtuple:

>>> LatLong = namedtuple('LatLong', 'lat long')
>>> location = LatLong(lat=32.733999, long=-117.147777)
>>> location
LatLong(lat=32.733999, long=-117.147777)
>>> location.lat
32.733999
>>> lat, long = location

We capitalize namedtuple names because named tuples are actually a new type, just as if you made a new class:

>>> type(LatLong)
<class 'type'>

defaultdict

A defaultdict is a dictionary that uses a constructor to create a default value for each key. This means that when you access a missing key, it’s added automatically.

Let’s take a list of animals:

>>> from collections import defaultdict
>>> animals = [("agatha", "dog"), ("kurt", "cat"), ("margaret", "mouse"), ("cory", "cat"), ("mary", "mouse")]
>>> animal_counts = defaultdict(int)
>>> for name, animal_type in animals:
...     animal_counts[animal_type] += 1
...
>>> animal_counts
defaultdict(<class 'int'>, {'dog': 1, 'mouse': 2, 'cat': 2})
>>> animal_counts['dog']
1
>>> animal_counts['cat']
2

Accessing a key makes it exist:

>>> 'bird' in animal_counts
False
>>> animal_counts['bird']
0
>>> animal_counts
defaultdict(<class 'int'>, {'dog': 1, 'bird': 0, 'mouse': 2, 'cat': 2})
>>> 'bird' in animal_counts
True

A defaultdict has all the same methods that are in a standard dict:

>>> animal_counts.pop("bird")
0
>>> animal_counts
defaultdict(<class 'int'>, {'mouse': 2, 'cat': 2, 'dog': 1})
>>> animal_counts.keys()
dict_keys(['dog', 'mouse', 'cat'])

Let’s see what happens when we use list as our constructor for a defaultdict:

>>> things = defaultdict(list)
>>> things['red']
[]
>>> things['red'].append("ball")
>>> things['red']
['ball']
>>> things['red'].append("ball")
>>> things['red']
['ball', 'ball']
>>> things
defaultdict(<class 'list'>, {'red': ['ball', 'ball']})
>>> things['blue'].append("shoe")
>>> things
defaultdict(<class 'list'>, {'blue': ['shoe'], 'red': ['ball', 'ball']})

What if we want to group the animals by type? For example, we could make a dictionary that has animal types as keys and a list of animal names as values.

We could make a defaultdict with the default value being a new set. We’re using a set because the order of these animal names doesn’t matter and we don’t have any animals with the same names.

>>> animals
[('agatha', 'dog'), ('kurt', 'cat'), ('margaret', 'mouse'), ('cory', 'cat'), ('mary', 'mouse')]
>>> animals_by_type = defaultdict(set)
>>> for name, animal_type in animals:
...     animals_by_type[animal_type].add(name)
...
>>> animals_by_type
defaultdict(<class 'set'>, {'dog': {'agatha'}, 'mouse': {'margaret', 'mary'}, 'cat': {'kurt', 'cory'}})
>>> animals_by_type['cat']
{'kurt', 'cory'}

Whenever you see code that checks if a key is not in a dictionary and puts it in, consider using a defaultdict instead.

For example, let’s count the number of occurrences of each word starting with “P” in the poem Peter Piper:

>>> poem = """Peter Piper picked a peck of pickled peppers.
... A peck of pickled peppers Peter Piper picked.
... If Peter Piper picked a peck of pickled peppers,
... Where's the peck of pickled peppers that Peter Piper picked?"""

You can copy-paste the poem from here:

Peter Piper picked a peck of pickled peppers.
A peck of pickled peppers Peter Piper picked.
If Peter Piper picked a peck of pickled peppers,
Where's the peck of pickled peppers that Peter Piper picked?

We could solve this with an if statement checking whether each word is in the dict yet:

>>> p_words = {}
>>> for word in poem.split():
...     if "p" in word.lower():
...         if word not in p_words:
...             p_words[word] = 0
...         p_words[word] += 1
...
>>> p_words
{'Piper': 4, 'peck': 4, 'picked': 2, 'peppers.': 1, 'peppers,': 1, 'peppers': 2, 'pickled': 4, 'picked?': 1, 'picked.': 1, 'Peter': 4}

We could do this with a defaultdict instead:

>>> p_words = defaultdict(int)
>>> for word in poem.split():
...     if "p" in word.lower():
...         p_words[word] += 1
...
>>> p_words
defaultdict(<class 'int'>, {'Piper': 4, 'peck': 4, 'picked': 2, 'peppers.': 1, 'peppers,': 1, 'peppers': 2, 'pickled': 4, 'picked?': 1, 'picked.': 1, 'Peter': 4})

The defaultdict constructor function does not need to return the same default every time. It could be random:

>>> import random
>>> from collections import defaultdict
>>> default_colors = ['red', 'green', 'blue', 'purple']
>>> random.choice(default_colors)
'red'
>>> random.choice(default_colors)
'blue'
>>> avatar_colors = defaultdict(lambda: random.choice(default_colors))
>>> avatar_colors["trey"] = "red"
>>> avatar_colors
defaultdict(<function <lambda> at 0x7f912cd816a8>, {'trey': 'red'})
>>> avatar_colors["peter"]
'red'
>>> avatar_colors
defaultdict(<function <lambda> at 0x7f912cd816a8>, {'trey': 'red', 'peter': 'red'})
>>> avatar_colors["diane"]
'green'
>>> avatar_colors
defaultdict(<function <lambda> at 0x7f912cd816a8>, {'trey': 'red', 'peter': 'red', 'diane': 'green'})
>>> avatar_colors["diane"] = "purple"
>>> avatar_colors
defaultdict(<function <lambda> at 0x7f912cd816a8>, {'trey': 'red', 'peter': 'red', 'diane': 'purple'})

Counter

Counters are a sort special purpose dictionary used for keeping track of how many times things occur.

>>> coins = ["quarter", "dime", "quarter", "penny", "nickel", "dime", "penny", "quarter"]
>>> coin_counts = Counter(coins)
>>> coin_counts
Counter({'quarter': 3, 'penny': 2, 'dime': 2, 'nickel': 1})

You can use any iterable

>>> letters = Counter("hello world")
>>> letters
Counter({'l': 3, 'o': 2, 'r': 1, 'w': 1, ' ': 1, 'e': 1, 'h': 1, 'd': 1})

You can ask for the most common occurrences also.

>>> letters.most_common(3)
[('l', 3), ('o', 2), ('r', 1)]
>>> coin_counts.most_common(1)
[('quarter', 3)]
>>> coin_counts.most_common()
[('quarter', 3), ('penny', 2), ('dime', 2), ('nickel', 1)]

You can also ask a Counter to give you all elements the number of times they occur, output into an iterable:

>>> coin_counts.elements()
<itertools.chain object at 0x7f912cd759e8>
>>> list(coin_counts.elements())
['quarter', 'quarter', 'quarter', 'nickel', 'penny', 'penny', 'dime', 'dime']

Of course these are not ordered in a predictable way:

>>> "".join(letters.elements())
'rwoo ehllld'

We can change values in a Counter:

>>> coin_counts
Counter({'quarter': 3, 'penny': 2, 'dime': 2, 'nickel': 1})
>>> coin_counts.update(["penny", "penny"])
>>> coin_counts
Counter({'penny': 4, 'quarter': 3, 'dime': 2, 'nickel': 1})
>>> coin_counts.subtract(["nickel", "penny", "penny"])
>>> coin_counts
Counter({'quarter': 3, 'penny': 2, 'dime': 2, 'nickel': 0})

We can also add or change values directly:

>>> coin_counts
Counter({'quarter': 3, 'penny': 2, 'dime': 2, 'nickel': 0})
>>> coin_counts['nickel'] = 1
>>> coin_counts
Counter({'quarter': 3, 'penny': 2, 'dime': 2, 'nickel': 1})
>>> coin_counts["toonie"] = 2
>>> coin_counts
Counter({'quarter': 3, 'penny': 2, 'dime': 2, 'toonie': 2, 'nickel': 1})

Like a defaultdict, counters default item values to zero:

>>> letters['z']
0
>>> letters['z'] += 10
>>> letters.most_common(1)
[('z', 10)]
>>> letters['z']
10

Remember that example we used earlier of counting all letters with a “P” in the Peter Piper poem? That’s even easier with a Counter.

First let’s make the poem:

>>> poem = """Peter Piper picked a peck of pickled peppers.
... A peck of pickled peppers Peter Piper picked.
... If Peter Piper picked a peck of pickled peppers,
... Where's the peck of pickled peppers that Peter Piper picked?"""

Now we can find the counts of all P words with just one line of code:

>>> p_words = Counter(w for w in poem.split() if "p" in w.lower())
>>> p_words
Counter({'Piper': 4, 'peck': 4, 'pickled': 4, 'Peter': 4, 'picked': 2, 'peppers': 2, 'peppers.': 1, 'peppers,': 1, 'picked?': 1, 'picked.': 1})

Other collections

A few other interesting collections:

Collection Exercises

Count Words

  1. Open a file containing the Declaration of Independence
  2. Create a dictionary recording the number of times all two letter words occur in the Declaration of Independence. The keys should be words and the values should be the count.

As a bonus exercise, remove words that are not alphanumeric.

Hint

Consider using a Counter or defaultdict from the collections module.

Most common

Create a function that accepts a list of iterables and returns a set of the most common items from all of the given iterables.

>>> most_common([{1, 2}, {2, 3}, {3, 4}])
{2, 3}

Hint

Consider using a Counter or defaultdict from the collections module.

Flipping Dictionary of Lists

Write a function that takes a dictionary of lists and returns a new dictionary containing the list items as keys and the original dictionary keys as list values.

Example:

>>> restaurants_by_people = {
...     'diane': {'Siam Nara', 'Punjabi Tandoor', 'Opera'},
...     'peter': {'Karl Strauss', 'Opera', 'Habaneros'},
...     'trey': {'Habaneros', 'Karl Strauss', 'Opera', 'Punjabi Tandoor'},
... }
>>> favorite_restaurants = flip_dict_of_lists(restaurants_by_people)
>>> favorite_restaurants
{'Siam Nara': ['diane'], 'Karl Strauss': ['trey', 'peter'], 'Opera': ['diane', 'trey', 'peter'], 'Punjabi Tandoor': ['diane', 'trey'], 'Habaneros': ['trey', 'peter']}

Hint

Consider using a defaultdict from the collections module.

Deal Cards

Create three functions:

  • get_cards: returns a list of namedtuples representing cards. Each card should have suit and rank.
  • shuffle_cards: accepts a list of cards as its argument and shuffles the list of cards in-place
  • deal_cards: accepts a number as its argument, removes the given number of cards from the end of the list and returns them

Examples:

>>> deck = get_cards()
>>> deck[:14]
[Card(rank='A', suit='spades'), Card(rank='2', suit='spades'), Card(rank='3', suit='spades'), Card(rank='4', suit='spades'), Card(rank='5', suit='spades'), Card(rank='6', suit='spades'), Card(rank='7', suit='spades'), Card(rank='8', suit='spades'), Card(rank='9', suit='spades'), Card(rank='10', suit='spades'), Card(rank='J', suit='spades'), Card(rank='Q', suit='spades'), Card(rank='K', suit='spades'), Card(rank='A', suit='hearts')]
>>> len(deck)
52
>>> shuffle_cards(deck)
>>> deck[-5:]
[Card(rank='9', suit='diamonds'), Card(rank='6', suit='hearts'), Card(rank='7', suit='diamonds'), Card(rank='K', suit='spades'), Card(rank='7', suit='clubs')]
>>> hand = deal_cards(deck)
>>> hand
[Card(rank='9', suit='diamonds'), Card(rank='6', suit='hearts'), Card(rank='7', suit='diamonds'), Card(rank='K', suit='spades'), Card(rank='7', suit='clubs')]
>>> len(deck)
47
>>> deck[-5:]
[Card(rank='5', suit='spades'), Card(rank='Q', suit='clubs'), Card(rank='Q', suit='spades'), Card(rank='2', suit='diamonds'), Card(rank='6', suit='clubs')]

Bonus: Memory-efficient CSV

Using DictReader to read CSV files is convenient because CSV columns can be referenced by name (instead of positional order). However there are some downsides to using DictReader. CSV column ordering is lost because dictionaries are unordered. The space required to store each row is also unnecessarily large because dictionaries are not a very space-efficient data structure.

There is discussion of adding a NamedTupleReader to the csv module, but this hasn’t actually happened yet.

In the meantime, it’s not too difficult to use a csv.reader object to open a CSV file and then use a namedtuple to represent each row.

Create a function parse_csv that accepts a file object which contains a CSV file (including a header row) and returns a list of namedtuples representing each row.

Example with us-state-capitals.csv:

>>> with open('us-state-capitals.csv') as csv_file:
...     csv_rows = parse_csv(csv_file)
...
>>> csv_rows[:3]
[Row(state='Alabama', capital='Montgomery'), Row(state='Alaska', capital='Juneau'), Row(state='Arizona', capital='Phoenix')]