CSV Files¶
CSV Reader¶
Python’s standard library includes a csv module with utilities for helping us read and write CSV files. Let’s use this to parse us-state-capitals.csv.
>>> import csv
>>> capitals = {}
>>> with open('us-state-capitals.csv') as csv_file:
... csv_reader = csv.reader(csv_file)
... for state, capital in csv_reader:
... capitals[state] = capital
>>> capitals
{'Idaho': 'Boise', 'Kentucky': 'Frankfort', 'California': 'Sacramento', 'Rhode Island': 'Providence', 'Mississippi': 'Jackson', 'Florida': 'Tallahassee', 'New Mexico': 'Santa Fe', 'Oregon': 'Salem', 'North Carolina': 'Raleigh', 'Wisconsin': 'Madison', 'Nebraska': 'Lincoln', 'Alaska': 'Juneau', 'South Carolina': 'Columbia', 'Delaware': 'Dover', 'Massachusetts': 'Boston', 'Maryland': 'Annapolis', 'Georgia': 'Atlanta', 'Indiana': 'Indianapolis', 'Missouri': 'Jefferson City', 'Oklahoma': 'Oklahoma City', 'South Dakota': 'Pierre', 'Utah': 'Salt Lake City', 'Louisiana': 'Baton Rouge', 'Alabama': 'Montgomery', 'state': 'capital', 'Maine': 'Augusta', 'Connecticut': 'Hartford', 'Ohio': 'Columbus', 'Vermont': 'Montpelier', 'Montana': 'Helena', 'Washington': 'Olympia', 'West Virginia': 'Charleston', 'Hawaii': 'Honolulu', 'Wyoming': 'Cheyenne', 'Tennessee': 'Nashville', 'Arizona': 'Phoenix', 'Pennsylvania': 'Harrisburg', 'Texas': 'Austin', 'Illinois': 'Springfield', 'New Hampshire': 'Concord', 'Nevada': 'Carson City', 'New York': 'Albany', 'Kansas': 'Topeka', 'Arkansas': 'Little Rock', 'Virginia': 'Richmond', 'Iowa': 'Des Moines', 'Minnesota': 'St. Paul', 'Michigan': 'Lansing', 'Colorado': 'Denver', 'New Jersey': 'Trenton', 'North Dakota': 'Bismarck'}
>>> len(capitals)
51
>>> capitals['state']
'capital'
We ended up with 51 state-capital pairs because the first row of the file is the header row state,capital. When there is a header row in the CSV file, we want to grab it first before we read the data.
We can change our code to read the header first, like this:
>>> import csv
>>> capitals = {}
>>> with open('us-state-capitals.csv') as csv_file:
... csv_reader = csv.reader(csv_file)
... headers = next(csv_reader)
... for state, capital in csv_reader:
... capitals[state] = capital
>>> headers
['state', 'capital']
>>> len(capitals)
50
CSV Writer¶
>>> import csv
>>> colors = [("purple", "0.15"), ("indigo", "0.25"), ("red", "0.3"), ("blue", "0.05"), ("green", "0.25")]
>>> with open('colors.csv', mode='wt') as colors_file:
... csv_writer = csv.writer(colors_file)
... for color, ratio in colors:
... csv_writer.writerow((color, ratio))
If we read from colors.csv, we’ll see our colors were written to it.
>>> with open('colors.csv') as colors_file:
... print(colors_file.read())
...
purple,0.15
indigo,0.25
red,0.3
blue,0.05
green,0.25
Delimiters and Quote Characters¶
When we are writing or reading a CSV file, it is important to understand the delimiters and quote characters. Of course, the usual delimiter is , and the quote charater is ".
But it is possible to use different ones, if we want to. Here is a silly example:
>>> import csv
>>> with open('eggs.csv', 'w') as csvfile:
... spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|')
... spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
... spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
...
We use a space as a delimiter and | as the quote character. The csv module only uses the quote characters when they are required, so not every item is quoted, only the ones that have the delimiter character within the string. If we go to the command line and display the file, it looks like this:
$ cat eggs.csv
Spam Spam Spam Spam Spam |Baked Beans|
Spam |Lovely Spam| |Wonderful Spam|
CSV Headers¶
So CSV reader gives us a list of lists and CSV writer expects us to pass it something similar.
What if we want to refer to our CSV rows by their column names? We can use DictReader and DictWriter for that.
>>> import csv
>>> capitals = {}
>>> with open('us-state-capitals.csv') as csv_file:
... reader = csv.DictReader(csv_file)
... for row in reader:
... capitals[row['state']] = row['capital']
>>> len(capitals)
50
The DictReader knows to interpret the first row of the file as headers for the file, and you can reference the header names. So, for example, if you wanted to have your dictionary reversed, with the capitals as keys, you could change the code like this:
>>> import csv
>>> capitals = {}
>>> with open('us-state-capitals.csv') as csv_file:
... reader = csv.DictReader(csv_file)
... for row in reader:
... capitals[row['capital']] = row['state']
...
>>> capitals
{'Salem': 'Oregon', 'Nashville': 'Tennessee', 'Little Rock': 'Arkansas', 'Topeka': 'Kansas', 'Harrisburg': 'Pennsylvania', 'Albany': 'New York', 'Jefferson City': 'Missouri', 'Denver': 'Colorado', 'Boston': 'Massachusetts', 'Juneau': 'Alaska', 'Helena': 'Montana', 'Lansing': 'Michigan', 'Pierre': 'South Dakota', 'Springfield': 'Illinois', 'Boise': 'Idaho', 'Jackson': 'Mississippi', 'Madison': 'Wisconsin', 'Phoenix': 'Arizona', 'Tallahassee': 'Florida', 'Richmond': 'Virginia', 'Des Moines': 'Iowa', 'Montgomery': 'Alabama', 'Baton Rouge': 'Louisiana', 'Olympia': 'Washington', 'Providence': 'Rhode Island', 'Columbia': 'South Carolina', 'Oklahoma City': 'Oklahoma', 'Santa Fe': 'New Mexico', 'Trenton': 'New Jersey', 'Hartford': 'Connecticut', 'Carson City': 'Nevada', 'Frankfort': 'Kentucky', 'Dover': 'Delaware', 'Augusta': 'Maine', 'Sacramento': 'California', 'Salt Lake City': 'Utah', 'Cheyenne': 'Wyoming', 'St. Paul': 'Minnesota', 'Honolulu': 'Hawaii', 'Atlanta': 'Georgia', 'Indianapolis': 'Indiana', 'Montpelier': 'Vermont', 'Charleston': 'West Virginia', 'Concord': 'New Hampshire', 'Austin': 'Texas', 'Annapolis': 'Maryland', 'Raleigh': 'North Carolina', 'Lincoln': 'Nebraska', 'Bismarck': 'North Dakota', 'Columbus': 'Ohio'}
When using csv.DictWriter to write a CSV file, we have to specify the field names in order:
>>> colors = [("purple", "0.15"), ("indigo", "0.25"), ("red", "0.3"), ("blue", "0.05"), ("green", "0.25")]
>>> color_dicts = [{'color': c, 'ratio': r} for c, r in colors]
>>> color_dicts
[{'color': 'purple', 'ratio': '0.15'}, {'color': 'indigo', 'ratio': '0.25'}, {'color': 'red', 'ratio': '0.3'}, {'color': 'blue', 'ratio': '0.05'}, {'color': 'green', 'ratio': '0.25'}]
>>> with open('colors.csv', mode='wt') as color_file:
... writer = csv.DictWriter(color_file, fieldnames=['ratio', 'color'])
... writer.writeheader()
... for row in color_dicts:
... writer.writerow(row)
We can see that the ratio column was written first because that’s how we ordered the columns in fieldnames:
>>> with open('colors.csv') as color_file:
... print(color_file.read())
...
ratio,color
0.15,purple
0.25,indigo
0.3,red
0.05,blue
0.25,green
CSV Exercises¶
Re-order¶
Read a CSV file, swap the first and second columns, and write the CSV back out to a new file.
Re-sort¶
Read a CSV file, sort the file by the second column, and write the file back.
Tab to Comma¶
Write a program that reads a CSV file using tabs as delimiters and converts it to one using commas.
Capital Guessing¶
Write a guessing game program that takes a list of locations and their capitals and quizzes us on capitals.
You might want to test this program using us-state-capitals.csv.