Introduction to Files¶
Reading Files¶
Up to now the only ways we’ve used large portions of data in our code is to put it directly into the code.
Let’s learn how to read data from files.
Let’s read from the file declaration-of-independence.txt.
>>> declaration_file = open('declaration-of-independence.txt')
>>> print(declaration_file.read())
>>> declaration_file.close()
First we open the file, then we read the contents of the file and print them out, then we close the file.
Let’s make a program that will read from a file and gives us statistics on the text in a given file.
import sys
def print_file_stats(filename):
stat_file = open(filename)
contents = stat_file.read()
stat_file.close()
word_count = len(contents.split())
print("Number of Words: {}".format(word_count))
if __name__ == "__main__":
filename = sys.argv[1]
print_file_stats(filename)
Let’s try it out:
$ python file_stats.py declaration-of-independence.txt
Number of Words: 1342
It works!
Closing Files¶
We need to remember to always close our file descriptors. This isn’t as important when reading files, but will be very important when writing files.
We can make sure we always close our files by putting our file read and close in a try-finally block.
def print_file_stats(filename):
stat_file = open(filename)
try:
contents = stat_file.read()
finally:
stat_file.close()
word_count = len(contents.split())
print("Number of Words: {}".format(word_count))
This will ensure that even if an exception is raised while reading the file, our file descriptor will still be closed.
This is such a common concern in Python, that the open function supports a special syntax for this.
def print_file_stats(filename):
with open(filename) as stat_file:
contents = stat_file.read()
word_count = len(contents.split())
print("Number of Words: {}".format(word_count))
This with block is called a context manager. Context managers allow us to ensure that particular cleanup tasks occur whenever a block of code is exited. Basically after our context manager block is exited, the stat_file file descriptor will be closed.
We’ll learn how to make our own context managers in a future workshop.
From now on we will always use the context manager syntax for opening files.
The last thing we’ll learn about are the mode and encoding arguments. Files are opened in read text mode by default. The encoding uses the system default. This is “utf-8” on my machine, but it can be different.
Let’s make our code a little more explicit about these values:
def print_file_stats(filename):
with open(filename, mode='rt', encoding='utf-8') as stat_file:
contents = stat_file.read()
word_count = len(contents.split())
print("Number of Words: {}".format(word_count))
Writing Files¶
Let’s make a program that writes to a file now.
To write to a file, we need to open it with a w in the mode argument:
>>> with open('test.txt', mode='wt', encoding='utf-8') as test_file:
... print("Hello world!", file=test_file)
...
So we can print straight to a file. This is pretty convenient.
We actually don’t see this very often, though.
What we see more often is this:
>>> with open('test.txt', mode='wt', encoding='utf-8') as test_file:
... test_file.write("Hello world!\n")
...
13
The write method on our file descriptor writes every character we give it to the file. It returns the number of characters it wrote to the file.
File Exercises¶
Count¶
Write a program that accepts a file as an argument and outputs the number of lines, words, and characters in the file.
$ python count.py my_file.txt
Lines: 2
Words: 6
Characters: 28
Longest Line: 17
Bonus: also output the maximum line length
Reverse¶
Write a program that reverses a file character-by-character and outputs the newly reversed text into a new file.
Example:
If my_file.txt contains:
This file
is two lines long
Running:
$ python reverse.py my_file.txt elif_ym.txt
Should make elif_ym.txt contain:
gnol senil owt si
elif sihT
Hint: review some of the interesting ways that slice works.
Concatenate¶
Write a program concat.py that takes any number of files as command-line arguments and sticks the files together, printing them to standard output.
If an error occurs while reading a file, the file should be skipped and an error should be printed.
Bonus: print the error messages to standard error (not standard output).
Tip
You can print to standard error like this:
>>> import sys
>>> print("this is an error", file=sys.stderr)
this is an error
Example usage of concat.py:
$ python concat.py file1.txt file2.txt file3.txt
This is file 1
[Errno 2] No such file or directory: 'file2.txt'
This is file 3
Sort¶
Write a program sort.py which takes a file as input and sorts every line in the file (ASCIIbetically). The original file should be overwritten.
Example:
$ python sort.py names.txt
If file names.txt started out as:
John Licea
Freddy Colella
James Stell
Mary Carr
Doris Romito
Janet Allen
Suzanne Blevins
Chris Moczygemba
Shawn McCarty
Jennette Holt
It should end up as:
Chris Moczygemba
Doris Romito
Freddy Colella
James Stell
Janet Allen
Jennette Holt
John Licea
Mary Carr
Shawn McCarty
Suzanne Blevins