File Objects¶
Fake Files¶
What if we want to read CSV data from a string?
>>> csv_data = "1,2\n3,4"
>>> csv_reader = csv.reader(csv_data)
>>> for data in csv_reader:
... print(data)
...
['1']
['', '']
['2']
[]
['3']
['', '']
['4']
That’s not what we want. CSV reader expects a list of strings to loop over and it will treat each one as a line in the CSV file.
So we could split our data by lines:
>>> csv_data = "purple,0.15\nindigo,0.25\nred,0.3\nblue,0.05\ngreen,0.25"
>>> csv_reader = csv.reader(csv_data.splitlines())
>>> colors = list(csv_reader)
>>> colors
[['purple', '0.15'], ['indigo', '0.25'], ['red', '0.3'], ['blue', '0.05'], ['green', '0.25']]
Neat!
What if we want to use the CSV writer to create CSV data in a string?
Unfortunately CSV writer needs a file object to write to. Fortunately, we can make a file-like object in Python.
>>> import csv
>>> from io import StringIO
>>> colors = [["purple", "0.15"], ["indigo", "0.25"], ["red", "0.3"], ["blue", "0.05"], ["green", "0.25"]]
>>> csv_file = StringIO()
>>> csv_writer = csv.writer(csv_file)
>>> for line in colors:
... csv_writer.writerow(line)
...
13
13
9
11
12
>>> csv_data = csv_file.getvalue()
>>> print(csv_data)
purple,0.15
indigo,0.25
red,0.3
blue,0.05
green,0.25
>>>
Success! We have tricked Python into writing our CSV data into a file which isn’t really a file but is actually an in-memory file-like object.
StringIO objects support lots of methods things that file objects support:
>>> from io import StringIO
>>> fake_file = StringIO("hello")
>>> fake_file.read()
'hello'
>>> fake_file.seek(0)
0
>>> fake_file.write(' world')
6
>>> fake_file.seek(0)
0
>>> fake_file.read()
'hello world'
>>> fake_file.close()
>>> fake_file.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file
>>> fake_file = StringIO("line 1\nline 2\n line 3")
>>> fake_file.readline()
'line 1\n'
>>> fake_file.readline()
'line 2\n'
StringIO objects are basically secret agent in-memory strings, pretending to be files.
Standard Input/Output¶
We’ve learned that you can print to a file. We can also print to a StringIO object. It works just like a file this way.
>>> my_file = StringIO()
>>> print("hello world!", file=my_file)
>>> my_file.getvalue()
'hello world!\n'
What “file” does print write to by default? The answer is sys.stdout.
>>> print("hello world")
hello world
>>> import sys
>>> print("hello world", file=sys.stdout)
hello world
There are three streams our program has to work with, two of them writable and one of them readable.
There’s sys.stdin which we can read from:
>>> sys.stdin.readline()
hello (we're typing this right now)
"hello (we're typing this right now)\n"
This is the standard input stream which is what our program uses to get input from the user or from files or streams that are piped into it from the command line.
There’s sys.stdout which we can write to:
>>> sys.stdout.write('hello\n')
hello
6
This is the standard output stream which our program writes output to by default.
There’s also sys.stderr which we can write to:
>>> sys.stderr.write('hello\n')
hello
6
This is the standard error stream which our program should write errors to. This can be useful when we have an error that we want to print that shouldn’t be put in standard output in case we’re actually using the output for something, like piping it to a file.
What would happen if we tried to write to stdin? Or read from stdout?
>>> import sys
>>> sys.stdin.write("hello world")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: not writable
>>> sys.stdout.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: not readable
We can’t write to standard input and we can’t read from standard output or standard error.
Files and file-like objects have a readable and writable method that we can use to determine whether we can read from or write to a file.
>>> sys.stdin.readable()
True
>>> sys.stdin.writable()
False
>>> sys.stdout.readable()
False
>>> sys.stdout.writable()
True
>>> sys.stderr.readable()
False
>>> sys.stderr.writable()
True
These methods work on other file objects too:
>>> from io import StringIO
>>> fake_file = StringIO()
>>> fake_file.readable()
True
>>> fake_file.writable()
True
What makes a file?¶
>>> import sys
>>> import io
>>> fake_file = io.StringIO()
>>> my_file = open('my_file.txt')
>>> isinstance(my_file, io.TextIOBase)
True
>>> isinstance(fake_file, io.TextIOBase)
True
>>> isinstance(sys.stdout, io.TextIOBase)
True
Files we open from the file system, standard input and output streams, and StringIO objects all inherit from the io.TextIOBase class.
Let’s see what’s in this class:
>>> help(io.TextIOBase)
So at this point you might assume that for an object to act like a file, it needs to inherit from io.TextIOBase. This is incorrect.
Let’s make a class that implements the bare minimum needed for Python’s print function to accept it as a file.
class FakeFile:
"""Don't actually use this... StringIO is better."""
def __init__(self):
self.contents = ""
def write(self, data):
self.contents += data
Now let’s import this and try it out:
>>> from fake_file import FakeFile
>>> fake = FakeFile()
>>> print("hello world", file=fake)
>>> fake.contents
'hello world\n'
Python doesn’t practice type checking and neither should you. In Python, we use duck typing.
Our fake file object even works with csv.writer:
>>> import csv
>>> from fake_file import FakeFile
>>> fake = FakeFile()
>>> csv_writer = csv.writer(fake)
>>> colors = [("purple", "0.15"), ("indigo", "0.25"), ("red", "0.3"), ("blue", "0.05"), ("green", "0.25")]
>>> for line in colors:
... csv_writer.writerow(line)
...
>>> print(fake.contents)
purple,0.15
indigo,0.25
red,0.3
blue,0.05
green,0.25
>>>
You’ll see some obviously file-related things in there:
We’ve learned about:
close: close a fileread: read contents from a filewrite: write to a file
All files have these methods. All files can also be looped over.
Files also have other methods for reading and writing:
readable: returns True if the file can be readreadline: reads and returns characters up to the next line breakwritable: returns True if the file can be written to
Files also have methods for changing the current position that we’re reading from in the file:
seekable: return True if the file read position can be changed withseekseek: change the current position we’re reading from in the filetell: return the current position we’re reading from (as a number)truncate: resize the file stream to a given size
Other File-Like Objects¶
Here are a couple other file-like things in the Python standard library.
HTTP responses:
>>> from urllib.request import urlopen
>>> with urlopen('http://th.mit-license.org/license.txt') as response:
... license = response.read()
...
>>> license
b'The MIT License (MIT)\nCopyright \xc2\xa9 2015 Trey Hunner\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \xe2\x80\x9cSoftware\xe2\x80\x9d), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \xe2\x80\x9cAS IS\xe2\x80\x9d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\nTHE SOFTWARE.\n'
Gzip files:
>>> import gzip
>>> with gzip.open('hello.txt.gz', mode='wt') as gzip_file:
... gzip_file.write("hello world")
...
11
>>> with open('hello.txt.gz', mode='rb') as gzip_file:
... print(gzip_file.read())
...
b'\x1f\x8b\x08\x08\xd2\xa2\x16V\x02\xffhello.txt\x00\xcaH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x00\x00\xff\xff\x03\x00\x85\x11J\r\x0b\x00\x00\x00'
File Exercises¶
Country Capitals CSV¶
Download this country capitals file.
Write a program that opens the file and extracts country name and capital city from each row, and write a new file to disk in the following format:
country,capital,population
China,Beijing,1330044000
India,New Delhi,1173108018
United States,Washington,310232863
The country rows should be sorted by largest population first.
Echo¶
Write a program that downloads gzipped data from the Internet, extracts it, and saves it on disk all without using a temporary file.
You can use this gzipped response: https://httpbin.org/gzip