This recipe presents a general purpose file object iterator cum file object proxy class. It provides a class that gives several iterator functions to read a text file by characters, words, lines, paragraphs or blocks. It also acts as a proxy for the wrapped file object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | import re
class FileIterator(object):
""" A general purpose file object iterator cum
file object proxy """
def __init__(self, fw):
self._fw = fw
# Attribute proxy for wrapped file object
def __getattr__(self, name):
try:
return self.__dict__[name]
except KeyError:
if hasattr(self._fw, name):
return getattr(self._fw, name)
return None
def readlines(self):
""" Line iterator """
for line in self._fw:
yield line
def readwords(self):
""" Word iterator. Newlines are omitted """
# 'Words' are defined as those things
# separated by whitespace.
wspacere = re.compile(r'\s+')
for line in self._fw:
words = wspacere.split(line)
for w in words:
yield w
def readchars(self):
""" Character iterator """
for c in self._fw.read():
yield c
def readblocks(self, block_size):
""" Block iterator """
while True:
block = self._fw.read(block_size)
if block=='':
break
yield block
def readparagraphs(self):
""" Paragraph iterator """
# This re-uses Alex Martelli's
# paragraph reading recipe.
# Python Cookbook 2nd edition 19.10, Page 713
paragraph = []
for line in self._fw:
if line.isspace():
if paragraph:
yield "".join(paragraph)
paragraph = []
else:
paragraph.append(line)
if paragraph:
yield "".join(paragraph)
if __name__=="__main__":
def dosomething(item):
print item,
try:
fw = open("myfile.txt")
iter = FileIterator(fw)
for item in iter.readlines():
dosomething(item)
# Rewind - method will be
# proxied to wrapped file object
iter.seek(0)
for item in iter.readblocks(100):
dosomething(item)
# Seek to a different position
pos = 200
iter.seek(pos)
for item in iter.readwords():
dosomething(item)
iter.close()
except (OSError, IOError), e:
print e
|
The idea for this recipe came from Alex Martelli's recipe that supplies a generator to read a text file by paragraph (Recipe 19.10, Python Cookbook 2nd Edition, Page 713). I thought it would be nice to have a single class that provides different methods to read a file - by character, word, line, paragraph and custom-size blocks.
One way to do it is by subtyping the "file" type and implementing the methods in the new type. However this recipe uses the aggregator cum proxy design pattern. It aggregates an open file object and defines iterator methods on top of it. However, unresolved methods are proxied to the wrapped file object, so you can use the iterator object to perform operations on the file object directly, as shown in the examples.