Welcome, guest | Sign In | My Account | Store | Cart

This recipe presents a general purpose file object iterator cum file object proxy class. It provides a class that gives several iterator functions to read a text file by characters, words, lines, paragraphs or blocks. It also acts as a proxy for the wrapped file object.

Python, 95 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import re

class FileIterator(object):
    """ A general purpose file object iterator cum
    file object proxy """
    
    def __init__(self, fw):
        self._fw = fw

    # Attribute proxy for wrapped file object
    def __getattr__(self, name):
        try:
            return self.__dict__[name]
        except KeyError:
            if hasattr(self._fw, name):
                return getattr(self._fw, name)

        return None
        
    def readlines(self):
        """ Line iterator """

        for line in self._fw:
            yield line

                
    def readwords(self):
        """ Word iterator. Newlines are omitted """
        
        # 'Words' are defined as those things
        # separated by whitespace.
        wspacere = re.compile(r'\s+')
        for line in self._fw:
            words = wspacere.split(line)
            for w in words:
                yield w

    def readchars(self):
        """ Character iterator """
        
        for c in self._fw.read():
            yield c

    def readblocks(self, block_size):
        """ Block iterator """

        while True:
            block = self._fw.read(block_size)
            if block=='':
                break
            yield block
        
    def readparagraphs(self):
        """ Paragraph iterator """

        # This re-uses Alex Martelli's
        # paragraph reading recipe.
        # Python Cookbook 2nd edition 19.10, Page 713
        paragraph = []
        for line in self._fw:
            if line.isspace():
                if paragraph:
                    yield "".join(paragraph)
                    paragraph = []
            else:
                paragraph.append(line)
        if paragraph:
            yield "".join(paragraph)
        
if __name__=="__main__":
    
    def dosomething(item):
        print item,
        
    try:
        fw = open("myfile.txt")
        iter = FileIterator(fw)
        for item in iter.readlines():
            dosomething(item)
            
        # Rewind - method will be
        # proxied to wrapped file object
        iter.seek(0)
        for item in iter.readblocks(100):
            dosomething(item)

        # Seek to a different position
        pos = 200
        iter.seek(pos)
        for item in iter.readwords():
            dosomething(item)        

        iter.close()
    except (OSError, IOError), e:
        print e

    

The idea for this recipe came from Alex Martelli's recipe that supplies a generator to read a text file by paragraph (Recipe 19.10, Python Cookbook 2nd Edition, Page 713). I thought it would be nice to have a single class that provides different methods to read a file - by character, word, line, paragraph and custom-size blocks.

One way to do it is by subtyping the "file" type and implementing the methods in the new type. However this recipe uses the aggregator cum proxy design pattern. It aggregates an open file object and defines iterator methods on top of it. However, unresolved methods are proxied to the wrapped file object, so you can use the iterator object to perform operations on the file object directly, as shown in the examples.

Created by Anand on Wed, 6 Apr 2005 (PSF)
Python recipes (4591)
Anand's recipes (38)

Required Modules

Other Information and Tasks