Welcome, guest | Sign In | My Account | Store | Cart

Another way to read lines from file backwards from the end to the beginning

Python, 36 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import sys
if sys.platform[:3] == 'win':
    # use this under MS Windows or if you're going to read
    # MS Windows-formatted text files on other OSes using
    # the "univeral" option to open()
    def find_offsets(infile):
        offsets = []
        while 1:
            offset = infile.tell()
            if not infile.readline():
                break
            offsets.append(offset)
        return offsets
else:
    # Assumes the file uses a newline convention which is
    # one byte long.
    def find_offsets(infile):
        offsets = []
        offset = 0
        for line in infile:
            offsets.append(offset)
            offset += len(line)
        return offsets

def iter_backwards(infile):
    # make sure it's seekable and at the start
    infile.seek(0)
    offsets = find_offsets(infile)
    for offset in offsets[::-1]:
        infile.seek(offset)
        yield infile.readline()

# An example of how to use the new iterator

for line in iter_backwards(open("spam.py")):
    print repr(line)

There are already a couple recipes for reading a file line by line backwards. They read a block from the end, get the lines, and continue reading block by block.

The approach used here is different. It reads all the lines going forwards and stores the byte position of the starting line number. It then iterates through the list of line positions and for each one it seeks to the byte position and reads a line. The solution is definitely easier to understand than the other approaches, but requires reading the full file at least once.

I provide two ways to get a line's start position. A Windows formatted file uses '\r\n' which gets converted to a '\n' if read under Windows via a file formatted for text (the default, unless you open with 'b'), or under any OS if read using the new "universal" flag to the open() call.