Welcome, guest | Sign In | My Account | Store | Cart

ilines is a generator that takes an iterable and produces lines of text. The input iterable should produce blocks of bytes (as type str) such as might be produced by reading a file in binary. The output lines are formed by the same rule as the "universal newlines" file mode [f = file(name, 'U')] and are produced "on-line" -- when lines are discovered, they are produced.

Python, 61 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def ilines(source_iterable):
    '''yield lines as in universal-newlines from a stream of data blocks'''
    tail = ''
    for block in source_iterable:
        if not block:
            continue
        if tail.endswith('\015'):
            yield tail[:-1] + '\012'
            if block.startswith('\012'):
                pos = 1
            else:
                tail = ''
        else:
            pos = 0
        try:
            while True: # While we are finding LF.
                npos = block.index('\012', pos) + 1
                try:
                    rend = npos - 2
                    rpos = block.index('\015', pos, rend)
                    if pos:
                        yield block[pos : rpos] + '\n'
                    else:
                        yield tail + block[:rpos] + '\n'
                    pos = rpos + 1
                    while True: # While CRs 'inside' the LF
                        rpos = block.index('\015', pos, rend)
                        yield block[pos : rpos] + '\n'
                        pos = rpos + 1
                except ValueError:
                    pass
                if '\015' == block[rend]:
                    if pos:
                        yield block[pos : rend] + '\n'
                    else:
                        yield tail + block[:rend] + '\n'
                elif pos:
                    yield block[pos : npos]
                else:
                    yield tail + block[:npos]
                pos = npos
        except ValueError:
            pass
        # No LFs left in block.  Do all but final CR (in case LF)
        try:
            while True:
                rpos = block.index('\015', pos, -1)
                if pos:
                    yield block[pos : rpos] + '\n'
                else:
                    yield tail + block[:rpos] + '\n'
                pos = rpos + 1
        except ValueError:
            pass

        if pos:
            tail = block[pos:]
        else:
            tail += block
    if tail:
        yield tail

Many data sources produce their data in fits and starts -- sockets, compression expansion, and (at its heart) most I/O. The data often doesn't arrive at convenient boundaries, but you often want to consume it in logical units. For text, this is often line-by-line. Python has "universal newline" processing for reading files written in system-idiomatic end-of-line conventions with its mode 'U'. There are, however, other data sources (rss feeds, compression expansion, timeout-controlled input) producing raw bytes that would benefit from this conversion.

Generators and iteration provide clear ways of expressing on-line operations. Often you don't need processes connected by pipes or threads connected by queues to produce "buffering" results, and this recipe is an example of how you can use generators for giving those results. By connecting to a data source this way, a program showing the first screenful of text from a data source can fill that screen as the data is being retrieved.

This recipe provides a useful tool for extracting text lines from arbitrary data sources. The recipe also provides a relatively simple example of how to build on-line agorithms.

1 comment

kusno mudiarto 8 years, 10 months ago  # | flag

It is very useful for csv reader, especially when I need to process stream that cannot be opened in universal newline mode, e.g. BlobStore in Google AppEngine

Created by Scott David Daniels on Wed, 23 Jun 2004 (PSF)
Python recipes (4591)
Scott David Daniels's recipes (10)

Required Modules

  • (none specified)

Other Information and Tasks