ilines -- universal newlines from any data source « Python recipes

ilines is a generator that takes an iterable and produces lines of text. The input iterable should produce blocks of bytes (as type str) such as might be produced by reading a file in binary. The output lines are formed by the same rule as the "universal newlines" file mode [f = file(name, 'U')] and are produced "on-line" -- when lines are discovered, they are produced.

      def ilines(source_iterable):
    '''yield lines as in universal-newlines from a stream of data blocks'''
    tail = ''
    for block in source_iterable:
        if not block:
            continue
        if tail.endswith('\015'):
            yield tail[:-1] + '\012'
            if block.startswith('\012'):
                pos = 1
            else:
                tail = ''
        else:
            pos = 0
        try:
            while True: # While we are finding LF.
                npos = block.index('\012', pos) + 1
                try:
                    rend = npos - 2
                    rpos = block.index('\015', pos, rend)
                    if pos:
                        yield block[pos : rpos] + '\n'
                    else:
                        yield tail + block[:rpos] + '\n'
                    pos = rpos + 1
                    while True: # While CRs 'inside' the LF
                        rpos = block.index('\015', pos, rend)
                        yield block[pos : rpos] + '\n'
                        pos = rpos + 1
                except ValueError:
                    pass
                if '\015' == block[rend]:
                    if pos:
                        yield block[pos : rend] + '\n'
                    else:
                        yield tail + block[:rend] + '\n'
                elif pos:
                    yield block[pos : npos]
                else:
                    yield tail + block[:npos]
                pos = npos
        except ValueError:
            pass
        # No LFs left in block.  Do all but final CR (in case LF)
        try:
            while True:
                rpos = block.index('\015', pos, -1)
                if pos:
                    yield block[pos : rpos] + '\n'
                else:
                    yield tail + block[:rpos] + '\n'
                pos = rpos + 1
        except ValueError:
            pass

        if pos:
            tail = block[pos:]
        else:
            tail += block
    if tail:
        yield tail

      

Many data sources produce their data in fits and starts -- sockets, compression expansion, and (at its heart) most I/O. The data often doesn't arrive at convenient boundaries, but you often want to consume it in logical units. For text, this is often line-by-line. Python has "universal newline" processing for reading files written in system-idiomatic end-of-line conventions with its mode 'U'. There are, however, other data sources (rss feeds, compression expansion, timeout-controlled input) producing raw bytes that would benefit from this conversion.

Generators and iteration provide clear ways of expressing on-line operations. Often you don't need processes connected by pipes or threads connected by queues to produce "buffering" results, and this recipe is an example of how you can use generators for giving those results. By connecting to a data source this way, a program showing the first screenful of text from a data source can fill that screen as the data is being retrieved.

This recipe provides a useful tool for extracting text lines from arbitrary data sources. The recipe also provides a relatively simple example of how to build on-line agorithms.

1 comment

kusno mudiarto 11 years, 2 months ago # | flag

It is very useful for csv reader, especially when I need to process stream that cannot be opened in universal newline mode, e.g. BlobStore in Google AppEngine

◄	Python recipes (4591)	►
◄	Scott David Daniels's recipes (10)	►

ilines -- universal newlines from any data source (Python recipe) by Scott David Daniels
ActiveState Code (http://code.activestate.com/recipes/286165/)

1 comment

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

ilines -- universal newlines from any data source (Python recipe) by Scott David Daniels ActiveState Code (http://code.activestate.com/recipes/286165/)

1 comment

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

ilines -- universal newlines from any data source (Python recipe) by Scott David Daniels
ActiveState Code (http://code.activestate.com/recipes/286165/)