Welcome, guest | Sign In | My Account | Store | Cart

Parseline: break a text line into formatted regions (Python recipe) by Rick Muller
ActiveState Code (http://code.activestate.com/recipes/502260/)

Parseline breaks a line (actually a string) into python objects like strings, floats, ints, etc., based upon a short format string.

      def parseline(line,format):
    """\
    Given a line (a string actually) and a short string telling
    how to format it, return a list of python objects that result.

    The format string maps words (as split by line.split()) into
    python code:
    x   ->    Nothing; skip this word
    s   ->    Return this word as a string
    i   ->    Return this word as an int
    d   ->    Return this word as an int
    f   ->    Return this word as a float

    Basic parsing of strings:
    >>> parseline('Hello, World','ss')
    ['Hello,', 'World']

    You can use 'x' to skip a record; you also don't have to parse
    every record:
    >>> parseline('1 2 3 4','xdd')
    [2, 3]

    >>> parseline('C1   0.0  0.0 0.0','sfff')
    ['C1', 0.0, 0.0, 0.0]
    """
    xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
    result = []
    words = line.split()
    for i in range(len(format)):
        f = format[i]
        trans = xlat.get(f)
        if trans: result.append(trans(words[i]))
    if len(result) == 0: return None
    if len(result) == 1: return result[0]
    return result

      

I have to parse the output of simulations codes all of the time, and this little recipe gets imported or pasted into most of my code at one time or another. The idea is to grab a line and then interpret the records based upon a little format string, which applies them to the words, returned by line.split().

I posted this recipe to c.l.python, and Fredrik Lundh posted a really great little variation:

def parseline(line, *types): return [c(x) for (x, c) in zip(line.split(), types) if c] or [None]

IMHO it loses some of the terseness of mine in application (it's amazing how many text records a line can have, and typing parseline(line,None,None,None,None,None,None,float) is much more tiring than parseline(line,'xxxxxxf'). But he wins points for pythonicity.

Tags: text

Created by Rick Muller on Mon, 26 Feb 2007 (PSF)

◄	Python recipes (4591)	►
◄	Rick Muller's recipes (10)	►

Required Modules

(none specified)

Other Information and Tasks

Licensed under the PSF License
Viewed 20134 times
Revision 1

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Parseline: break a text line into formatted regions (Python recipe) by Rick Muller ActiveState Code (http://code.activestate.com/recipes/502260/)