You have a file with long lines split over two or more lines, with backslashes to indicate that a continuation line follows. You want to rejoin those split lines.
| Python |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | class LogicalLines:
def __init__(self, fileobj, continued=None):
# self.seq: the underlying line-sequence
# self.phys_num: current index into self.seq (physical line number)
# self.logi_num: current index into self (logical line number)
import xreadlines
try: self.seq = fileobj.xreadlines()
except AttributeError: self.seq = xreadlines.xreadlines(fileobj)
self.phys_num = 0
self.logi_num = 0
# allow for optional passing of continued-function
if not callable(continued):
def continued(line):
if line.endswith('\\\n'): return 1,line[:-2]
else: return 0, line
self.continued = continued
def __getitem__(self, index):
if index != self.logi_num:
raise TypeError, "Only sequential access supported"
self.logi_num += 1
result = []
while 1:
# Note: we must intercept IndexError, since we may not
# be finished, even when the underlying sequence is --
# we may have one or more lines in result to be returned
try: line = self.seq[self.phys_num]
except IndexError:
if result: break
else: raise
self.phys_num += 1
continues, line = self.continued(line)
result.append(line)
if not continues: break
# return string result
return ''.join(result)
# here's an example function, showing off usage:
def show_logicals(fileob,numlines=5):
ll = LogicalLines(fileob)
for l in ll:
print "Log#%d, phys# %d: %s" % (
ll.logi_num, ll.phys_num, repr(l))
if ll.logi_num>numlines: break
if __name__=='__main__':
from cStringIO import StringIO
ff = StringIO(
"""prima \seconda \terza
quarta \quinta
sesta
settima \ottava
""")
show_logicals( ff )
# a simpler approach, if the need is of a 1-off kind, might be:
# logical_line = []
# for physical_line in fileobj.xreadlines():
# if physical_line.endswith('\\\n'):
# logical_line.append(physical_line[:-2])
# else:
# logical_line = ''.join(logical_line) + physical_line
# process_full_record(logical_line)
# logical_line = []
# if logical_line: process_full_record(''.join(logical_line))
|
Discussion
Inspired by Recipe 8.1 in O'Reilly's Perl Cookbook. See also http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66063, recipe "Read a text file by-paragraph", since the structure is quite similar. We could have picked a more ad-hoc approach, closer to the logic of the Perl recipe, here shown in the ending comment of this recipe.
However, a class wrapper is a much more natural, reusable-code approach in Python, and this exemplifies a similar but different kind of line-bunching from recipe 66063, and is similarly extensible (here, by passing a "continued" function that takes a physical line and returns a pair -- first item true if the line is to be continued, false if this finishes the logical line -- second item, part or all of the physical line to be used in composing the logical line). Again, this shows an important general approach.
Here, the ending "if __name__=='main'" part does perform a simple test, in this case with a simulated-file object, just to show the base functionaliry.


Comments
a generator version. I was going to use Alex's implementation, but since xreadlines has been deprecated, I wrote a generator version instead:
This has the downside of needing the whole file (or having to chunk it manually) at once, but has the nice upside of using splitlines to handle DOS/Unix/Max line ending conventions seamlessly.
Sign in to comment