Welcome, guest | Sign In | My Account | Store | Cart

Accepts one of more files and/or globs and interleaves the lines from each writing the result to stdout.

Python, 46 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#!/usr/bin/env python
"""interleave.py <glob1> [, <glob1> ... ]

Accepts one of more files or globs interleaving lines and writing to stdout.

"""
import os
import sys
import glob

def iter_interleave(*iterables):
    """
    A generator that interleaves the output from a one or more iterators
    until they are *all* exhausted.

    """
    iterables = map(iter, iterables)
    while iterables:
        result = []
        for it in iterables:
            try:
                result.append(it.next())
            except StopIteration:
                iterables.remove(it)
        print result
        for item in result:
            yield item

if __name__ == '__main__':
    files = []

    if len(sys.argv) < 2:
        print __doc__.split("\n")[0]
        sys.exit(1)

    if sys.argv[1].lower() in ('-h', '--help'):
        print __doc__,
        sys.exit(0)

    for arg in sys.argv[1:]:
        for entry in glob.glob(arg):
            if os.path.isfile(entry):
                files.append(open(entry, 'U')) # Use universal newline support

    for line in iter_interleave(*files):
        print line,

I had a need to interleave lines from multiple files into a single file and didn't know of a simple UNIX command to do this - I'm sure a solution will be forthcoming now that I've said it ;-)

Here is a handy cross-platform Python version that I created in about 10 minutes.

The key part is the generator that combines lines read from file objects. It is important to note that its behaviour differs from itertools.izip() which ends as soon as one of the iterables runs out of data. This would cause a problem here if you have input files of varying lengths.

2 comments

Sylvain Fourmanoit 15 years, 2 months ago  # | flag

On a GNU-based system, I can think of:

ls .txt | xargs -l nl -n ln | sort -n | sed 's/^[0-9][ \t]*//'

It does not take ten minutes to write, but it's clearly not better than your solution in term of readability nor portability.

Gerrat Rickert 11 years, 3 months ago  # | flag

You shouldn't remove items from a list when you're iterating over it.

Also, Python's docs for the itertools library show a roundrobin recipe that does this: http://docs.python.org/2/library/itertools.html (recipe only existed in docs since python 2.5.4, Dec 2008, so still pretty new when this was created)