Welcome, guest | Sign In | My Account | Store | Cart
4

This module introduces an alternative syntax a-la shell pipes for sequence-oriented functions, such as filter, map, etc., via certain classes that override __ror__ method.

Python, 54 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from itertools import izip, imap, count, ifilter
import re

def cat(fname):
    return file(fname).xreadlines()

class grep:
    """keep only lines that match the regexp"""
    def __init__(self,pat,flags=0):
        self.fun = re.compile(pat,flags).match
    def __ror__(self,input):
        return ifilter(self.fun,input)

class tr:
    """apply arbitrary transform to each sequence element"""
    def __init__(self,transform):
        self.tr=transform
    def __ror__(self,input):
        return imap(self.tr,input)

class printlines_class:
    """print sequence elements one per line"""
    def __ror__(self,input):
        for l in input:
            print l

printlines=printlines_class()

class terminator:
    """to be used at the end of a pipe-sequence"""
    def __init__(self,method):
        self.process=method
    def __ror__(self,input):
        return self.process(input)

# those objects transform generator to list, tuple or dict
aslist  = terminator(list)
asdict  = terminator(dict)
astuple = terminator(tuple)

# this object transforms seq to tuple sequence
enum = terminator( lambda input: izip(count(),input) )

#######################
# example 1: equivalent to shell grep ".*/bin/bash" /etc/passwd
cat('/etc/passwd') | tr(str.rstrip) | grep('.*/bin/bash') | printlines

#######################
# example 2: get a list of int's methods beginning with '__r'
dir(int) | grep('__r') | aslist

#######################
# example 3: useless; returns a dict {0:'l',1:'a',2:'m',3:'b',4:'d',5:'a'} 
'lambda' | enum | asdict

Python has several functions that do operate on sequential data, a.e. filter, map, zip, sum, etc. However, to do some complicated processing one has to introduce intermediate variables, or build complex nested function calls or list comrehencions. This is not as elegant as, for example, unix shell command "cat foo.bar | grep smth | sort | uniq".

Inspired by a "C++-like iostream" recipe by Erik Max Francis (no. 157034 in this cookbook), i made this quick-hack emulation of shell pipe syntax. The main advantage of such syntax is that the distinct operations in a sequence are located between |'s, so there is no messing brackets, and no extra variables too.

This is also useful in interactive mode, to see the content of a generator. It seems easier to add "| aslist" at the end of an expression than to enclose the whole expression in list(...) constructor.

Note also, that everything here makes use of generators when possible, so no extra memory is consumed during processing.

6 comments

Garth Kidd 10 years, 6 months ago  # | flag

Fantastic hack! I'd like to see a module along these lines as part of the standard library. Add cut, uniq, sort, and a few others, and you'd really have something.

A wee fussy mod or two:

class match:
    """keep only lines that match the regexp"""
    def __init__(self, pat, flags=0, method='match'):
        self.fun = getattr(re.compile(pat, flags), method)
    def __ror__(self, input):
        return ifilter(self.fun, input)

class search(match):
    def __init__(self, pat, flags=0):
        match.__init__(self, pat, flags, method='search')

grep = search

... and then later...

class writelines:
    "write each item to a file like object"
    def __init__(self, f):
        self.file = f
    def __ror__(self, input):
        for l in input:
            self.file.write(l)

printlines = writelines(sys.stdout)

One makes grep behave more commandliney (grep doesn't insist upon matching at the beginning of the first line by default) and the other adds a more generic writelines method and re-implements printlines with it -- also eliminating an issue in which cat(file)|printlines would double the newlines.

Garth Kidd 10 years, 6 months ago  # | flag

Hmmm. We could define

def cat(fname, mode='rtU'):
    return file(fname).xreadlines()

but it's a little redundant as one can just as easily use file() as easily in Python 2.3, at least, thanks to iter(file-like-object) being suitabily equivalent to iter(file-like-object.xrealines()) in behaviour.

Maxim Krikun (author) 10 years, 6 months ago  # | flag

Re: fantastic hack. Thank you for your feedback. In fact, there was a similar hierarchy in the original module, but i did cut this off before posting here in order to be more easy-readable.

This receipe is, first of all, to demonstrate the new syntax, not to propose a complete module.

Of course one can add analogs to other unix-tools, and the issues concerning enchancements in recent python versions, such as file() iterability, enumerate() function et cetera should be considered. Hovewer i prefer to avoid writing code before i really need it.

Maxim Krikun (author) 10 years, 6 months ago  # | flag

module URL. I had put module text and related discussion to my personal wiki-page: http://lbss.math.msu.su/~krikun/PipeSyntaxModule

Massimo Santini 5 years, 5 months ago  # | flag

This is a really cool hack! I've attempted to extend/adapt it to generators in recipe 576756 and to external processes (via Popen) in recipe 576757.

Benoit Perdu 1 year, 8 months ago  # | flag

Beautiful hack, it finds its place everywhere.

I agree that it would be at home in the standard library.

Add a comment

Sign in to comment