Welcome, guest | Sign In | My Account | Store | Cart

Split a sequence or generator into two iterators, each iterating over the elements that either pass or fail a predicate function.

Python, 23 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from collections import deque

def splitby(pred, seq):

    trues = deque()
    falses = deque()
    iseq = iter(seq)
    
    def pull(source, pred, thisval, thisbuf, otherbuf):
        while 1:
            while thisbuf:
                yield thisbuf.popleft()
            newitem = next(source)
            # uncomment next line to show that source is processed only once
            # print "pulled", newitem
            if pred(newitem) == thisval:
                yield newitem
            else:
                otherbuf.append(newitem)

    true_iter = pull(iseq, pred, True, trues, falses)
    false_iter = pull(iseq, pred, False, falses, trues)
    return true_iter, false_iter

Sometimes a sequence of items must be split into 2 groups depending on the evaluation of a predicate function. Using groupby to do this requires that the sequence be sorted, then grouped. Using filter/filterfalse requires 2 passes over the input sequence, which fails if the input is actually a generator. Paul Moore suggests using tee to create a split iterator over the sequence, and then passing the tee'd iterators to filter and filterfalse - this is simpler code, but it does evaluate the predicate twice for each element in the sequence; this recipe does the evaluation once per element and pushes the value to the corresponding output match/no-match list/generator.

The splitby recipe only traverses the input sequence once, so it will work for a generator, and it will only pull from the generator as values are pulled from the two output iterators.

6 comments

Paul Moore 11 years, 3 months ago  # | flag

You can do this using filter (in Python 3 - in Python 2, you need itertools.ifilter) and itertools.tee:

>>> import itertools
>>> def myiter():
...     for i in range(10):
...         print(i) # To confirm we are only going through the values once
...         yield i
...
>>> def even(n): return n%2 == 0
...
>>> def odd(n): return n%2 != 0
...
>>> i1, i2 = itertools.tee(myiter())
>>> e = filter(even, i1)
>>> o = filter(odd, i2)
>>> list(e)
0
1
2
3
4
5
6
7
8
9
[0, 2, 4, 6, 8]
>>> list(o)
[1, 3, 5, 7, 9]
>>>
Paul McGuire (author) 11 years, 3 months ago  # | flag

Thanks, Paul - I guess I just reinvented itertools.tee in my internal pull method. Still, I think there is some value in packaging up splitby much as you show here using tee, ifilter, and ifilterfalse, perhaps as an addition to the recipes included with itertools. -- Paul McGuire

Paul McGuire (author) 11 years, 3 months ago  # | flag

One additional note - using tee as proposed by Paul Moore will evaluate pred for every element twice. In contrast, my original version will only evaluate pred once - if this is an expensive operation, my original splitby may be preferable over using two filters.

huangchong 11 years, 3 months ago  # | flag

Thanks Author,Although I consider ur code also need to pred for every element twice.

Paul McGuire (author) 11 years, 3 months ago  # | flag

@huangchong - please look at the code again, pred is called only once for each element, and then based on returning True or False, the element is either yielded or put into the buffer list for the other value. Once an element is in a buffer list, it is never passed to pred again. If I expand pred to be the method isOdd:

def isOdd(x):
    print "testing",x
    return x % 2

odds,evens = splitby(range(10), isOdd)
print list(odds)
print list(evens)

We get this output:

testing 0
testing 1
testing 2
testing 3
testing 4
testing 5
testing 6
testing 7
testing 8
testing 9
[1, 3, 5, 7, 9]
[0, 2, 4, 6, 8]

See that each element is tested only once.

Paddy McCarthy 10 years, 5 months ago  # | flag

HI Paul. you might be interested in my code to split into N parts calling the predicate function only once and done in a more function style. I blogged it here: http://paddy3118.blogspot.co.uk/2013/06/filtering-iterator-into-n-parts-lazily.html