Welcome, guest | Sign In | My Account | Store | Cart

What if you had a list like this: [1, -10, [1,2,[3,4]], xrange(200)], and you just wanted to go through each element in order (wanted it to return a simple list of [1,-10,1,2,3,4,1,2,3,4...199])

I've seen a couple of attempts to flatten arbitrarily deep lists. Many of them involve recursion, like this one: http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html

Recursion is generally considered non-pythonic (at least to my knowledge), so I have used one which just involves simple iterators instead. Also, recursion will fail if the list is too deep (so it wouldn't really be arbitrary, would it?).

Python, 39 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class iter2(object):
    '''Takes in an object that is iterable.  Allows for the following method
    calls (that should be built into iterators anyway...)
    calls:
        - append - appends another iterable onto the iterator.
        - insert - only accepts inserting at the 0 place, inserts an iterable
         before other iterables.
        - adding.  an iter2 object can be added to another object that is
         iterable.  i.e. iter2 + iter (not iter + iter2).  It's best to make
         all objects iter2 objects to avoid syntax errors.  :D
    '''
    def __init__(self, iterable):
        self._iter = iter(iterable)
    
    def append(self, iterable):
        self._iter = itertools.chain(self._iter, iter(iterable))
        
    def insert(self, place, iterable):
        if place != 0:
            raise ValueError('Can only insert at index of 0')
        self._iter = itertools.chain(iter(iterable), self._iter)
    
    def __add__(self, iterable):
        return itertools.chain(self._iter, iter(iterable))
        
    def next(self):
        return self._iter.next()
    
    def __iter__(self):
        return self

def flatten(iterable):
    '''flatten a list of any depth'''
    iterable = iter2(iterable)
    for e in iterable:
        if hasattr(e, '__iter__'):
            iterable.insert(0, e)
        else:
            yield e

It's slow (like 10 times slower than the solutions here: http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python?answertab=active#tab-top) but unlike those it handles lists of any depth.

Note: that this will not split up non-iterables (it won't split up combinations, like strings).

Note: be careful, as this can almost be too general. Anything that has an iterator in it will be split up, so if you have dictionaries in your list their keys will be extracted (and no exception will be raised). On the other hand, it will also work if you have numpy arrays or other such iterables (such as xrange).

8 comments

Roberto De Almeida 12 years ago  # | flag

Here's my take on the problem. I use a queue:

def consume(iterable):
    input = list(iterable)
    while input:
        item = input.pop(0)
        try:
            data = iter(item)
            input = list(data) + input
        except:
            yield item

Not very efficient because it converts iterables to lists, but it could be changed to use itertools.chain, for example.

Roberto De Almeida 12 years ago  # | flag

And here's the solution with itertools:

import itertools

def consume(iterable):
    iterable = iter(iterable)

    while 1:
        try:
            item = iterable.next()
        except StopIteration:
            break

        try:
            data = iter(item)
            iterable = itertools.chain(data, iterable)
        except:
            yield item
Garrett (author) 12 years ago  # | flag

Yep! It's doing the same basic procedure as I am (inserting iterators into the beginning as they come). It is probably even a bit faster (but not much). I like it!

Garrett (author) 12 years ago  # | flag

I did some tests on that. I think mine is faster under most cases, but I think that is because you are using exceptions whereas I never encounter an exception (until the iterator is done obviously).

Garrett (author) 12 years ago  # | flag

I take that back... nesting it seems to give you a speedup over me. I don't know why mine is normally faster for my tests!

Also, on my box going to more than a depth of a 100 results in a memory error (but only with the timeit module, otherwise you can go to a depth of at least 20,000. I tried 200,000 and it crashed python without an exception! I've never done that!)

Matthew Wood 12 years ago  # | flag

It seems to me that the versions here don't necessarily work if strings are contained in the nested nest of nestings. :-) At least Roberto's doesn't as it gets stuck in an infinite loop. Here's my proposal:

import itertools

def consume(iterable, break_strings=False):
    iterable = iter(iterable)

    while 1:
        try:
            item = iterable.next()
            print 'item = ' + repr(item)
        except StopIteration:
            break

    if isinstance(item, basestring):
        if break_strings:
            for char in item:
                yield char
        else:
            yield item
        continue

    try:
        data = iter(item)
        iterable = itertools.chain(data, iterable)
    except:
        yield item

Obviously, the extra isinstance check will slow things down a little bit, but I think the flexibility is worth it in this case for me.

Matthew Wood 12 years ago  # | flag

Gah. Left in a debugging print. :-(

Garrett (author) 12 years ago  # | flag

yes, it only flattens iterables (not containers)

Created by Garrett on Tue, 3 Apr 2012 (MIT)
Python recipes (4591)
Garrett's recipes (7)

Required Modules

  • (none specified)

Other Information and Tasks