ActiveState Code

Recipe 303060: Group a list into sequential n-tuples


This function returns a list of n-tuples from a single "flat" list.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def group(lst, n):
    """group([0,3,4,10,2,3], 2) => [(0,3), (4,10), (2,3)]
    
    Group a list into consecutive n-tuples. Incomplete tuples are
    discarded e.g.
    
    >>> group(range(10), 3)
    [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
    """
    return zip(*[lst[i::n] for i in range(n)]) 

Discussion

There is probably an easier technique that accomplishes the same thing but I've found this function useful in XML data processing.

Comments

  1. 1. At 4:54 a.m. on 2 sep 2004, Kent Johnson said:

    Solution using a generator. You can do this with a generator and avoid creating the intermediate lists:

    def group(lst, n):
      for i in range(0, len(lst), n):
        val = lst[i:i+n]
        if len(val) == n:
          yield tuple(val)
    
    >>> list(group([0,3,4,10,2,3], 2))
    [(0, 3), (4, 10), (2, 3)]
    >>> list(group(range(10), 3))
    [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
    
  2. 2. At 4:25 a.m. on 3 sep 2004, Hamish Lawson said:

    Requires input to be a sequence rather than just an iterable. This requires that the input be a sequence rather than just an iterable. For a generator-based version that can take any iterable (and which doesn't discard incomplete tail items) see my own recipe at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303279.

  3. 3. At 4:14 a.m. on 6 sep 2004, Brian Quinlan (the author) said:

    Speed comparison. There is a speed vs memory vs flexibility trade-off that needs to be made here (along with the correct semantics regarding the final incomplete tuple). Here are my timings for the 4 functions that have been presented (group2 and batch2 are the alternate implementations suggested in comments). As you can see, I've skewed the test in favor of the iteration implementations by using a fairly large input list and starting with an iterable input.

    import timeit
    
    for fn in ['group1', 'group2', 'batch', 'batch2']:
        if fn.startswith('group'):
            call = '%s(list(x), 3)' % fn
        else:
            call = '%s(x, 3)' % fn
    
        timer = timeit.Timer(
            'for y in %s: pass\n' % call,
            'x = xrange(10000); from __main__ import %s' % fn)
        print timer.timeit(1000)
    
    Results:
    
    5.60028100014
    13.2801439762
    16.8175079823
    19.8542819023
    

    In the code that I designed this function for [len(lst) ~= 1000, list input], my algorithm is more than 10x faster than the next fastest alternative.

  4. 4. At 7:43 a.m. on 6 sep 2004, Brian Quinlan (the author) said:

    Fastest iterable version.

    def group(lst, n):
        """group([0,3,4,10,2,3], 2) => iterator
    
        Group an iterable into an n-tuples iterable. Incomplete tuples
        are discarded e.g.
    
        >>> list(group(range(10), 3))
        [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
        """
    
    Timings:
    
    6.01822805405
    14.5682420731
    2.44970393181 # this version
    18.8882629871
    21.9563779831
    
    I'm leaving the original recipe here because it is still appropriate if:
    
    o you are starting with a sequence
    o you need list output
    o the size of your list/iterable is small
    
  5. 5. At 7:57 a.m. on 6 sep 2004, Brian Quinlan (the author) said:

    Now with code!

    def group3(lst, n):
        """group([0,3,4,10,2,3], 2) => iterator
    
        Group an iterable into an n-tuples iterable. Incomplete tuples
        are discarded e.g.
    
        >>> list(group(range(10), 3))
        [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
        """
        return itertools.izip(*[itertools.islice(lst, i, None, n) for i in range(n)])
    

Sign in to comment