Welcome, guest | Sign In | My Account | Store | Cart
1

Break a list into roughly equal sized pieces.

Python, 6 lines
1
2
3
4
5
6
def split_seq(seq, size):
        newseq = []
        splitsize = 1.0/size*len(seq)
        for i in range(size):
                newseq.append(seq[int(round(i*splitsize)):int(round((i+1)*splitsize))])
        return newseq

Inspired by http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/425044

This requires that you know the length of the list beforehand, of course, so you can't use it with an arbitrary sequence as is. It's simple, but it's easy to create fencepost errors when implementing it.

>>> split_seq(range(10), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9]]

7 comments

Jeremy Dunck 11 years, 10 months ago  # | flag

Naming suggestion. The second parameter, "size", would be better named "numPieces", since split_seq([]..., x) always returns a list with x pieces.

>>> split_seq(range(10), 1)

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

>>> split_seq(range(10), 2)

[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]

Paul Watson 11 years, 10 months ago  # | flag

All integer approach. Here is an all integer method that builds a list of ranges first, then splices the sequence.

def split_seq(seq, numpieces):
    seqlen = len(seq)
    d, m = divmod(seqlen, numpieces)
    rlist = range(0, ((d + 1) * (m + 1)), (d + 1))
    if d != 0: rlist += range(rlist[-1] + d, seqlen, d) + [seqlen]

    newseq = []
    for i in range(len(rlist) - 1):
        newseq.append(seq[rlist[i]:rlist[i + 1]])

    newseq += [[]] * max(0, (numpieces - seqlen))
    return newseq
Greg Jorgensen 11 years, 10 months ago  # | flag

alternate implementation using integers. This version uses integer math and distributes the remaindered items evenly over the first few splits.

def split_seq(seq, p):
    newseq = []
    n = len(seq) / p    # min items per subsequence
    r = len(seq) % p    # remaindered items
    b,e = 0, n + min(1, r)  # first split
    for i in range(p):
        newseq.append(seq[b:e])
        r = max(0, r-1)  # use up remainders
        b,e = e, e + n + min(1, r)  # min(1,r) is always 0 or 1

    return newseq


>>> split_seq(range(10), 3)
[[0, 1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> split_seq(range(11), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10]]
>>> split_seq(range(10), 4)
[[0, 1, 2], [3, 4, 5], [6, 7], [8, 9]]
Marc Keller 11 years, 10 months ago  # | flag

again, an integer approach. def splitCeil(seq, m):

"""Distribute the seq elements in lists in m groups

   according to quasi equitative distribution (decreasing order):

     splitCeil(range(13), 4) --> seq = range(13), m=4

     result : [[0, 1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

"""

n,b,newseq=len(seq),0,[]

for k in range(m):

    q,r=divmod(n-k,m)

    a, b = b, b + q + (r!=0)

    newseq.append(seq[a:b])

return newseq

def splitFloor(seq, m):

"""Distribute the seq elements in lists in m groups

   according to quasi equitative distribution (increasing order):

     seq = range(13), m=4

     result : [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11, 12]]

"""

n,b,newseq=len(seq),0,[]

for k in range(m):

    a, b = b, b + (n+k)//m

    newseq.append(seq[a:b])

return newseq
Sebastian Hempel 9 years, 9 months ago  # | flag

Another approach using slicing for the calculation of the list lengths.

def slice_it(li, cols=2):
    start = 0
    for i in xrange(cols):
        stop = start + len(li[i::cols])
        yield li[start:stop]
        start = stop
Gary Robinson 7 years, 7 months ago  # | flag

Here is a one-liner that has the following characteristics:

1) It gives the exact number of smaller sequences that are desired.

2) The lengths of the smaller sequences are as similar as can be.

3) The smaller sequences are not subsequences; the original sequence is sliced-and diced. But all the original elements are included; usually, having actual subsequences is less important (in my own experience) than having the right number of sequences and having them be of similar lengths.

Where seq is the original sequence and num is the desired number of subsequences:

[seq[i::num] for i in range(num)]

This issue is discussed in more detail on my blog.

Amit Bhatkal 2 years ago  # | flag

Implementation using numpy.linspace method. Just specify the number of parts you want the array to be divided in to.The array will be divided in to parts with nearly equal size.

Example :

>>> import numpy as np   
>>> a=np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



>>> parts=3
>>> i=np.linspace(np.min(a),np.max(a)+1,parts+1)
>>> i
array([  0.        ,   3.33333333,   6.66666667,  10.        ])
>>> i=np.array(i,dtype='uint16') # Indices should be floats
>>> i
array([ 0,  3,  6, 10], dtype=uint16)



>>> split_arr=[]
>>> for ind in range(i.size-1):
...     split_arr.append(a[i[ind]:i[ind+1]])



>>> split_arr
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8, 9])]

Add a comment

Sign in to comment

Created by Nick Matsakis on Fri, 10 Jun 2005 (PSF)
Python recipes (4574)
Nick Matsakis's recipes (2)

Required Modules

  • (none specified)

Other Information and Tasks