We all know the groupby()
which is available in the itertools
standard module. This one yields groups of consecutive elements in the input which are meant to be together in one group. For non-consecutive elements this will yield more than one group for the same key.
So effectively, groupby()
only reformats a flat list into bunches of elements from that list without reordering anything. In practice this means that for input sorted by key this works perfect, but for unsorted input it might yield several groups for the same key (with groups for other keys in between). Typically needed, though, is a grouping with reordering if necessary.
I implemented a likewise lazy function (yielding generators) which also accepts ungrouped input.
1 2 3 4 5 6 7 8 | def groupbyUnsorted(input, key=lambda x:x):
yielded = set()
keys = [ key(element) for element in input ]
for i, wantedKey in enumerate(keys):
if wantedKey not in yielded:
yield (wantedKey,
(input[j] for j in range(i, len(input)) if keys[j] == wantedKey))
yielded.add(wantedKey)
|
This derived from a StackOverflow question in which the list of lists
xs = [
[1,2,3,4],
[5,6,7,8],
[9,0,0,1],
[2,3],
[0],
[5,8,3,2,5,1],
[6,4],
[1,6,9,9,2,9]
]
was supposed to be grouped by length of the lists.
A solution using my function looks like this:
{ key: list(value) for (key, value) in groupbyUnsorted(xs, len) }
The OP's version has O(mn) complexity, where m is the number of keys and n is the list's length. It's possible to have a much faster version with optimal O(n) complexity, by resorting to a dict of lists where we store the indexes of the original sequence.
Here's some quick benchmarking on my laptop:
More than 100 times faster in this case :)