This recipe provides a lightweight way to group arbitrary objects together into disjoint sets when a full-blown graph data structure would be overkill.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | class Grouper(object):
"""This class provides a lightweight way to group arbitrary objects
together into disjoint sets when a full-blown graph data structure
would be overkill.
Objects can be joined using .join(), tested for connectedness
using .joined(), and all disjoint sets can be retreived using
.get().
The objects being joined must be hashable.
For example:
>>> g = grouper.Grouper()
>>> g.join('a', 'b')
>>> g.join('b', 'c')
>>> g.join('d', 'e')
>>> list(g.get())
[['a', 'b', 'c'], ['d', 'e']]
>>> g.joined('a', 'b')
True
>>> g.joined('a', 'c')
True
>>> g.joined('a', 'd')
False"""
def __init__(self, init=[]):
mapping = self._mapping = {}
for x in init:
mapping[x] = [x]
def join(self, a, *args):
"""Join given arguments into the same set.
Accepts one or more arguments."""
mapping = self._mapping
set_a = mapping.setdefault(a, [a])
for arg in args:
set_b = mapping.get(arg)
if set_b is None:
set_a.append(arg)
mapping[arg] = set_a
elif set_b is not set_a:
if len(set_b) > len(set_a):
set_a, set_b = set_b, set_a
set_a.extend(set_b)
for elem in set_b:
mapping[elem] = set_a
def joined(self, a, b):
"""Returns True if a and b are members of the same set."""
mapping = self._mapping
try:
return mapping[a] is mapping[b]
except KeyError:
return False
def __iter__(self):
"""Returns an iterator returning each of the disjoint sets as a list."""
seen = set()
for elem, group in self._mapping.iteritems():
if elem not in seen:
yield group
seen.update(group)
|
I've used this class to group vector objects on page images, and group sets of webpages by their links, but there are many other conceivable applications.
The objects being joined must be hashable, and may not be "None".
The original has been edited as per Raymond's first two suggestions.
Nicely done. Here are couple of minor suggestions.
The joined() method can return True, False, or None (when "a" is not known). It can fixed and simplified to:
The get() method should be renamed to __iter__(). Its logic can also be simplified:
Small Algorithmic Improvement. The most time consuming part of join() is merging two existing sets. That time is kept to a minimum by keeping the larger set and merging in the smaller:
Also, it is faster to test "a is not b" than "not a is b".
API suggestion for __init__(). The __init__() method should accept a series of join pairs: [('a', 'b'), ('c', 'd'), ...]. That makes more sense than accepting a list of elements which initially define distinct groups but can become paired later. Thinking in terms of use cases, it would be odd for this application to have a list of elements without any information to link them, and if it did have the links, it would be odd to not use that information during initialization.
Thanks. I really appreciate the feedback.
I agree wholeheartedly with the first two suggestions.
However, for this one, I disagree. In my own use case, I generally know all the elements first, and then determine their grouping later. If I knew their grouping already, I wouldn't have a need for this class ;)
While I can conceive of use cases where your suggestion would be more convenient, it makes my use case more inconvenient (I would have to pass a list of lists, rather than just a list.)
Perhaps a couple different factory functions with either case would be better?
Nice class. Wow, this is a great class. In the two uses I've had for groupings, it makes more sense for __init__ to take a list of elements than a list of pairs. I've stopped using my own (less Pythonic) grouping class, and now I use yours. Thanks!
Sorry that should be
Thanks for the class; saved me a bunch of work.
For my use it was helpful to access this Group using subscript; i.e. g['a'] -> ['a', 'b', 'c']
Code to implement this is:
def __getitem__(self, a):
# Return the set that contains an itemfor set in list(self):
for elem in set:if a in elem:
return set