Welcome, guest | Sign In | My Account | Store | Cart

This recipe provides a lightweight way to group arbitrary objects together into disjoint sets when a full-blown graph data structure would be overkill.

Python, 63 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63``` ```class Grouper(object): """This class provides a lightweight way to group arbitrary objects together into disjoint sets when a full-blown graph data structure would be overkill. Objects can be joined using .join(), tested for connectedness using .joined(), and all disjoint sets can be retreived using .get(). The objects being joined must be hashable. For example: >>> g = grouper.Grouper() >>> g.join('a', 'b') >>> g.join('b', 'c') >>> g.join('d', 'e') >>> list(g.get()) [['a', 'b', 'c'], ['d', 'e']] >>> g.joined('a', 'b') True >>> g.joined('a', 'c') True >>> g.joined('a', 'd') False""" def __init__(self, init=[]): mapping = self._mapping = {} for x in init: mapping[x] = [x] def join(self, a, *args): """Join given arguments into the same set. Accepts one or more arguments.""" mapping = self._mapping set_a = mapping.setdefault(a, [a]) for arg in args: set_b = mapping.get(arg) if set_b is None: set_a.append(arg) mapping[arg] = set_a elif set_b is not set_a: if len(set_b) > len(set_a): set_a, set_b = set_b, set_a set_a.extend(set_b) for elem in set_b: mapping[elem] = set_a def joined(self, a, b): """Returns True if a and b are members of the same set.""" mapping = self._mapping try: return mapping[a] is mapping[b] except KeyError: return False def __iter__(self): """Returns an iterator returning each of the disjoint sets as a list.""" seen = set() for elem, group in self._mapping.iteritems(): if elem not in seen: yield group seen.update(group) ```

I've used this class to group vector objects on page images, and group sets of webpages by their links, but there are many other conceivable applications.

The objects being joined must be hashable, and may not be "None".

The original has been edited as per Raymond's first two suggestions.

Raymond Hettinger 19 years ago

Nicely done. Here are couple of minor suggestions.

The joined() method can return True, False, or None (when "a" is not known). It can fixed and simplified to:

``````def joined(self, a, b):
"""Returns True if a and b are members of the same set."""
mapping = self._mapping
try:
return mapping[a] is mapping[b]
except KeyError:
return False
``````

The get() method should be renamed to __iter__(). Its logic can also be simplified:

``````def __iter__(self):
"""Returns an iterator returning each of the disjoint sets as a list."""
seen = set()
for elem, group in self._mapping.iteritems():
if elem not in seen:
yield group
seen.update(group)
``````
Raymond Hettinger 19 years ago

Small Algorithmic Improvement. The most time consuming part of join() is merging two existing sets. That time is kept to a minimum by keeping the larger set and merging in the smaller:

``````def join(self, a, *args):
"""Join given arguments into the same set."""
mapping = self._mapping
set_a = mapping.setdefault(a, [a])

for arg in args:
set_b = mapping.get(arg)
if set_b is None:
set_a.append(arg)
mapping[arg] = set_a
elif set_b is not set_a:
if len(set_b) &gt; len(set_a):
set_a, set_b = set_b, set_a
set_a.extend(set_b)
for elem in set_b:
mapping[elem] = set_a
``````

Also, it is faster to test "a is not b" than "not a is b".

Raymond Hettinger 19 years ago

API suggestion for __init__(). The __init__() method should accept a series of join pairs: [('a', 'b'), ('c', 'd'), ...]. That makes more sense than accepting a list of elements which initially define distinct groups but can become paired later. Thinking in terms of use cases, it would be odd for this application to have a list of elements without any information to link them, and if it did have the links, it would be odd to not use that information during initialization.

Michael Droettboom (author) 18 years, 9 months ago

Thanks. I really appreciate the feedback.

I agree wholeheartedly with the first two suggestions.

However, for this one, I disagree. In my own use case, I generally know all the elements first, and then determine their grouping later. If I knew their grouping already, I wouldn't have a need for this class ;)

While I can conceive of use cases where your suggestion would be more convenient, it makes my use case more inconvenient (I would have to pass a list of lists, rather than just a list.)

Perhaps a couple different factory functions with either case would be better?

Connelly Barnes 18 years, 5 months ago

Nice class. Wow, this is a great class. In the two uses I've had for groupings, it makes more sense for __init__ to take a list of elements than a list of pairs. I've stopped using my own (less Pythonic) grouping class, and now I use yours. Thanks!

brian russo 15 years, 5 months ago

Sorry that should be

``````def __getitem__(self, a):
# Return the set that contains an item
for set in list(self):
for elem in set:
if a in elem:
return set
``````
brian russo 15 years, 5 months ago

Thanks for the class; saved me a bunch of work.

For my use it was helpful to access this Group using subscript; i.e. g['a'] -> ['a', 'b', 'c']

Code to implement this is:

```def __getitem__(self, a): ``` # Return the set that contains an item ```for set in list(self): ``` for elem in set: ```if a in elem: ``` return set

 Created by Michael Droettboom on Thu, 17 Feb 2005 (PSF)

### Required Modules

• (none specified)