Welcome, guest | Sign In | My Account | Store | Cart

This recipe provides a lightweight way to group arbitrary objects together into disjoint sets when a full-blown graph data structure would be overkill.

Python, 63 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class Grouper(object):
   """This class provides a lightweight way to group arbitrary objects
together into disjoint sets when a full-blown graph data structure
would be overkill.

Objects can be joined using .join(), tested for connectedness
using .joined(), and all disjoint sets can be retreived using
.get().

The objects being joined must be hashable.

For example:

>>> g = grouper.Grouper()
>>> g.join('a', 'b')
>>> g.join('b', 'c')
>>> g.join('d', 'e')
>>> list(g.get())
[['a', 'b', 'c'], ['d', 'e']]
>>> g.joined('a', 'b')
True
>>> g.joined('a', 'c')
True
>>> g.joined('a', 'd')
False"""   
   def __init__(self, init=[]):
      mapping = self._mapping = {}
      for x in init:
         mapping[x] = [x]
        
   def join(self, a, *args):
      """Join given arguments into the same set.
Accepts one or more arguments."""
      mapping = self._mapping
      set_a = mapping.setdefault(a, [a])

      for arg in args:
         set_b = mapping.get(arg)
         if set_b is None:
            set_a.append(arg)
            mapping[arg] = set_a
         elif set_b is not set_a:
            if len(set_b) > len(set_a):
               set_a, set_b = set_b, set_a
            set_a.extend(set_b)
            for elem in set_b:
               mapping[elem] = set_a

   def joined(self, a, b):
      """Returns True if a and b are members of the same set."""
      mapping = self._mapping
      try:
          return mapping[a] is mapping[b]
      except KeyError:
          return False

   def __iter__(self):
      """Returns an iterator returning each of the disjoint sets as a list."""
      seen = set()
      for elem, group in self._mapping.iteritems():
          if elem not in seen:
              yield group
              seen.update(group)

I've used this class to group vector objects on page images, and group sets of webpages by their links, but there are many other conceivable applications.

The objects being joined must be hashable, and may not be "None".

The original has been edited as per Raymond's first two suggestions.

7 comments

Raymond Hettinger 16 years, 9 months ago  # | flag

Nicely done. Here are couple of minor suggestions.

The joined() method can return True, False, or None (when "a" is not known). It can fixed and simplified to:

def joined(self, a, b):
   """Returns True if a and b are members of the same set."""
   mapping = self._mapping
   try:
       return mapping[a] is mapping[b]
   except KeyError:
       return False

The get() method should be renamed to __iter__(). Its logic can also be simplified:

def __iter__(self):
   """Returns an iterator returning each of the disjoint sets as a list."""
   seen = set()
   for elem, group in self._mapping.iteritems():
       if elem not in seen:
           yield group
           seen.update(group)
Raymond Hettinger 16 years, 9 months ago  # | flag

Small Algorithmic Improvement. The most time consuming part of join() is merging two existing sets. That time is kept to a minimum by keeping the larger set and merging in the smaller:

def join(self, a, *args):
   """Join given arguments into the same set."""
   mapping = self._mapping
   set_a = mapping.setdefault(a, [a])

   for arg in args:
      set_b = mapping.get(arg)
      if set_b is None:
         set_a.append(arg)
         mapping[arg] = set_a
      elif set_b is not set_a:
         if len(set_b) > len(set_a):
             set_a, set_b = set_b, set_a
         set_a.extend(set_b)
         for elem in set_b:
            mapping[elem] = set_a

Also, it is faster to test "a is not b" than "not a is b".

Raymond Hettinger 16 years, 9 months ago  # | flag

API suggestion for __init__(). The __init__() method should accept a series of join pairs: [('a', 'b'), ('c', 'd'), ...]. That makes more sense than accepting a list of elements which initially define distinct groups but can become paired later. Thinking in terms of use cases, it would be odd for this application to have a list of elements without any information to link them, and if it did have the links, it would be odd to not use that information during initialization.

Michael Droettboom (author) 16 years, 7 months ago  # | flag

Thanks. I really appreciate the feedback.

I agree wholeheartedly with the first two suggestions.

However, for this one, I disagree. In my own use case, I generally know all the elements first, and then determine their grouping later. If I knew their grouping already, I wouldn't have a need for this class ;)

While I can conceive of use cases where your suggestion would be more convenient, it makes my use case more inconvenient (I would have to pass a list of lists, rather than just a list.)

Perhaps a couple different factory functions with either case would be better?

Connelly Barnes 16 years, 2 months ago  # | flag

Nice class. Wow, this is a great class. In the two uses I've had for groupings, it makes more sense for __init__ to take a list of elements than a list of pairs. I've stopped using my own (less Pythonic) grouping class, and now I use yours. Thanks!

brian russo 13 years, 2 months ago  # | flag

Sorry that should be

def __getitem__(self, a):
    # Return the set that contains an item
    for set in list(self):
        for elem in set:
            if a in elem:
                return set
brian russo 13 years, 2 months ago  # | flag

Thanks for the class; saved me a bunch of work.

For my use it was helpful to access this Group using subscript; i.e. g['a'] -> ['a', 'b', 'c']

Code to implement this is:

def __getitem__(self, a): # Return the set that contains an item for set in list(self): for elem in set: if a in elem: return set

Created by Michael Droettboom on Thu, 17 Feb 2005 (PSF)
Python recipes (4591)
Michael Droettboom's recipes (3)

Required Modules

  • (none specified)

Other Information and Tasks