ActiveState Code

Recipe 387776: Grouping objects into disjoint sets


This recipe provides a lightweight way to group arbitrary objects together into disjoint sets when a full-blown graph data structure would be overkill.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class Grouper(object):
   """This class provides a lightweight way to group arbitrary objects
together into disjoint sets when a full-blown graph data structure
would be overkill.

Objects can be joined using .join(), tested for connectedness
using .joined(), and all disjoint sets can be retreived using
.get().

The objects being joined must be hashable.

For example:

>>> g = grouper.Grouper()
>>> g.join('a', 'b')
>>> g.join('b', 'c')
>>> g.join('d', 'e')
>>> list(g.get())
[['a', 'b', 'c'], ['d', 'e']]
>>> g.joined('a', 'b')
True
>>> g.joined('a', 'c')
True
>>> g.joined('a', 'd')
False"""   
   def __init__(self, init=[]):
      mapping = self._mapping = {}
      for x in init:
         mapping[x] = [x]
        
   def join(self, a, *args):
      """Join given arguments into the same set.
Accepts one or more arguments."""
      mapping = self._mapping
      set_a = mapping.setdefault(a, [a])

      for arg in args:
         set_b = mapping.get(arg)
         if set_b is None:
            set_a.append(arg)
            mapping[arg] = set_a
         elif set_b is not set_a:
            if len(set_b) > len(set_a):
               set_a, set_b = set_b, set_a
            set_a.extend(set_b)
            for elem in set_b:
               mapping[elem] = set_a

   def joined(self, a, b):
      """Returns True if a and b are members of the same set."""
      mapping = self._mapping
      try:
          return mapping[a] is mapping[b]
      except KeyError:
          return False

   def __iter__(self):
      """Returns an iterator returning each of the disjoint sets as a list."""
      seen = set()
      for elem, group in self._mapping.iteritems():
          if elem not in seen:
              yield group
              seen.update(group)

Discussion

I've used this class to group vector objects on page images, and group sets of webpages by their links, but there are many other conceivable applications.

The objects being joined must be hashable, and may not be "None".

The original has been edited as per Raymond's first two suggestions.

Comments

  1. 1. At 7:54 a.m. on 22 feb 2005, Raymond Hettinger said:

    Nicely done. Here are couple of minor suggestions.

    The joined() method can return True, False, or None (when "a" is not known). It can fixed and simplified to:

    def joined(self, a, b):
       """Returns True if a and b are members of the same set."""
       mapping = self._mapping
       try:
           return mapping[a] is mapping[b]
       except KeyError:
           return False
    

    The get() method should be renamed to __iter__(). Its logic can also be simplified:

    def __iter__(self):
       """Returns an iterator returning each of the disjoint sets as a list."""
       seen = set()
       for elem, group in self._mapping.iteritems():
           if elem not in seen:
               yield group
               seen.update(group)
    
  2. 2. At 11:13 a.m. on 23 feb 2005, Raymond Hettinger said:

    Small Algorithmic Improvement. The most time consuming part of join() is merging two existing sets. That time is kept to a minimum by keeping the larger set and merging in the smaller:

    def join(self, a, *args):
       """Join given arguments into the same set."""
       mapping = self._mapping
       set_a = mapping.setdefault(a, [a])
    
       for arg in args:
          set_b = mapping.get(arg)
          if set_b is None:
             set_a.append(arg)
             mapping[arg] = set_a
          elif set_b is not set_a:
             if len(set_b) > len(set_a):
                 set_a, set_b = set_b, set_a
             set_a.extend(set_b)
             for elem in set_b:
                mapping[elem] = set_a
    

    Also, it is faster to test "a is not b" than "not a is b".

  3. 3. At 10:45 p.m. on 23 feb 2005, Raymond Hettinger said:

    API suggestion for __init__(). The __init__() method should accept a series of join pairs: [('a', 'b'), ('c', 'd'), ...]. That makes more sense than accepting a list of elements which initially define distinct groups but can become paired later. Thinking in terms of use cases, it would be odd for this application to have a list of elements without any information to link them, and if it did have the links, it would be odd to not use that information during initialization.

  4. 4. At 1:02 p.m. on 9 may 2005, Michael Droettboom (the author) said:

    Thanks. I really appreciate the feedback.

    I agree wholeheartedly with the first two suggestions.

    However, for this one, I disagree. In my own use case, I generally know all the elements first, and then determine their grouping later. If I knew their grouping already, I wouldn't have a need for this class ;)

    While I can conceive of use cases where your suggestion would be more convenient, it makes my use case more inconvenient (I would have to pass a list of lists, rather than just a list.)

    Perhaps a couple different factory functions with either case would be better?

  5. 5. At 7:17 p.m. on 27 sep 2005, Connelly Barnes said:

    Nice class. Wow, this is a great class. In the two uses I've had for groupings, it makes more sense for __init__ to take a list of elements than a list of pairs. I've stopped using my own (less Pythonic) grouping class, and now I use yours. Thanks!

  6. 6. At 6:09 p.m. on 8 sep 2008, brian russo said:

    Sorry that should be

    def __getitem__(self, a):
        # Return the set that contains an item
        for set in list(self):
            for elem in set:
                if a in elem:
                    return set
    
  7. 7. At 1:07 a.m. on 9 sep 2008, brian russo said:

    Thanks for the class; saved me a bunch of work.

    For my use it was helpful to access this Group using subscript; i.e. g['a'] -> ['a', 'b', 'c']

    Code to implement this is:

    def __getitem__(self, a): # Return the set that contains an item for set in list(self): for elem in set: if a in elem: return set

Sign in to comment