ActiveState Code

Recipe 576694: OrderedSet


Set that remembers original insertion order.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import collections

KEY, PREV, NEXT = range(3)

class OrderedSet(collections.MutableSet):

    def __init__(self, iterable=None):
        self.end = end = [] 
        end += [None, end, end]         # sentinel node for doubly linked list
        self.map = {}                   # key --> [key, prev, next]
        if iterable is not None:
            self |= iterable

    def __len__(self):
        return len(self.map)

    def __contains__(self, key):
        return key in self.map

    def add(self, key):
        if key not in self.map:
            end = self.end
            curr = end[PREV]
            curr[NEXT] = end[PREV] = self.map[key] = [key, curr, end]

    def discard(self, key):
        if key in self.map:        
            key, prev, next = self.map.pop(key)
            prev[NEXT] = next
            next[PREV] = prev

    def __iter__(self):
        end = self.end
        curr = end[NEXT]
        while curr is not end:
            yield curr[KEY]
            curr = curr[NEXT]

    def __reversed__(self):
        end = self.end
        curr = end[PREV]
        while curr is not end:
            yield curr[KEY]
            curr = curr[PREV]

    def pop(self, last=True):
        if not self:
            raise KeyError('set is empty')
        key = next(reversed(self)) if last else next(iter(self))
        self.discard(key)
        return key

    def __repr__(self):
        if not self:
            return '%s()' % (self.__class__.__name__,)
        return '%s(%r)' % (self.__class__.__name__, list(self))

    def __eq__(self, other):
        if isinstance(other, OrderedSet):
            return len(self) == len(other) and list(self) == list(other)
        return not self.isdisjoint(other)

    def __del__(self):
        self.clear()                    # remove circular references

            
if __name__ == '__main__':
    print(OrderedSet('abracadaba'))
    print(OrderedSet('simsalabim'))

Discussion

Runs on Py2.6 or later (and runs on 3.0 or later without any modifications).

Implementation based on a doubly linked link and an internal dictionary. This design gives OrderedSet the same big-Oh running times as regular sets including O(1) adds, removes, and lookups as well as O(n) iteration.

Comments

  1. 1. At 10:13 a.m. on 20 mar 2009, Andrew Henshaw said:

    In "__init__", iterable is checked to be "not None" before calling "update", which seems redundant because "update" does the same check before doing anything.

  2. 2. At 7:32 p.m. on 8 apr 2009, Adam Olsen said:

    Although this recipe is far superior in that it actually is a set, my older recipe doesn't require the cyclic GC to be collected. http://code.activestate.com/recipes/528878/ It'd be fairly easy to combine the two.

  3. 3. At 5:22 p.m. on 11 apr 2009, Raymond Hettinger (the author) said:

    Adam, this recipe doesn't use or rely on GC. Individual links get deleted immediately when discard() is called. Likewise, when the whole set is no longer needed, the __del__ method neatly deletes clears all of the links without waiting for GC.

    Also, FWIW, there is an alternate version of this recipe using weakrefs. See recipe 576696. Neither version needs GC.

  4. 4. At 12:53 p.m. on 23 sep 2009, Nagappan Alagappan said:

    Hello Raymond,

    I'm planing to use this module in LDTPv2 - http://ldtp.freedesktop.org under LGPLv2 license. If you have any objections, please write me to nagappan (at) gmail (dot) com. I have just mentioned your name in the file. I could not find your email id to specify in orderedset.py (http://cgit.freedesktop.org/ldtp/ldtp2/tree/ldtpd/orderedset.py). You can email me that as well, if you are interested with my proposal :)

    You have done a wonderful job.

    Thanks Nagappan

Sign in to comment