Welcome, guest | Sign In | My Account | Store | Cart

Set that remembers original insertion order.

Python, 71 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import collections

class OrderedSet(collections.MutableSet):
    
    KEY, PREV, NEXT = range(3)

    def __init__(self, iterable=None):
        self.end = end = [] 
        end += [None, end, end]         # sentinel node for doubly linked list
        self.map = {}                   # key --> [key, prev, next]
        if iterable is not None:
            self |= iterable

    def __len__(self):
        return len(self.map)

    def __contains__(self, key):
        return key in self.map

    def add(self, key):
        if key not in self.map:
            end = self.end
            curr = end[self.PREV]
            curr[self.NEXT] = end[self.PREV] = self.map[key] = [key, curr, end]

    def discard(self, key):
        if key in self.map:        
            key, prev, next = self.map.pop(key)
            prev[self.NEXT] = next
            next[self.PREV] = prev

    def __iter__(self):
        end = self.end
        curr = end[self.NEXT]
        while curr is not end:
            yield curr[self.KEY]
            curr = curr[self.NEXT]

    def __reversed__(self):
        end = self.end
        curr = end[self.PREV]
        while curr is not end:
            yield curr[self.KEY]
            curr = curr[self.PREV]

    def pop(self, last=True):
        if not self:
            raise KeyError('set is empty')
        key = next(reversed(self)) if last else next(iter(self))
        self.discard(key)
        return key

    def __repr__(self):
        if not self:
            return '%s()' % (self.__class__.__name__,)
        return '%s(%r)' % (self.__class__.__name__, list(self))

    def __eq__(self, other):
        if isinstance(other, OrderedSet):
            return len(self) == len(other) and list(self) == list(other)
        return set(self) == set(other)

    def __del__(self):
        self.clear()                    # remove circular references


if __name__ == '__main__':
    print(OrderedSet('abracadaba'))
    print(OrderedSet('simsalabim'))
    x = OrderedSet('abracadaba')
    # doesn't raise "Exception TypeError: TypeError('list indices must be integers, not NoneType',) in ignored"

Runs on Py2.6 or later (and runs on 3.0 or later without any modifications).

Implementation based on a doubly linked link and an internal dictionary. This design gives OrderedSet the same big-Oh running times as regular sets including O(1) adds, removes, and lookups as well as O(n) iteration.

Edit #1: moved constants inside class to avoid raising TypeError noticed by j vickroy

4 comments

j vickroy 12 years, 8 months ago  # | flag

Thanks for the code.

Adding an assignment statement to the main program:

x = OrderedSet('abracadaba')

seems to trigger an exception:

Exception TypeError: TypeError('list indices must be integers, not NoneType',) in ignored

Any idea what my mistake is?

I'm running Python 2.6.4 on a MS Windows XP machine.

Jonathan Blakes (author) 12 years, 8 months ago  # | flag

Hi j vickroy,

This recipe is a fork of recipe 576694 by Raymond Hettinger, he would know better than I. However, it seems that the error occurs when the module finishes rather than at the point you assign to x. Put a "print 'got here'" at the very end to see what I mean. I tracked the problem down to the "KEY, PREV, NEXT = range(3)" declared outside the OrderedSet class. Something is referencing one of these integers after the module has been destroyed and they have been set to None. Moving them inside the class and using "self.PREV", etc. fixes the problem. The reason you don't see this when importing and using OrderedSet is that the module gets destroyed after any of your modules that import it.

Jonathan Blakes (author) 12 years, 8 months ago  # | flag

I have updated the recipe according to the previous comment. See diff here

j vickroy 12 years, 8 months ago  # | flag

Thanks much Jonathan for your quick and detailed response and for taking the time to further diagnose this. I like your solution which seems cleaner and safer to me.

Created by Jonathan Blakes on Thu, 24 Mar 2011 (MIT)
Python recipes (4591)
Jonathan Blakes's recipes (2)

Required Modules

Other Information and Tasks