Welcome, guest | Sign In | My Account | Store | Cart

Alternate version of OrderedSet (recipe 576694) using weak references instead of __del__.

Python, 78 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import collections
from weakref import proxy

class Link(object):
    __slots__ = 'prev', 'next', 'key', '__weakref__'

class OrderedSet(collections.MutableSet):
    'Set the remembers the order elements were added'
    # Big-O running times for all methods are the same as for regular sets.
    # The internal self.__map dictionary maps keys to links in a doubly linked list.
    # The circular doubly linked list starts and ends with a sentinel element.
    # The sentinel element never gets deleted (this simplifies the algorithm).
    # The prev/next links are weakref proxies (to prevent circular references).
    # Individual links are kept alive by the hard reference in self.__map.
    # Those hard references disappear when a key is deleted from an OrderedSet.

    def __init__(self, iterable=None):
        self.__root = root = Link()         # sentinel node for doubly linked list
        root.prev = root.next = root
        self.__map = {}                     # key --> link
        if iterable is not None:
            self |= iterable

    def __len__(self):
        return len(self.__map)

    def __contains__(self, key):
        return key in self.__map

    def add(self, key):
        # Store new key in a new link at the end of the linked list
        if key not in self.__map:
            self.__map[key] = link = Link()            
            root = self.__root
            last = root.prev
            link.prev, link.next, link.key = last, root, key
            last.next = root.prev = proxy(link)

    def discard(self, key):
        # Remove an existing item using self.__map to find the link which is
        # then removed by updating the links in the predecessor and successors.        
        if key in self.__map:        
            link = self.__map.pop(key)
            link.prev.next = link.next
            link.next.prev = link.prev

    def __iter__(self):
        # Traverse the linked list in order.
        root = self.__root
        curr = root.next
        while curr is not root:
            yield curr.key
            curr = curr.next

    def __reversed__(self):
        # Traverse the linked list in reverse order.
        root = self.__root
        curr = root.prev
        while curr is not root:
            yield curr.key
            curr = curr.prev

    def pop(self, last=True):
        if not self:
            raise KeyError('set is empty')
        key = next(reversed(self)) if last else next(iter(self))
        self.discard(key)
        return key

    def __repr__(self):
        if not self:
            return '%s()' % (self.__class__.__name__,)
        return '%s(%r)' % (self.__class__.__name__, list(self))

    def __eq__(self, other):
        if isinstance(other, OrderedSet):
            return len(self) == len(other) and list(self) == list(other)
        return not self.isdisjoint(other)

Prevents circular reference issues by using weak references between links.

Works on Python 2.4 or later.

7 comments

xtian 14 years, 4 months ago  # | flag

There's a bug in the last line of the __eq__ method. It means that OrderedSet([1, 2, 3]) compares equal to set([1, 2, 3, 4]) and set([1, 2]) because they overlap. One way to fix it is to make the last line

return all(item in self for item in other) and all(item in other for item in self)

It's a bit long-winded - is there a simpler way to implement the comparison for this case? I was thinking of using

return not bool(self ^ other)

but I'm put off by the thought that it could create a large set as the result of the xor (if the sets are large and very different) before throwing it away. At least the first one only loops through the two sets, stopping as soon as there's a difference. I'm conscious of the fact that the 'in' check will be expensive with a long list, though.

Paul Bohm 14 years ago  # | flag

I think

return len(other) == len(self) and not self.isdisjoint(other)

should do the job.

Dietmar Schabus 13 years, 8 months ago  # | flag

No, not self.isdisjoint(other) will only be true if the lists do not share a single element. So with Paul's version, OrderedSet([1, 2, 3]) is not equal to set([1, 2]) any more, but still to set([9, 8, 1]). Certainly not what is intended.

Dietmar Schabus 13 years, 8 months ago  # | flag

xtian's version gets it right for sets, but what about lists? Shouldn't OrderedSet([3,2,1]) == [1,2,3] be false, because the ordering matters for both the ordered set and the list?

To be safe, I am now using

def __eq__(self, other):
    if isinstance(other, OrderedSet):
        return len(self) == len(other) and list(self) == list(other)
    return False
Emil Wall 11 years, 8 months ago  # | flag

Thank you Raymond for a great piece of code!

I rewrote the __eq__ method body to the following:

return isinstance(other, OrderedSet) and list(self) == list(other)

This ensures that the relation is symmetric (except for classes that inherits from this class and overrides the __eq__ method, to be protected from that you need a can_equal method). Also, we get rid of the redundant len(self) == len(other) check and it fits on one line while still being readable (IMO).

I also added some documentation that might be helpful. I'm sorry that the formatting got all messed up...

class OrderedSet
"""A set of unique items that preserves the insertion order.

Created by: Raymond Hettinger
Version: 2009-03-19
Licence: http://opensource.org/licenses/MIT
Source: http://code.activestate.com/recipes/576694/

An ordered set is implemented as a wrapper class for a dictionary
implementing a doubly linked list. It also has a pointer to the last item
in the set (self.end) which is used by the add and _iterate methods to add 
items to the end of the list and to know when an iteration has finished,
respectively."""

__init__
    """Creates an ordered set.
    iterable -- A mutable iterable, typically a list or a set, containing
                initial values of the set. If the default value (None) is
                used, the set is initialized as empty."""    

__len__
    """Returns the number of items in this set."""

__contains__
    """Returns true if item is in this set; false otherwise."""

add
    """Adds an item to the end of this set."""

discard
    """Finds and discards an item from this set, amortized O(1) time."""

_iterate
    """Internal generator method to iterate through this set.
    iter_index -- If 1, the set is iterated in reverse order. If 2, the set
                  is iterated in order of insertion. Else IndexError."""

__iter__
    """Returns a generator object for iterating the set in the same order
    as its items were added."""

__reversed__
    """Returns a generator object for iterating the set, beginning with the
    most recently added item and ending with the first/oldest item."""

pop
    """Discards the first or last item of this set.
    last -- If True the last item is discarded, otherwise the first."""

__repr__
    """Returns a string representing this set. If empty, the string
    returned is 'OrderedSet()', otherwise if the set contains items a, b
    and c: 'OrderedSet([a, b, c])'"""

__eq__
    """Returns True if other is an OrderedSet containing the same items as
    other, in the same order."""

__del__
    """Destructor, clears self to avoid circular reference which could
    otherwise occur due to the doubly linked list."""
Chris Smith 11 years, 6 months ago  # | flag

MutableSet is not part of collections in Python 2.4 (it was added in 2.7?) and updating a set with |= will fail if the rhs is a list. So __init__ can use .update(iterable) instead.

Chris Smith 11 years, 5 months ago  # | flag

For that __eq__ test how about,

if not isinstance(other, OrderedSet):
    return False
for a, b in zip(self, other):
    if a != b:
        return False
return True

This avoids creating the entire lists.