Welcome, guest | Sign In | My Account | Store | Cart

Alternate version of OrderedSet (recipe 576694) using weak references instead of __del__.

Python, 78 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78``` ```import collections from weakref import proxy class Link(object): __slots__ = 'prev', 'next', 'key', '__weakref__' class OrderedSet(collections.MutableSet): 'Set the remembers the order elements were added' # Big-O running times for all methods are the same as for regular sets. # The internal self.__map dictionary maps keys to links in a doubly linked list. # The circular doubly linked list starts and ends with a sentinel element. # The sentinel element never gets deleted (this simplifies the algorithm). # The prev/next links are weakref proxies (to prevent circular references). # Individual links are kept alive by the hard reference in self.__map. # Those hard references disappear when a key is deleted from an OrderedSet. def __init__(self, iterable=None): self.__root = root = Link() # sentinel node for doubly linked list root.prev = root.next = root self.__map = {} # key --> link if iterable is not None: self |= iterable def __len__(self): return len(self.__map) def __contains__(self, key): return key in self.__map def add(self, key): # Store new key in a new link at the end of the linked list if key not in self.__map: self.__map[key] = link = Link() root = self.__root last = root.prev link.prev, link.next, link.key = last, root, key last.next = root.prev = proxy(link) def discard(self, key): # Remove an existing item using self.__map to find the link which is # then removed by updating the links in the predecessor and successors. if key in self.__map: link = self.__map.pop(key) link.prev.next = link.next link.next.prev = link.prev def __iter__(self): # Traverse the linked list in order. root = self.__root curr = root.next while curr is not root: yield curr.key curr = curr.next def __reversed__(self): # Traverse the linked list in reverse order. root = self.__root curr = root.prev while curr is not root: yield curr.key curr = curr.prev def pop(self, last=True): if not self: raise KeyError('set is empty') key = next(reversed(self)) if last else next(iter(self)) self.discard(key) return key def __repr__(self): if not self: return '%s()' % (self.__class__.__name__,) return '%s(%r)' % (self.__class__.__name__, list(self)) def __eq__(self, other): if isinstance(other, OrderedSet): return len(self) == len(other) and list(self) == list(other) return not self.isdisjoint(other) ```

Prevents circular reference issues by using weak references between links.

Works on Python 2.4 or later.

xtian 11 years, 11 months ago

There's a bug in the last line of the __eq__ method. It means that OrderedSet([1, 2, 3]) compares equal to set([1, 2, 3, 4]) and set([1, 2]) because they overlap. One way to fix it is to make the last line

``````return all(item in self for item in other) and all(item in other for item in self)
``````

It's a bit long-winded - is there a simpler way to implement the comparison for this case? I was thinking of using

``````return not bool(self ^ other)
``````

but I'm put off by the thought that it could create a large set as the result of the xor (if the sets are large and very different) before throwing it away. At least the first one only loops through the two sets, stopping as soon as there's a difference. I'm conscious of the fact that the 'in' check will be expensive with a long list, though.

Paul Bohm 11 years, 8 months ago

I think

``````return len(other) == len(self) and not self.isdisjoint(other)
``````

should do the job.

Dietmar Schabus 11 years, 4 months ago

No, `not self.isdisjoint(other)` will only be true if the lists do not share a single element. So with Paul's version, OrderedSet([1, 2, 3]) is not equal to set([1, 2]) any more, but still to set([9, 8, 1]). Certainly not what is intended.

Dietmar Schabus 11 years, 4 months ago

xtian's version gets it right for sets, but what about lists? Shouldn't `OrderedSet([3,2,1]) == [1,2,3]` be false, because the ordering matters for both the ordered set and the list?

To be safe, I am now using

``````def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return False
``````
Emil Wall 9 years, 4 months ago

Thank you Raymond for a great piece of code!

I rewrote the __eq__ method body to the following:

`return isinstance(other, OrderedSet) and list(self) == list(other)`

This ensures that the relation is symmetric (except for classes that inherits from this class and overrides the __eq__ method, to be protected from that you need a can_equal method). Also, we get rid of the redundant `len(self) == len(other)` check and it fits on one line while still being readable (IMO).

I also added some documentation that might be helpful. I'm sorry that the formatting got all messed up...

``````class OrderedSet
"""A set of unique items that preserves the insertion order.

Created by: Raymond Hettinger
Version: 2009-03-19
Source: http://code.activestate.com/recipes/576694/

An ordered set is implemented as a wrapper class for a dictionary
implementing a doubly linked list. It also has a pointer to the last item
in the set (self.end) which is used by the add and _iterate methods to add
items to the end of the list and to know when an iteration has finished,
respectively."""

__init__
"""Creates an ordered set.
iterable -- A mutable iterable, typically a list or a set, containing
initial values of the set. If the default value (None) is
used, the set is initialized as empty."""

__len__
"""Returns the number of items in this set."""

__contains__
"""Returns true if item is in this set; false otherwise."""

"""Adds an item to the end of this set."""

"""Finds and discards an item from this set, amortized O(1) time."""

_iterate
"""Internal generator method to iterate through this set.
iter_index -- If 1, the set is iterated in reverse order. If 2, the set
is iterated in order of insertion. Else IndexError."""

__iter__
"""Returns a generator object for iterating the set in the same order

__reversed__
"""Returns a generator object for iterating the set, beginning with the
most recently added item and ending with the first/oldest item."""

pop
"""Discards the first or last item of this set.
last -- If True the last item is discarded, otherwise the first."""

__repr__
"""Returns a string representing this set. If empty, the string
returned is 'OrderedSet()', otherwise if the set contains items a, b
and c: 'OrderedSet([a, b, c])'"""

__eq__
"""Returns True if other is an OrderedSet containing the same items as
other, in the same order."""

__del__
"""Destructor, clears self to avoid circular reference which could
otherwise occur due to the doubly linked list."""
``````
Chris Smith 9 years, 2 months ago

MutableSet is not part of collections in Python 2.4 (it was added in 2.7?) and updating a set with `|=` will fail if the rhs is a list. So `__init__` can use `.update(iterable)` instead.

Chris Smith 9 years, 1 month ago

For that `__eq__` test how about,

``````if not isinstance(other, OrderedSet):
return False
for a, b in zip(self, other):
if a != b:
return False
return True
``````

This avoids creating the entire lists.

 Created by Raymond Hettinger on Fri, 20 Mar 2009 (MIT)