Welcome, guest | Sign In | My Account | Store | Cart

Alternate version of OrderedSet (recipe 576694) using weak references instead of __del__.

Python, 78 lines

Prevents circular reference issues by using weak references between links.

Works on Python 2.4 or later.

xtian 13 years, 5 months ago

There's a bug in the last line of the __eq__ method. It means that OrderedSet([1, 2, 3]) compares equal to set([1, 2, 3, 4]) and set([1, 2]) because they overlap. One way to fix it is to make the last line

return all(item in self for item in other) and all(item in other for item in self)

It's a bit long-winded - is there a simpler way to implement the comparison for this case? I was thinking of using

return not bool(self ^ other)

but I'm put off by the thought that it could create a large set as the result of the xor (if the sets are large and very different) before throwing it away. At least the first one only loops through the two sets, stopping as soon as there's a difference. I'm conscious of the fact that the 'in' check will be expensive with a long list, though.

Paul Bohm 13 years, 2 months ago

I think

return len(other) == len(self) and not self.isdisjoint(other)

should do the job.

Dietmar Schabus 12 years, 10 months ago

No, not self.isdisjoint(other) will only be true if the lists do not share a single element. So with Paul's version, OrderedSet([1, 2, 3]) is not equal to set([1, 2]) any more, but still to set([9, 8, 1]). Certainly not what is intended.

Dietmar Schabus 12 years, 10 months ago

xtian's version gets it right for sets, but what about lists? Shouldn't OrderedSet([3,2,1]) == [1,2,3] be false, because the ordering matters for both the ordered set and the list?

To be safe, I am now using

def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return False
Emil Wall 10 years, 10 months ago

Thank you Raymond for a great piece of code!

I rewrote the __eq__ method body to the following:

return isinstance(other, OrderedSet) and list(self) == list(other)

This ensures that the relation is symmetric (except for classes that inherits from this class and overrides the __eq__ method, to be protected from that you need a can_equal method). Also, we get rid of the redundant len(self) == len(other) check and it fits on one line while still being readable (IMO).

I also added some documentation that might be helpful. I'm sorry that the formatting got all messed up...

class OrderedSet
"""A set of unique items that preserves the insertion order.

Created by: Raymond Hettinger
Version: 2009-03-19
Source: http://code.activestate.com/recipes/576694/

An ordered set is implemented as a wrapper class for a dictionary
implementing a doubly linked list. It also has a pointer to the last item
in the set (self.end) which is used by the add and _iterate methods to add
items to the end of the list and to know when an iteration has finished,
respectively."""

__init__
"""Creates an ordered set.
iterable -- A mutable iterable, typically a list or a set, containing
initial values of the set. If the default value (None) is
used, the set is initialized as empty."""

__len__
"""Returns the number of items in this set."""

__contains__
"""Returns true if item is in this set; false otherwise."""

"""Adds an item to the end of this set."""

"""Finds and discards an item from this set, amortized O(1) time."""

_iterate
"""Internal generator method to iterate through this set.
iter_index -- If 1, the set is iterated in reverse order. If 2, the set
is iterated in order of insertion. Else IndexError."""

__iter__
"""Returns a generator object for iterating the set in the same order

__reversed__
"""Returns a generator object for iterating the set, beginning with the
most recently added item and ending with the first/oldest item."""

pop
"""Discards the first or last item of this set.
last -- If True the last item is discarded, otherwise the first."""

__repr__
"""Returns a string representing this set. If empty, the string
returned is 'OrderedSet()', otherwise if the set contains items a, b
and c: 'OrderedSet([a, b, c])'"""

__eq__
"""Returns True if other is an OrderedSet containing the same items as
other, in the same order."""

__del__
"""Destructor, clears self to avoid circular reference which could
otherwise occur due to the doubly linked list."""
Chris Smith 10 years, 8 months ago

MutableSet is not part of collections in Python 2.4 (it was added in 2.7?) and updating a set with |= will fail if the rhs is a list. So __init__ can use .update(iterable) instead.

Chris Smith 10 years, 7 months ago

For that __eq__ test how about,

if not isinstance(other, OrderedSet):
return False
for a, b in zip(self, other):
if a != b:
return False
return True

This avoids creating the entire lists.

 Created by Raymond Hettinger on Fri, 20 Mar 2009 (MIT)