Set that remembers original insertion order.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | import collections
class OrderedSet(collections.MutableSet):
def __init__(self, iterable=None):
self.end = end = []
end += [None, end, end] # sentinel node for doubly linked list
self.map = {} # key --> [key, prev, next]
if iterable is not None:
self |= iterable
def __len__(self):
return len(self.map)
def __contains__(self, key):
return key in self.map
def add(self, key):
if key not in self.map:
end = self.end
curr = end[1]
curr[2] = end[1] = self.map[key] = [key, curr, end]
def discard(self, key):
if key in self.map:
key, prev, next = self.map.pop(key)
prev[2] = next
next[1] = prev
def __iter__(self):
end = self.end
curr = end[2]
while curr is not end:
yield curr[0]
curr = curr[2]
def __reversed__(self):
end = self.end
curr = end[1]
while curr is not end:
yield curr[0]
curr = curr[1]
def pop(self, last=True):
if not self:
raise KeyError('set is empty')
key = self.end[1][0] if last else self.end[2][0]
self.discard(key)
return key
def __repr__(self):
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, list(self))
def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return set(self) == set(other)
if __name__ == '__main__':
s = OrderedSet('abracadaba')
t = OrderedSet('simsalabim')
print(s | t)
print(s & t)
print(s - t)
|
Runs on Py2.6 or later (and runs on 3.0 or later without any modifications).
Implementation based on a doubly linked link and an internal dictionary. This design gives OrderedSet the same big-Oh running times as regular sets including O(1) adds, removes, and lookups as well as O(n) iteration.
In "__init__", iterable is checked to be "not None" before calling "update", which seems redundant because "update" does the same check before doing anything.
Although this recipe is far superior in that it actually is a set, my older recipe doesn't require the cyclic GC to be collected. http://code.activestate.com/recipes/528878/ It'd be fairly easy to combine the two.
Adam, this recipe doesn't use or rely on GC. Individual links get deleted immediately when discard() is called. Likewise, when the whole set is no longer needed, the __del__ method neatly deletes clears all of the links without waiting for GC.
Also, FWIW, there is an alternate version of this recipe using weakrefs. See recipe 576696. Neither version needs GC.
Hello Raymond,
I'm planing to use this module in LDTPv2 - http://ldtp.freedesktop.org under LGPLv2 license. If you have any objections, please write me to nagappan (at) gmail (dot) com. I have just mentioned your name in the file. I could not find your email id to specify in orderedset.py (http://cgit.freedesktop.org/ldtp/ldtp2/tree/ldtpd/orderedset.py). You can email me that as well, if you are interested with my proposal :)
You have done a wonderful job.
Thanks Nagappan
The __eq__ operator for comparison with a non-OrderedSet returns
but this returns False when comparing empty sets. I think the following should return True, not False:
I used this OrderedSet in a chain of functions processing lists. I soon found it was bogging down the calculations. I studied the code and find it is overloaded with Python features. For my purpose, the following simpler function runs 4 times faster.
def OrderedSet(alist):
I've create a python package for this recipe and upload to Pypi as 'oset'. Code on http://gitorious.org/sleipnir/python-oset. It also works on python 2.5.
@Raymond: If you are interested in maintain this, it's all yours :)
Hello Raymond,
I created an MIT Licensed project that is using your OrderedSet recipe. I've added the required MIT License comment at the beginning: http://pypsifas.hg.sourceforge.net/hgweb/pypsifas/pypsifas/file/7bfaf2ed308c/psifas/datatypes/ordered_set.py
If you have any problem with that, please let me know.
Oren Zomer oren.zomer@gmail.com
I experienced memory problems (objects were not garbage-collected) when using this recipe, but the one based on weakref worked better.
The __del__ method is not guaranteed to be called when the set is not needed, it will only be called during the garbage collection phase, if the set is ever garbage-collected. Moreover, reference cycles can actually be made worse by the addition of a __del__ method: http://docs.python.org/2/library/gc.html#gc.garbage
For the record, removing the __del__ method actually got rid of the memory leak I was experiencing, at least on Python 2.7.
You may also want to check out the SortedContainers package. It's a pure-Python and fast implementation of SortedList, SortedDict, and SortedSet data types. "Sorted" isn't the same as "ordered" but it's sometimes what you actually want. There's also extensive performance benchmarks with comparisons.