Set that remembers original insertion order.
| Python |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | import collections
KEY, PREV, NEXT = range(3)
class OrderedSet(collections.MutableSet):
def __init__(self, iterable=None):
self.end = end = []
end += [None, end, end] # sentinel node for doubly linked list
self.map = {} # key --> [key, prev, next]
if iterable is not None:
self |= iterable
def __len__(self):
return len(self.map)
def __contains__(self, key):
return key in self.map
def add(self, key):
if key not in self.map:
end = self.end
curr = end[PREV]
curr[NEXT] = end[PREV] = self.map[key] = [key, curr, end]
def discard(self, key):
if key in self.map:
key, prev, next = self.map.pop(key)
prev[NEXT] = next
next[PREV] = prev
def __iter__(self):
end = self.end
curr = end[NEXT]
while curr is not end:
yield curr[KEY]
curr = curr[NEXT]
def __reversed__(self):
end = self.end
curr = end[PREV]
while curr is not end:
yield curr[KEY]
curr = curr[PREV]
def pop(self, last=True):
if not self:
raise KeyError('set is empty')
key = next(reversed(self)) if last else next(iter(self))
self.discard(key)
return key
def __repr__(self):
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, list(self))
def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return not self.isdisjoint(other)
def __del__(self):
self.clear() # remove circular references
if __name__ == '__main__':
print(OrderedSet('abracadaba'))
print(OrderedSet('simsalabim'))
|
Discussion
Runs on Py2.6 or later (and runs on 3.0 or later without any modifications).
Implementation based on a doubly linked link and an internal dictionary. This design gives OrderedSet the same big-Oh running times as regular sets including O(1) adds, removes, and lookups as well as O(n) iteration.


Comments
In "__init__", iterable is checked to be "not None" before calling "update", which seems redundant because "update" does the same check before doing anything.
Although this recipe is far superior in that it actually is a set, my older recipe doesn't require the cyclic GC to be collected. http://code.activestate.com/recipes/528878/ It'd be fairly easy to combine the two.
Adam, this recipe doesn't use or rely on GC. Individual links get deleted immediately when discard() is called. Likewise, when the whole set is no longer needed, the __del__ method neatly deletes clears all of the links without waiting for GC.
Also, FWIW, there is an alternate version of this recipe using weakrefs. See recipe 576696. Neither version needs GC.
Hello Raymond,
I'm planing to use this module in LDTPv2 - http://ldtp.freedesktop.org under LGPLv2 license. If you have any objections, please write me to nagappan (at) gmail (dot) com. I have just mentioned your name in the file. I could not find your email id to specify in orderedset.py (http://cgit.freedesktop.org/ldtp/ldtp2/tree/ldtpd/orderedset.py). You can email me that as well, if you are interested with my proposal :)
You have done a wonderful job.
Thanks Nagappan
Sign in to comment