Tracks the additions and deletions to a set during update operations. Useful for large sets with few modifications.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | from itertools import chain, ifilterfalse
class TransactionSet(object):
def __init__(self, self.master):
self.master = set(master)
self.deleted = set()
self.added = set()
def check_invariants(self):
assert self.deleted <= self.master # deleted is a subset of master
assert not (self.added & self.master) # added is disjoint from master
def __len__(self):
return len(self.master) - len(self.deleted) + len(self.added)
def __iter__(self):
return chain(self.added, ifilterfalse(self.deleted.__contains__, self.master))
def __contains__(self, key):
s = frozenset([key])
return not not (s & self.master and not s & self.deleted or s & self.added)
def add(self, key):
s = frozenset([key])
self.deleted -= s
self.added |= s
def discard(self, key):
s = frozenset([key])
if s & self.master:
self.deleted |= s
else:
self.added -= s
def remove(self, key):
s = frozenset([key])
if s & self.master:
self.deleted |= s
elif s & self.added:
self.added -= s
else:
raise KeyError(key)
def pop(self):
if self.added:
return self.added.pop()
for elem in ifilterfalse(self.deleted.__contains__, self.master):
self.deleted.add(elem)
return elem
raise KeyError(key)
def intersection(self, other):
other = frozenset(other)
return ((self.master & other) - self.deleted) | (self.added & other)
def _Set(self):
s = self.master - self.deleted
s |= self.added
return s
def union(self, other):
s = _Set(self)
s.update(other)
return s
def difference(self, other):
s = _Set(self)
s.difference_update(other)
return s
def symmetric_difference(self, other):
s = _Set(self)
s.symmetric_diffence_update(other)
return s
def update(self, other):
other = frozenset(other)
self.deleted -= other
self.added += other - self.master
def intersection_update(self, other):
other = frozenset(other)
self.deleted |= self.master - other
self.added &= other
def difference_update(self, other):
other = frozenset(other)
self.deleted |= self.master & other
self.added -= other
def symmetric_difference_update(self, other):
master_and_other = self.master.intersection(other)
self.deleted |= master_and_other
self.added ^= other - master_and_other
def issubset(self, other):
return len(self)<=len(other) and not (other & self.deleted) and master.issubset(other - self.added)
def issuperset(self, other):
return len(self)>=len(other) and not (other & self.deleted) and master.issuperset(other - self.added)
|
This recipe arose from an idea suggested by Alex Martelli on comp.lang.python: http://groups.google.com/group/comp.lang.python/msg/5499a3fc790b5868
The code above is a first crack at an efficient implementation. It still needs more testing, better comments, and to have the rest of the API filled-out (expect an update before too long).
The performance is optimized for the case where the master set is large and the added/deleted sets are small. The implementation makes sure that all looping is done by underlying C-code and not with a slower pure-Python for-loop.
Some pains are also taken to make sure that a set element's underlying hash function doesn't get called more than once per method call. That is why the __contains__ method wraps the key in a frozenset before doing the searches (building a new set triggers a hash call but set-to-set operations do not).
"not not"? The code for the 'contains' method:
Is 'not not' a faster form of 'bool(X)'?
P.S. The comment formatting on the Python Cookbook is atrocious. I would personally be embarrassed, but I am probably not aware of the time/resource constraints the maintainer is under.
wrapping the key in a "frozenset", purpose? Does wrapping the key in a "frozenset" mean that:
1) the "hash" for the key is computed
2) as long as the frozenset is in the local scope, the hash for the key (already computed) is available as does not need to be computed again?
My main question is, is the critical thing that the frozenset remains in the local scope?
Doing the hash only once. Writing s=frozenset([k]) forces once call to k.__hash__(). Subsequent set operations on the frozenset will re-use the hash value and will avoid making more calls to the hash function (for example, computing master.intersection(s) will not call k.__hash__()). The ability of sets to remember hash values is intrinsic and has nothing to do with local scope.
set-to-set operations do not re-trigger a hash call.
Thank you for making this clear. I will try to keep this in mind in my own code.