Welcome, guest | Sign In | My Account | Store | Cart

Tracks the additions and deletions to a set during update operations. Useful for large sets with few modifications.

Python, 101 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
from itertools import chain, ifilterfalse

class TransactionSet(object):

    def __init__(self, self.master):
        self.master = set(master)
        self.deleted = set()
        self.added = set()

    def check_invariants(self):
        assert self.deleted <= self.master     # deleted is a subset of master       
        assert not (self.added & self.master)  # added is disjoint from master

    def __len__(self):
        return len(self.master) - len(self.deleted) + len(self.added)

    def __iter__(self):
        return chain(self.added, ifilterfalse(self.deleted.__contains__, self.master))

    def __contains__(self, key):
        s = frozenset([key])
        return not not (s & self.master and not s & self.deleted or s & self.added)

    def add(self, key):
        s = frozenset([key])
        self.deleted -= s
        self.added |= s

    def discard(self, key):
        s = frozenset([key])
        if s & self.master:
            self.deleted |= s
        else:
            self.added -= s

    def remove(self, key):
        s = frozenset([key])
        if s & self.master:
            self.deleted |= s
        elif s & self.added:
            self.added -= s
        else:
            raise KeyError(key)

    def pop(self):
        if self.added:
            return self.added.pop()
        for elem in ifilterfalse(self.deleted.__contains__, self.master):
            self.deleted.add(elem)
            return elem
        raise KeyError(key)

    def intersection(self, other):
        other = frozenset(other)
        return ((self.master & other) - self.deleted) | (self.added & other)

    def _Set(self):
        s = self.master - self.deleted
        s |= self.added
        return s

    def union(self, other):
        s = _Set(self)
        s.update(other)
        return s

    def difference(self, other):
        s = _Set(self)
        s.difference_update(other)
        return s

    def symmetric_difference(self, other):
        s = _Set(self)
        s.symmetric_diffence_update(other)
        return s

    def update(self, other):
        other = frozenset(other)
        self.deleted -= other
        self.added += other - self.master

    def intersection_update(self, other):
        other = frozenset(other)
        self.deleted |= self.master - other
        self.added &= other

    def difference_update(self, other):
        other = frozenset(other)
        self.deleted |= self.master & other
        self.added -= other

    def symmetric_difference_update(self, other):
        master_and_other = self.master.intersection(other)
        self.deleted |= master_and_other
        self.added ^= other - master_and_other

    def issubset(self, other):
        return len(self)<=len(other) and not (other & self.deleted) and master.issubset(other - self.added)

    def issuperset(self, other):
        return len(self)>=len(other) and not (other & self.deleted) and master.issuperset(other - self.added)

This recipe arose from an idea suggested by Alex Martelli on comp.lang.python: http://groups.google.com/group/comp.lang.python/msg/5499a3fc790b5868

The code above is a first crack at an efficient implementation. It still needs more testing, better comments, and to have the rest of the API filled-out (expect an update before too long).

The performance is optimized for the case where the master set is large and the added/deleted sets are small. The implementation makes sure that all looping is done by underlying C-code and not with a slower pure-Python for-loop.

Some pains are also taken to make sure that a set element's underlying hash function doesn't get called more than once per method call. That is why the __contains__ method wraps the key in a frozenset before doing the searches (building a new set triggers a hash call but set-to-set operations do not).

4 comments

Manuel Garcia 16 years, 12 months ago  # | flag

"not not"? The code for the 'contains' method:

def __contains__(self, key):
    s = frozenset([key])
    return not not (key AND self.master | key AND self.added)

Is 'not not' a faster form of 'bool(X)'?

P.S. The comment formatting on the Python Cookbook is atrocious. I would personally be embarrassed, but I am probably not aware of the time/resource constraints the maintainer is under.

Manuel Garcia 16 years, 12 months ago  # | flag

wrapping the key in a "frozenset", purpose? Does wrapping the key in a "frozenset" mean that:

1) the "hash" for the key is computed

2) as long as the frozenset is in the local scope, the hash for the key (already computed) is available as does not need to be computed again?

My main question is, is the critical thing that the frozenset remains in the local scope?

Raymond Hettinger (author) 16 years, 11 months ago  # | flag

Doing the hash only once. Writing s=frozenset([k]) forces once call to k.__hash__(). Subsequent set operations on the frozenset will re-use the hash value and will avoid making more calls to the hash function (for example, computing master.intersection(s) will not call k.__hash__()). The ability of sets to remember hash values is intrinsic and has nothing to do with local scope.

Manuel Garcia 16 years, 11 months ago  # | flag

set-to-set operations do not re-trigger a hash call.

> (building a new set triggers a hash call but set-to-set operations do not)

Thank you for making this clear. I will try to keep this in mind in my own code.

Created by Raymond Hettinger on Thu, 26 Apr 2007 (PSF)
Python recipes (4591)
Raymond Hettinger's recipes (97)

Required Modules

  • (none specified)

Other Information and Tasks