Welcome, guest | Sign In | My Account | Store | Cart

Some simple techniques for working with long integers as bit arrays. Python support for long integers allows fast bitwise operations on large bit arrays. Bitwise operations on 100 million bit arrays happen in the blink of an eye. They are also fast and compact when saving and loading to disk. Unfortunately bit set, get, append, pop etc are slow so another bit array technique may be preferred.

Python, 75 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
''' Bit Array - Bit array computing with long integers.
Good for around 1,000,000 bit arrays but gets slow with a billion bits...
Mike Sweeney. June 2013.

Positives:
Works even with old versions of Python.
Only pure Python needed - no third party modules.
Binary operations such as & | ^ << and >> are fast.
Store and load are fast and compact.

Negatives:
Longs are passed by value so no in-place operations are possible.
Array changing operations are slow on large arrays.

Timing of one AND operation on bit arrays with Python 2.7 on AMD 3.8 GHz 4 core:
1,000,000 bit         15 us
10,000,000 bit       320 us
100,000,000 bit      6.2 ms
1,000,000,000 bit     86 ms    (125MB bit array objects)
'''

# Constructors for bit arrays:
bitarray = long
def frombin(s): return int(s, 2)

# Operations on bit arrays of the same size:
def issubset(a, b): return a & b == a
def subtract(a, b): return a & ~b
# use & | ^ and ~ for other bitarray operations

# Change a bit value in a bit array:
def setbit(a, n): return a | (1<<n)
def unsetbit(a, n): return a & ~(1<<n)
def toggle(a, n): return a ^ (1<<n)

# Append and pop will add or remove a bit from the end of the array:
def append(a, v): return (a<<1) | v
def pop(a):
    v = (a & 1)
    new_a = (a >> 1)
    return new_a, v

# Query a bitarray:
def getbit(a, n): return a & (1<<n)
def countbits(a): return bin(a).count('1')
def activebits(a):
    s=bin(a)[2:][::-1]
    return [i for i,d in enumerate(s) if d == '1']
def tostring(a, w=0): return bin(a)[2:].zfill(w)

# Convert bit arrays for storage in files or databases:
import cPickle
def store(a): return cPickle.dumps(a, -1)
def load(data): return cPickle.loads(data)


def _analyticsdemo():
    USERS = 1000000
    from random import randint

    def randarray(n,arr=0):
        for i in xrange(n):
            arr = setbit(arr, randint(0, USERS-1))
        return arr

    admin = randarray(1000)
    upload24hr = randarray(1000)
    badpwd7days = randarray(10000)
    member2012 = randarray(100000)

    maybehacker = badpwd7days & upload24hr & ~admin & ~member2012
    print 'Possible hacker users:', activebits(maybehacker)

if __name__ == '__main__':
    _analyticsdemo()

As Python long integers (and therefore these bit arrays) cannot be changed in place, many operations return a new value after a change. This also means that you cannot keep a reference to a bit array, as the reference will not be updated when the bit array is changed, so this technique will only be useful to a small subset of bit array applications.

The demonstration example code shows how these fast operations on large bit arrays can be used to support a web analytics requirement.

The output (with some omitted instrumentation) is:

Array building finished. Duration 2.743431 seconds.
Bit ops finished. Duration 0.000336 seconds.
Possible hacker users: [103422, 174673, 214018, 228011, 232142, 278846, 296268, 369008, 714406, 732689, 846327, 858973, 879821, 919424]
Bit number listing finished. Duration 0.049643 seconds.

Note than some operations (such as building the bit array) are slow, but the AND and INVERT operations are very fast.

This recipe may be of use if you need a pure Python solution to a bit array requirement. It can also be useful if you only need fast save/load and bitwise operations.

For a more complete bit array module, I recommend https://github.com/ilanschnell/bitarray .