Welcome, guest | Sign In | My Account | Store | Cart

Python generators are a great way of reducing memory usage due to the lazy (on demand) generation of values. However, when the process-time of generating such a value is relatively high, we can improve performance even more by obtaining the next n values of the generator in a separate thread in the background. Hence, the BackgroundGenerator.

Python, 98 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import Queue
import threading

class BackgroundGenerator(threading.Thread):
    def __init__(self, generator, lookahead=10):
        threading.Thread.__init__(self)
        self.queue = Queue.Queue(lookahead)
        self.generator = generator
        self.daemon = True
        self.start()

    def __iter__(self):
        return self

    def run(self):
        for item in self.generator:
            self.queue.put(item)
        self.queue.put(None)

    def next(self):
            next_item = self.queue.get()
            if next_item is None:
                 raise StopIteration
            return next_item



def test():
    import time

    def processing_releasing_GIL(process_time):
        # simulating processing that releases the Global Interpreter Lock
        # also see: https://wiki.python.org/moin/GlobalInterpreterLock
        time.sleep(process_time)

    def processing_no_release_GIL(n):
        i = 0
        while i < n:
           i += 1

    def generator_count_up_release_GIL(count, process_time):
        x = 1
        while x <= count:
            yield x
            processing_releasing_GIL(process_time)
            x += 1

    def generator_count_up_not_releasing_GIL(count, n):
        x = 1
        while x <= count:
            yield x
            processing_no_release_GIL(n)
            x += 1

    print "Testing runtime of a generator that releases the GIL."
    n = 10
    # test runtime without backgroundGenerator
    start = time.time()
    gen1 = generator_count_up_release_GIL(n, 0.1)
    counter = 0
    for i in gen1:
        counter = i
        processing_releasing_GIL(0.1)
    print "- no BackgroundGenerator:  ", counter, time.time() - start

    # test runtime WITH backgroundGenerator
    start = time.time()
    gen2 = BackgroundGenerator(generator_count_up_release_GIL(n, 0.1))
    counter = 0
    for i in gen2:
        counter = i
        processing_releasing_GIL(0.1)
    print "- with BackgroundGenerator:", counter, time.time() - start


    print "Testing runtime of a generator that does not release the GIL."
    iterations = 2000000
    # test runtime without backgroundGenerator
    start = time.time()
    gen1 = generator_count_up_not_releasing_GIL(n, iterations)
    counter = 0
    for i in gen1:
        counter = i
        processing_no_release_GIL(iterations)
    print "- no BackgroundGenerator:  ", counter, time.time() - start

    # test runtime WITH backgroundGenerator
    start = time.time()
    gen2 = BackgroundGenerator(generator_count_up_not_releasing_GIL(n, iterations))
    counter = 0
    for i in gen2:
        counter = i
        processing_no_release_GIL(iterations)
    print "- with BackgroundGenerator:", counter, time.time() - start


if __name__=="__main__":
    test()

The BackgroundGenerator produces a new generator from an existing generator by calling: newGenerator = BackgroundGenerator(oldGenerator)

In effect it prefetches the next 10 values (by default) produced by the generator in a separate thread. Note that this only improves performance if the generator itself releases the GIL. A typical example is a generator that opens a file, extracts some content, closes the file and yields the content. Since I/O releases the GIL, huge performance boost can be achieved.

When the GIL is not released in the wrapped generator, performance may actually decrease as demonstrated by http://www.dabeaz.com/python/UnderstandingGIL.pdf presentation.

Both scenario's are demonstrated in the test() method.