Python generators are a great way of reducing memory usage due to the lazy (on demand) generation of values. However, when the process-time of generating such a value is relatively high, we can improve performance even more by obtaining the next n values of the generator in a separate thread in the background. Hence, the BackgroundGenerator.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | import Queue
import threading
class BackgroundGenerator(threading.Thread):
def __init__(self, generator, lookahead=10):
threading.Thread.__init__(self)
self.queue = Queue.Queue(lookahead)
self.generator = generator
self.daemon = True
self.start()
def __iter__(self):
return self
def run(self):
for item in self.generator:
self.queue.put(item)
self.queue.put(None)
def next(self):
next_item = self.queue.get()
if next_item is None:
raise StopIteration
return next_item
def test():
import time
def processing_releasing_GIL(process_time):
# simulating processing that releases the Global Interpreter Lock
# also see: https://wiki.python.org/moin/GlobalInterpreterLock
time.sleep(process_time)
def processing_no_release_GIL(n):
i = 0
while i < n:
i += 1
def generator_count_up_release_GIL(count, process_time):
x = 1
while x <= count:
yield x
processing_releasing_GIL(process_time)
x += 1
def generator_count_up_not_releasing_GIL(count, n):
x = 1
while x <= count:
yield x
processing_no_release_GIL(n)
x += 1
print "Testing runtime of a generator that releases the GIL."
n = 10
# test runtime without backgroundGenerator
start = time.time()
gen1 = generator_count_up_release_GIL(n, 0.1)
counter = 0
for i in gen1:
counter = i
processing_releasing_GIL(0.1)
print "- no BackgroundGenerator: ", counter, time.time() - start
# test runtime WITH backgroundGenerator
start = time.time()
gen2 = BackgroundGenerator(generator_count_up_release_GIL(n, 0.1))
counter = 0
for i in gen2:
counter = i
processing_releasing_GIL(0.1)
print "- with BackgroundGenerator:", counter, time.time() - start
print "Testing runtime of a generator that does not release the GIL."
iterations = 2000000
# test runtime without backgroundGenerator
start = time.time()
gen1 = generator_count_up_not_releasing_GIL(n, iterations)
counter = 0
for i in gen1:
counter = i
processing_no_release_GIL(iterations)
print "- no BackgroundGenerator: ", counter, time.time() - start
# test runtime WITH backgroundGenerator
start = time.time()
gen2 = BackgroundGenerator(generator_count_up_not_releasing_GIL(n, iterations))
counter = 0
for i in gen2:
counter = i
processing_no_release_GIL(iterations)
print "- with BackgroundGenerator:", counter, time.time() - start
if __name__=="__main__":
test()
|
The BackgroundGenerator produces a new generator from an existing generator by calling: newGenerator = BackgroundGenerator(oldGenerator)
In effect it prefetches the next 10 values (by default) produced by the generator in a separate thread. Note that this only improves performance if the generator itself releases the GIL. A typical example is a generator that opens a file, extracts some content, closes the file and yields the content. Since I/O releases the GIL, huge performance boost can be achieved.
When the GIL is not released in the wrapped generator, performance may actually decrease as demonstrated by http://www.dabeaz.com/python/UnderstandingGIL.pdf presentation.
Both scenario's are demonstrated in the test() method.