A thread pool is a class that maintains a pool of worker threads to perform time consuming operations in parallel. It assigns jobs to the threads by putting them in a work request queue, where they are picked up by the next available thread. This then performs the requested operation in the background and puts the results in a another queue.
The thread pool class can then collect the results from all threads from this queue as soon as they become available or after all threads have finished their work. It's also possible, to define callbacks to handle each result as it comes in.
Basic usage:
>>> main = TreadPool(poolsize)
>>> requests = makeRequests(some_callable, list_of_args, callback)
>>> [main.putRequests(req) for req in requests]
>>> main.wait()
See the below for a longer, annotated usage example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 | __all__ = ['makeRequests', 'NoResultsPending', 'NoWorkersAvailable',
'ThreadPool', 'WorkRequest', 'WorkerThread']
__author__ = "Christopher Arndt"
__version__ = "1.1"
__date__ = "2005-07-19"
import threading, Queue
class NoResultsPending(Exception):
"""All work requests have been processed."""
pass
class NoWorkersAvailable(Exception):
"""No worker threads available to process remaining requests."""
pass
class WorkerThread(threading.Thread):
"""Background thread connected to the requests/results queues.
A worker thread sits in the background and picks up work requests from
one queue and puts the results in another until it is dismissed.
"""
def __init__(self, requestsQueue, resultsQueue, **kwds):
"""Set up thread in damonic mode and start it immediatedly.
requestsQueue and resultQueue are instances of Queue.Queue passed
by the ThreadPool class when it creates a new worker thread.
"""
threading.Thread.__init__(self, **kwds)
self.setDaemon(1)
self.workRequestQueue = requestsQueue
self.resultQueue = resultsQueue
self._dismissed = threading.Event()
self.start()
def run(self):
"""Repeatedly process the job queue until told to exit.
"""
while not self._dismissed.isSet():
# thread blocks here, if queue empty
request = self.workRequestQueue.get()
if self._dismissed.isSet():
# return the work request we just picked up
self.workRequestQueue.put(request)
break # and exit
# XXX catch exceptions here and stick them to request object
self.resultQueue.put(
(request, request.callable(*request.args, **request.kwds))
)
def dismiss(self):
"""Sets a flag to tell the thread to exit when done with current job.
"""
self._dismissed.set()
class WorkRequest:
"""A request to execute a callable for putting in the request queue later.
See the module function makeRequests() for the common case
where you want to build several work requests for the same callable
but different arguments for each call.
"""
def __init__(self, callable, args=None, kwds=None, requestID=None,
callback=None):
"""A work request consists of the a callable to be executed by a
worker thread, a list of positional arguments, a dictionary
of keyword arguments.
A callback function can be specified, that is called when the results
of the request are picked up from the result queue. It must accept
two arguments, the request object and it's results in that order.
If you want to pass additional information to the callback, just stick
it on the request object.
requestID, if given, must be hashable as it is used by the ThreadPool
class to store the results of that work request in a dictionary.
It defaults to the return value of id(self).
"""
if requestID is None:
self.requestID = id(self)
else:
self.requestID = requestID
self.callback = callback
self.callable = callable
self.args = args or []
self.kwds = kwds or {}
class ThreadPool:
"""A thread pool, distributing work requests and collecting results.
See the module doctring for more information.
"""
def __init__(self, num_workers, q_size=0):
"""Set up the thread pool and start num_workers worker threads.
num_workers is the number of worker threads to start initialy.
If q_size > 0 the size of the work request is limited and the
thread pool blocks when queue is full and it tries to put more
work requests in it.
"""
self.requestsQueue = Queue.Queue(q_size)
self.resultsQueue = Queue.Queue()
self.workers = []
self.workRequests = {}
self.createWorkers(num_workers)
def createWorkers(self, num_workers):
"""Add num_workers worker threads to the pool."""
for i in range(num_workers):
self.workers.append(WorkerThread(self.requestsQueue,
self.resultsQueue))
def dismissWorkers(self, num_workers):
"""Tell num_workers worker threads to to quit when they're done."""
for i in range(min(num_workers, len(self.workers))):
worker = self.workers.pop()
worker.dismiss()
def putRequest(self, request):
"""Put work request into work queue and save for later."""
self.requestsQueue.put(request)
self.workRequests[request.requestID] = request
def poll(self, block=False):
"""Process any new results in the queue."""
while 1:
try:
# still results pending?
if not self.workRequests:
raise NoResultsPending
# are there still workers to process remaining requests?
elif block and not self.workers:
raise NoWorkersAvailable
# get back next results
request, result = self.resultsQueue.get(block=block)
# and hand them to the callback, if any
if request.callback:
request.callback(request, result)
del self.workRequests[request.requestID]
except Queue.Empty:
break
def wait(self):
"""Wait for results, blocking until all have arrived."""
while 1:
try:
self.poll(True)
except NoResultsPending:
break
def makeRequests(callable, args_list, callback=None):
"""Convenience function for building several work requests for the same
callable with different arguments for each call.
args_list contains the parameters for each invocation of callable.
Each item in 'argslist' should be either a 2-item tuple of the list of
positional arguments and a dictionary of keyword arguments or a single,
non-tuple argument.
callback is called when the results arrive in the result queue.
"""
requests = []
for item in args_list:
if item == isinstance(item, tuple):
requests.append(
WorkRequest(callable, item[0], item[1], callback=callback))
else:
requests.append(
WorkRequest(callable, [item], None, callback=callback))
return requests
################
# USAGE EXAMPLE
################
if __name__ == '__main__':
import random
import time
# the work the threads will have to do (rather trivial in our example)
def do_something(data):
time.sleep(random.randint(1,5))
return round(random.random() * data, 5)
# this will be called each time a result is available
def print_result(request, result):
print "Result: %s from request #%s" % (result, request.requestID)
# assemble the arguments for each job to a list...
data = [random.randint(1,10) for i in range(20)]
# ... and build a WorkRequest object for each item in data
requests = makeRequests(do_something, data, print_result)
# we create a pool of 10 worker threads
main = ThreadPool(3)
# then we put the work requests in the queue...
for req in requests:
main.putRequest(req)
print "Work request #%s added." % req.requestID
# or shorter:
# [main.putRequest(req) for req in requests]
# ...and wait for the results to arrive in the result queue
# wait() will return when results for all work requests have arrived
# main.wait()
# alternatively poll for results while doing something else:
i = 0
while 1:
try:
main.poll()
print "Main thread working..."
time.sleep(0.5)
if i == 10:
print "Adding 3 more worker threads..."
main.createWorkers(3)
i += 1
except (KeyboardInterrupt, NoResultsPending):
break
|
The basic concept and some code was taken from the book "Python in a Nutshell" by Alex Martelli, copyright O'Reilly 2003, ISBN 0-596-00188-6, from section 14.5 "Threaded Program Architecture". I wrapped the main program logic in the ThreadPool class, added the WorkRequest class and the callback system and tweaked the code here and there.
There are some other recipes in this Cookbook, that serve a similar purpose. This one is distinguished by the following characteristics:
object-oriented, reusable design
provides callback mechanism to process results as they are returned from the worker threads
WorkRequest objects wrap the tasks assigned to the worker threads and allow for easy passing of arbitrary data to the callbacks
the use of the Queue class solves most locking issues
all worker threads are daemonic, so they exit when the main programm exits, no need for joining
threads start running as soon as you create them. No need to start or stop them. You can increase or decrease the pool size at any time, superfluous hreads will just exit when they finish their current task.
you don't need to keep a reference to a thread after you have assigned the last task to it. You just tell it: "don't come back looking for work, when you're done!"
threads don't eat up cycles while waiting to be assigned a task, they just block when the task queue is empty.
Due to the parallel nature of threads, you have to keep some things in mind:
do not use for tasks were threads compete for a single external resource (e.g. a harddisk or stdout)
if you call ThreadPool.wait() the main thread will block until _all_ results have arrived. If you only want to check for results that are available immediately, use ThreadPool.poll()
Also note:
- the results of the work requests are not stored anywhere. You should provide an appropriate callback if you want to do so.
References:
You can download the last version of this module and view the pydoc generated documentation here:
http://chrisarndt.de/en/software/python/threadpool.html
Other similar recipes:
A small bug. Thanks for this neat code!
I think I've found a small bug in it: in the dismissWorker() function, I think the max() call should be min()
Thanks! Silly me! Of course you're right. I've fixed the recipe and the version on my website.
I also made a slight change to the WorkerThread class and replaced the _dismissed flag with a threading.Event object.
New enhanced version on my website. I have uploaded a new version (1.2) that has exception handling to my website (see above for link). Check it out!
version 1.2.1, line 236. i believe the
if item == isinstance(item, tuple):
should be
if isinstance(item, tuple):
Thanks! Fixed in version 1.2.2 on my website
A bug in version 1.2.2 putRequest, line # 195. The assert is on an undefined variable 'req' Should be 'request'. The self test works because 'req' is a global from the test.