Does not require multiprocessing, easy to hack, maybe not optimal but did the job for a make like tool I wrote.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | import sys, os, time
from subprocess import Popen, list2cmdline
def cpu_count():
''' Returns the number of CPUs in the system
'''
num = 1
if sys.platform == 'win32':
try:
num = int(os.environ['NUMBER_OF_PROCESSORS'])
except (ValueError, KeyError):
pass
elif sys.platform == 'darwin':
try:
num = int(os.popen('sysctl -n hw.ncpu').read())
except ValueError:
pass
else:
try:
num = os.sysconf('SC_NPROCESSORS_ONLN')
except (ValueError, OSError, AttributeError):
pass
return num
def exec_commands(cmds):
''' Exec commands in parallel in multiple process
(as much as we have CPU)
'''
if not cmds: return # empty list
def done(p):
return p.poll() is not None
def success(p):
return p.returncode == 0
def fail():
sys.exit(1)
max_task = cpu_count()
processes = []
while True:
while cmds and len(processes) < max_task:
task = cmds.pop()
print list2cmdline(task)
processes.append(Popen(task))
for p in processes:
if done(p):
if success(p):
processes.remove(p)
else:
fail()
if not processes and not cmds:
break
else:
time.sleep(0.05)
commands = [
['curl', 'http://www.reddit.com/'],
['curl', 'http://en.wikipedia.org/'],
['curl', 'http://www.google.com/'],
['curl', 'http://www.yahoo.com/'],
['curl', 'http://news.ycombinator.com/']
]
exec_commands(commands)
|
Tags: multiprocessing, process
Add the following to the script:
Thanks, I just did that.
You already have some code to detect the os
if sys.platform == 'win32': try: num = int(os.environ['NUMBER_OF_PROCESSORS']) except (ValueError, KeyError): pass
but
['ls', '/bin'], ['ls', '/usr'], ['ls', '/etc'], ['ls', '/var'], ['ls', '/tmp']
needs to be replaced with commands & directories that will work on a Win32 system.
These commands would work if you have cygwin installed and it's in the path, but on a Win32 system without cygwin,
commands = [ ['dir', '%systemroot%\system32'], ['dir', '%systemroot%\Help'], ['dir', '%systemroot%\servicepackfiles\i386'], ['dir', '%systemdrive%\Program Files'], ['dir', '%systemroot%\temp'] ]
I didn't know that %systemroot% would expand to the system root (like C:) on windows. About the recipe, it's hard if impossible to find commands that exists on both unix and windows and that take some time to complete. I think Windows users will figure out like you did that you need to replace the list of commands with commands that work for them.
For me, it blocks if I have more commands to run than CPU's available. Then, it runs only as many commands as I have CPU's and it blocks there. Is there any way to run more commands after the first bunch of commands has already been implemented? Thank you very much.
Hi Julio,
The goal of the recipe is not to oversubscribe the machine with too much work, which would be detrimental to the overall performance. This is why I use the amount of cores on the machine to dictate the maximum number of active jobs.
If your workload is IO bound and not CPU bound (like downloading files), then you can try to change line 39 to set the max_task variable to whatever you want. You could try to double the CPU count. Be aware that by having too many active processes the OS will spend its time context switching and not doing real work.