Welcome, guest | Sign In | My Account | Store | Cart

Does not require multiprocessing, easy to hack, maybe not optimal but did the job for a make like tool I wrote.

Python, 66 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import sys, os, time
from subprocess import Popen, list2cmdline

def cpu_count():
    ''' Returns the number of CPUs in the system
    '''
    num = 1
    if sys.platform == 'win32':
        try:
            num = int(os.environ['NUMBER_OF_PROCESSORS'])
        except (ValueError, KeyError):
            pass
    elif sys.platform == 'darwin':
        try:
            num = int(os.popen('sysctl -n hw.ncpu').read())
        except ValueError:
            pass
    else:
        try:
            num = os.sysconf('SC_NPROCESSORS_ONLN')
        except (ValueError, OSError, AttributeError):
            pass

    return num

def exec_commands(cmds):
    ''' Exec commands in parallel in multiple process 
    (as much as we have CPU)
    '''
    if not cmds: return # empty list

    def done(p):
        return p.poll() is not None
    def success(p):
        return p.returncode == 0
    def fail():
        sys.exit(1)

    max_task = cpu_count()
    processes = []
    while True:
        while cmds and len(processes) < max_task:
            task = cmds.pop()
            print list2cmdline(task)
            processes.append(Popen(task))

        for p in processes:
            if done(p):
                if success(p):
                    processes.remove(p)
                else:
                    fail()

        if not processes and not cmds:
            break
        else:
            time.sleep(0.05)

commands = [
    ['curl', 'http://www.reddit.com/'],
    ['curl', 'http://en.wikipedia.org/'],
    ['curl', 'http://www.google.com/'],
    ['curl', 'http://www.yahoo.com/'],
    ['curl', 'http://news.ycombinator.com/']
]
exec_commands(commands)

6 comments

Todd Berry 11 years, 3 months ago  # | flag

Add the following to the script:

import time
Benjamin Sergeant (author) 11 years, 3 months ago  # | flag

Thanks, I just did that.

cappy2112 11 years, 3 months ago  # | flag

You already have some code to detect the os

if sys.platform == 'win32': try: num = int(os.environ['NUMBER_OF_PROCESSORS']) except (ValueError, KeyError): pass

but

['ls', '/bin'], ['ls', '/usr'], ['ls', '/etc'], ['ls', '/var'], ['ls', '/tmp']

needs to be replaced with commands & directories that will work on a Win32 system.

These commands would work if you have cygwin installed and it's in the path, but on a Win32 system without cygwin,

commands = [ ['dir', '%systemroot%\system32'], ['dir', '%systemroot%\Help'], ['dir', '%systemroot%\servicepackfiles\i386'], ['dir', '%systemdrive%\Program Files'], ['dir', '%systemroot%\temp'] ]

Benjamin Sergeant (author) 11 years, 3 months ago  # | flag

I didn't know that %systemroot% would expand to the system root (like C:) on windows. About the recipe, it's hard if impossible to find commands that exists on both unix and windows and that take some time to complete. I think Windows users will figure out like you did that you need to replace the list of commands with commands that work for them.

Julio Cesar da Silva 6 years, 1 month ago  # | flag

For me, it blocks if I have more commands to run than CPU's available. Then, it runs only as many commands as I have CPU's and it blocks there. Is there any way to run more commands after the first bunch of commands has already been implemented? Thank you very much.

Benjamin Sergeant (author) 6 years, 1 month ago  # | flag

Hi Julio,

The goal of the recipe is not to oversubscribe the machine with too much work, which would be detrimental to the overall performance. This is why I use the amount of cores on the machine to dictate the maximum number of active jobs.

If your workload is IO bound and not CPU bound (like downloading files), then you can try to change line 39 to set the max_task variable to whatever you want. You could try to double the CPU count. Be aware that by having too many active processes the OS will spend its time context switching and not doing real work.