Welcome, guest | Sign In | My Account | Store | Cart

This is a simple function that runs another function in a different process by forking a new process which runs the function and waiting for the result in the parent. This can be useful for releasing resources used by the function such as memory.

Python, 40 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/usr/bin/env python
from __future__ import with_statement

import os, cPickle
def run_in_separate_process(func, *args, **kwds):
    pread, pwrite = os.pipe()
    pid = os.fork()
    if pid > 0:
        os.close(pwrite)
        with os.fdopen(pread, 'rb') as f:
            status, result = cPickle.load(f)
        os.waitpid(pid, 0)
        if status == 0:
            return result
        else:
            raise result
    else: 
        os.close(pread)
        try:
            result = func(*args, **kwds)
            status = 0
        except Exception, exc:
            result = exc
            status = 1
        with os.fdopen(pwrite, 'wb') as f:
            try:
                cPickle.dump((status,result), f, cPickle.HIGHEST_PROTOCOL)
            except cPickle.PicklingError, exc:
                cPickle.dump((2,exc), f, cPickle.HIGHEST_PROTOCOL)
        os._exit(0)

#an example of use
def treble(x):
    return 3 * x

def main():
    #calling directly
    print treble(4)
    #calling in separate process
    print run_in_separate_process(treble, 4)

Frequently, one might write code such like:

for x in alist: result = do_work(params)

where do_work consumes a lot of memory which is not useful for the lifetime of the program. I actually wrote this function when I was doing large amounts of computations (using numpy) and using large number of temporary arrays. A good way of actually reclaiming the memory is forking a child process, doing the computation in the child process, and returning the results to the parent. This pattern was mentioned in [1].

The function run_in_separate_process encodes this pattern. It handles exceptions as well, partially. Basically the child process returns a status code as well as a result. If the status is 0, then the function returned successfully and its result is returned. If the status is 1, then the function raised an exception, which will be raised in the parent. If the status is 2, then the function has returned successfully but the result is not picklable, an exception is raised. Exceptions such as SystemExit and KeyboardInterrupt in the child are not checked. As the function stands, it will result in an EOFError in the parent.

This function works on Linux and Mac OS X. It should work on all "reasonable" systems.

Thanks to Alex Martelli for confirming the basic approach and supplying the code for pipe communication. His message and the (short) thread [2] contain more explanation.

As it stands it works on Python 2.5 although there is nothing specific about 2.5 other than the use of the with statement. If that is rewritten to use normal exception handling is should work on much earlier versions, but I didn't test it.

[1] http://mail.python.org/pipermail/python-list/2007-March/431910.html [2] http://groups.google.com/group/comp.lang.python/browse_thread/thread/369862d6f91f55f4/48c51e5e42406b6f#48c51e5e42406b6f

14 comments

Jean Brouwers 16 years, 9 months ago  # | flag

Two suggestions. (1) Change the name of the first argument of run_in_separate_process() to something else like 'func'. The current name 'f' causes confusion with variable 'f' used for files. (2) The f.cpose() line is superfluous.

/Jean Brouwers

Muhammad Alkarouri (author) 16 years, 2 months ago  # | flag

Thanks. Thanks for the suggestions, which I have incorporated in the code.

Gary Eakins 16 years, 2 months ago  # | flag

os.waitpid(pid, 0) unnecessary? The "os.waitpid(pid, 0)" line seems unnecessary. When you have the output from the pipe, you are through with the child process. It seems to me your waitpid should be inside a loop and test for a -1 return, and raise an exception if it gets one.But maybe I've misinterpreted the man page for waitpid.

Rodney Drenth 16 years, 2 months ago  # | flag

Use Subprocess module. A better way might be to just to use the subprocess module. This creates a subprocess and executes a given command. Interprocess pipes can be created to transmit data to and receive results from the subprocess.

Muhammad Alkarouri (author) 16 years, 2 months ago  # | flag

Should be working alright .. I would like to note first that this code is in production now, so it at least kind of works. I am not sure if a loop is needed as waitpid waits until the process finishes as the options are set to 0 not WNOHANG. I didn't check for failure because in this situation if the pickle succeeds then all is well. On the other hand I put the waitpid just to not leave a possibly still running process in the background. So it is not fully tight-proof but I feel enough for the relatively controlled situation we have.

Muhammad Alkarouri (author) 16 years, 2 months ago  # | flag

Probably not, at least not for my needs. The idea is to call a Python function with variables which are already setup in the current process. Starting a new independent process with a copy of the current Python file and going through all the setup or separating the function in a different file which will be run under the Python interpreter and write more communication code seems to me a bit more than what is needed. You also lose the ease of replacing an already existing func(x, y) with run_in_separate_process(func, x, y).

The easiest way was to fork a copy of the running process, and the subprocess module is not replacing os.fork as far as I know.

Tennis Smith 15 years, 12 months ago  # | flag

Windoze? Anyone tried this on windows?

Bill Bose 15 years, 12 months ago  # | flag

Python 2.5.1. Does not work inside Python 2.5.1 with Solaris.

Getting the following error message:

./run_func_seperate_process.py:9: Warning: 'with' will become a reserved keyword in Python 2.6

File "./run_func_seperate_process.py", line 9

with os.fdopen(pread, 'rb') as f:

SyntaxError: invalid syntax

Bill Bose 15 years, 12 months ago  # | flag

Python 2.5.1 - solution. The Python distribution build 2.5.1 which I had didn't support the 'with' statement out the box. Please note that this solution doesn't work with the older versions of Python because of this reason.

However, I had it covered by using "from __future__ import with_statement" after the shebang.

It will be nice to extend this function to do parallel processing.

Muhammad Alkarouri (author) 15 years, 11 months ago  # | flag

Thanks.. Of course you are right. It needs the with statement which I use myself in python 2.5. I have modified the script accordingly.

Alan Brooks 15 years, 11 months ago  # | flag

Windows doesn't have os.fork. But you might be able to re-write using os.spawnXX to accomplish something similar.

ms4py 12 years ago  # | flag

Platform independent solution with multiprocessing: https://gist.github.com/2311116

Mike O'Connor 7 years, 9 months ago  # | flag

Hello All,.. I can report that this code still works, on Python 2.7.12, and also to report that I have tried to use if for a purpose that Muhammad Alkarouri did not suggest, parallelism.

I had not contemplated Active State membership when I did the modifications and so I posted it on stackoverflow. You can find the post by searching for the title: "Poor man's Python data parallelism doesn't scale with the # of processors? Why?"

Actually, as I may have made the code defective that may have been the best thing to do, as there you are encouraged to ask a question, such as 'what's wrong with my code?'

In my case, what's wrong is that when I quadruple the number of processor cores in use on my machine using the modified code the running time is reduced with that parallel processing by something like only 45%, not the 75% or so for which I had hoped.

Thanks much Muhammad for the post. I do provide a link back here from the stackoverflow article and do name you as the author and do describe the changes that I made (as well as show the code, of course).

Mike O'Connor 7 years, 8 months ago  # | flag

I should report that the problems that I was having with the code that was posted on stackoverflow pretty much resolved themselves--- with the actual, more complicated function that I am actually computing, which does not make ridiculously heavy use of numpy concatenation, I am getting performance that scales properly with the number of cores.

I should add that there were pipe errors when first I dared to save everything that was returned with each iteration, not to mention utter inefficiency of RAM usage. But I wrote a wrapper for the function that eliminated all but the essential results and that effectively cured that problem.

If you have a function that can't be pickled, and if each iteration is going to take approximately the same amount of time (think Monte Carlo) then my revisions could work for you. If your function won't pickle, then multiprocessing.Pool(), joblib and the like won't work.