Welcome, guest | Sign In | My Account | Store | Cart

This recipe shows how to profile threads in Python by using custom profiler function.

Python, 96 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
from time import time
import threading
import sys
from collections import deque
try:
    from resource import getrusage, RUSAGE_SELF
except ImportError:
    RUSAGE_SELF = 0
    def getrusage(who=0):
        return [0.0, 0.0] # on non-UNIX platforms cpu_time always 0.0

p_stats = None
p_start_time = None

def profiler(frame, event, arg):
    if event not in ('call','return'): return profiler
    #### gather stats ####
    rusage = getrusage(RUSAGE_SELF)
    t_cpu = rusage[0] + rusage[1] # user time + system time
    code = frame.f_code 
    fun = (code.co_name, code.co_filename, code.co_firstlineno)
    #### get stack with functions entry stats ####
    ct = threading.currentThread()
    try:
        p_stack = ct.p_stack
    except AttributeError:
        ct.p_stack = deque()
        p_stack = ct.p_stack
    #### handle call and return ####
    if event == 'call':
        p_stack.append((time(), t_cpu, fun))
    elif event == 'return':
        try:
            t,t_cpu_prev,f = p_stack.pop()
            assert f == fun
        except IndexError: # TODO investigate
            t,t_cpu_prev,f = p_start_time, 0.0, None
        call_cnt, t_sum, t_cpu_sum = p_stats.get(fun, (0, 0.0, 0.0))
        p_stats[fun] = (call_cnt+1, t_sum+time()-t, t_cpu_sum+t_cpu-t_cpu_prev)
    return profiler


def profile_on():
    global p_stats, p_start_time
    p_stats = {}
    p_start_time = time()
    threading.setprofile(profiler)
    sys.setprofile(profiler)


def profile_off():
    threading.setprofile(None)
    sys.setprofile(None)

def get_profile_stats():
    """
    returns dict[function_tuple] -> stats_tuple
    where
      function_tuple = (function_name, filename, lineno)
      stats_tuple = (call_cnt, real_time, cpu_time)
    """
    return p_stats

#### EXAMPLE ##################################################################

from time import sleep
from threading import Thread
import random

def test_function():
    pass

class T(Thread):
    def __init__(self):
        Thread.__init__(self)
    def run(self):                  # takes about 5 seconds
        for i in xrange(100):
            self.test_method()
            test_function()
    def test_method(self):
        sleep(random.random() / 10)

profile_on()
#######################
threads = [T() for i in xrange(3)]
for t in threads:
    t.start()
for i in xrange(100):
    test_function()
for t in threads:
    t.join()
#######################
profile_off()

from pprint import pprint
pprint(get_profile_stats())

UPDATE: critical flaw fixed, see CHANGES below.

Googling for "profiling python threads" returns almost no practical advices. I think it's time to change that ;-)

Recipe given above is easy to use. Just call profile_on() and after some time profile_off(). Now calling get_profile_stats() will return dictionary with profiling stats (number of calls, cumulative "wall" time, cumulative cpu time) for every function.

Example key,value pairs from stats dictionary: ('test_function', 'xprofile.py', 70): (400, 0.011523, 0.003999) ('test_method', 'xprofile.py', 80): (300, 15.507113, 0.011998)

CHANGES 1.6: * fixed: the same p_stack in all threads

CHANGES 1.3: * fixed: concurency issue with p_stack, now it is local for each thread * fixed support for non-Unix-like platforms (but cpu_time is always reported as 0.0)

1 comment

Jeff Silver 15 years, 4 months ago  # | flag

Double accounting? That's a good try at what would be very useful functionality. Unfortunately, I don't think the numbers for CPU usage will be correct. If I read the code correctly, what you are recording is the amount of CPU used by the process while a function was being run in a thread. That CPU usage figure will include any CPU time used by other threads that received some CPU resource during the (real) time in which the function was running.

I'm not aware of any system call that would give CPU usage for a thread. The only way I can think of doing this accurately is somehow to catch context switches and record CPU usage against thread at that point. Again, I don't know how to do this (I'm sure it would have to be at the C level rather than in Python).

:-(