Welcome, guest | Sign In | My Account | Store | Cart

I needed to use the cumulative normal distribution and normal probability density functions repeatedly for some data analysis. I found that I could speed things up drastically by using a lookup table and matplotlib's builtin interpolation function.

Python, 46 lines
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 import pylab import scipy.stats as ss nrm = ss.norm nx = nrm.pdf # Lookup table based implementations ------------------------------------------ def init_nx_table(xlim=5,N=1001): """Go from -xlim to +xlim and make N entries, return us the dx and x0. if N is made odd it is better""" idx0 = int(N/2) tbl = pylab.zeros(N) x = pylab.linspace(-xlim,xlim,N) dx = x - x tbl = nx(x) return x, tbl, idx0, dx def nx_lookup(x,mu,tbl, idx0, dx): """x needs to be an array.""" sz = tbl.size ret = pylab.zeros(x.size) #Our results idx = (x-mu)/dx + idx0 + .5 #indexes into our table need +.5 because of rounding idxidx = pylab.find((idx>=0) & (idx

The direct implementation, using the built-in pdf function is slowest, taking 61 function calls in 0.358 CPU seconds. Next comes the homemade lookup table, taking 12 function calls in 0.157 CPU seconds and the fastest is the interp function, taking 6 function calls in 0.050 CPU seconds. A x7 speed up. It is also much closer to the original function that the home made lookup table. Created by Kaushik Ghose on Sat, 2 Jul 2011 (MIT)