ActiveState Code

Recipe 413086: Gibbs Sampler


The gibbs sampler is an iterative conditional sampler from multidimensional probability density functions (PDFs). The resulting sample is plotted as a scatter plot with the Matplotlib module.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#Author: Flavio C. Coelho

from math import *
from RandomArray import *
from matplotlib.pylab import * 


n=10000
rho=.99 #correlation
#Means
m1 = 10
m2 = 20
#Standard deviations
s1 = 1
s2 = 1
#Initialize vectors
x=zeros(n, Float)
y=zeros(n, Float)
sd=sqrt(1-rho**2)
# the core of the method: sample recursively from two normal distributions
# Tthe mean for the current sample, is updated at each step.
for i in range(1,n):
  x[i] = normal(m1+rho*(y[i-1]-m2)/s2,s1*sd)
  y[i] = normal(m2+rho*(x[i-1]-m1)/s1,s2*sd)

scatter(x,y,marker='d',c='r')
title('Amostrador de Gibbs')
xlabel('x')
ylabel('y')
grid()

show()

Discussion

This little algorithm is essencial for stochastic simulations where you need to make inferences from complex multidimensional probability density function which cannot be integrated algebraically.

Since its an iterative method it might be speeded up with Pyrex or inlining the loop in C with weave.

If anyone would like to try it please post the implementation.

Comments

  1. 1. At 8:40 p.m. on 3 may 2005, Douglas Bagnall said:

    speed. Before jumping to pyrex, you could move some unvarying calculations out of the loop, like so:

    sx = s1 * sd
    sy = s2 * sd
    rx = rho / s2
    ry = rho / s1
    
    for i in range(1,n):
      x[i] = normal(m1 + rx * (y[i-1] - m2), sx)
      y[i] = normal(m2 + ry * (x[i-1] - m1), sy)
    

    In any case, most of the work probably happens in normal(), whatever that is, and nothing in this script can change that.

  2. 2. At 6:18 a.m. on 5 may 2005, Flávio Codeço Coelho said:

    Thanks for the suggestion. Yes,

    Moving calculations out of the loop is basic. But answering to your question, Normal is a function from the Numeric module that draws a sample from a Normal distribution. It's probably C or Fortran code. So, apart from the function call overhead it must be pretty optimized already.

    In this first version of the recipe, I kept the calculations inside the loop to keep the specifications of the mean and standard deviations in one place (breaking up formulas makes it harder to figure out whats going on). But I'll run some tests to see what the speed improvement of moving these calculations out of the loop is and post back the results.

  3. 3. At 7:11 a.m. on 5 may 2005, Flávio Codeço Coelho said:

    Timings. Ok, I did the timings: Original code:

    IPython CPU timings (estimated):
    Total runs performed: 1000
      Times :      Total       Per run
      User  : 292.444541 s, 0.292444541 s.
      System:        0.0 s,        0.0 s.
    

    Unchanging calculations moved out of the loop:

    IPython CPU timings (estimated):
    Total runs performed: 1000
      Times :      Total       Per run
      User  : 268.004257 s, 0.268004257 s.
      System:        0.0 s,        0.0 s.
    

    By doing this we get an improvement of almost 10% at the expense of a few more lines.

Sign in to comment