Welcome, guest | Sign In | My Account | Store | Cart

The gibbs sampler is an iterative conditional sampler from multidimensional probability density functions (PDFs). The resulting sample is plotted as a scatter plot with the Matplotlib module.

Python, 32 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#Author: Flavio C. Coelho

from math import *
from RandomArray import *
from matplotlib.pylab import * 


n=10000
rho=.99 #correlation
#Means
m1 = 10
m2 = 20
#Standard deviations
s1 = 1
s2 = 1
#Initialize vectors
x=zeros(n, Float)
y=zeros(n, Float)
sd=sqrt(1-rho**2)
# the core of the method: sample recursively from two normal distributions
# Tthe mean for the current sample, is updated at each step.
for i in range(1,n):
  x[i] = normal(m1+rho*(y[i-1]-m2)/s2,s1*sd)
  y[i] = normal(m2+rho*(x[i-1]-m1)/s1,s2*sd)

scatter(x,y,marker='d',c='r')
title('Amostrador de Gibbs')
xlabel('x')
ylabel('y')
grid()

show()

This little algorithm is essencial for stochastic simulations where you need to make inferences from complex multidimensional probability density function which cannot be integrated algebraically.

Since its an iterative method it might be speeded up with Pyrex or inlining the loop in C with weave.

If anyone would like to try it please post the implementation.

6 comments

Douglas Bagnall 16 years, 7 months ago  # | flag

speed. Before jumping to pyrex, you could move some unvarying calculations out of the loop, like so:

sx = s1 * sd
sy = s2 * sd
rx = rho / s2
ry = rho / s1

for i in range(1,n):
  x[i] = normal(m1 + rx * (y[i-1] - m2), sx)
  y[i] = normal(m2 + ry * (x[i-1] - m1), sy)

In any case, most of the work probably happens in normal(), whatever that is, and nothing in this script can change that.

Flávio Codeço Coelho 16 years, 7 months ago  # | flag

Thanks for the suggestion. Yes,

Moving calculations out of the loop is basic. But answering to your question, Normal is a function from the Numeric module that draws a sample from a Normal distribution. It's probably C or Fortran code. So, apart from the function call overhead it must be pretty optimized already.

In this first version of the recipe, I kept the calculations inside the loop to keep the specifications of the mean and standard deviations in one place (breaking up formulas makes it harder to figure out whats going on). But I'll run some tests to see what the speed improvement of moving these calculations out of the loop is and post back the results.

Flávio Codeço Coelho 16 years, 7 months ago  # | flag

Timings. Ok, I did the timings: Original code:

IPython CPU timings (estimated):
Total runs performed: 1000
  Times :      Total       Per run
  User  : 292.444541 s, 0.292444541 s.
  System:        0.0 s,        0.0 s.

Unchanging calculations moved out of the loop:

IPython CPU timings (estimated):
Total runs performed: 1000
  Times :      Total       Per run
  User  : 268.004257 s, 0.268004257 s.
  System:        0.0 s,        0.0 s.

By doing this we get an improvement of almost 10% at the expense of a few more lines.

max likely 9 years, 1 month ago  # | flag

I believe you might have a bug in your code; y sampled at time t should be conditioned on x at time t. Cool snippet, tho.

You might want to change the inside of the loop to this:

x[i] = normal(m1+rho(y[i-1]-m2)/s2,s1sd) y[i] = normal(m2+rho(x[i]-m1)/s1,s2sd)

max likely 9 years, 1 month ago  # | flag

Sorry for the double post, here's my post properly formatted.

I believe you might have a bug in your code; y sampled at time t should be conditioned on x at t. Cool snippet, tho. You might want to change the inside of the loop to this:

x[i] = normal(m1+rho(y[i-1]-m2)/s2,s1sd)
y[i] = normal(m2+rho(x[i]-m1)/s1,s2sd)
whille 8 years, 6 months ago  # | flag

3d ploting can be added for iteration view:

# 3d plot with iteration as z axis
from mpl_toolkits.mplot3d import Axes3D

fig = figure()
ax = Axes3D(fig)
ax.plot(x, y, range(len(x)))
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('n')
show()