Welcome, guest | Sign In | My Account | Store | Cart

This is a slightly different version of this http://arctrix.com/nas/python/bpnn.py

Python, 162 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162``` ```import math import random import string class NN: def __init__(self, NI, NH, NO): # number of nodes in layers self.ni = NI + 1 # +1 for bias self.nh = NH self.no = NO # initialize node-activations self.ai, self.ah, self.ao = [],[], [] self.ai = [1.0]*self.ni self.ah = [1.0]*self.nh self.ao = [1.0]*self.no # create node weight matrices self.wi = makeMatrix (self.ni, self.nh) self.wo = makeMatrix (self.nh, self.no) # initialize node weights to random vals randomizeMatrix ( self.wi, -0.2, 0.2 ) randomizeMatrix ( self.wo, -2.0, 2.0 ) # create last change in weights matrices for momentum self.ci = makeMatrix (self.ni, self.nh) self.co = makeMatrix (self.nh, self.no) def runNN (self, inputs): if len(inputs) != self.ni-1: print 'incorrect number of inputs' for i in range(self.ni-1): self.ai[i] = inputs[i] for j in range(self.nh): sum = 0.0 for i in range(self.ni): sum +=( self.ai[i] * self.wi[i][j] ) self.ah[j] = sigmoid (sum) for k in range(self.no): sum = 0.0 for j in range(self.nh): sum +=( self.ah[j] * self.wo[j][k] ) self.ao[k] = sigmoid (sum) return self.ao def backPropagate (self, targets, N, M): # http://www.youtube.com/watch?v=aVId8KMsdUU&feature=BFa&list=LLldMCkmXl4j9_v0HeKdNcRA # calc output deltas # we want to find the instantaneous rate of change of ( error with respect to weight from node j to node k) # output_delta is defined as an attribute of each ouput node. It is not the final rate we need. # To get the final rate we must multiply the delta by the activation of the hidden layer node in question. # This multiplication is done according to the chain rule as we are taking the derivative of the activation function # of the ouput node. # dE/dw[j][k] = (t[k] - ao[k]) * s'( SUM( w[j][k]*ah[j] ) ) * ah[j] output_deltas = [0.0] * self.no for k in range(self.no): error = targets[k] - self.ao[k] output_deltas[k] = error * dsigmoid(self.ao[k]) # update output weights for j in range(self.nh): for k in range(self.no): # output_deltas[k] * self.ah[j] is the full derivative of dError/dweight[j][k] change = output_deltas[k] * self.ah[j] self.wo[j][k] += N*change + M*self.co[j][k] self.co[j][k] = change # calc hidden deltas hidden_deltas = [0.0] * self.nh for j in range(self.nh): error = 0.0 for k in range(self.no): error += output_deltas[k] * self.wo[j][k] hidden_deltas[j] = error * dsigmoid(self.ah[j]) #update input weights for i in range (self.ni): for j in range (self.nh): change = hidden_deltas[j] * self.ai[i] #print 'activation',self.ai[i],'synapse',i,j,'change',change self.wi[i][j] += N*change + M*self.ci[i][j] self.ci[i][j] = change # calc combined error # 1/2 for differential convenience & **2 for modulus error = 0.0 for k in range(len(targets)): error = 0.5 * (targets[k]-self.ao[k])**2 return error def weights(self): print 'Input weights:' for i in range(self.ni): print self.wi[i] print print 'Output weights:' for j in range(self.nh): print self.wo[j] print '' def test(self, patterns): for p in patterns: inputs = p print 'Inputs:', p, '-->', self.runNN(inputs), '\tTarget', p def train (self, patterns, max_iterations = 1000, N=0.5, M=0.1): for i in range(max_iterations): for p in patterns: inputs = p targets = p self.runNN(inputs) error = self.backPropagate(targets, N, M) if i % 50 == 0: print 'Combined error', error self.test(patterns) def sigmoid (x): return math.tanh(x) # the derivative of the sigmoid function in terms of output # proof here: # http://www.math10.com/en/algebra/hyperbolic-functions/hyperbolic-functions.html def dsigmoid (y): return 1 - y**2 def makeMatrix ( I, J, fill=0.0): m = [] for i in range(I): m.append([fill]*J) return m def randomizeMatrix ( matrix, a, b): for i in range ( len (matrix) ): for j in range ( len (matrix) ): matrix[i][j] = random.uniform(a,b) def main (): pat = [ [[0,0], ], [[0,1], ], [[1,0], ], [[1,1], ] ] myNN = NN ( 2, 2, 1) myNN.train(pat) if __name__ == "__main__": main() ``` Zico 9 years, 10 months ago

Hey David,

This is a cool code I must say. I do have one question though... how can I train the net with this? Prokop Hapala 9 years, 7 months ago

Hi, It's great to have simplest back-propagation MLP like this for learning. I'm just surprissed that I'm unable to learn this network a checkerboard function

Why? Do you know what can be the problem? Universal approximation theorem ( http://en.wikipedia.org/wiki/Universal_approximation_theorem ) says that it should be possible to do with 1 hidden layer.

``````def demo():
# Teach network checkerboard function
pat = [
[ [0.0,0.0], [0.0] ],
[ [0.0,0.5], [1.0] ],
[ [0.0,1.0], [0.0] ],

[ [0.5,0.0], [1.0] ],
[ [0.5,0.5], [0.0] ],
[ [0.5,1.0], [1.0] ],

[ [1.0,0.0], [0.0] ],
[ [1.0,0.5], [1.0] ],
[ [1.0,1.0], [0.0] ]
]

# create a network with two input, 10 hidden, and one output nodes
n = NN(2, 10, 1)
print " train it with some patterns "
n.train(pat)
print " test it "
n.test(pat)
``````

it will not coverge to any reasonable approximation

``````train it with some patterns
error 3.14902
error 1.37104
error 1.35305
error 1.30453
error 1.28329
error 1.27599
error 1.27275
error 1.27108
error 1.27014
error 1.26957
test it
([0.0, 0.0], '->', [0.019645293674000152])
([0.0, 0.5], '->', [0.5981006916165954])
([0.0, 1.0], '->', [0.5673621981298169])
([0.5, 0.0], '->', [0.5801274708105488])
([0.5, 0.5], '->', [0.5475774428347904])
([0.5, 1.0], '->', [0.5054692523873793])
([1.0, 0.0], '->', [0.5269586801603834])
([1.0, 0.5], '->', [0.48368767897171666])
([1.0, 1.0], '->', [0.43916379836698244])
`````` Muhammad Rizman 9 years, 6 months ago

if i'm going to use this code with 3 inputs, 3 hidden, 1 output nodes. which part of the code do I really have to adjust sabastr 9 years, 6 months ago

Hello!

Thank you for sharing your code! I am in the process of trying to write my own code for a neural network but it keeps not converging so I started looking for working examples that could help me figure out what the problem might be. I have one question about your code which confuses me. You use tanh as your activation function which has limits at -1 and 1 and yet for your inputs and outputs you use values of 0 and 1 rather than the -1 and 1 as is usually suggested. Could you explain to me how is that possible? I have seen it elsewhere already but it seems somewhat untraditional and I am trying to understand whether I am not understanding something that might help me figure out my own code.

Thank you! virt 9 years, 4 months ago

Great to see you sharing this code. I found this through Google and have some comments in case others run into problems:

• Line 99 does: `error = 0.5 * (targets[k]-self.ao[k])**2` This should be `+=`

• The non-linear function is confusingly called sigmoid, but uses a tanh. In a lot of people's minds the sigmoid function is just the logistic function `1/1+e^-x`, which is very different from tanh! The derivative of tanh is indeed `(1 - y**2)`, but the derivative of the logistic function is `s*(1-s)`. The link does not help very much with this. Sudhanshu Patel 8 years, 3 months ago

Great work Bro __/__ Created by David Adler on Wed, 30 May 2012 (MIT)