Welcome, guest | Sign In | My Account | Store | Cart

This is a slightly different version of this http://arctrix.com/nas/python/bpnn.py

Python, 162 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
import math
import random
import string

class NN:
  def __init__(self, NI, NH, NO):
    # number of nodes in layers
    self.ni = NI + 1 # +1 for bias
    self.nh = NH
    self.no = NO
    
    # initialize node-activations
    self.ai, self.ah, self.ao = [],[], []
    self.ai = [1.0]*self.ni
    self.ah = [1.0]*self.nh
    self.ao = [1.0]*self.no

    # create node weight matrices
    self.wi = makeMatrix (self.ni, self.nh)
    self.wo = makeMatrix (self.nh, self.no)
    # initialize node weights to random vals
    randomizeMatrix ( self.wi, -0.2, 0.2 )
    randomizeMatrix ( self.wo, -2.0, 2.0 )
    # create last change in weights matrices for momentum
    self.ci = makeMatrix (self.ni, self.nh)
    self.co = makeMatrix (self.nh, self.no)
    
  def runNN (self, inputs):
    if len(inputs) != self.ni-1:
      print 'incorrect number of inputs'
    
    for i in range(self.ni-1):
      self.ai[i] = inputs[i]
      
    for j in range(self.nh):
      sum = 0.0
      for i in range(self.ni):
        sum +=( self.ai[i] * self.wi[i][j] )
      self.ah[j] = sigmoid (sum)
    
    for k in range(self.no):
      sum = 0.0
      for j in range(self.nh):        
        sum +=( self.ah[j] * self.wo[j][k] )
      self.ao[k] = sigmoid (sum)
      
    return self.ao
      
      
  
  def backPropagate (self, targets, N, M):
    # http://www.youtube.com/watch?v=aVId8KMsdUU&feature=BFa&list=LLldMCkmXl4j9_v0HeKdNcRA
    
    # calc output deltas
    # we want to find the instantaneous rate of change of ( error with respect to weight from node j to node k)
    # output_delta is defined as an attribute of each ouput node. It is not the final rate we need.
    # To get the final rate we must multiply the delta by the activation of the hidden layer node in question.
    # This multiplication is done according to the chain rule as we are taking the derivative of the activation function
    # of the ouput node.
    # dE/dw[j][k] = (t[k] - ao[k]) * s'( SUM( w[j][k]*ah[j] ) ) * ah[j]
    output_deltas = [0.0] * self.no
    for k in range(self.no):
      error = targets[k] - self.ao[k]
      output_deltas[k] =  error * dsigmoid(self.ao[k]) 
   
    # update output weights
    for j in range(self.nh):
      for k in range(self.no):
        # output_deltas[k] * self.ah[j] is the full derivative of dError/dweight[j][k]
        change = output_deltas[k] * self.ah[j]
        self.wo[j][k] += N*change + M*self.co[j][k]
        self.co[j][k] = change

    # calc hidden deltas
    hidden_deltas = [0.0] * self.nh
    for j in range(self.nh):
      error = 0.0
      for k in range(self.no):
        error += output_deltas[k] * self.wo[j][k]
      hidden_deltas[j] = error * dsigmoid(self.ah[j])
    
    #update input weights
    for i in range (self.ni):
      for j in range (self.nh):
        change = hidden_deltas[j] * self.ai[i]
        #print 'activation',self.ai[i],'synapse',i,j,'change',change
        self.wi[i][j] += N*change + M*self.ci[i][j]
        self.ci[i][j] = change
        
    # calc combined error
    # 1/2 for differential convenience & **2 for modulus
    error = 0.0
    for k in range(len(targets)):
      error = 0.5 * (targets[k]-self.ao[k])**2
    return error
        
        
  def weights(self):
    print 'Input weights:'
    for i in range(self.ni):
      print self.wi[i]
    print
    print 'Output weights:'
    for j in range(self.nh):
      print self.wo[j]
    print ''
  
  def test(self, patterns):
    for p in patterns:
      inputs = p[0]
      print 'Inputs:', p[0], '-->', self.runNN(inputs), '\tTarget', p[1]
  
  def train (self, patterns, max_iterations = 1000, N=0.5, M=0.1):
    for i in range(max_iterations):
      for p in patterns:
        inputs = p[0]
        targets = p[1]
        self.runNN(inputs)
        error = self.backPropagate(targets, N, M)
      if i % 50 == 0:
        print 'Combined error', error
    self.test(patterns)
    

def sigmoid (x):
  return math.tanh(x)
  
# the derivative of the sigmoid function in terms of output
# proof here: 
# http://www.math10.com/en/algebra/hyperbolic-functions/hyperbolic-functions.html
def dsigmoid (y):
  return 1 - y**2

def makeMatrix ( I, J, fill=0.0):
  m = []
  for i in range(I):
    m.append([fill]*J)
  return m
  
def randomizeMatrix ( matrix, a, b):
  for i in range ( len (matrix) ):
    for j in range ( len (matrix[0]) ):
      matrix[i][j] = random.uniform(a,b)

def main ():
  pat = [
      [[0,0], [1]],
      [[0,1], [1]],
      [[1,0], [1]],
      [[1,1], [0]]
  ]
  myNN = NN ( 2, 2, 1)
  myNN.train(pat)
  
  





if __name__ == "__main__":
    main()

6 comments

Zico 7 years, 10 months ago  # | flag

Hey David,

This is a cool code I must say. I do have one question though... how can I train the net with this?

Prokop Hapala 7 years, 8 months ago  # | flag

Hi, It's great to have simplest back-propagation MLP like this for learning. I'm just surprissed that I'm unable to learn this network a checkerboard function

Why? Do you know what can be the problem? Universal approximation theorem ( http://en.wikipedia.org/wiki/Universal_approximation_theorem ) says that it should be possible to do with 1 hidden layer.

def demo():
    # Teach network checkerboard function
    pat = [
    [ [0.0,0.0], [0.0] ],
    [ [0.0,0.5], [1.0] ],
    [ [0.0,1.0], [0.0] ],

    [ [0.5,0.0], [1.0] ],
    [ [0.5,0.5], [0.0] ],
    [ [0.5,1.0], [1.0] ],

    [ [1.0,0.0], [0.0] ],
    [ [1.0,0.5], [1.0] ],
    [ [1.0,1.0], [0.0] ]
    ]

    # create a network with two input, 10 hidden, and one output nodes
    n = NN(2, 10, 1)
    print " train it with some patterns "
    n.train(pat)
    print " test it "
    n.test(pat)

it will not coverge to any reasonable approximation

train it with some patterns 
error 3.14902
error 1.37104
error 1.35305
error 1.30453
error 1.28329
error 1.27599
error 1.27275
error 1.27108
error 1.27014
error 1.26957
test it 
([0.0, 0.0], '->', [0.019645293674000152])
([0.0, 0.5], '->', [0.5981006916165954])
([0.0, 1.0], '->', [0.5673621981298169])
([0.5, 0.0], '->', [0.5801274708105488])
([0.5, 0.5], '->', [0.5475774428347904])
([0.5, 1.0], '->', [0.5054692523873793])
([1.0, 0.0], '->', [0.5269586801603834])
([1.0, 0.5], '->', [0.48368767897171666])
([1.0, 1.0], '->', [0.43916379836698244])
Muhammad Rizman 7 years, 6 months ago  # | flag

if i'm going to use this code with 3 inputs, 3 hidden, 1 output nodes. which part of the code do I really have to adjust

sabastr 7 years, 6 months ago  # | flag

Hello!

Thank you for sharing your code! I am in the process of trying to write my own code for a neural network but it keeps not converging so I started looking for working examples that could help me figure out what the problem might be. I have one question about your code which confuses me. You use tanh as your activation function which has limits at -1 and 1 and yet for your inputs and outputs you use values of 0 and 1 rather than the -1 and 1 as is usually suggested. Could you explain to me how is that possible? I have seen it elsewhere already but it seems somewhat untraditional and I am trying to understand whether I am not understanding something that might help me figure out my own code.

Thank you!

virt 7 years, 4 months ago  # | flag

Great to see you sharing this code. I found this through Google and have some comments in case others run into problems:

  • Line 99 does: error = 0.5 * (targets[k]-self.ao[k])**2 This should be +=

  • The non-linear function is confusingly called sigmoid, but uses a tanh. In a lot of people's minds the sigmoid function is just the logistic function 1/1+e^-x, which is very different from tanh! The derivative of tanh is indeed (1 - y**2), but the derivative of the logistic function is s*(1-s). The link does not help very much with this.

Sudhanshu Patel 6 years, 3 months ago  # | flag

Great work Bro __/__