NN Backpropagation

The case we are handling: a 2 layers network

2layer_nn_bpp

The above diagram shows the network to be used. From the last blog we get the loss function:

In order to use the gradient descent algorithm to train and , we need to compute the derivative of to and , which are:

If we have by hand, then can be trained using gradient descent:

Compute

How to compute ? We use the chain rule to “propagate back” the gradient, for :

As the last blog described, of the sample can be expressed as:

considering :

then:

Compute

Now we will compute to train . The gradient will be propagated back to like this:

the same as before, we know that , the above equation results in:

is already known, depends on the activation function of the hidden layer.

Using this method, we can easily propagate the gradient back to the input layer through the whole network and update all the layers, no matter how many layers are there in between.

Wangxin

I am algorithm engineer focused in computer vision, I know it will be more elegant to shut up and show my code, but I simply can't stop myself learning and explaining new things ...

NN Softmax loss function

Published on March 09, 2017

NN Dropout

Published on January 08, 2017