NN Architecture

nn

Input: The input feature vector, if we are dealing with image patches, it will be long

Layers: One layer is constructed by many neurons, a neuron is a combination of a linear classifier and a activation function:

$$f(x) = Act(\omega\cdot x+b)$$

\(x\) is a 3071x1 vector, \(\omega\) of one neuron is a 1x3071 weights vector, \(b\) is a real value.

So one neuron in the 1st layer outputs one single value(marked by one color), this value will be copied to all the neurons of the 2nd layer.

The 1st layer, consist of 5 neurons, will output a 5x1 vector to the 2nd layer.

Then all the neurons in the 2nd layer will receive the same 5x1 vector from the the 1st layer as their inputs.

We can see that the original input, a 3071x1 vector is now compressed to a 5x1 vector by the 1st layer.

Or we can describe it more accurately: each neuron of the 1st layer extracts some key information from the 3071x1 input vector, all 5 neurons extracted 5 different kinds of information and store them in a vector.

The 2nd layer will further extract 4 kinds of information out of the 5x1 information vector from the 1st layer and output a 4x1 vector to the 3rd layer, the 3rd layer will output 1 value as the final result.

Mathematically

We can represent each layer as one weights matrix:

input \(x\): 3071x1

1st layer \(W_1\) : 5x3071, \(b_1\) : 5x1

2nd layer \(W_2\) : 4x5, \(b_2\) : 4x1

3d layer \(W_3\) : 1x4, \(b_3\) : 1x1

The equation after the 1st layer will be:

\[F(x)=Act(W_1\cdot x+b_1)\]

after the 2nd layer:

\[F(x)=Act(W_2\cdot Act(W_1\cdot x+b_1)+b_2)\]

after the 3rd layer:

\[F(x)=Act(W_3\cdot Act(W_2\cdot Act(W_1\cdot x+b_1)+b_2)+b_3)\]

The above equation describes the whole NN

Practically

f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (sigmoid)
x = np.random.randn(3071, 1) # random input vector(3071x1)
h1 = f(np.dot(W1, x) + b1) # calculate first hidden layer activations (5x1)
h2 = f(np.dot(W2, h1) + b2) # calculate second hidden layer activations (4x1)
out = np.dot(W3, h2) + b3 # output neuron (1x1)

Or we can put all together:

f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (sigmoid)
x = np.random.randn(3071, 1) # random input vector(3071x1)
out = np.dot(W3,f(np.dot(W2,f(np.dot(W1, x) + b1) ) + b2) )+b3

Wangxin

I am algorithm engineer focused in computer vision, I know it will be more elegant to shut up and show my code, but I simply can't stop myself learning and explaining new things ...