How a Neural Network Learns.

Put your Vanilla code here

How a Neural Network Learns.

Postby hbyte » Sat May 18, 2024 8:53 am

Neural networks learn through training.

In training a neural network is given an input and it shows an output. It learns from the mistakes it makes between its outputs and a desired output.

Here is waht happens initially during the Feedforward stage of learning:

The input Feedforwards through each layer which carrying its value through a set of

Weights and Activations unique to each layer.

During this process each neuron calculates its value based on the summation of

Weights * Inputs

This summation is then fed through the units Activation function usually

Act = Sigmoid(Weights * Inputs)

Sigmoid = 1 / (1 + Exponent(-Sum)) - This produces a smooth step which switches the neuron on or off by providing a value between 0 and 1.

This process is called feedforward and is the first step in learning. Remember the Sigmoid as this is important for Training.

Then the Error made by the neural network is calculated :-

Error = Target (Desired Output) - Ouput of the net

This error is then used to create a measure of how the Network performed called

Loss

An example of how we can do this is the Loss function here it is

Loss = Sum of squared(Error) / Number of Outputs

This makes the Network Error easier to use in training. Squaring makes it positive only.

Summing and dividing by the number of Outputs is like taking the average.

This method is called Mean Square Error or MSE.

Now that we have the Loss calculated we need to feedback the Loss from the output Layer to all the hidden layers of the Neural Network.

This is called Backpropogation.

In mathematics there is a method called the chain rule:

If a change in z depends on y and

y changes depend on x then it follows that changes in z must depend on x too.

dz/dx = dz/dy * dy/dx

Using the chain rule we can Feedback the Loss from the Output layer to all other layers.

if we think of the change in Loss with respect to the weights that connect each layer:

l = loss at lower layer
ul = loss at upper layer
a = actvity of lower layer neuron
w = weight between upper and lower layer

dl/dw = dl/dul * dul/dw

upper loss depends on wgt

lower loss depends on upper loss

then it follows that...

lower loss depends on wgt


this breaks down to

y = sum(ul * w)

Here we are differentiating the Sigmoid activation function for each neuron:

dl/dw = y*(y-1) * a (Differentiate Sigmoid using dul/dw)

The chain rule allows us to connect a change in loss at one layer with that of another ie the output loss with the input loss.

So now we have used the chain rule to feedback or backprop error from an upper layer to a lower layer. In this way we have a value for loss at each layer for each neuron.

Hurrah.

Now that we have loss for each layer and each neuron we can then change the weights that connect between each layer in such a way to reduce the loss and thereby train the neural network to provide an output that better resembles the desired one - the target.

We will change the Weights at each layer using differentiation. We will change them according to the change in loss at each layer giving us:

dw/dl = loss * a * lrt

lrt is our learning rate - how fast we want it to learn.

This is called Gradient Descent because we are following the gradient of the loss. As the loss gradient lessons so do the updates to each weight. We follow the path of least resistance until we reach our goal which is the least amount of loss.

lrt is our learning rate - how fast we want it to learn.

Too fast and the network will overfit the data. Overfitting is when the neural network fails to generalise to new unseen data. We can compensate for overfitting by using noise in our neural network. This helps the Gradient Descent to avoid falling into suboptimal solutions that overfit the data. The noise works by causing the Descent to move a little randomly and to jump up out of these suboptima and explore other solutions that are better.

Another way to look at it is that Backprop is the reverse process of Feedforward.

In Backprop we are using derivatives of the neurons activation function sigmoid to produce error from the summation of upper error and weight. Reverse.

We Feedforward activations using the same activation function on the summation of weights and lower activations in the forward direction. Forward.

And thats how the neural network learns.
hbyte
Site Admin
 
Posts: 141
Joined: Thu Aug 13, 2020 6:11 pm

Re: How a Neural Network Learns.

Postby hbyte » Thu May 23, 2024 6:11 pm

Note to self. If we differentiate dl/dw to feedback error and update loss at each layer with this value this will be very different to setting loss at each layer.

loss = dl/dw = y*(y-1)*a

or

loss += dl/dw = y*(y-1)*a

We update weight using dw/dl gradients so why not loss with dl/dw gradients?

Answer: This is because we are using the gradient of the loss to change the weights so we feedback dl/dw to each layer to use when we calc dw/dl.

I havent tested this. I normally just feedback error using the first one. There is probably a good reason for this.
hbyte
Site Admin
 
Posts: 141
Joined: Thu Aug 13, 2020 6:11 pm


Return to Python and ML

Who is online

Users browsing this forum: No registered users and 1 guest

cron