Backprop :-
i = upper neuron
j= lower neuron
calc gradient of weight wrt loss:
dw[i,j]/dl = delta weights wrt loss = Lrate * Err[i] * Act[j]
(Option here to include Momentum and Decay)
calculate gradient of loss wrt weight:
Err[j] = dl/dw = sum[Err[i]*Wgt[i,j]] wrt Act[j] <-- Chain rule
e.g for sigmoid dl/dw = Act*(1-Act) * Sum[Wgt*Err]
Err[j] = Act[j]*(1-Act[j]) * sum[Err[i]*Wgt[i,j]]
Output Err = Target - Output
Calculate Loss :-
Types of Loss Function
Loss function =
MSE 1/n sum(sqrd(Target-Output))
MAE 1/n * Sum[|Target-Output|]
Cross Entropy Y:[0,1] for outputs that are between 0 or 1 or are either 0 or 1
CE Loss = 1/n * Sum[ -(Y*log(P) + (1-Y) * log(1-P)
Follow these 3 steps -:-
1. Calculate Loss (Using Loss function)
2. Feedback Error wrt weight (Chain rule) (dl/dw)
3. Change weights wrt loss (dw/dl)
Done!
(It is not as hard as they are making it out to be.)