How do we train with Stochastic Gradient Descent on a feed-forward neural network?
Pseudo-code:
- The choice of weight initialization in lines 2 and 3 is explained here
- The actual computation of the gradient values (e.g. ) is not directly defined in this code, because we want to make the structure of the computation clear.