How do we train with Stochastic Gradient Descent on a feed-forward neural network?

Pseudo-code:

  • The choice of weight initialization in lines 2 and 3 is explained here
  • The actual computation of the gradient values (e.g. ) is not directly defined in this code, because we want to make the structure of the computation clear.