This is a common problem – side effect of Backpropagation. It can occur due to poor parameter initialization.

The gradient is the slope of the loss function along the error curve.

  • When the gradient is too small, it continues to become smaller, updating the weight parameters until they become insignificant, which means that the algorithm is no longer learning.
  • Exploding gradients occur when the gradient is too large, creating an unstable model (NaN results).

Ways to stop vanishing gradients:

  • Residual/skip connections

Ways to stop exploding gradients:

  • Gradient clipping