This is a common problem – side effect of Backpropagation.
The gradient is the slope of the loss function along the error curve.
- When the gradient is too small, it continues to become smaller, updating the weight parameters until they become insignificant, which means that the algorithm is no longer learning.
- Exploding gradients occur when the gradient is too large, creating an unstable model (NaN results).