Loss functions in the case of gradient descent for linear regression always have a single well-defined global minimum. They are convex, such that every chord (line segment between two points on the surface) lies above the function and does not intersect it. Convexity implies that whenever we initialize the parameters, we are bound to reach the minimum if we keep walking downhill; the training procedure can’t fail.
In practice, loss functions for most nonlinear models, including both shallow neural networks and deep neural networks, are non-convex. Visualizing neural network loss functions is challenging due to the number of parameters. Hence, we first explore a simpler nonlinear model with two parameters to gain properties of non-convex loss functions:
This Gabor model maps scalar input to scalar output and consists of a sinusoidal component