1D Gradient Descent

This is an example of Gradient Descent in one dimension. Expanded to multiple dimensions in Multiple Dimension Gradient Descent

Let’s say we have some arbitrary function $f (Θ)$ . We specify an initial value for parameter $Θ$ , a step-size parameter $η$ , and an accuracy parameter $ϵ$ . Then, the 1D gradient descent algorithm is:

This algorithm terminates when the change in the function $f$ is sufficiently small (less than $ϵ$ ). This is similar to the convergence criterion used in Numerical Methods. There are also other options on when to terminate this algorithm, such as:

Stop after a fixed number of iterations, $T$
Stop when the change in the value of parameter $Θ$ is sufficiently small, when $∣ Θ^{(t)} - Θ^{(t - 1)} ∣ < ϵ$
Stop when the derivative $f^{'}$ at the latest value of $Θ$ is sufficiently small, when $∣ f^{'} (Θ^{(t)}) ∣ < ϵ$

Convergence

Step Size

We have to choose the step size $η$ carefully; too small of a value will mean convergence is slow, and too big may cause oscillation around the minimum.

Local vs. Global Minimums

Another thing we have to think about is absolute/global vs. local minima; our function may find a local minima instead of the absolute one.

Theorem

If $J$ is convex, for any desired accuracy $ϵ$ , there is some step size $η$ such that gradient descent will converge to within $ϵ$ of the optimal $Θ$ .

For non-convex functions, the point of convergence depends on $Θ_{init}$ . When we reach a $Θ$ where $f^{'} (Θ) = 0, f^{''} (Θ) > 0$ , it is a local minimum. This is shown in the function below.

This method reminds me a lot of Newton-Raphson Method

/notes/

Recent

Mathematical Formulation of Test Error

Reducing Model Error

MNIST 1D Test Error Example

1D Gradient Descent

Convergence

Step Size

Local vs. Global Minimums

Graph View

Table of Contents

Backlinks