Machine Learning as Optimization

Instead of trying to intuit learning algorithms like the Perceptron, we can introduce a general framework for solving machine learning problems that lets us derive machine learning algorithms for arbitrarily complicated problems. This is done by framing machine learning as an optimization problem: using computational methods to find the minimum/maximum of a given function.

Fundamentally, we define an objective function $J (Θ)$ , where $Θ$ are the parameters of our model. For a Linear Classifier, we would have $Θ = θ, θ_{0}$ . We also write $J (Θ, D)$ to indicate dependence on the data $D$ . Generally, the optimization is that we want to find $Θ^{⋆}$ such that:

Θ^{⋆} = argmin_{Θ} J (Θ)

which is to say that we want to find the $Θ$ that minimizes $J (Θ)$ .

A common objective function for machine learning is:

J (Θ) = \frac{1}{n} i = 1 \sum n loss L (h (x^{(i)}; Θ), y^{i}) + constant λ regularizer R (Θ)

where:

$L$ is a Loss Function, which makes the first term equivalent to training error.
$R (Θ)$ is a regularizer, and $λ$ is a constant hyperparameter governing how well we want to fit to training data.

/notes/

Recent

State Space Realization

Characteristic Polynomial Closed-Loop Stability

3D Angular Velocities

Machine Learning as Optimization

Graph View

Backlinks