Gradient Descent for Logistic Regression

Based on Gradient Descent and the idea of Machine Learning as Optimization in general, we can use gradient descent for Logistic Regression.

We are considering a linear separator defined by $θ, θ_{0}$ , where our hypothesis class or guess is $g^{(i)} = σ (θ^{T} x^{(i)} + θ_{0})$ . So, our objective function is:

J_{lr} (θ, θ_{0}) = \frac{1}{n} i = 1 \sum n L_{NLL} (g^{(i)}, y^{(i)}) + \frac{λ}{2} ∣∣ θ ∣ ∣^{2}

We use $\frac{λ}{2}$ as a constant for convenience to make the differentiation nicer. The idea of using $∣∣ θ ∣ ∣^{2}$ as a regularizer (L2 Regularization) forces the magnitude of the separator to stay small, so that it doesn’t overfit to data.

Finding the gradient of $J$ , we have:

\nabla_{θ} J_{lr} (θ, θ_{0}) = \frac{1}{n} i = 1 \sum n (g^{(i)} - y^{(i)}) x^{(i)} + λ θ

\frac{\partial J _{lr} ( θ , θ _{0} )}{\partial θ _{0}} = \frac{1}{n} i = 1 \sum n (g^{(i)} - y^{(i)})

Note that $\nabla_{θ} J$ will be of shape $d \times 1$ and $\frac{\partial J}{\partial θ _{0}}$ will be a scalar (we have separated $θ_{0}$ and $θ$ in this case).

Putting everything together, our gradient descent for logistic regression algorithm becomes:

/notes/

Recent

Matrix Multiplication

Backpropagation Algorithm

Backpropagation Intuition

Gradient Descent for Logistic Regression

Graph View

Backlinks