Based on Gradient Descent and the idea of Machine Learning as Optimization in general, we can use gradient descent for Logistic Regression.

We are considering a linear separator defined by , where our hypothesis class or guess is . So, our objective function is:

We use as a constant for convenience to make the differentiation nicer. The idea of using as a regularizer (L2 Regularization) forces the magnitude of the separator to stay small, so that it doesn’t overfit to data.

Finding the gradient of , we have:

Note that will be of shape and will be a scalar (we have separated and in this case).

Putting everything together, our gradient descent for logistic regression algorithm becomes: