### The Negative Gradient Does Not Point Towards the Minimum

In this post, we will explain how Gradient Descent (GD) works and why it can converge very slowly. The simplest first-order optimization algorithm is Gradient Descent. It is used to minimize a convex differentiable function over $latex {{\mathbb R}^d}&fg=000000$. The update rule is simply $latex {{\boldsymbol x}_{t+1} = {\boldsymbol x}_t - \eta_t \nabla f({\boldsymbol x}_t)}&fg=000000$, …

Subgradient descent (SD) is a very simple algorithm for minimizing a non-differentiable convex function. It proceeds iteratively updating the vector $latex {{\boldsymbol x}_t}&fg=000000$ moving in the direction of a negative subgradient $latex {{\boldsymbol g}_t}&fg=000000$, but a positive factor $latex {\eta_t}&fg=000000$ called the stepsize. Also, we project onto the feasible set $latex {V}&fg=000000$. The pseudocode is …