### The Negative Gradient Does Not Point Towards the Minimum

In this post, we will explain how Gradient Descent (GD) works and why it can converge very slowly. The simplest first-order optimization algorithm is Gradient Descent. It is used to minimize a convex differentiable function over $latex {{\mathbb R}^d}&fg=000000$. The update rule is simply $latex {{\boldsymbol x}_{t+1} = {\boldsymbol x}_t - \eta_t \nabla f({\boldsymbol x}_t)}&fg=000000$, …

Advertisements

### Subgradient Descent Does Not Descend

Subgradient descent (SD) is a very simple algorithm for minimizing a non-differentiable convex function. It proceeds iteratively updating the vector $latex {{\boldsymbol x}_t}&fg=000000$ moving in the direction of a negative subgradient $latex {{\boldsymbol g}_t}&fg=000000$, but a positive factor $latex {\eta_t}&fg=000000$ called the stepsize. Also, we project onto the feasible set $latex {V}&fg=000000$. The pseudocode is …

### First Post!

Hi Everybody! This is the first post of my blog. I thought many times about opening a blog, to discuss the research in machine learning, but I never had enough motivation to start one. This time I just did it, before allowing my conscious mind to realize how much effort it will be needed... The …