Subgradients and Online-to-Batch conversion

Last time we introduced Projected Online Gradient Descent: And we proved the following regret guarantee: Theorem 1 Let $latex {V\subseteq {\mathbb R}^d}&fg=000000$ a closed non-empty convex set with diameter $latex {D}&fg=000000$, i.e. $latex {\max_{{\boldsymbol x},{\boldsymbol y}\in V} \|{\boldsymbol x}-{\boldsymbol y}\|_2 \leq D}&fg=000000$. Let $latex {\ell_1, \cdots, \ell_T}&fg=000000$ an arbitrary sequence of convex functions $latex {\ell_t:{\mathbb …

Advertisements

Online Gradient Descent

To summarize what we said in the previous note, let's define online learning as the following general game For $latex {t=1,\dots,T}&fg=000000$ Outputs $latex {{\boldsymbol x}_t \in V\subseteq {\mathbb R}^d}&fg=000000$ Receive $latex {\ell_t:V \rightarrow {\mathbb R}}&fg=000000$ Pay $latex {\ell_t({\boldsymbol x}_t)}&fg=000000$ End for The aim of this game is to minimize the regret with respect to any …

Introduction to Online Learning

Imagine the following repeated game: In each round $latex {t=1,\dots,T}&fg=000000$ An adversary choose a real number in $latex {y_t \in [0,1]}&fg=000000$ and he keeps it secret You try to guess its real number, choosing $latex {x_t \in [0,1]}&fg=000000$ The adversary's number is revealed and you pay the squared difference $latex {(x_t-y_t)^2}&fg=000000$ Basically, we want to …

The Negative Gradient Does Not Point Towards the Minimum

In this post, we will explain how Gradient Descent (GD) works and why it can converge very slowly. The simplest first-order optimization algorithm is Gradient Descent. It is used to minimize a convex differentiable function over $latex {{\mathbb R}^d}&fg=000000$. The update rule is simply $latex {{\boldsymbol x}_{t+1} = {\boldsymbol x}_t - \eta_t \nabla f({\boldsymbol x}_t)}&fg=000000$, …

Subgradient Descent Does Not Descend

Subgradient descent (SD) is a very simple algorithm for minimizing a non-differentiable convex function. It proceeds iteratively updating the vector $latex {{\boldsymbol x}_t}&fg=000000$ moving in the direction of a negative subgradient $latex {{\boldsymbol g}_t}&fg=000000$, but a positive factor $latex {\eta_t}&fg=000000$ called the stepsize. Also, we project onto the feasible set $latex {V}&fg=000000$. The pseudocode is …

First Post!

Hi Everybody! This is the first post of my blog. I thought many times about opening a blog, to discuss the research in machine learning, but I never had enough motivation to start one. This time I just did it, before allowing my conscious mind to realize how much effort it will be needed... The …