Paper Summary 1: Online Convex P

作者: 廿怎么念 | 来源:发表于2021-01-23 23:44 被阅读0次

Paper Summary 1: Online Convex P
March 16, 2018
怎样应对相信阴谋论的人？
培训开发sd15hr171012
Time Clock by Lamport
CIE ALevel数学Mathematics真题解析Pastp
Efficient Graph-Based Image Segm
R-CNN
卷积层可视化方法
Summary of the a paper ‘A love-

0 Paper: Zinkevich, Martin. "Online convex programming and generalized infinitesimal gradient ascent." In Proceedings of the 20th international conference on machine learning (icml-03), pp. 928-936. 2003.

1 Motivation: A general setting is like this. Imagine a farmer who decides what to plant each year. She has certain restrictions on her resources, both land and labor, as well as restrictions on the output she is allowed to produce. How can she select which crops to grow without knowing in advance what the prices will be?

2 Problem statement:

Given a known feasible set $F\subseteq R^n$ and an infinite sequence of convex functions $\left\{ c^1, c^2, ... \right\}$ that $c^t:F\rightarrow R$ at step $t$ . By the time a decision $x\in F$ is made, a cost function $c^t$ associate with the decision is assigned. How to make the decision so as to minimize the cost? And how to evaluate the algorithm that makes the decision?

3 Regret of an algorithm:

Regret is a metric to assess the performance of a decision making algorithm . Give an algorithm $A$ and its selected decision $\{ x^1, x^2, ...\}$ , the regret of A can be defined as follows.

The cost of A

The cost of a static solution x in F

The regret of A

$\min\limits_{x \in F} \ C_x(T)$ means that select a fixed $x \in F$ such that the sum of costs received by A with $x$ as input is minimized.

4 Important Notations

1) $\| x \| = \sqrt{x \cdot x}$

2) $d(x,y)=\| x-y\|$

3) projection $P(y)=\mathop{\arg\min}_{x \in F} d(x,y)$

4) $\| F \| = \max_{x,y\in F} d(x,y)$

5) $\| \nabla c \| = \sup_{x\in F, t \in\{1,2,...\}} \| \nabla c^t(x) \|$

5 Algorithm 1: Greedy Projection

1) Definition

Greedy Projection

where, $\eta _t$ is the learning rate. What $P(\cdot)$ does is to map the gradient descent update into the feasible set by projection.

2) Upper bound of regret

For $\eta _t=t^{-1/2}$

The upper bound of regret of Greedy Projection

6 Algorithm 2: Lazy Projection

1) Definition

For an arbitrary start $y^1=x^1 \in F$ ,

Lazy projection

2) Upper bound of regret

For fixed learning rate,

The upper bound of regret of Lazy Projection

7 Algorithm 3: Generalized Infinitesimal Gradient Ascent

0) Background: Repeated Games

Algorithm 3 is formulated from the concept of repeated games. Consider a matching game that a player needs to match two sets of actions $A=\{a_1, a_2, a_3\}$ and $Y=\{y_1, y_2, y_3\}$ , where a utility function $u:A\times Y \rightarrow \R$ can be defined as $u(a_1, y_1)=u(a_2, y_2)=u(a_3, y_4)=1$ and otherwise $u(a_i, y_i)=0$

(1) history: a sequence of joint actions.

(1)' $H^t=(A\times Y)^t$ is the set of all histories of length $t$

(2)' $|h|$ is the length of history $h$

(3)' e.g. of $h$ with length $|h|=3$

$h=\{(a_3,y_1), (a_1,y_2), (a_2,y_3), (a_2,y_2), (a_3,y_3), (a_1,y_2)\}$

(4)' utility of a history

To access the history, define $h_i$ to be the i-th joint action. E.g., $h_3=(a_2, y_3)$ and $h_{3,1}=a_2$

utility of history h

(5)' history change

$h^{*\rightarrow a}=\{(a_i,y_i) \in h| (a_i,y_i)=(a,y_i)\}$

Namely, replace each action of $h$ with an action $a$

(6)' regret of not playing action $a$

regret of not playing action a

(7)' the maximum regret,or just simply regret

regret of repeated games

The most important aspect of this definition of regret is that regret is a function of the resulting history, independent of the strategies that generated that history.

(8)' behavior $\sigma$

Let $\Delta (S)$ be the set of all probabilities over $S$

Let $\Pr_{x \in D}[P(x)]$ be the probability that the boolean predicate $P(x)$ is true for $x$ is selected from a distribution $D$

Then $\sigma : H \rightarrow \Delta(A)$ is a function from histories of past actions to distributions over the next action of the player.

(9)' environment $\rho$

$\rho: H \rightarrow \Delta (Y)$ is a function from the history of past actions to distributions over the next action of the environment.

(10)' universally consistent

a behavior $\sigma$ is universally consistent if for any $\epsilon >0$ there exists a $T$ such that for all $\rho$ :

universally consistent

In other words, after some time, with high probability the average regret never again exceeds $\epsilon$

1) Definition

For an arbitrary start $y^1=x^1 \in F$ , do the followings

(1) Play according to $x^t$ : play action $i$ with probability $x^t_{i}$ .

(2) Observe the action $h_(t,2) \in Y$ of the other player and calculate

update of action

2) upper bound of regret

Regarding the special example presented in (0) background, we have $\| F\|\leq \sqrt{2}$ , and $\|\nabla c \| \leq \sqrt{\|A\|}|u|$ , where

definition of abs(u)

Then for $\eta _t=t^{-1/2}$ , the expected regret of Generalized Infinitesimal Gradient Ascent for all obvious determinsitic environments $\rho$ , for all $a \in A$ , for all $T$ is

upper bound of regret of generalized infinitesimal gradient ascent

Note: An environment is an oblivious deterministic environment if it plays the same sequence of actions regardless of the actions of the player.

8 Summary

The paper formulated online convex programing and introduced several algorithms including the three algorithms presented in this post. A index called regret was utilized to evaluate the performance of the algorithms. Finally, the upper bound of regret of the discussed algorithms were derived.

Paper Summary 1: Online Convex P
0 Paper: Zinkevich, Martin. "Online convex programming an...
March 16, 2018
Make a summary : Firstly, listen to the online classes.Fi...
怎样应对相信阴谋论的人？
Summary Two BBC presenters are talking about online consp...
培训开发sd15hr171012
Main content：course summary; final paper arrangement; pre...
Time Clock by Lamport
Summary This paper explained how time clock works in dist...
CIE ALevel数学Mathematics真题解析Pastp
本商品为精编的真题解析手稿，分为Paper 1 Pure Mathematics 1 (P1)和Paper 2 P...
Efficient Graph-Based Image Segm
本文相关 Paper Summary:https://github.com/FDU-VTS/CVPaper Cod...
R-CNN
本文相关 Paper Summary:https://github.com/FDU-VTS/CVPaper Cod...
卷积层可视化方法
探究新方法的效率: online demos(github etc.) > paper >> patents 其中...
Summary of the a paper ‘A love-
After reading a paper published on Nature titled ‘A love-...