Paper Summary3: Competing in the

作者: 廿怎么念 | 来源:发表于2021-02-28 11:33 被阅读0次

Paper Summary3: Competing in the
《经济学人》精读57：Go east, young founde
英语观点积累[1]:人工智能对人类社会的真正威胁
OkHttp学习笔记4--HTTPS
C31943-Day05-Trump continues to
【GRE】Argument Task
More than a million tickets in t
Okhttp-wiki 之 HTTPS
Love Best
SRE vs DevOps：是敌是友

0 Paper

Abernethy, J. D., Hazan, E., & Rakhlin, A. (2009). Competing in the dark: An efficient algorithm for bandit linear optimization.

1 Contributions

(1) Proposing an efficient algorithm for online linear bandit optimization, which achieves a regret bound $\mathcal{O}(poly(n) \sqrt{t})$
(2) Linking notion of Bregman divergences with self-concordant, which shows that divergence functions, which are widely used as a regularization tool in online learning, provide the right perspective for the problem of managing uncertainty
given limited feedback.

2 Main results

Algorithm 1

Regret bound of algorithm 1

Algorithm 2

3 Key ideas

(1) The exploration exploitation trade-off is the primary source of difficulty in obtaining $O(√T)$ guarantees on the regret. Roughly two categories of approaches have been suggested to perform both exploration and exploitation， which are alternating explore / exploit , and simultaneous explore / exploit. The former one will suffer at least $Ω(T^{2/3})$ for the regret, whereas the second category is promising in obtaining $O(√T)$ bound.

(2) Estimates are inversely proportional to the distance of $x_t$ to the boundary. This implies high variance of the estimated functions. Therefore, a regularization scheme can be employed as follows.