Paper Summary3: Competing in the

作者: 廿怎么念 | 来源:发表于2021-02-28 11:33 被阅读0次

0 Paper

Abernethy, J. D., Hazan, E., & Rakhlin, A. (2009). Competing in the dark: An efficient algorithm for bandit linear optimization.

1 Contributions

(1) Proposing an efficient algorithm for online linear bandit optimization, which achieves a regret bound \mathcal{O}(poly(n) \sqrt{t})
(2) Linking notion of Bregman divergences with self-concordant, which shows that divergence functions, which are widely used as a regularization tool in online learning, provide the right perspective for the problem of managing uncertainty
given limited feedback.

2 Main results

Algorithm 1
Regret bound of algorithm 1 Algorithm 2

3 Key ideas

(1) The exploration exploitation trade-off is the primary source of difficulty in obtaining O(√T) guarantees on the regret. Roughly two categories of approaches have been suggested to perform both exploration and exploitation, which are alternating explore / exploit , and simultaneous explore / exploit. The former one will suffer at least Ω(T^{2/3}) for the regret, whereas the second category is promising in obtaining O(√T) bound.

(2) Estimates are inversely proportional to the distance of x_t to the boundary. This implies high variance of the estimated functions. Therefore, a regularization scheme can be employed as follows.

Regularization scheme

(3)Key building blocks to prove Theorem 1 includes Bregman divergences, self-concordant,barriers and the Dikin ellipsoid

相关文章

网友评论

    本文标题:Paper Summary3: Competing in the

    本文链接:https://www.haomeiwen.com/subject/ghzifltx.html