0 Paper
Abernethy, J. D., Hazan, E., & Rakhlin, A. (2009). Competing in the dark: An efficient algorithm for bandit linear optimization.
1 Contributions
(1) Proposing an efficient algorithm for online linear bandit optimization, which achieves a regret bound
(2) Linking notion of Bregman divergences with self-concordant, which shows that divergence functions, which are widely used as a regularization tool in online learning, provide the right perspective for the problem of managing uncertainty
given limited feedback.
2 Main results
Algorithm 1Regret bound of algorithm 1 Algorithm 2
3 Key ideas
(1) The exploration exploitation trade-off is the primary source of difficulty in obtaining guarantees on the regret. Roughly two categories of approaches have been suggested to perform both exploration and exploitation, which are alternating explore / exploit , and simultaneous explore / exploit. The former one will suffer at least for the regret, whereas the second category is promising in obtaining bound.
(2) Estimates are inversely proportional to the distance of to the boundary. This implies high variance of the estimated functions. Therefore, a regularization scheme can be employed as follows.
(3)Key building blocks to prove Theorem 1 includes Bregman divergences, self-concordant,barriers and the Dikin ellipsoid
网友评论