Paper Summary3: Competing in the

作者: 廿怎么念 | 来源:发表于2021-02-28 11:33 被阅读0次

    0 Paper

    Abernethy, J. D., Hazan, E., & Rakhlin, A. (2009). Competing in the dark: An efficient algorithm for bandit linear optimization.

    1 Contributions

    (1) Proposing an efficient algorithm for online linear bandit optimization, which achieves a regret bound \mathcal{O}(poly(n) \sqrt{t})
    (2) Linking notion of Bregman divergences with self-concordant, which shows that divergence functions, which are widely used as a regularization tool in online learning, provide the right perspective for the problem of managing uncertainty
    given limited feedback.

    2 Main results

    Algorithm 1
    Regret bound of algorithm 1 Algorithm 2

    3 Key ideas

    (1) The exploration exploitation trade-off is the primary source of difficulty in obtaining O(√T) guarantees on the regret. Roughly two categories of approaches have been suggested to perform both exploration and exploitation, which are alternating explore / exploit , and simultaneous explore / exploit. The former one will suffer at least Ω(T^{2/3}) for the regret, whereas the second category is promising in obtaining O(√T) bound.

    (2) Estimates are inversely proportional to the distance of x_t to the boundary. This implies high variance of the estimated functions. Therefore, a regularization scheme can be employed as follows.

    Regularization scheme

    (3)Key building blocks to prove Theorem 1 includes Bregman divergences, self-concordant,barriers and the Dikin ellipsoid

    相关文章

      网友评论

        本文标题:Paper Summary3: Competing in the

        本文链接:https://www.haomeiwen.com/subject/ghzifltx.html