美文网首页
[Notes]Lecture14 Stochastic Mult

[Notes]Lecture14 Stochastic Mult

作者: 半山来客 | 来源:发表于2018-11-10 08:33 被阅读0次

文章链接:http://www-bcf.usc.edu/~haipengl/courses/CSCI699/lecture14.pdf

  • 这篇讲义主要介绍了Stochastic MAB 的一些基本概念,有很多数学公式及证明,如果要从数学角度理解细节和推敲,可以参考。
  • 第一部分将Stochastic MAB的基本概念,讲解了pesudo-regret。
  • 第二部分 2 First Attempt: Explore-then-exploit 一种基本思想,引出了bound。
  • 第三部分 3 The UCB Algorithm, 讲义中实际中讨论的Lower Bound,思想与UCB对称。

内容与读过的几篇高度重叠,只作部分摘录:

Stochastic Multi-armed Bandit

Pseudo-regret

Pseudo-regret is the expected regret against the fixed action a^*(instead of the empirically best actiontion, where the expectation is over the randomness of both the environment and the algorithm.

  • Pseudo-regret can be simplified as:


    Simpified pseudo-regret
  • pseudo-regret of UCB is bounded as:


    pseudo-regret bound

Symbols


  • a:each action
  • D_a:Distribution
  • l_1(a),\dots,l_T(a): Independent samples of D_a
  • a^*=argmin_a \mu(a): action argmin_a \hat{\mu}(a) : Optimal action on terms of the expected lossEmpirically best
  • \Delta_a = \mu(a)-\mu(a^*): the suboptiomal gap of action a
  • The number of times action a has been pulled up to round t

相关文章

网友评论

      本文标题:[Notes]Lecture14 Stochastic Mult

      本文链接:https://www.haomeiwen.com/subject/fdoaxqtx.html