[Notes]Lecture14 Stochastic Mult

[Notes]Lecture14 Stochastic Mult

作者: 半山来客 | 来源:发表于2018-11-10 08:33 被阅读0次

[Notes]Lecture14 Stochastic Mult
Stochastic Calculus A Practical
A Survey of Actor-Critic Reinfor
十一.随机和批量下降训练
【学术讲座】2020.08.07
2018-12-19
Natural Gradient Works Efﬁcientl
heuristic and stochastic
STOCHASTIC CALCULUS
机器学习基石第十一节

文章链接：http://www-bcf.usc.edu/~haipengl/courses/CSCI699/lecture14.pdf

这篇讲义主要介绍了Stochastic MAB 的一些基本概念，有很多数学公式及证明，如果要从数学角度理解细节和推敲，可以参考。
第一部分将Stochastic MAB的基本概念，讲解了pesudo-regret。
第二部分 2 First Attempt: Explore-then-exploit 一种基本思想，引出了bound。
第三部分 3 The UCB Algorithm, 讲义中实际中讨论的Lower Bound，思想与UCB对称。

内容与读过的几篇高度重叠，只作部分摘录：

Stochastic Multi-armed Bandit

Pseudo-regret

Pseudo-regret is the expected regret against the ﬁxed action $a^*$ (instead of the empirically best actiontion, where the expectation is over the randomness of both the environment and the algorithm.

Pseudo-regret can be simplified as:

Simpified pseudo-regret
pseudo-regret of UCB is bounded as:

pseudo-regret bound

Symbols

$a$ :each action
$D_a$ :Distribution
$l_1(a),\dots,l_T(a)$ : Independent samples of $D_a$
$a^*=argmin_a \mu(a)$ : action $argmin_a \hat{\mu}(a)$ : Optimal action on terms of the expected lossEmpirically best
$\Delta_a = \mu(a)-\mu(a^*)$ : the suboptiomal gap of action a
The number of times action a has been pulled up to round t

相关文章

[Notes]Lecture14 Stochastic Mult
文章链接：http://www-bcf.usc.edu/~haipengl/courses/CSCI699/lec...
Stochastic Calculus A Practical
Notes for the following book:Richard Durrett, Stochastic ...
A Survey of Actor-Critic Reinfor
The stochastic process to be controlled is described by t...
十一.随机和批量下降训练
Batch and Stochastic Training This python function illust...
【学术讲座】2020.08.07
【主题】Stochastic Optimization with Decisions Truncated by R...
2018-12-19
Stochastic model of rumor propagation dynamics on homogen...
Natural Gradient Works Efﬁcientl
Introduction The stochastic gradient method (Widrow, 1963...
heuristic and stochastic
I like the word very very very much.
STOCHASTIC CALCULUS
二叉树无套利定价模型使用二叉树是为了用股票和现金的组合来复制期权，使用的理论是无套利定价。定理1.2.2（多时...
机器学习基石第十一节
Linear Models for Classification stochastic 随机的gradient 梯...

网友评论

本文标题：[Notes]Lecture14 Stochastic Mult

本文链接：https://www.haomeiwen.com/subject/fdoaxqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|[Notes]Lecture14 Stochastic Mult|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！