[Notes]Regret Analysis of Stocha

作者: 半山来客 | 来源:发表于2018-10-22 13:43 被阅读0次

[Notes]Regret Analysis of Stocha
Privacy Policy
Algorithms Notes (1) - Analysis
WSE Option B Day15
regret
regret
Regret 👣
regret
Regret
regret

完整paper地址：https://arxiv.org/pdf/1204.5721.pdf

Introduction

Application: ad placement, website optimization, and packeting route.
Goal: maximize the total payoﬀ obtained in a sequence of allocations.
trade-oﬀ between exploration and exploitation in sequential experiments
3 fundamental formalization depending on reward process: stochastic, adversarial, Markovian.
3 playing strategy:
- UCB algorithm (stochastic case)
- Exp3 randomized algorithm(adversarial case)
- Gittins indices(Markovian case)

Regret $R_n$ :

Regret Definition

WHERE:

$K$ : number of arms ,
$i$ : arm i ,i=1,2,3,4...,K
$t$ :steps of selecting arms. t=1,2,3...
$n$ : n times plays
$I_t$ : ther arm selected in step $t$ (Choice)
$X_{i,t}$ : unkown rewards associated with arm $i$
( $Max \sum X_{i,t}$ -- Optimal )
$X_{I_t,t}:$ reward received by select arm $I_t$ (Reward)

In general, both rewards Xi,t and forecaster’s choices It might be stochastic. Distinguish between the two following notions of averaged regret:

Expected regret

expected regret
The expectation of the regret with respect to the action which is optimal on the sequence of reward realizations.

Psudo-regret

A weaker notion of regret, since one competes against the action which is optimal only in expectation.

与expected regret的区别在于：
- expected regret是K个摇臂中各自的总收益中选出最大的(相当于选reward最大的n次),减去forecaster的总reward，求期望。
- Pseudo-regret是先选定某个arm i，选n次求和计算reward，减去forecaster的总reward，再求期望。最后再从K个arms的各自的这个期望中选择最大的。

?感觉这个解释不太对啊？再看看，每个t中选i的时候可以变吗？

Pseudo-regret ≤ Expected regret

Stochastic bandits (UCB algorithm)

Stochastic bandits

Probability distributions $ν_i,\dots,v_k~ on ~[0,1] ~~ X_{I_t,t} \sim v_{I_t}$
$\mu_i$ : the mean of $ν_i$ (mean reward of arm i).

Psudo-regret

$\mu_i$ : the mean of $ν_i$ (Mean reward of arm i).

Lai and Robbins [125], who introduced the technique of upper conﬁdence bounds for the asymptotic analysis of regret:
[125] T. L. Lai and H. Robbins, “Asymptotically eﬃcient adaptive allocation rules,” Advances in Applied Mathematics, vol. 6, pp. 4–22, 1985.

Adversarial bandits(Exp3 randomized algorithm)

Adversarial bandits

Adversary/opponent: the mechanism that sets the sequence of gains for each arm.
Obvious adversary: The mechanism is independent of the forecaster's actions
Nonobvious adversary: the adversary adapt to the forecaster's past behavior
eg: in the rigged casino the owner observes the way a gambler plays in order to design even more evil sequences of gains.

In this adversarial setting, the goal is to obtain regret bounds in high probability or in expectation with respect to any possible randomization in the strategies used by the forecaster or the opponent and irrespective of the opponent. In the case of a nonobvious adversary this is not an easy task, and for this reason, we usually start by bounding
the pseudo-regret:

image.png

Markovian bandits(Gittins indices)

Not much will be discussed in this monograph
A special case of Markovian bandits: Bayesian bandits.

Terminology

forecaster = player(i.e.,the agent implementing a bandit strategy)

网友评论

本文标题：[Notes]Regret Analysis of Stocha

本文链接：https://www.haomeiwen.com/subject/fjiwzftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

[Notes]Regret Analysis of Stocha

完整paper地址：https://arxiv.org/pdf/1204.5721.pdf

Introduction

Regret $R_n$ :

Expected regret

Stochastic bandits (UCB algorithm)

Adversarial bandits(Exp3 randomized algorithm)

Markovian bandits(Gittins indices)

Terminology

相关文章

[Notes]Regret Analysis of Stocha

Privacy Policy

Algorithms Notes (1) - Analysis

WSE Option B Day15

regret

regret

Regret 👣

regret

Regret

regret

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

[Notes]Regret Analysis of Stocha

完整paper地址：https://arxiv.org/pdf/1204.5721.pdf

Introduction

Regret :

Expected regret

Stochastic bandits (UCB algorithm)

Adversarial bandits(Exp3 randomized algorithm)

Markovian bandits(Gittins indices)

Terminology

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Regret $R_n$ :