美文网首页
[Notes]Regret Analysis of Stocha

[Notes]Regret Analysis of Stocha

作者: 半山来客 | 来源:发表于2018-10-22 13:43 被阅读0次

完整paper地址:https://arxiv.org/pdf/1204.5721.pdf

Introduction

  • Application: ad placement, website optimization, and packeting route.
  • Goal: maximize the total payoff obtained in a sequence of allocations.
  • trade-off between exploration and exploitation in sequential experiments
  • 3 fundamental formalization depending on reward process: stochastic, adversarial, Markovian.
  • 3 playing strategy:
    • UCB algorithm (stochastic case)
    • Exp3 randomized algorithm(adversarial case)
    • Gittins indices(Markovian case)

Regret R_n:

Regret Definition

WHERE:

  • K: number of arms ,
  • i: arm i ,i=1,2,3,4...,K
  • t:steps of selecting arms. t=1,2,3...
  • n: n times plays
  • I_t: ther arm selected in step t (Choice)
  • X_{i,t}: unkown rewards associated with arm i
    ( Max \sum X_{i,t} -- Optimal )
  • X_{I_t,t}:reward received by select arm I_t (Reward)

In general, both rewards Xi,t and forecaster’s choices It might be stochastic. Distinguish between the two following notions of averaged regret:

Expected regret

expected regret
The expectation of the regret with respect to the action which is optimal on the sequence of reward realizations.
Psudo-regret

A weaker notion of regret, since one competes against the action which is optimal only in expectation.

  • 与expected regret的区别在于:
    • expected regret是K个摇臂中各自的总收益中选出最大的(相当于选reward最大的n次),减去forecaster的总reward,求期望。
    • Pseudo-regret是先选定某个arm i,选n次求和计算reward,减去forecaster的总reward,再求期望。最后再从K个arms的各自的这个期望中选择最大的。

?感觉这个解释不太对啊?再看看,每个t中选i的时候可以变吗?

Pseudo-regret ≤ Expected regret

Stochastic bandits (UCB algorithm)

Stochastic bandits

Probability distributions ν_i,\dots,v_k~ on ~[0,1] ~~ X_{I_t,t} \sim v_{I_t}
\mu_i: the mean of ν_i (mean reward of arm i).

Psudo-regret

\mu_i: the mean of ν_i (Mean reward of arm i).

Lai and Robbins [125], who introduced the technique of upper confidence bounds for the asymptotic analysis of regret:
[125] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Advances in Applied Mathematics, vol. 6, pp. 4–22, 1985.

Adversarial bandits(Exp3 randomized algorithm)

Adversarial bandits
  • Adversary/opponent: the mechanism that sets the sequence of gains for each arm.
  • Obvious adversary: The mechanism is independent of the forecaster's actions
  • Nonobvious adversary: the adversary adapt to the forecaster's past behavior
    eg: in the rigged casino the owner observes the way a gambler plays in order to design even more evil sequences of gains.

In this adversarial setting, the goal is to obtain regret bounds in high probability or in expectation with respect to any possible randomization in the strategies used by the forecaster or the opponent and irrespective of the opponent. In the case of a nonobvious adversary this is not an easy task, and for this reason, we usually start by bounding
the pseudo-regret:


image.png

Markovian bandits(Gittins indices)

  • Not much will be discussed in this monograph
  • A special case of Markovian bandits: Bayesian bandits.

Terminology

  • forecaster = player(i.e.,the agent implementing a bandit strategy)

相关文章

  • [Notes]Regret Analysis of Stocha

    完整paper地址:https://arxiv.org/pdf/1204.5721.pdf Introductio...

  • Privacy Policy

    The "Weight notes-Instant analysis" app respects and prot...

  • Algorithms Notes (1) - Analysis

    转载请注明出处:http://www.jianshu.com/p/e535ce1d30ab The note fo...

  • WSE Option B Day15

    Key word: Regret "We usually regret that the chances we m...

  • regret

    其实也不知道该从何说起,太高大上的开端,往往意味着文章不够贴近于我所想。学习,确实非一件易事,坚持犹难,可,纵如我...

  • regret

    你轻轻对我说 从来没喜欢我(在意) 轻轻对我说 从来不曾爱我 轻轻地对我说 从来,不曾,爱我 从来不曾爱我 轻轻地...

  • Regret 👣

    写信告诉我, 海是什么颜色。 夜夜陪着你的海, 你的心情又如何。 海面上白帆点点, 天上白云相映成辉。 阳光依然很...

  • regret

    我想握紧她的手 可是她的手总是会脱落 不知道什么时候开始对她小心翼翼的说话了 我记得我以前还挺嚣张的 像是我们之前...

  • Regret

    让自己不后悔的方法之一,就是告诉自己一切都是天注定。

  • regret

    不在一个挣扎存在的世界,也许你没有那么多的烦恼。 没有长了一颗不想从众的心,也许你没有那么多的烦恼。 不是一个习惯...

网友评论

      本文标题:[Notes]Regret Analysis of Stocha

      本文链接:https://www.haomeiwen.com/subject/fjiwzftx.html