美文网首页
Lecture 3 Part A: Follow up

Lecture 3 Part A: Follow up

作者: Ysgc | 来源:发表于2020-01-20 09:19 被阅读0次

1. with DAgger fully works: p_{train}(s) = p_{\theta}(s)

  • the best of E[∑c] = epsilon · T
  • (once move to the red region, agent could move back, since DAgger)

2. Distribution mismatch: p_{train}(s) \ne p_{\theta}(s)

  • no mistake Prob: (1-\epsilon)^T
  • mistake Prob: the rest -> state distribution is p_{mistake}(s_t)
  • total variational divergence: between p_{train}(s) and p_{\theta}(s)

what's the worse case of this divergence?

  • factor of "2"


c_{max} = 1


thoughts:

相关文章

网友评论

      本文标题:Lecture 3 Part A: Follow up

      本文链接:https://www.haomeiwen.com/subject/qqurzctx.html