

1. with DAgger fully works:
- the best of E[∑c] = epsilon · T
- (once move to the red region, agent could move back, since DAgger)
2. Distribution mismatch:
- no mistake Prob:
- mistake Prob: the rest -> state distribution is
- total variational divergence: between
and
-
what's the worse case of this divergence?
-
factor of "2"



thoughts:
网友评论