![](https://img.haomeiwen.com/i11683600/de6707335fcc4482.png)
![](https://img.haomeiwen.com/i11683600/6aec323699bcf3f3.png)
1. with DAgger fully works:
- the best of E[∑c] = epsilon · T
- (once move to the red region, agent could move back, since DAgger)
2. Distribution mismatch:
- no mistake Prob:
- mistake Prob: the rest -> state distribution is
- total variational divergence: between
and
-
what's the worse case of this divergence?
-
factor of "2"
![](https://img.haomeiwen.com/i11683600/71826d38a5e9f1b9.png)
![](https://img.haomeiwen.com/i11683600/e9b99c4c3e52a287.png)
![](https://img.haomeiwen.com/i11683600/2a6ecf6e0d692ce7.png)
thoughts:
网友评论