美文网首页
2018-12-20 PPO debug experience

2018-12-20 PPO debug experience

作者: 云雨惊袭明月夜 | 来源:发表于2018-12-20 23:16 被阅读0次

    PPO Debug Experience

    Recently, I need to perform PPO in a complex env. I refer to some code in GitHub, however, I can't grasp their meaning...

    After reading PPO paper, I decided to code by myself.

    I already have some experience writing RL code. After several minutes, I finished the first version with gym-cart-pole-v0. However, that didn't work...

    Then I started to check the core algorithm again and again...It's very sad, the code still did not work.

    So I suspect whether the agent's interacting with env is right or not...
    Then I started to debug the interaction between agent and env.
    Luckily, I found that the reward(or Gt/advantage) went wrong. So I refer to some papers about advantage such as GAE, TRPO and so on...

    Then I changed the way reward is calculated. The code work.
    You can click here to ref my code.

    相关文章

      网友评论

          本文标题:2018-12-20 PPO debug experience

          本文链接:https://www.haomeiwen.com/subject/xecpkqtx.html