美文网首页
Asynchronous Methods for Deep Re

Asynchronous Methods for Deep Re

作者: 初七123 | 来源:发表于2019-01-10 21:14 被阅读7次

    Introduction

    Deep RL algorithms based on experience replay haveachieved unprecedented success in challenging domainssuch as Atari 2600. However, experience replay has severaldrawbacks: it uses more memory and computation per realinteraction; and it requires off-policy learning algorithmsthat can update from data generated by an older policy

    Related Work

    In Gorila, each process contains an actor that acts in its own copyof the environment, a separate replay memory, and a learnerthat samples data from the replay memory and computesgradients of the DQN loss (Mnih et al., 2015) with respectto the policy parameters. The gradients are asynchronouslysent to a central parameter server which updates a centralcopy of the model.

    (Tsitsiklis, 1994) studied convergence properties of Q-learning in the asynchronous optimization setting. Theseresults show that Q-learning is still guaranteed to convergewhen some of the information is outdated as long as out-dated information is always eventually discarded and sev-eral other technical assumptions are satisfied.

    Asynchronous RL Framework

    We now present multi-threaded asynchronous variants ofone-step Sarsa, one-step Q-learning, n-step Q-learning, andadvantage actor-critic

    相关文章

      网友评论

          本文标题:Asynchronous Methods for Deep Re

          本文链接:https://www.haomeiwen.com/subject/mqrzrqtx.html