Asynchronous Methods for Deep Re

作者: 初七123 | 来源:发表于2019-01-10 21:14 被阅读7次

Introduction

Deep RL algorithms based on experience replay haveachieved unprecedented success in challenging domainssuch as Atari 2600. However, experience replay has severaldrawbacks: it uses more memory and computation per realinteraction; and it requires off-policy learning algorithmsthat can update from data generated by an older policy

Related Work

In Gorila, each process contains an actor that acts in its own copyof the environment, a separate replay memory, and a learnerthat samples data from the replay memory and computesgradients of the DQN loss (Mnih et al., 2015) with respectto the policy parameters. The gradients are asynchronouslysent to a central parameter server which updates a centralcopy of the model.

(Tsitsiklis, 1994) studied convergence properties of Q-learning in the asynchronous optimization setting. Theseresults show that Q-learning is still guaranteed to convergewhen some of the information is outdated as long as out-dated information is always eventually discarded and sev-eral other technical assumptions are satisfied.

Asynchronous RL Framework

We now present multi-threaded asynchronous variants ofone-step Sarsa, one-step Q-learning, n-step Q-learning, andadvantage actor-critic

网友评论

本文标题：Asynchronous Methods for Deep Re

本文链接：https://www.haomeiwen.com/subject/mqrzrqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！