论文复现 :
论文详述
Multiagent cooperation and competition with deep reinforcement learning
- 基础模型:pong game, two agents
- 算法结构:dqn
- reward:scoring:(-1,1) conceding(-1)
未击中球得-1,击中球得分between (-1,1)
双方均击中球得分0,游戏继续
- reward:scoring:(-1,1) conceding(-1)
- 训练参数
- 50 epochs, 250000 time steps each.
- exploration rate: 1.0 to 0.05(in the 1000000 time steps) and stays fixed at that value
- 结果分析
-
是否收敛:monitor average maximal Q-values of 500 randomly selected game situations, set aside before training begins
Q values -
训练效果反馈:
- Average paddle-bounces per point 在一方得分前球在players间来回的次数
- Average wall-bounces per paddle-bounce 球在到达一方前撞墙的次数
- Average serving time per point 球丢了以后players restart game的反应时间(一些rewarding scheme下players不希望重启游戏,serving time很长,如p = -1)
-
结果分析
- scoring = -1时,双方为合作状态(均不希望球掉落)
最终双方均升至页面最上方,球水平传来传去
合作模式video-youtube
1.png - scoring = 1时,双方为竞争模式(希望自己多得分)
竞争模式video-youtube
2.png - p range from -1 to 1
-
multiplayer dqn vs single-player
(score表示a胜b的得分)
4
本文遵守知识共享协议:署名-非商业性使用-相同方式共享 (BY-NC-SA)及简书协议
转载请注明:作者空空格格,首发简书 Jianshu.com
网友评论