这篇论文主要介绍了DGN的算法,在DQN的基础上加了图网络,用于状态的融合。在多智能体环境下运用。relation kernel用的是self-attention。
论文算法框架
这篇论文提到的几个点:
-
因为智能体之间的关系变化太快了,所以图动态变化太快,不利于收敛,所以在连续2个时间点保持图暂时不变。
-
unlike other methods with parameter-sharing, e.g., DQN, that sample experiences from individual agents, DGN samples experiences based on the graph of agents, not individual agents, and thus takes into con- sideration the interactions between agents.(这个没太看懂,怎么根据图来sample呢?)
-
Temporal Relation Regularization.
这篇论文和论文:Deep Reinforcement Learning with Relational Inductive Biases. 都用到了图网络和强化学习的结合,都提到了relational reinforcement learning 这个概念。有机会可以了解一下。
网友评论