It is a fact that we live in a world involving interaction with others, including both cooperation and competition. Thus, it is attractive to apply reinforcement learning into multi-agent systems.
Multi-agent SystemFramework
Because the problem of math formula editor, I will give a picture showing the definition from the perspective of markov decision process.
Multi-agent Reinforcement LearningAdvantages
There are many advantages of multiple agents acting in the systems.
-
<strong>Explore Efficiently</strong>. There is a trade-off between exploration and exploitation in single agent reinforcement learning. How powerful it will be if there are multiple agents together to explore and communicate with each other, upon which the efficiency of sampling will be dramatically improved. For a recent research result, please see [1].
-
<strong>Robust Securely</strong>. It is nor rare that some machines suddenly break down in reality, resulting in collapse of the systems. Thus, we need spare machines to avoid unexpected accidents. Thus, multi-agent reinforcement learning comes.
-
<strong>Transfer and Lifelong Learning</strong>. By teaching and imitating, new agents can learn more faster than learning primitively.
-
<strong>Cooperation and Competition</strong>. Some Tasks directly need us to cooperate to accomplish, like playing soccer, playing combat games and so on. By teamwork, it can tackle complicated environment. In addition, when it comes to the conflict of self-interest, we need to think about how to achieve best reward. Interesting phenomenons includes Nash Equilibrium.
Problems
We have talked about lots of advantages of multi-agent reinforcement learning. Now, what's the disadvantages or problems in multi-agent reinforcement learning?
-
<strong>Huge State and Action Space</strong>. It is no doubt that the space of discrete state and action will grow exponentially with the number of agents, not to mention that the state abstraction and representation will be more tough.
-
<strong>Partially Observation</strong>. Considering that the range single agent can perceive is small from the perspective of whole systems, there is problem of partial observation. Maybe agents need to communicate and then get a deal about the complete state information. If we think further, how to design the mechanism of communication channel among agents is also a trouble. For recent research results, please see [2] [3].
-
<strong>Instability in Learning</strong>. Because the transition model is determined by all agents, the quality of policy singe agent has learned is affected by other agent's policies. Think when single agent do the same action again, only to find that the next state and reward, it will be confused and do not know how to learn. Under this constitution, the process of learning may be stuck in oscillation.
-
<strong>Coordination and Cooperation</strong>. In the following picture, agents need to coordinate to escape obstacle and keep formation. That means agent 1 needs to know what's action agent 2 will choose in order to achieve best payoff. Vice Versa. It is impossible to complete such task by only choosing individual actions regardless of other's actions. It will be more complicated when agents need to coordinate with a series of actions.
Reference
[1] Maria and Benjamin. Coordinated Exploration in Concurrent Reinforcement Learning. ICML 2018.
[2] Jakob, Yannis, Nando and Shimon. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. NIPS 2016.
[3] Sainbayar, Arthur and Rob. Learning Multiagent Communication with Backpropagation. NIPS 2016.
网友评论