Notes of Reinforcement Learning

作者: 海街diary | 来源:发表于2018-08-05 12:21 被阅读29次

Notes of Reinforcement Learning
强化学习
AI technique I should learn in r
【ICLR2020】通过强化学习和稀疏奖励进行模仿学习
从零开始强化学习（一）——基础概念
reinforcement learning 学习资源推荐
【5分钟 Paper】Deep Reinforcement Le
2018-11-16 Tips for training DQN
Policy network
第17周论文阅读（2019年）

ICML-2018 Seminar Notes

There is a productive and meaningful seminar in LAMDA Group. I am so honored to attend and benefit from this seminar. It is worth recording notes about some papers.

2018-08-04

《Efficient Mode-Based Deep Reinforcement Learning with Variational State Tabulation》
- This paper proposal that by abstraction we can build a easy state transition model, upon which we can apply ordinary tabular methods(like Q-Learning) to accelerate learning.

Network Model of Abstraction and Reconstruction

However, this abstraction process is done by reconstructing code to initial state, which is unnecessary. Because all that we need is an easy and feasible generative model about transition, we do not need reconstruction. We can think more to how to build a abstraction model in the future.
《Obfuscated Gradients Give a False Sense of Security: Circumventing Defense to Adversarial Examples》

Attack Results

This paper is intended to attack neural network by adversarial examples. It defeats 7 of 9 defense networks on ICLR in 2017. Moreover, it classify these defense networks by their defense mechanisms.

2018-08-03

《Universal Planning Networks》
- The authors wants to learn an abstraction for state by imitation. The process of abstraction is optimized by the difference of actions between those learned by a certain policy and expert demonstrations.

Universal Planning Network(UPN)

However, it is unnecessary to lean such an abstraction by imitation because the environment are enough generative.
By the way, there are lots of details about experiments in the supplement. I am grateful for authors so thoughtful.

《Investigating Human Priors for Playing Video Games》

Comparison between Semantic Images and Non-Semantic Images

There is little theory in this paper. However, there are lots of experiments to demonstrate how human prior knowledge helps us explore efficiently. By removing the semantic of information, we are the same as agents, who learn very slowly.

2018-08-02

《Clipped Action Policy Gradient》
- It is usual that we limit continues action range into a valid range. However, the author points out that we often make a mistake when we deal with gradients of these invalid actions. The author proposal a new way of computing these gradients, which can reduce the variance.

Clipped Distribution

2018-08-01

《Coordinated Exploration in Concurrent Reinforcement Learning》
- In the case of multi-agent or multi-thread to explore in same environment, ordinary methods like UCB will often explore in a limited range. The author proposals a sample method for multi-agent, which can explore efficiently. For a demo, see here.

Seed Sampling

《Latent Space Policies for Hierarchical Reinforcement Learning》
- The work of this paper is focus on the hierarchical reinforcement learning by latent space. By a mechanism of double-shot, the high level of policy can correct the mistake of low level policy.

Laten Policy Control

2018-07-31

《Towards Deep Learning Models Resistant to Adversarial Attacks》
- The author proposal an algorithm of minimax optimization about how to generate adversarial examples and how to defense these examples. Luckily, it is robust that it defense the attacks of paper in ICML 2018 we mentioned about.

Illustration of "Natural" vs. "Adversarial" decision boundaries

《Time Limits in Reinforcement Learning》
- This paper concerns the effect that truncated time length brings about. The author claims that in the most of implementation of algorithms we do not consider this problem, which may result low efficiency of learning.

Correct Solution to Time-limited State

《Essentially No Barriers in Neural Network Energy Landscape》
- By delicate design of experiments, the author points out here are a comparative flat region in loss of neural network, where many parameters share a similar performance. Thus, we need careful to explore these parameters.

Illustration of Loss Region

2018-07-30

《Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents》
- The author extends actor-critic methods to multi agent environments. These decentralized agents can communicate by sharing the parameters of neural networks to coordinate. Personally, it is somewhat complicated.

Algorithm of Networked Actor-Critic

2018-07-27

《Min&Match-Agent Curricula for Reinforcement Learning》

The author provide a method that is similar to the idea of boosting to train a series of different policy in hierarchical reinforcement learning. During the training, he also use the method of population based training(see here for more details ) to choose hyper parameters.

Scheme of Min&Match

2018-07-24

《Self-Imitation Learning》

The author distinguishes the good samples and bad samples in experience buffer. By restudying the good examples, the author wants to solve the exploration dilemma in reinforcement learning. From my point of view, it cannot work because it just "memory" the path to good results, which has nothing with exploration.

Algorithm of Self-Imitation Learning

《Ray-A Distributed Framework for Emerging AI Applications》

The main work of this paper is focus on the distributed framework of training reinforcement learning, which is done by guys of University of Berkeley.

Ray

《Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization》

The author combines the optimization method of GD and SGD, which makes a balance. This optimization works on training neural network.

KatyushaX

2018-07-23

《QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning》

This work is based on the value function decomposition, which is designed to reduce the complexity of control problem in multi-agent Q-Learning. The author uses more complicated neural network to represent joint q value.

Architecture of QMIX

《Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations》

This work consider the problem of recovering reward from expert demonstrations, which may be sub-optimal rather than previously-assumed optimal. However, in the process of finding equilibrium in the zero-sum game, the computation is so huge that this algorithm is impractical in real world.

Optimization Objective of IRL

网友评论

增强学习Reinforcement Learning

本文标题：Notes of Reinforcement Learning

本文链接：https://www.haomeiwen.com/subject/xucbvftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Notes of Reinforcement Learning

ICML-2018 Seminar Notes

2018-08-04

2018-08-03

2018-08-02

2018-08-01

2018-07-31

2018-07-30

2018-07-27

《Min&Match-Agent Curricula for Reinforcement Learning》

2018-07-24

《Self-Imitation Learning》

《Ray-A Distributed Framework for Emerging AI Applications》

《Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization》

2018-07-23

《QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning》

《Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations》

相关文章

Notes of Reinforcement Learning

强化学习

AI technique I should learn in r

【ICLR2020】通过强化学习和稀疏奖励进行模仿学习

从零开始强化学习（一）——基础概念

reinforcement learning 学习资源推荐

【5分钟 Paper】Deep Reinforcement Le

2018-11-16 Tips for training DQN

Policy network

第17周论文阅读（2019年）

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

增强学习Reinforcement Learning