Notes of Reinforcement Learning

作者: 海街diary | 来源:发表于2018-08-05 12:21 被阅读29次

    ICML-2018 Seminar Notes

    There is a productive and meaningful seminar in LAMDA Group. I am so honored to attend and benefit from this seminar. It is worth recording notes about some papers.

    2018-08-04

    Network Model of Abstraction and Reconstruction Attack Results
    • This paper is intended to attack neural network by adversarial examples. It defeats 7 of 9 defense networks on ICLR in 2017. Moreover, it classify these defense networks by their defense mechanisms.

    2018-08-03

    • 《Universal Planning Networks》
      • The authors wants to learn an abstraction for state by imitation. The process of abstraction is optimized by the difference of actions between those learned by a certain policy and expert demonstrations.
    Universal Planning Network(UPN)
    • However, it is unnecessary to lean such an abstraction by imitation because the environment are enough generative.
    • By the way, there are lots of details about experiments in the supplement. I am grateful for authors so thoughtful.
    Comparison between Semantic Images and Non-Semantic Images
    • There is little theory in this paper. However, there are lots of experiments to demonstrate how human prior knowledge helps us explore efficiently. By removing the semantic of information, we are the same as agents, who learn very slowly.

    2018-08-02

    • 《Clipped Action Policy Gradient》
      • It is usual that we limit continues action range into a valid range. However, the author points out that we often make a mistake when we deal with gradients of these invalid actions. The author proposal a new way of computing these gradients, which can reduce the variance.
    Clipped Distribution

    2018-08-01

    Seed Sampling Laten Policy Control

    2018-07-31

    Illustration of "Natural" vs. "Adversarial" decision boundaries
    • 《Time Limits in Reinforcement Learning》
      • This paper concerns the effect that truncated time length brings about. The author claims that in the most of implementation of algorithms we do not consider this problem, which may result low efficiency of learning.
    Correct Solution to Time-limited State Illustration of Loss Region

    2018-07-30

    Algorithm of Networked Actor-Critic

    2018-07-27

    《Min&Match-Agent Curricula for Reinforcement Learning》

    • The author provide a method that is similar to the idea of boosting to train a series of different policy in hierarchical reinforcement learning. During the training, he also use the method of population based training(see here for more details ) to choose hyper parameters.
    Scheme of Min&Match

    2018-07-24

    《Self-Imitation Learning》

    • The author distinguishes the good samples and bad samples in experience buffer. By restudying the good examples, the author wants to solve the exploration dilemma in reinforcement learning. From my point of view, it cannot work because it just "memory" the path to good results, which has nothing with exploration.
    Algorithm of Self-Imitation Learning

    《Ray-A Distributed Framework for Emerging AI Applications》

    • The main work of this paper is focus on the distributed framework of training reinforcement learning, which is done by guys of University of Berkeley.
    Ray

    《Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization》

    • The author combines the optimization method of GD and SGD, which makes a balance. This optimization works on training neural network.
    KatyushaX

    2018-07-23

    《QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning》

    • This work is based on the value function decomposition, which is designed to reduce the complexity of control problem in multi-agent Q-Learning. The author uses more complicated neural network to represent joint q value.
    Architecture of QMIX

    《Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations》

    • This work consider the problem of recovering reward from expert demonstrations, which may be sub-optimal rather than previously-assumed optimal. However, in the process of finding equilibrium in the zero-sum game, the computation is so huge that this algorithm is impractical in real world.
    Optimization Objective of IRL

    相关文章

      网友评论

        本文标题:Notes of Reinforcement Learning

        本文链接:https://www.haomeiwen.com/subject/xucbvftx.html