美文网首页
An Actor-Critic Algorithm for Se

An Actor-Critic Algorithm for Se

作者: hzyido | 来源:发表于2016-08-03 11:04 被阅读602次

    Recurrent neural networks


    RNNs for sequence prediction

    In our models, the sequence of vectors is produced by either a bidirectional RNN (Schuster and Paliwal, 1997) or a convolutional encoder (Rush et al., 2015).








    3 Actor-Critic for Sequence Prediction

    We note that this way of re-writing the gradient of the expected reward is known in RL under the names policy gradient theorem (Sutton et al., 1999) and stochastic actor-critic (Sutton, 1984).
    我们注意到,重写预期回报的梯度这样的RL是已知的名字政策梯度定理下(萨顿等,1999)和随机演员评论家(萨顿,1984)。




    Training the critic


    Applying deep RL techniques

    Attempts to remove the target network by propagating the gradient through qt resulted in a lower square error (Qˆ(ˆyt ; Yˆ 1...T ) − qt) 2 , but the resulting Qˆ values proved very unreliable as training signals for the actor

    采样 5page
    To compensate for this, we sample predictions from a delayed actor, whose weights are slowly updated to follow the actor that is actually trained. This is inspired by (Lillicrap et al., 2015), where a delayer actor is used for a similar purpose。

    有关target critic network解释

    CONTINUOUS CONTROL WITH DEEP REINFORCEMENT 1509.02971.pdf

    相关文章

      网友评论

          本文标题:An Actor-Critic Algorithm for Se

          本文链接:https://www.haomeiwen.com/subject/pquqsttx.html