美文网首页
Lecture 4: Introduction to Reinf

Lecture 4: Introduction to Reinf

作者: Ysgc | 来源:发表于2020-01-21 12:56 被阅读0次

    1. Definition

    re-organize: now, state-action Markov chain item number = state number * action number

    ergodicity

    reward function can be non smooth
    but in the sense of probability, the expectation of reward is smooth

    2. Algorithm

    3. Tradeoffs

    maybe efficiency is not the only thing we care about -> wall clock time
    actually if plot the value of wall clock time -> the order of algorithms can be flipped

    Q learning -> use things look like gradient
    -> but actually fixed point iteration -> can not be proved to converge, using function approximation

    Model-based RL -> a better model -> doesn't mean more rewards

    Policy gradient -> better stability

    vision based grasping

    相关文章

      网友评论

          本文标题:Lecture 4: Introduction to Reinf

          本文链接:https://www.haomeiwen.com/subject/fpxjzctx.html