美文网首页
Markov Decision Processes II

Markov Decision Processes II

作者: Ysgc | 来源:发表于2020-01-12 15:02 被阅读0次

waste of computation

policy evaluation is a fixed-policy version of value iteration

full (MDP) problem solved in one step
-> value iteration solution by bellman equation (consider every action for each state)
-> policy evaluation + policy improvement (take only one action for each state)

we aren't given the MDP
(meaning that the transition matrix is given???)

相关文章

网友评论

      本文标题:Markov Decision Processes II

      本文链接:https://www.haomeiwen.com/subject/wlgiactx.html