












waste of computation
policy evaluation is a fixed-policy version of value iteration
full (MDP) problem solved in one step
-> value iteration solution by bellman equation (consider every action for each state)
-> policy evaluation + policy improvement (take only one action for each state)





we aren't given the MDP
(meaning that the transition matrix is given???)
网友评论