增强学习有四个要素:policy, reward signal, value function and model of the environment.
1.Policy
policy定义了在给定时间点,对环境(situation)将做出如何的行为。( a policy defines the learning agent's way of the behaving at a given time).
2.Reward Signal
reward signal定义了在增强学习过程中的目标(goal)(a reward signal defines the goal in a reinforcement learning problem)。我们的学习目标就是要maximize the total reward。
3. Value Function
value function定义了长期来看的reward(a value function specifies what is good in the long run)。举个例子,agent可能选择一个暂时low的reward,但是在那个时间段内,总体的reward比较大。value function可以看作是对未来reward的estimate,是增强学习算法中核心的部分。
4. Model of the environment
model of the environment定义了环境因agent的action如何变化(the model of the environment is something that mimics the behavior of the environment, or more generally,that allows inferences to be made about how the environment will behavior)。
网友评论