Lecture 6: Value Function Approx

Lecture 6: Value Function Approx

作者: 魏鹏飞 | 来源:发表于2020-04-22 11:51 被阅读0次

Lecture 6: Value Function Approx
Lecture 6: Value Function Approx
Lecture 6 Value Function Approxi
Marketing
Your First Python Program
文本框输入数字和小数点整数和浮点数 JS判断
js 保留两位小数
2018-02-25 vue filter的运用截取字符串
学习
01.MySQL PHP语法

Author：David Silver
He was awarded the 2019 ACM Prize in Computing for breakthrough advances in computer game-playing.

Outline

Introduction
Incremental Methods
Batch Methods

Large-Scale Reinforcement Learning

Reinforcement learning can be used to solve large problems, e.g.

Backgammon: $10^{20}$ states
Computer Go: $10^{170}$ states
Helicopter: continuous state space

How can we scale up the model-free methods for prediction and control from the last two lectures?

Value Function Approximation

Types of Value Function Approximation

Which Function Approximator?

Gradient Descent

Value Function Approx. By Stochastic Gradient Descent

Feature Vectors

Linear Value Function Approximation

Table Lookup Features

Incremental Prediction Algorithms

Monte-Carlo with Value Function Approximation

TD Learning with Value Function Approximation

TD(λ) with Value Function Approximation

Control with Value Function Approximation

Action-Value Function Approximation

Linear Action-Value Function Approximation

Incremental Control Algorithms

Linear Sarsa with Coarse Coding in Mountain Car

Linear Sarsa with Radial Basis Functions in Mountain Car

Study of λ: Should We Bootstrap?

Baird’s Counterexample

Parameter Divergence in Baird’s Counterexample

Convergence of Prediction Algorithms

Gradient Temporal-Difference Learning

Convergence of Control Algorithms

Batch Reinforcement Learning

Gradient descent is simple and appealing
But it is not sample efficient
Batch methods seek to find the best fitting value function
Given the agent’s experience (“training data”)

Least Squares Prediction

Stochastic Gradient Descent with Experience Replay

Stochastic Gradient Descent with Experience Replay

Experience Replay in Deep Q-Networks (DQN)

DQN in Atari

DQN Results in Atari

How much does DQN help?

Linear Least Squares Prediction

Experience replay finds least squares solution
But it may take many iterations
Using linear value function approximation $\hat{v}(s, w) = x(s)^Tw$
We can solve the least squares solution directly

Linear Least Squares Prediction (2)

Linear Least Squares Prediction Algorithms

Linear Least Squares Prediction Algorithms (2)

Convergence of Linear Least Squares Prediction Algorithms

Least Squares Policy Iteration

Least Squares Action-Value Function Approximation

Least Squares Control

Least Squares Q-Learning

Least Squares Policy Iteration Algorithm

Convergence of Control Algorithms

Chain Walk Example

LSPI in Chain Walk: Action-Value Function

LSPI in Chain Walk: Policy

Questions?

Reference：《UCL Course on RL》

相关文章

Lecture 6: Value Function Approx
一、Introduction （一）Large-Scale Reinforcement Learning 强化学习...
Lecture 6: Value Function Approx
Author：David SilverHe was awarded the 2019 ACM Prize in C...
Lecture 6 Value Function Approxi
Value Function Approximation 如何将强化学习应用到大的数据集希望使用value fu...
Marketing
Lecture 1 Business begins with value creation ● Take some...
Your First Python Program
Every Python function returns a value, if the function ev...
文本框输入数字和小数点整数和浮点数 JS判断
function clearNoNum(obj){ obj.value = obj.value.replace(/...
js 保留两位小数
function returnFloat(value){var value=Math.round(parseFlo...
2018-02-25 vue filter的运用截取字符串
截取字符串： filters: { filterFun: function (value) { if(value&...
学习
function test(){ constcubic=value=>Math.pow(value,3); con...
01.MySQL PHP语法
//1.PHPMysqli函数mysqli_function(value,value,...);//2.funct...

网友评论

本文标题：Lecture 6: Value Function Approx

本文链接：https://www.haomeiwen.com/subject/jkspihtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|Lecture 6: Value Function Approx|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！