03-07 Dyna

03-07 Dyna

作者: woodwood2000 | 来源:发表于2017-12-21 16:03 被阅读0次

03-07 Dyna
ML4T笔记 | 03-07 Dyna
type 'Image' is not a subtype of
ssh 访问越狱iPhone的两种方式
人类已无法阻挡它的三级跳，波士顿动力终结者来临！
Principle of Locality I: Hacking
学读《乙亥杂诗》NO•4
关于春天的五首小诗Five ancient Chinese po
EMF-20210920 马斯克谈中国新能源汽车市场的发展
03-07

https://classroom.udacity.com/courses/ud501/lessons/5326212698/concepts/54629888620923

hallucinate 产生幻觉

Dyna-Q：混合 Model-Free 和 Model-based

image.png

每一次和真实世界的交互，都会自己更新100次。

image.png

T'[s,a,s']: 从状态 s，采取动作 a，到状态 s’的概率
R'[s,a]: 从状态 s，采取动作 a的 reward

image.png

image.png

image.png

根据真实世界发生的次数，更新 T

image.png

练习: How To Evaluate T?

Type in your expression usingMathQuill

a WYSIWYG math renderer that understands LaTeX.

Correction: The expression should be:

Computing transition probabilities using counts

image.png

image.png

image.png

R：模型中的 Reward
r: 真实的立即 reward

image.png

Summary

The Dyna architecture consists of a combination of:

direct reinforcement learning from real experience tuples gathered by acting in an environment,
updating an internal model of the environment, and,
using the model to simulate experiences.

Sutton and Barto.
Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. [web]

Resources

Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In

Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, 1990. [pdf]
Sutton and Barto.

Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. [web]
RL course by David Silver

(videos, slides)
- Lecture 8: Integrating Learning and Planning [pdf]

相关文章

03-07 Dyna
https://classroom.udacity.com/courses/ud501/lessons/53262...
ML4T笔记 | 03-07 Dyna
1 - Overview Q-learning is expensive because it takes man...
type 'Image' is not a subtype of
type 'Image' is not a subtype of type 'ImageProvider
ssh 访问越狱iPhone的两种方式
1.sshiphoneStartUSBTUNNL2.cd/Library/MobileSubstrate/Dyna...
人类已无法阻挡它的三级跳，波士顿动力终结者来临！
转载参考：https://interestingengineering.com/video/boston-dyna...
Principle of Locality I: Hacking
Outline What does "Continuum Mean-Field" mean ? From Dyna...
学读《乙亥杂诗》NO•4
诗句英译： Yi Hai Miscellaneous Poems Gong Zizhen in Qing Dyna...
关于春天的五首小诗Five ancient Chinese po
1. Spring Morning - by Meng Haoran (689-740) of Tang Dyna...
EMF-20210920 马斯克谈中国新能源汽车市场的发展
2021-0920，打卡第1082天。China is the largest and the most dyna...
03-07
久违的读书分享，身体感受着紧张的等待，用书中的感悟安慰自己——我们所有人都是一个整体，用不着紧张，但有时候理论和实...

网友评论

本文标题：03-07 Dyna

本文链接：https://www.haomeiwen.com/subject/cmjawxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|03-07 Dyna|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！