2019-01-17 Paperman #1

作者: 朱小虎XiaohuZhu | 来源:发表于2019-01-17 00:34 被阅读7次

2019-01-17 Paperman #1
Paperman
PapeRman #6
PapeRman #3
PapeRman #4
PapeRman #5
2019-01-18
「动画推荐」让我忘不掉的《纸人》
780. 到达终点 (Reaching Points)
动作不熟练、没自信，孩子拖拉怎么办？60天易效能亲子时间管理践行

来自 DeepMind 的两篇重要论文，关于免模型规划和一般化的贡献分配研究。值得大家研读。感兴趣的小伙伴可以私信我。我们将在近期分享解析。

arXiv:1901.03559 [pdf, other]

An investigation of model-free planning

作者: Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

简介: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.
arXiv:1901.01761 [pdf, other]

Credit Assignment Techniques in Stochastic Computation Graphs

作者: Théophane Weber, Nicolas Heess, Lars Buesing, David Silver

简介: Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.

网友评论

本文标题：2019-01-17 Paperman #1

本文链接：https://www.haomeiwen.com/subject/kdcbdqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

2019-01-17 Paperman #1

相关文章

2019-01-17 Paperman #1

Paperman

PapeRman #6

PapeRman #3

PapeRman #4

PapeRman #5

2019-01-18

「动画推荐」让我忘不掉的《纸人》

780. 到达终点 (Reaching Points)

动作不熟练、没自信，孩子拖拉怎么办？60天易效能亲子时间管理践行

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

DeepMind

PapeRman

智能跃迁——通用人工智能 AGI