2019-01-17 Paperman #1

作者: 朱小虎XiaohuZhu | 来源:发表于2019-01-17 00:34 被阅读7次

来自 DeepMind 的两篇重要论文,关于免模型规划和一般化的贡献分配研究。值得大家研读。感兴趣的小伙伴 可以私信我。我们将在近期分享解析。

  1. arXiv:1901.03559 [pdf, other]

    An investigation of model-free planning

    作者: Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

    简介: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.

  2. arXiv:1901.01761 [pdf, other]

    Credit Assignment Techniques in Stochastic Computation Graphs

    作者: Théophane Weber, Nicolas Heess, Lars Buesing, David Silver

    简介: Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.

相关文章

  • 2019-01-17 Paperman #1

    来自 DeepMind 的两篇重要论文,关于免模型规划和一般化的贡献分配研究。值得大家研读。感兴趣的小伙伴 可以私...

  • Paperman

    最后一班地铁,所有人低着头,我看不清任何一个人的脸孔。我四处张望,无法寻找到可以交汇的目光。人们的面颊纷纷映着白炽...

  • PapeRman #6

    本文描述了一个新的推断智能体动机的方法。该方法基于影响图,这是一种图模型的类型,包含特别的决策和效用节点。图标准可...

  • PapeRman #3

    The Evolved Transformer Authors: David R. So, Chen Liang,...

  • PapeRman #4

    分布算法目前是强化学习的有趣的发现。以此为基础可以构造更具严格理论支持的强化学习算法。本系列给出最近 Google...

  • PapeRman #5

    对抗健壮性的研究非常具有挑战性。在众多研究方向中,存在一些相应的进展。本篇论文是一个较清楚的整理,有助于大家更好地...

  • 2019-01-18

    2019-01-17 20190114-20190120计划 1.阅读:小狗钱钱,人生效率手册 2.运动:每天坚持...

  • 「动画推荐」让我忘不掉的《纸人》

    文/Jove桥薇 第一次看到《paperman》(中文译作“纸人”),那时候还在读大学,我正在去往汽车车站的途中,...

  • 780. 到达终点 (Reaching Points)

    title: ' 780. 到达终点 (Reaching Points)'date: 2019-01-17 17:...

  • 动作不熟练、没自信,孩子拖拉怎么办?60天易效能亲子时间管理践行

    打卡日期:2019-01-17 打卡累计天数:7/60 60天践行目标: 1、早起6:00-6:30,早睡22:0...

网友评论

    本文标题:2019-01-17 Paperman #1

    本文链接:https://www.haomeiwen.com/subject/kdcbdqtx.html