Abstract
idea
- RNN生成model descriptions
- train this RNN with reinforcement learning
achievments
- CIFAR-10 state-of-art
- compose a novel cecurrent cell & 这样的cell是transferrable
Introduction
idea
- 本文基于这样的观察:网络的结构以及连接方式可以看成一个变长字符串,则有可能通过RNN来生成
- 在一个强化学习系统中
- RNN充当控制器的角色,用来生成字符串,也就是model
- model经过训练,在验证集上得到精度,看成Reward
- 通过算法更新控制器RNN, 来取得更高的reward,也就是validation accuracy
achivement
- CV: cifar10 SOTA(3.65 test set error) & 1.05x faster
- NLP: Penn Treebank dataset SOTA
- 搜索到一个novel reccurrent cell which is better than RNN and LSTM.
Related Work
前人工作缺点
- 只能搜索定长的model
- Bayesian optimization methods 可以搜索不定长的结构,但是不够通用&不够灵活
- Modern neuro-evolution[神经进化] algorithms 在结构搜索上灵活得多。但是在大规模的结构搜索上不实用。主要是因为他们都是
search-based methods
, 因此会很慢或者需要许多heuristics[启发式信息]才能奏效
相似工作
- 程序合成. 相当于自动生成程序
- program synthesizers typically perform some form of search over the space of programs to generate a
program that is consistent with a variety of constraints (e.g. inputoutput examples, demonstrations, natural language, partial programs,
and assertions).
- program synthesizers typically perform some form of search over the space of programs to generate a
- end-to-end sequence to sequence learning。相似之处是:
auto-regressive[自回归]
- 自回归: 不用x预测y,而是用x预测x(自己)
- meta-learning. 相似之处在于:
- using a neural network to learn the gradient descent updates for another net- work
- using reinforcement learning to find update policies for another network
Methods
使用RNN Controller生成model descriptions
一个简单的RNN Controller示例- 何时停止继续生成?
- 当layer超过一定数目之后便停止。这个数据随着训练过程而增加
- 如何更新RNN Controller的参数?
- 把expected validation accuracy[验证精度的期望]看成reward, 使用policy gradient method来更新参数
Training with reinforce
介绍如何更新RNN Controller的参数
公式推导
关于(1.1)的解释,在相同的RNN Controller参数下,每个时步t下执行的动作是不同的(按照softmax采样的),因此产生的网络结构也是不同的,得到的精度R也不同,因此需要求R的期望,即(1.1). 后续令,则:
(1.2)对(1.1)重写, 下面对求导:
下面对上式中进行计算:
因为:
所以:
image
带入到绿色公式可以得到,即为文章中
[站外图片上传中...(image-b8ba6d-1571643802815)]
并行异步更新
用来加速RNN Controller的学习。没搞懂,暂且放着[1]
添加Skip Connections and Other Layer Types
添加Skip Connections和branching layer
-
方法:
-
每个layer添加一个anchor point, 则经过anchor point, RNN Controller有一个hidden state
-
通过概率,采样j是否连接到i layer
-
其中,, , 是可以学习的参数
-
-
问题:
- layer可能没有输入: 看成是input layer
- layer输出可能没被送到任何其他layers:都送到classifier
- layer有多个inputs,但是inputs尺寸不同,不能stack: pad zeros
添加其他类型layers
pooling, batchnorm, 甚至是Learning rate
- RNN Controller首先预测layer type,在预测相关的hyperparameters
Generate RNN Cell Architectures
本节介绍如何用本文方法生成RNN Cell。即不仅对CNN结构搜索有效,对RNN也同样能够进行搜索
- 以“base 2” Tree为例,生成一个RNN Cell [2]
- “base 2”: the tree has two leaf nodes
- 实际使用“base 8”
实验和结果
CNN for cifar10
-
Dataset:
- whitening
- upsample and random crop 32*32
- random horizotal crop
-
search space:
- layer type: RELU, BN, Skip connections
- filter hight in [1,3,5,7]
- filter width in [1,3,5,7]
- number of filters in [24,36,48,64]
- stride: 有两套实验,一套固定为1;令一套search in [1,2,3]
-
训练细节(更多见论文)
- child model:
- trained for 50 epochs
- 最后5轮中,最大的cubed validation acc 看成reward
- RNN Controller
- RNN会同时sample 800个child model并行分布地训练
- 每sample 1600个child model, child model的depth变成2倍(depth初始为6)
- child model:
-
结果:run a small grid search[3] over learning rate, weight decay, batchnorm epsilon and what epoch to decay the learning rate. The best model from this grid search is then run until convergence and we then compute the test accuracy of such model
- v1: stride=1 and 没有Pooling
- v2: search stride in [1,2,3], 由于搜索空间变大,因此精度略微下降
- v3: layer 13 and layer 24为max pooling
- v4:
- To limit the search space complexity we have our model predict 13 layers where each layer prediction is a fully connected block of 3 layers[4]
- filter number search from [24, 36, 48, 64] to [6, 12, 24, 36]
- adding 40 filters to each layer
RNN for PENN TREEBANK
RNN部分暂且不看
网友评论