Do Deep Nets Really Need to be D

作者: snowhou | 来源:发表于2018-11-08 10:27 被阅读0次

Do Deep Nets Really Need to be D
[20181221] 投资笔记：谁来解决不敢买、不敢拿、总割肉等
chrome中C++锁和条件变量-英文版
每天我们需要锻炼多少？真的需要1万步吗？
How to make cookies and pizzas
monarchy（君主立宪）
Do You Really Need @DirtiesConte
Do we really need a husband or
每周翻译练习——英语 25072016
你真的需要一个伴侣，才能开心吗？

一、主要思想

用一种模型压缩( Model Compression)[2]的方法训练浅层网络来模仿深层网络,得到只有一个隐藏层的浅层网络。 Shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures，文章通过实验验证了这一结论，并推论 there probably exist better algorithms for training shallow feed-forward nets than those currently available only by deep models.

当复杂模型可以被浅层模型来模拟时，就说明复杂模型的内在 function 并不是真正复杂的。模型的复杂性，和模型表达能力的复杂度是两回事。

二、 Training Shallow Nets to Mimic Deep Nets

1、training a state-of-the-art deep model

2、training a shallow model to mimic the deep model.

1. Model Compression

将未标签的数据输入 teacher model，得到的分数拿去训练 student model model，it is trained to learn the function that was learned by the larger mode。主要问题在于学习模型的复杂程度和达到最好学习效果的 size of the representation。
浅层网络在原始数据上直接训练要比深层网络更容易过拟合，所以采用模型压缩的方法就相当于一种正则化手段来缩小浅层网络和深层网络之间的 gap，如下图

gap

2.Mimic Learning via Regressing Logit with L2 Loss

shallow mimic models 的训练采用 softmax 层前的 Logit 输出， The logit values provide richer information to student to mimic the exact behaviours of a teach model.

3.Speeding-up Mimic Learning by Introducing a Linear Layer

模拟模型层数少，但节点多，运算非常慢，收敛的也慢，因此在输入层和非线性隐藏层之间加入一个线性层（含有 k 个 units），由于线性层可以被吸收到权重矩阵中，所以加入线性层之后，新的模型具备和原来一样的表达能力。
这样重新参数化权重矩阵不仅提高了收敛速度，也大大降低了内存空间，这样也就允许训练更大的浅层网络

4 Discussion

为了学习更难的深层模型，加入了一个卷积层和池化层。
SNN-MIMIC models for CIFAR-10 thus consist of a convolution and max pooling layer followed by fully connected 1200 linear units and 30k non-linear units.
shallow models with a number of parameters comparable to deep models are likely capable of learning even more accurate functions if a more accurate teacher and/or more unlabeled data became available
浅层网络更加适合当前的并行计算设备，计算速度更快，需要更少的计算周期(cycles)，更适合实时项目。

三、总结

本文的模型压缩算法使得在精确度和计算消耗上的权衡更容易了。
This approach allows one to adjust ﬂexibly the trade-off between accuracy and computational cost
Developing algorithms to train shallow models of high accuracy directly from the original data without going through the intermediate teacher model would, if possible, be a signiﬁcant contribution.
深度学习的优势可能来自于其深度结构和当前训练方法的 good match 。

For a given number of parameters, depth may make learning easier , but may not always be essential .

**参考文献：
[1] Lei Jimmy Ba, Rich Caruana. Do Deep Nets Really Need to be Deep? NIPS 2014· ·
[2] Cristian Bucilu, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. ACM SIGKDD, 2006

注：[2]主要是将复杂的集成模型转化成单层神经网络，结果是：mimic neural nets are 1000 times smaller and 1000 times faster。并且作者认为任何算法模型都可以通过模型压缩方法用简单的神经网络来模拟实现。
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

网友评论

本文标题：Do Deep Nets Really Need to be D

本文链接：https://www.haomeiwen.com/subject/ohvixqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！