EfficientNet论文阅读

作者: FantDing | 来源:发表于2019-10-17 20:20 被阅读0次

EfficientNet论文阅读
EfficientNet论文阅读理解
EfficientNet
速度与精度的结合 - EfficientNet 详解
经典卷积模型之EfficientNet
【论文笔记】EfficientNet: Rethinking M
EfficientNet介绍
EfficientNet
EfficientNet
《EfficientNet: Rethinking Model

论文原文

Abstract

Introduction

scale up:
- 常规：increase depth, width, resolution.
- new method: compound scaling method
baseline network:
- architecture search: EfficientNet-B0

Related Work

ConvNet Accuracy

在精度方面的state-of-art

GoogleNet
SENet
GPipe

ConvNet Efficiency

几种efficient手段

model compression
手动设计的 mobile-size的卷积网络
architecture search的mobile sizes卷积网络

这几种efficient手段无法应用于large model(larger design space, expensive tuning cost)

Model scaling

depth: ResNets
Width: WideResNet
- width: 指channels
input image size

虽然这3种model scaling方法能增加精度，但是没能说明how to effectively scale to trade off efficiency and accuracy

Compound Model Scaling

Problem Formulation

怎么看这一小节都没什么用

一个卷积层可以形式化成: $Y_i=F_i(X_i)$ ， $Y_i$ 是输出tensor; $X_i$ 是输入tensor, with shape $<H_i,W_i,C_i>$
则一个model可以写成公式: $N=F_k \odot...\odot F_2 \odot F_1(X_1)=\odot_{j=1...k}F_j(X_1)$
而实际中，模型是分为多个stage的，每个stage中卷积类型是相同的，因此网络又可以定义为: $N=\odot_{i=1...s}F_i^{L_i}(X_{<H_i,W_i,C_i>})$ , 其中 $F_i^{L_i}$ 代表，在stage $i$ 中， layer $F_i$ 重复了 $L_i$ 次
简化问题
- step1: 不关注find best layer architecture $F_i$ , 而是在预定义的baseline network上，搜索 $L_i, C_i, H_i, W_i$
- step2: 即使这样 $L_i, C_i, H_i, W_i$ 的搜索空间仍然很大，为此,约束所有layer的scale比例都是一个常数(不同维度的比例不同，不同layer的比例相同)
- 因此得到一下优化问题：

优化问题

Scaling Dimensions

单一维度scale

Depth(d):

好处:
- deeper net可以capture richer and complex features
- generalize well
弊端
- 越深的网络越难收敛
- 网络加深到一定程度,精度提高有限

Width(w):

好处：
- capture more fine-grained features
- easier to train
弊端
- wide shallow net不易capture hign level features

Resolution(r):

好处:
- 确实可以提高精度
弊端
- 使用very high resolutions精度提高不大

看图总结：
单独scale up每个维度都能提高精度，但是对于更大点的模型，这种"Accuracy gain"的利好便不再有了

image

Compound Scaling

三个维度都scale up

经验得知，不同维度之间的scale up是相互影响的。如输入higher resolution images，为了增加感受野范围，需要增加网络depth；为了capture fine-grained patterns，需要增加width

如下图所示，使用相同的baseline network

蓝色: 在depth和resolution不变的情况下，不断增大w的值
红色: depth变成原来两倍，resolution变成原来1.3倍，再不断增大w

image

看图结论：
在FLOPS相同的情况下，to pursue better accuracy and efficiency ,it is citical to balance all dimensions

compound scaling method

作者提出了一种scale原则，如下图：
[站外图片上传中...(image-b36cca-1571643636142)]

$\alpha, \beta, \gamma$ ：通过grid search得到
$\phi$ : compound coefficient。是用户依据可利用资源数量手动给定的系数
通过上述两步就可以确定,depth,width,resolution的伸缩因子了

FLOPS

float per second

对于一个卷积操作,FLOPS与 $d$ , $w^2$ , $r^2$ 是成比例的^[1]。如， $d=2$ , 即depth变成原来两倍,FLOPS也会变成原来2倍； $w$ or $r$ 变成原来2倍，FLOPS将会变成原来四倍。 $FLOPS增加的倍数=d*w^2*r^2$ , 如果写成关于 $\phi$ , 则有 $FLOPS增加的倍数=\alpha^\phi*(\beta^\phi)^2*(\gamma^\phi)^2=(\alpha*\beta^2*\gamma^2)^\phi=2^\phi$

其中有一些是“约等于”的关系：

网络总的FLOPS，约等于总的卷积操作FLOPS
(3)式的等式约束是约等于的

EfficientNet architecture

EfficientNet-B0

B0网络是baseline network

是通过neural architecture search技术搜索^[2] 出来的
mobile-sized

efficientNet-b0网络结构图

[站外图片上传中...(image-d9c2b1-1571643636142)]

如何进行scale up

step1: 令 $\phi=1$ , 通过small grid search找到最优 $\alpha, \beta, \gamma$ . “最优”是指的是 $ACC(model)$ 最大，搜索出来的 $\alpha=1.2, \beta=1.1, \gamma=1.15$
step2: 固定 $\alpha, \beta, \gamma$ ，增大 $\phi$ ,得到新的 $d, w, r$ .从而得到EfficientNet-B1到B7

其实正常的做法是，先令 $\phi=1$ ,进行一次搜索得到 $\alpha, \beta, \gamma$ ；再令 $\phi=2$ ,搜索一次 $\alpha, \beta, \gamma$ ；... 但是为了减少搜索的代价，作者使用了上述的简便方式

实验

Scaling up MobileNets and ResNets

在MobileNets和ResNets上比较两种scaling方法，说明了compound scaling比single-dimension scaling好^[3]

image

ImageNet Results for EfficientNet

训练细节

bigger models need more regularization.因此大模型的dropout要增大
what
- norm momentum
- swish activation
- fixed AutoAugment policy
- stochastic depth

性能对比

TOP-1 ACC
TOP-5 ACC
parameters
FLOPS

Latancy

为了说明real hardware上真实有效，又做了inference latency实验对比
[站外图片上传中...(image-77c9b6-1571643636142)]

Transfer Learning Results for EfficientNet

在其他8个数据集上比较，有5个数据集都做到了state-of-art, but magnitude fewer parameters

Discussion

为了说明 compound scaling比single-dimension scaling好，作者又在B0上，做了不同scaling的比较实验。compound scaling能有2.5%的精度提升

scaling up EfficientNet-B0

why better

通过activation map可视化^[4]发现compound scaling method能够让模型关注more relevant regions with more object details

different scaling method at the same baseline model

参考文章

google官方博文

角注

why FLOPS与width, resolution是平方的关系
- FLOPS计算见此文
↩
neural architecture search
- Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M.,
  Howard, A., and Le, Q. V. MnasNet: Platform-aware
  neural architecture search for mobile. CVPR, 2019.
- MnasNet
- MBConv
- squeeze-and-excitation optimization
↩
为什么不搜索一个最好的w or d来比较,而是随便使用了2、4之类的来比较 ↩
activation map可视化: 《Learning deep features for discriminative localization》

CAM生成方式 ↩

EfficientNet论文阅读
论文原文 Abstract Introduction scale up:常规：increase depth, wi...
EfficientNet论文阅读理解
论文地址：https://link.zhihu.com/?target=https%3A//arxiv.org/a...
EfficientNet
论文：EfficientNet: Rethinking Model Scaling for Convolution...
速度与精度的结合 - EfficientNet 详解
初识本篇将为你介绍来自 google 的 EfficientNet，论文提出了一种多维度混合的模型放缩方法。论文...
经典卷积模型之EfficientNet
EfficientNet模型一、模型框架 2019年，谷歌新出EfficientNet，网络如其名，这个网络非常...
【论文笔记】EfficientNet: Rethinking M
URL:EfficientNet: Rethinking Model Scaling for Convolutio...
EfficientNet介绍
什么是EfficientNet EfficientNet的设想就是能否设计一个标准化的卷积网络扩展方法，既可以实现...
EfficientNet
速度与精度的结合 - EfficientNet 详解 https://arxiv.org/pdf/1905.119...
EfficientNet
《EfficientNet: Rethinking Model
论文地址：https://arxiv.org/pdf/1905.11946.pdfGitHub实现地址：https...

EfficientNet论文阅读

Abstract

Introduction

Related Work

ConvNet Accuracy

ConvNet Efficiency

Model scaling

Compound Model Scaling

Problem Formulation

Scaling Dimensions

Depth(d):

Width(w):

Resolution(r):

Compound Scaling

compound scaling method

FLOPS

EfficientNet architecture

EfficientNet-B0

如何进行scale up

实验

Scaling up MobileNets and ResNets

ImageNet Results for EfficientNet

训练细节

性能对比

Latancy

Transfer Learning Results for EfficientNet

Discussion

why better

参考文章

角注

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读