GPU高效的GhostNets：不涉及太多GPU低效操作（例如深

作者: Valar_Morghulis | 来源:发表于2022-07-12 13:49 被阅读0次

GPU高效的GhostNets：不涉及太多GPU低效操作（例如深
pytorch多GPU并行运算
容器是如何调用GPU的
Metal的简短旅程 <- Metal
Unity 与计算机图形学
Draw Call是什么？
Draw Call 的产生和优化
什么是DrawCall
Tensorflow简单计算操作
Tensorflow的线程同步和停止

GhostNets on Heterogeneous Devices via Cheap Operations

注：这是GhostNets的扩展版本，分别为CPU和GPU设计结构

编者注：编者还没细看，但是GPU高效版本的实验结论让编者有些困惑。G-Ghost的baseline主要是一些比较早的模型（eg:ResNet）或者RegNet，速度提升也不大，遗漏了一些新的模型（例如RepVGG、EfficientNetv2），而且最后一张图（图15）将G-Ghost只与一些为CPU计算效率而优化的模型（包括MobileNetv2、MobileNetv3）在GPU平台上对比，是否是不足够的?

10 Jan 2022

IJCV 2022

原文地址：https://arxiv.org/abs/2201.03297

作者：Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chunjing Xu, Enhua Wu, Qi Tian

作者单位：华为

https://github.com/huawei-noah/Efficient-AI-Backbones

Deploying convolutional neural networks (CNNs) on mobile devices is difficult due to the limited memory and computation resources. We aim to design efficient neural networks for heterogeneous devices including CPU and GPU, by exploiting the redundancy in feature maps, which has rarely been investigated in neural architecture design. For CPU-like devices, we propose a novel CPU-efficient Ghost (C-Ghost) module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. C-Ghost bottlenecks are designed to stack C-Ghost modules, and then the lightweight C-GhostNet can be easily established. We further consider the efficient networks for GPU devices. Without involving too many GPU-inefficient operations (e.g.,, depth-wise convolution) in a building stage, we propose to utilize the stage-wise feature redundancy to formulate GPU-efficient Ghost (G-Ghost) stage structure. The features in a stage are split into two parts where the first part is processed using the original block with fewer output channels for generating intrinsic features, and the other are generated using cheap operations by exploiting stage-wise redundancy. Experiments conducted on benchmarks demonstrate the effectiveness of the proposed C-Ghost module and the G-Ghost stage. C-GhostNet and G-GhostNet can achieve the optimal trade-off of accuracy and latency for CPU and GPU, respectively. Code is available at this https URL.

由于内存和计算资源有限，在移动设备上部署卷积神经网络很困难。我们的目标是利用特征映射中的冗余，为包括CPU和GPU在内的异构设备设计高效的神经网络，这在神经架构设计中很少被研究。对于类似CPU的设备，我们提出了一种新的CPU高效Ghost（C-Ghost）模块，以从廉价的操作中生成更多的特征映射。基于一组内在特征映射，我们以低廉的成本应用一系列线性变换来生成许多能够充分揭示内在特征背后信息的鬼特征映射。提出的C-Ghost模块可以作为即插即用组件来升级现有卷积神经网络。C-Ghost瓶颈设计用于堆叠C-Ghost模块，然后可以轻松建立轻量级C-Ghost网。我们进一步考虑了GPU设备的有效网络。在构建阶段不涉及太多GPU低效操作（例如深度卷积），我们建议利用阶段特征冗余来制定GPU高效Ghost（g-Ghost）阶段结构。阶段中的特征分为两部分，第一部分使用具有较少输出通道的原始块进行处理，以生成内部特征，另一部分通过利用阶段冗余使用廉价操作生成。在基准测试上进行的实验证明了所提出的C-Ghost模块和G-Ghost阶段的有效性。C-GhostNet和G-GhostNet可以分别实现CPU和GPU在准确性和延迟方面的最佳权衡。此https URL提供代码。