Reading Note: Pelee: A Real-Time

作者: joshua_1988 | 来源:发表于2018-04-27 22:05 被阅读11次

    title: "Reading Note: Pelee: A Real-Time Object Detection System on Mobile Devices"
    category: ["Computer Vision"]
    tag: ["Reading Note", "CNN", "Mobile Models", "Object Detection"]


    TITLE: Pelee: A Real-Time Object Detection System on Mobile Devicesn

    AUTHOR: Robert J. Wang, Xiang Li, Shuang Ao, Charles X. Ling

    ASSOCIATION: University ofWestern Ontario

    FROM: arXiv:1804.06882

    CONTRIBUTION

    1. A variant of DenseNet architecture called PeleeNet for mobile devices is proposed.
    2. The network architecture of Single Shot MultiBox Detector (SSD) is optimized for speed acceleration and then combine it with PeleeNet.

    METHOD

    BUILDING BLOCKS

    Two-Way Dense Layer. A 2-way dense layer is used to get different scales of receptive fields. One branch uses a small kernel size (3x3) to capture small-size objects. The other branch stacks two 3x3 convolution layers for larger objects. The structure is shown in the following figure.

    Two-Way Dense Layer

    Stem Block. This block is placed before the first dense layer for the sake of cost efficiency. This stem block can effectively improve the feature expression ability without adding computational cost too much. The structure is shown as follows.

    Stem Block

    Dynamic Number of Channels in Bottleneck Layer. The number of channels in the bottleneck layer varies according to the input shape to make sure the number of output channels does not exceed the number of its input channels.

    Transition Layer without Compression. experiments show that the compression factor proposed by DenseNet hurts the feature expression so that the number of output channels is kept the same as the number of input channels in transition layers.

    Composite Function. The post-activation (Convolution - Batch Normalization - Relu) is used for speed acceleration. For post-activation, all batch normalization layers can be merged with convolution layer at the inference stage. To compensate for the negative impact on accuracy caused by this change, a shallow and wide network structure is designed. In addition, a 1x1 convolution layer is added to the last dense block to get a stronger representational ability.

    ARCHITECTURE

    The framework of the work is illustrated in the following table.

    PeleeNet Architecture

    OPTIMIZATION FOR SSD

    Feature Map Selection. 5 scale feature maps (19x19, 10x10, 5x5, 3x3, and 1x1) are selected. Larger resolution features are discarded for speed acceleration.

    Residual Prediction Block. For each feature map used for detection, a residual block (ResBlock) is constructed before conducting prediction, shown in the following figure.

    PeleeNet SSD

    PERFORMANCE

    The classification performance on ILSVRC2012 is shown in the following table.

    ILSVRC2012

    The detection performance on VOC2007 is shown in the following table.

    VOC2007

    The detection performance on COCO2015 is shown in the following table.

    COCO

    SOME IDEAS

    From my own experince, DW convolution is not pruning friendly so that recently pruning methods, such as ThiNet and Net-Trim, works poorly on DW convolution. This work uses conventional convolutional layers, so maybe those pruning methods can play a role.

    相关文章

      网友评论

        本文标题:Reading Note: Pelee: A Real-Time

        本文链接:https://www.haomeiwen.com/subject/xzailftx.html