OpenCV在移动端的优化

作者: 9b9e2461db01 | 来源:发表于2016-06-30 14:10 被阅读0次

先说一下OpenCV在Android移动端的历史。OpenCV官方正式开始支持Android平台的是在OpenCV 2.4版本，2012年。OpenCV4Android是OpenCV库支持Android接口的官方命名。共提供两种接入方式：使用OpenCV Java API以及使用Android NDK。
使用Java API的好处是接入简单，缺点是只支持OpenCV的部分库函数以及由于是封装了C++而带来的一小部分性能损失。具体可以看一下下面两张图：假如视频的每一帧都需要调用三个OpenCV的函数，使用Java API的话就需要有三对JNI的输入输出，也就是应用每一帧会导致6次JNI的调用；而使用native C++的方式时，OpenCV的部分完全由C++写，在调用OpenCV的函数时完全绕开了JNI的调用，就会将每一帧JNI的调用次数从6次减到了2次，对性能表现会有优化。当然，如果只是调用一次OpenCV函数的过程的话，性能优化方面其实是不明显的。

bdti figure 2 500.jpg

使用Java API

bdti figure 3 500.jpg

使用Native C++
OpenCV针对NVDIA的的Tegra3及以上平台系列做了针对Android操作系统的优化(TADP, Tegra Android Development Pack)，在Tegra上使用OpenCV通常能达到几倍快于一般平台上基于Android的实现。也是从Tegra3开始，OpenCV开始支持 ARM的SIMD扩展，NEON。这里在Google Play上有专门一个针对Tegra平台性能的Demo—OpenCV for Tegra Demo。以上可以为今后针对Tegra的芯片优化做参考。
NEON专为矢量操作设计，适合用于图片处理，对许多视觉算法可以显著提升速度。下面举个栗子：将RGB转为灰图，一般的颜色转换这么写，每次处理一个像素：

{
    for(int i=0; i<num_pixels; ++i, rgb+=3)
    {
        int v = (77*rgb[0] + 150*rgb[1] +29*rgb[2]);
        gray[i] v>>8;
    }
}```
使用NEON intrinsics
```void rgb_to_gray_neon(const uint8_t* rgb, uint8_t* gray, int num_pixels)
{
    // We'll use 64-bit NEON registers to process 8 pixels in parallel.
    num_pixels /= 8;
    // Duplicate the weight 8 times.
    uint8x8_t w_r = vdup_n_u8(77);
    uint8x8_t w_g = vdup_n_u8(150);
    uint8x8_t w_b = vdup_n_u8(29);
    // For intermediate results. 16-bit/pixel to avoid overflow.
    uint16x8_t temp;
    // For the converted grayscale values.
    uint8x8_t result;
    for(int i=0; i<num_pixels; ++i, rgb+=8*3, gray+=8)
    {
            // Load 8 pixels into 3 64-bit registers, split by channel.
            uint8x8x3_t src = vld3_u8(rgb);
            // Multiply all eight red pixels by the corresponding weights.
            temp = vmull_u8(src.val[0], w_r);
            // Combined multiply and addition.
            temp = vmlal_u8(temp, src.val[1], w_g);
            temp = vmlal_u8(temp, src.val[2], w_b);
            // Shift right by 8, "narrow" to 8-bits (recall temp is 16-bit).
            result = vshrn_n_u16(temp, 8);
            // Store converted pixels in the output grayscale image.
            vst1_u8(gray, result);
    }
 }```
优化的结果(iPhone 5S)如下图所示：
![optimization results.png](https://img.haomeiwen.com/i1712633/9cbce1473bd1d0f2.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
使用NEON，可以看到从之前的一个一个像素串行处理转为8个像素并行处理，一秒中提升的速度在6倍左右。没有达到8倍理论值，猜想可能损失在寄存器数据的存取上。
根据OpenCV官网论坛的说法，只有一小部分的代码使用了ARM NEON加速，分别是：
    •   cvCanny - modules\imgproc\src\canny.cpp
    •   cvDilate - modules\imgproc\src\morph.cpp
    •   cvResize - modules\imgproc\src\imgwarp.cpp
    •   cvtColor - modules\imgproc\src\color.cpp

针对加速的平台可用的有：
    •   Intel Performance Primitives (IPP)
    •   NVIDIA CUDA
    •   NVIDIA Tegra
    •   x86 SIMD (SSE2 and up)

总结
在Android平台上使用NEON优化的局限性较高，针对的是NVIDIA Tegra 3及以上平台，具体对Tegra 3做了哪些优化以及如何详细使用可参考这里。在做Android端的平台适配时，需要对NEON是否支持作判断。

Reference:
http://www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/documents/pages/developing-opencv-computer-vision-app
https://en.wikipedia.org/wiki/Tegra
http://web.stanford.edu/class/cs231m/lectures/lecture-4-opencv.pdf
http://answers.opencv.org/question/33940/are-these-functions-accelerated-by-arm-neon/
http://queue.acm.org/detail.cfm?id=2206309

网友评论

本文标题：OpenCV在移动端的优化

本文链接：https://www.haomeiwen.com/subject/lstfjttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

OpenCV在移动端的优化

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读