Ubuntu18.04跑faster-rcnn安装配置

作者: 乘瓠散人 | 来源:发表于2019-03-04 22:42 被阅读2次

Ubuntu18.04跑faster-rcnn安装配置
VirtualBox安装linux
Ubuntu18.04.2LTS安装、配置、美化
Ubuntu PyTorch 配置GPU环境
redis 安装
Ubuntu18.04安装flash
sublime安装与配置：带你一起看官方文档
01-安装和优化Ubuntu18.04作为C/C++的开发环境
ubuntu18.04下K8S安装记录
ubuntu “无法获得锁 /var/lib/dpkg/lock

我的配置: Ubuntu 18.04+nvidia 410.78+cuda 10.0+cudnn 7.4.2

下载 py-faster-rcnn
git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
由于用到了caffe框架，所以需要先将caffe依赖的包装上

sudo apt-get install python-pip
sudo pip install cython  
sudo pip install easydict 
sudo apt-get install python-opencv

还需要装：

boost
sudo apt-get install libboost-all-dev
proto
sudo apt-get install libprotobuf-dev protobuf-c-compiler protobuf-compiler
glog
sudo apt-get install libgoogle-glog-dev
gflags
sudo apt-get install libgflags-dev
lmdb
sudo apt-get install liblmdb-dev
leveldb
sudo apt-get install libleveldb-dev
snappy
sudo apt-get install libsnappy-dev
opencv
sudo apt-get install libopencv-dev
BLAS
sudo apt-get install libatlas-base-dev
hdf5.h头文件
sudo apt-get install libhdf5-\*

编译caffe-faster-rcnn

编译Cython模块
cd py-faster-rcnn/lib
make
编译caffe和pycaffe
先进入caffe-fast-rcnn目录下
cd py-faster-rcnn/caffe-fast-rcnn
复制Makefile.config.example为Makefile.config
cp Makefile.config.example Makefile.config
编辑Makefile.config，对应地方改为如下形式：

USE_CUDNN := 1
WITH_PYTHON_LAYER := 1
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial 
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

这时进行编译还是会出现错误，faster-rcnn默认的caffe支持的cudnn版本是v4，因此编译caffe会出现版本不兼容而导致的函数参数不对应的错误。这时参考博文https://blog.csdn.net/flygeda/article/details/78638824，下载caffe最新源码https://github.com/BVLC/caffe

用最新caffe源码的以下文件替换掉caffe-fast-rcnn中的对应文件：
include/caffe/layers/cudnn_relu_layer.hpp
src/caffe/layers/cudnn_relu_layer.cpp
src/caffe/layers/cudnn_relu_layer.cu
include/caffe/layers/cudnn_sigmoid_layer.hpp
src/caffe/layers/cudnn_sigmoid_layer.cpp
src/caffe/layers/cudnn_sigmoid_layer.cu
include/caffe/layers/cudnn_tanh_layer.hpp
src/caffe/layers/cudnn_tanh_layer.cpp
src/caffe/layers/cudnn_tanh_layer.cu

include/caffe/util/cudnn.hpp

将caffe-fast-rcnn中的src/caffe/layers/cudnn_conv_layer.cu 文件中所有的
cudnnConvolutionBackwardData_v3 函数名替换为 cudnnConvolutionBackwardData
cudnnConvolutionBackwardFilter_v3函数名替换为 cudnnConvolutionBackwardFilter

然后进行编译：

cd py-faster-rcnn/caffe-fast-rcnn
make -j8 && make pycaffe

这时编译又遇到一个错误nvcc fatal : Unsupported gpu architecture 'compute_20'，这时需要将Makefile.config中CUDA_ARCH配置去掉
-gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
然后编译完成。

获取faster-rcnn模型
cd py-faster-rcnn
./data/scripts/fetch_faster_rcnn_models.sh
服务器没法翻墙，所以我先在本地下载后传到服务器的py-faster-rcnn/data目录下，下载URL位于fetch_faster_rcnn_models.sh中。
然后进行解压:tar -xvf faster_rcnn_models.tgz
运行demo
cd py-faster-rcnn
sudo ./tools/demo.py

报错：ImportError: No module named skimage.io
解决：sudo apt-get install python-skimage
报错：ImportError: No module named google.protobuf.internal
解决：pip install protobuf
报错：

Cannot create Cublas handle. Cublas won't be available
...中间省略几十行
Check failed: status == CUDNN_STATUS_SUCCESS (1 VS. 0) CUDNN_STATUS_NOT_INITIALIZED

电脑之前安装了cuda10.1，这个版本是不适合我的显卡驱动410.78的，一直没删，在此将其卸载，只保留cuda10.0。到/usr/local/cuda-10.1/bin目录下执行./cuda_uninstaller

报错：

...
in <module>
    from nms.gpu_nms import gpu_nms
ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory

这是由于我之前用cuda10.1编译过，而换成cuda10.0进行编译后部分文件并没有进行重新编译，依然依赖cuda10.1。所以需要将/py-faster-rcnn/lib/下文件夹中所有的*.so文件删除，之后再重新进行make。
至此demo运行成功：）

下载在ImageNet上pre-trained的模型参数（用于初始化网络参数）
cd py-faster-rcnn
./data/scripts/fetch_imagenet_models.sh
下载不下来的话方法同4.
创建PASCAL VOC数据集的符号链接，以便可以在多个项目使用该数据集，$VOCdevkit为你下载的数据集的目录
cd py-faster-rcnn/data
ln -s $VOCdevkit VOCdevkit2007
用VOC数据集进行训练
cd py-faster-rcnn
./experiments/scripts/faster_rcnn_alt_opt.sh [GPU_ID] [NET] [--set...]
./experiments/scripts/faster_rcnn_alt_opt.sh 1 ZF pascal_voc
此时报错：

File "/home/zd/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 51, in __init__
     pb2.text_format.Merge(f.read(), self.solver_param)
AttributeError: 'module' object has no attribute 'text_format'

解决办法是在py-faster-rcnn/lib/fast_rcnn/train.py中加上一句代码：
import google.protobuf.text_format
然后开始training...
但是跑了一会儿又报了个错：

  File "/home/zd/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 110, in _sample_rois
fg_inds, size=fg_rois_per_this_image, replace=False
  File "mtrand.pyx", line 1176, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:18822)
TypeError: 'numpy.float64' object cannot be interpreted as an index

于是重装numpy1.11.0版本sudo pip install -U numpy==1.11.0
但是会出现新的错误ImportError: numpy.core.multiarray failed to import
于是参考https://github.com/rbgirshick/py-faster-rcnn/issues/626 修改py-faster-rcnn/lib/roi_data_layer/minibatch.py文件中的line55 line98 line110 line124 line175，并且将numpy版本升级到1.13.1sudo pip install -U numpy==1.13.1

参考文章：
[1] Kali新手喝咖啡(Caffe)的艰辛之路
[2] Caffe-GPU编译问题:nvcc fatal:Unsupported gpu architecture 'compute_20'
[3] Ubuntu16.04 faster-rcnn+caffe+gpu运行环境配置以及解决各种bug
[4] caffe学习(四):py-faster-rcnn配置，运行测试程序(Ubuntu)