我的配置: Ubuntu 18.04+nvidia 410.78+cuda 10.0+cudnn 7.4.2
- 下载 py-faster-rcnn
git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
- 由于用到了caffe框架,所以需要先将caffe依赖的包装上
sudo apt-get install python-pip
sudo pip install cython
sudo pip install easydict
sudo apt-get install python-opencv
还需要装:
- boost
sudo apt-get install libboost-all-dev
- proto
sudo apt-get install libprotobuf-dev protobuf-c-compiler protobuf-compiler
- glog
sudo apt-get install libgoogle-glog-dev
- gflags
sudo apt-get install libgflags-dev
- lmdb
sudo apt-get install liblmdb-dev
- leveldb
sudo apt-get install libleveldb-dev
- snappy
sudo apt-get install libsnappy-dev
- opencv
sudo apt-get install libopencv-dev
- BLAS
sudo apt-get install libatlas-base-dev
- hdf5.h头文件
sudo apt-get install libhdf5-\*
- 编译caffe-faster-rcnn
- 编译Cython模块
cd py-faster-rcnn/lib
make
- 编译caffe和pycaffe
先进入caffe-fast-rcnn目录下
cd py-faster-rcnn/caffe-fast-rcnn
复制Makefile.config.example为Makefile.config
cp Makefile.config.example Makefile.config
编辑Makefile.config,对应地方改为如下形式:
USE_CUDNN := 1
WITH_PYTHON_LAYER := 1
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
这时进行编译还是会出现错误,faster-rcnn默认的caffe支持的cudnn版本是v4,因此编译caffe会出现版本不兼容而导致的函数参数不对应的错误。这时参考博文https://blog.csdn.net/flygeda/article/details/78638824,下载caffe最新源码https://github.com/BVLC/caffe
用最新caffe源码的以下文件替换掉caffe-fast-rcnn中的对应文件:
include/caffe/layers/cudnn_relu_layer.hpp
src/caffe/layers/cudnn_relu_layer.cpp
src/caffe/layers/cudnn_relu_layer.cu
include/caffe/layers/cudnn_sigmoid_layer.hpp
src/caffe/layers/cudnn_sigmoid_layer.cpp
src/caffe/layers/cudnn_sigmoid_layer.cu
include/caffe/layers/cudnn_tanh_layer.hpp
src/caffe/layers/cudnn_tanh_layer.cpp
src/caffe/layers/cudnn_tanh_layer.cu
include/caffe/util/cudnn.hpp
将caffe-fast-rcnn中的src/caffe/layers/cudnn_conv_layer.cu 文件中所有的
cudnnConvolutionBackwardData_v3 函数名替换为 cudnnConvolutionBackwardData
cudnnConvolutionBackwardFilter_v3函数名替换为 cudnnConvolutionBackwardFilter
然后进行编译:
cd py-faster-rcnn/caffe-fast-rcnn
make -j8 && make pycaffe
这时编译又遇到一个错误nvcc fatal : Unsupported gpu architecture 'compute_20'
,这时需要将Makefile.config中CUDA_ARCH
配置去掉
-gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
然后编译完成。
-
获取faster-rcnn模型
cd py-faster-rcnn
./data/scripts/fetch_faster_rcnn_models.sh
服务器没法翻墙,所以我先在本地下载后传到服务器的py-faster-rcnn/data目录下,下载URL位于fetch_faster_rcnn_models.sh中。
然后进行解压:tar -xvf faster_rcnn_models.tgz
-
运行demo
cd py-faster-rcnn
sudo ./tools/demo.py
- 报错:
ImportError: No module named skimage.io
解决:sudo apt-get install python-skimage
- 报错:
ImportError: No module named google.protobuf.internal
解决:pip install protobuf
- 报错:
Cannot create Cublas handle. Cublas won't be available
...中间省略几十行
Check failed: status == CUDNN_STATUS_SUCCESS (1 VS. 0) CUDNN_STATUS_NOT_INITIALIZED
电脑之前安装了cuda10.1,这个版本是不适合我的显卡驱动410.78的,一直没删,在此将其卸载,只保留cuda10.0。到/usr/local/cuda-10.1/bin
目录下执行./cuda_uninstaller
- 报错:
...
in <module>
from nms.gpu_nms import gpu_nms
ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory
这是由于我之前用cuda10.1编译过,而换成cuda10.0进行编译后部分文件并没有进行重新编译,依然依赖cuda10.1。所以需要将/py-faster-rcnn/lib/
下文件夹中所有的*.so
文件删除,之后再重新进行make
。
至此demo运行成功:)
-
下载在ImageNet上pre-trained的模型参数(用于初始化网络参数)
cd py-faster-rcnn
./data/scripts/fetch_imagenet_models.sh
下载不下来的话方法同4. -
创建PASCAL VOC数据集的符号链接,以便可以在多个项目使用该数据集,
$VOCdevkit
为你下载的数据集的目录
cd py-faster-rcnn/data
ln -s $VOCdevkit VOCdevkit2007
-
用VOC数据集进行训练
cd py-faster-rcnn
./experiments/scripts/faster_rcnn_alt_opt.sh [GPU_ID] [NET] [--set...]
./experiments/scripts/faster_rcnn_alt_opt.sh 1 ZF pascal_voc
此时报错:
File "/home/zd/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 51, in __init__
pb2.text_format.Merge(f.read(), self.solver_param)
AttributeError: 'module' object has no attribute 'text_format'
解决办法是在py-faster-rcnn/lib/fast_rcnn/train.py
中加上一句代码:
import google.protobuf.text_format
然后开始training...
但是跑了一会儿又报了个错:
File "/home/zd/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 110, in _sample_rois
fg_inds, size=fg_rois_per_this_image, replace=False
File "mtrand.pyx", line 1176, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:18822)
TypeError: 'numpy.float64' object cannot be interpreted as an index
于是重装numpy1.11.0版本sudo pip install -U numpy==1.11.0
但是会出现新的错误ImportError: numpy.core.multiarray failed to import
于是参考https://github.com/rbgirshick/py-faster-rcnn/issues/626 修改py-faster-rcnn/lib/roi_data_layer/minibatch.py
文件中的line55 line98 line110 line124 line175
,并且将numpy版本升级到1.13.1sudo pip install -U numpy==1.13.1
参考文章:
[1] Kali新手喝咖啡(Caffe)的艰辛之路
[2] Caffe-GPU编译问题:nvcc fatal:Unsupported gpu architecture 'compute_20'
[3] Ubuntu16.04 faster-rcnn+caffe+gpu运行环境配置以及解决各种bug
[4] caffe学习(四):py-faster-rcnn配置,运行测试程序(Ubuntu)
网友评论