目录
深度学习框架之caffe(一) —编译安装
深度学习框架之caffe(二) —模型训练和使用
深度学习框架之caffe(三) —通过NetSpec自定义网络
深度学习框架之caffe(四) —可视化与参数提取
深度学习框架之caffe(五) —模型转换至其他框架
更新 before 6.23
训练
CAFFE_ROOT/tools目录提供了训练和测试等需要的一些常用操作的源码实现(.cpp文件,文件名的作用一目了然),编译过程会对这些cpp文件进行编译,完成后,会在build/tools目录下生成相应的可执行应用程序,见下图:
image.png-
训练前的数据准备
见这里 -
训练过程
见这里 -
几个文件说明
xxx_train_test_full.protxt
xxx_solver.protxt
xxx_iter_xxx.caffemodel
xxx_mean.binaryproto
xxx_mean.npy
xxx_classes.txt (注:类别名与索引号对应表,一般在进行使用python/C++进行分类时需要),如下:
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
- caffe目录说明
源码主页
./src
./include
./docs
./python python 接口库
./matlab
./models
./example
./scripts
./tools
注:
a. 关于执行convert_imageset命令时所需3个文件train.txt, test.txt, val.txt的作用说明见这里
b. 所提供的帖子里的需要执行的脚本,只是根据训练过程的具体步骤,将相关程序的执行通过sh脚本实现,如常规流程是:
转为lmdb(convert_imageset) -> 训练(caffe train) -> 测试(caffe test),通过sh脚本,可简化对相关命令的参数设置。但这些脚本的功能并不是最好,尤其是当你进行重复训练时,需要手动删除lmdb转换时创建的两个目录才能顺利执行,如果能在此基础上将这些sh脚本合并成一个,并能自动地删除、创建某些目录,更加自动方便。
c. caffe训练的脚本方式多种多样,某些开源算法 如fasterRCNN,deppID等也会提供python下的训练接口脚本。本文提供的只是一种最原生的训练方式,对于fasterRCNN的训练,可直接采用作者提供的训练接口,其本质都是相通的(按顺序执行tools下的相关应用程序)。
- python使用
python调用第三方库时,会通过在3个目录下进行搜索(系统默认的第三方库目录/usr/lib/python2.7/dist-packages,系统环境变量$PYTHONPATH 以及执行python命令的目录,执行python脚本是通过模块sys获取这些目录并幅值给到 sys.path)。因此首先要确保caffe的python库接口(在CAFFE_ROOT/python 目录)在python的搜索目录下,将第三方库添加到python可搜索路径下的简单方式是在python脚本(即调用caffe的 .py文件)中添加命令:sys.path.insert(0, "CAFFE_ROOT/python")
- C++使用
源码编译完成后,新建一工程,根据 caffe头文件和库文件目录,对工程的头文件路径和库目录进行配置。
头文件路径:
CAFFE_ROOT/include
CAFFE_ROOT/src
CUDA_ROOT/include
usr/include (其他依赖库头文件,boost,protobuf等)
库文件路径:
CAFFE_ROOT/build/lib
CUDA_ROOT/lib64
usr/lib (其他依赖库库文件目录,boost,protobuf等)
使用
for python
import os
from functools import partial
import caffe
import cv2
import numpy as np
from synset_words import WordCode
class CnnClassify(object):
def __init__(self, path='/trainedCaffeData/',
**kwargs):
"""
:param path:
:param caffe_files:
:param imgSize:
:return:
"""
print(os.path.abspath(path))
if kwargs.get("use_gpu", False):
caffe.set_mode_gpu() # gpu or cpu
caffe.set_device(0)
else:
caffe.set_mode_cpu()
self.img_size = kwargs.get("img_size", (48, 48))
join_func = partial(os.path.join, path)
for k in ["model_file", "params_file", "mean_file", "synset_file"]:
kwargs[k] = join_func(kwargs[k])
self.net = caffe.Net(kwargs["model_file"], # defines the structure of the model
kwargs["params_file"], # contains the trained weights
caffe.TEST) # use test mode (e.g., don't perform dropout)
self.__setReadFormat(kwargs["mean_file"])
self.synset_words = WordCode(filename=kwargs["synset_file"])
def __setReadFormat(self, model_mean):
'''
:param model_mean:训练集的均值
'''
print(self.net.blobs['data'].data.shape)
self.transformer = caffe.io.Transformer({'data': self.net.blobs['data'].data.shape})
# 加载均值文件,并计算BGR三通道的均值
mu = np.load(model_mean).mean(1).mean(1)
# 提取均值
self.transformer.set_transpose('data', (2, 0, 1))
self.transformer.set_mean('data', mu)
self.transformer.set_raw_scale('data', 255) # 图像尺度从[0,1]归一化为[0,255]
# swap channels from RGB to BGR
self.transformer.set_channel_swap('data', (2, 1, 0))
def predict_batch(self, img_arr): # , tableList
self.net.blobs['data'].reshape(len(img_arr), 3, self.img_size[0], self.img_size[1]) # image size is 48x48
img_inputs = np.zeros((len(img_arr), 3, self.img_size[0], self.img_size[1]))
for ind, img_data in enumerate(img_arr):
img_inputs[ind, :, :, :] = self.transformer.preprocess('data', caffe.io.load_image_arr(img_data))
self.net.blobs['data'].data[...] = img_inputs # self.transformer.preprocess('data', img_input) # read image
out = self.net.forward()
predictions = []
for i in range(0, len(img_arr)):
output_prob = out['prob'][i] # the output probability vector for the first image in the batch
pred_label = output_prob.argmax()
word = self.synset_words.getUnicode(pred_label)
predictions.append({"Label": word, "Prob": output_prob[pred_label]})
# print "识别",pred_label
return predictions # word, output_prob[pred_label]
def predict(self, img_arr):
self.net.blobs['data'].reshape(1, 3, self.img_size[0], self.img_size[1]) # image size is 48x48
img_input = self.transformer.preprocess('data', caffe.io.load_image_arr(img_arr))
self.net.blobs['data'].data[...] = img_input # self.transformer.preprocess('data', img_input) # read image
output_prob = self.net.forward()['prob'][0]
pred_label = output_prob.argmax()
word = self.synset_words.getUnicode(pred_label)
return word, output_prob[pred_label]
def testCaffeCnn():
import glob
test = CnnClassify(path='E:/TibetOCR/Models/tibet_0323/',
model_file='tibet_full_train_test.prototxt',
params_file='tibet_full_iter_2000.caffemodel',
mean_file='ocr_mean.npy',
synset_file='synsetWords_79.pkl',
use_gpu=True,
imgSize=(48, 48)
)
imageBasePath = 'E:/TibetOCR/Data/samples/*.jpg'
imageList = glob.glob(imageBasePath)
predict_labels = []
for imagefile in imageList:
# imagefile_abs = os.path.join(imageBasePath, imagefile)
im = cv2.imread(imagefile)
label = test.predict(im)
print("识别结果:{},置信概率:{}".format(label[0], label[1]))
cv2.imshow('im', im)
cv2.waitKey(0)
predict_labels.append(label)
for C++
-
caffe提供的C++分类接口是CAFFE_ROOT/examples/cpp_classification.cpp
-
自己参考已有帖子,封装的C++下Classifier类的声明和实现分别如下:
//classifier.h
#pragma once
#include <algorithm>
#include <vector>
#include "caffe/caffe.hpp"
#include "caffe/util/io.hpp"
#include "caffe/blob.hpp"
#include "opencv2/opencv.hpp"
#include "boost/smart_ptr/shared_ptr.hpp"
// Caffe's required library
//#pragma comment(lib, "caffe.lib")
using namespace boost;
using namespace caffe;
/* Pair (label, confidence) representing a prediction. */
typedef std::pair<std::string, float> Prediction;
//#define CPU_ONLY //仅在CPU上运行程序
class Classifier
{
public:
Classifier();
Classifier(const std::string& model_file,
const std::string& trained_file,
const std::string& mean_file,
const std::string& label_file);
~Classifier();
//string classFaces(Rect face, Mat frame, int *w, string name);
int LoadModelFile(std::string caffePath);
Prediction Classify(const cv::Mat& img);
std::vector<Prediction> ClassifyBatch(std::vector< cv::Mat>& img_batch);
private:
void SetMean(const std::string& mean_file);
int InitCaffeNet();
std::vector<float> Predict(const cv::Mat& img);
void WrapInputLayer(std::vector<cv::Mat>* input_channels);
void Preprocess(const cv::Mat& img,
std::vector<cv::Mat>* input_channels);
std::string model_file_;
std::string trained_file_;
std::string mean_file_;
std::string label_file_;
boost::shared_ptr<Net<float> > net_;
cv::Size input_geometry_;
int num_channels_;
cv::Mat mean_;
std::vector<string> labels_;
};
//classifier.cpp
#include "include/Classifier.h"
#include <iomanip>
#include <algorithm>
#include <time.h>
using namespace caffe;
/* Return the indices of the top N values of vector v. */
int Argmax(std::vector<float>& v) {
std::vector<float>::iterator biggest = std::max_element(v.begin(), v.end());
return std::distance(v.begin(), biggest);
}
void imagePadding(cv::Mat src, cv::Mat &dst)
{
int maxEdge = MAX(src.cols, src.rows);
int paddingWidth = abs(src.cols - src.rows);
int extraPaddingWidth = MIN(src.cols, src.rows) / 2;
int xPaddingWidth = abs(src.cols - maxEdge) / 2 + extraPaddingWidth;
int yPaddingWidth = abs(src.rows - maxEdge) / 2 + extraPaddingWidth;
copyMakeBorder(src.clone(), dst, yPaddingWidth, yPaddingWidth, xPaddingWidth, xPaddingWidth, cv::BORDER_CONSTANT, cv::Scalar(255, 255, 255));
//imshow("src", src);
//imshow("dst", dst);
//waitKey(0);
}
Classifier::~Classifier(){ }
Classifier::Classifier(){ }
int Classifier::LoadModelFile(std::string caffePath)
{
model_file_ = caffePath + "tibet_full_train_test.prototxt";
trained_file_ = caffePath + "tibet_full.caffemodel";
mean_file_ = caffePath + "Tibet_mean.binaryproto";
label_file_ = caffePath + "synsetWords.txt";
if (InitCaffeNet())//文件都存在,返回1,否则返回0
return 1;
}
int Classifier::InitCaffeNet()
{
#ifdef CPU_ONLY
Caffe::set_mode(Caffe::CPU);
#else
Caffe::set_mode(Caffe::GPU);
#endif
/* Load the network. */
net_.reset(new Net<float>(model_file_, TEST));
net_->CopyTrainedLayersFrom(trained_file_);
CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";
Blob<float>* input_layer = net_->input_blobs()[0];
int num_inputs = net_->num_inputs();
int num_outputs = net_->num_outputs();
num_channels_ = input_layer->channels();
CHECK(num_channels_ == 3 || num_channels_ == 1) << "Input layer should have 1 or 3 channels.";
input_geometry_ = cv::Size(input_layer->width(), input_layer->height());
/* Load the binaryproto mean file. */
SetMean(mean_file_);
/* Load labels. */
std::ifstream labels(label_file_.c_str());
CHECK(labels) << "Unable to open labels file " << label_file_;
string line;
while (std::getline(labels, line))
labels_.push_back(string(line));
Blob<float>* output_layer = net_->output_blobs()[0];
CHECK_EQ(labels_.size(), output_layer->channels())
<< "Number of labels is different from the output layer dimension.";
return 1;
}
Classifier::Classifier(const std::string& model_file,
const std::string& trained_file,
const std::string& mean_file,
const std::string& label_file)
{
model_file_ = model_file;
trained_file_ = trained_file;
mean_file_ = mean_file;
label_file_ = label_file;
InitCaffeNet();
}
static bool PairCompare(const std::pair<float, int>& lhs,
const std::pair<float, int>& rhs)
{
return lhs.first > rhs.first;
}
/* Return the top N predictions. */
Prediction Classifier::Classify(const cv::Mat& img) {
std::vector<float> output = Predict(img);
int maxIdx = Argmax(output);
//std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
return std::make_pair(labels_[maxIdx],output[maxIdx]);
//stringstream stream;
//stream << maxIdx;
//return std::make_pair(stream.str(), output[maxIdx]);
}
/* Load the mean file in binaryproto format. */
void Classifier::SetMean(const std::string& mean_file) {
Blob<float> mean_blob;
BlobProto blob_proto;
float *mean_ptr;
unsigned int num_pixel;
bool succeed = ReadProtoFromBinaryFile(mean_file, &blob_proto);
if (succeed)
{
mean_blob.FromProto(blob_proto);
CHECK_EQ(mean_blob.channels(), num_channels_)
<< "Number of channels of mean file doesn't match input layer.";
num_pixel = mean_blob.count(); /* NCHW=1x3x256x256=196608 */
//mean_ptr = (float *)mean_blob.cpu_data();
mean_ptr = mean_blob.mutable_cpu_data();
/* The format of the mean file is planar 32-bit float BGR or grayscale. */
std::vector<cv::Mat> channels;
for (int i = 0; i < num_channels_; ++i)
{
/* Extract an individual channel. */
cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1, mean_ptr);
//cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1);
//memcpy(channel.data, data, mean_blob.width()*mean_blob.height()*sizeof(float));
channels.push_back(channel);
//imshow("img", channel);
//waitKey(0);
mean_ptr += mean_blob.height() * mean_blob.width();
}
/* Merge the separate channels into a single image. */
//cv::Mat mean(mean_blob.height(), mean_blob.width(), CV_32FC1);//;//
cv::Mat mean;
cv::merge(channels, mean);
/* Compute the global mean pixel value and create a mean image
* filled with this value. */
cv::Scalar channel_mean = cv::mean(mean);//mean);//channels[0]
mean_ = cv::Mat(input_geometry_, mean.type(), channel_mean);
//imshow("img1", mean_);
//waitKey(0);
}
}
std::vector<float> Classifier::Predict(const cv::Mat& img)
{
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(1, num_channels_,input_geometry_.height, input_geometry_.width);
/* Forward dimension change to all layers. */
net_->Reshape();
std::vector<cv::Mat> input_channels;
WrapInputLayer(&input_channels);
Preprocess(img, &input_channels);
net_->Forward(0);
Blob<float>* output_layer = net_->output_blobs()[0];
const float* begin = output_layer->cpu_data();
const float* end = begin + output_layer->channels();
return std::vector<float>(begin, end);
}
std::vector<Prediction> Classifier::ClassifyBatch(std::vector< cv::Mat>& img_batch)
{
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(img_batch.size(), num_channels_, input_geometry_.height, input_geometry_.width);
/* Forward dimension change to all layers. */
net_->Reshape();
std::vector<cv::Mat> input_data;
WrapInputLayer(&input_data);
//clock_t st_tm = clock();
std::vector<cv::Mat>::iterator it = input_data.begin();
for (int i = 0; i < img_batch.size(); i++)
{
std::vector<cv::Mat>tmp_channls(3);
tmp_channls.assign(input_data.begin() + i*num_channels_, input_data.begin() + (i + 1)*num_channels_);
Preprocess(img_batch[i], &tmp_channls);
}
//std::cout << "do imgPreprocess cost time : " << (double)(clock() - st_tm) / CLOCKS_PER_SEC << std::endl;
net_->Forward(0);
Blob<float>* output_layer = net_->output_blobs()[0];
std::vector<Prediction>predictions;
/* Copy the output layer to a std::vector */
for (int i = 0; i < img_batch.size(); i++)
{
const float* begin = output_layer->cpu_data()+i*output_layer->channels();
const float* end = begin + output_layer->channels();
std::vector<float> output = std::vector<float>(begin, end);
int maxIdx = Argmax(output);
//std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
predictions.push_back(std::make_pair(labels_[maxIdx], output[maxIdx]));
}
return predictions;
}
/* Wrap the input layer of the network in separate cv::Mat objects
* (one per channel). This way we save one memcpy operation and we
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
Blob<float>* input_layer = net_->input_blobs()[0];
int width = input_layer->width();
int height = input_layer->height();
float* input_data = input_layer->mutable_cpu_data();
for (int j = 0; j < input_layer->num(); j++)
{
for (int i = 0; i < input_layer->channels(); ++i) {
cv::Mat channel(height, width, CV_32FC1, input_data);
input_channels->push_back(channel);
input_data += width * height;
}
}
}
void Classifier::Preprocess(const cv::Mat& img,
std::vector<cv::Mat>* input_channels) {
/* Convert the input image to the input image format of the network. */
cv::Mat img_padded=img;
//imagePadding(img, img_padded);
cv::Mat sample;
if (img_padded.channels() == 3 && num_channels_ == 1)
cv::cvtColor(img_padded, sample, cv::COLOR_BGR2GRAY);
else if (img_padded.channels() == 4 && num_channels_ == 1)
cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2GRAY);
else if (img_padded.channels() == 4 && num_channels_ == 3)
cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2BGR);
else if (img_padded.channels() == 1 && num_channels_ == 3)
cv::cvtColor(img_padded, sample, cv::COLOR_GRAY2BGR);
else
sample = img_padded;
cv::Mat sample_resized;
if (sample.size() != input_geometry_)
cv::resize(sample, sample_resized, input_geometry_);
else
sample_resized = sample;
cv::Mat sample_float;
if (num_channels_ == 3)
sample_resized.convertTo(sample_float, CV_32FC3);
else
sample_resized.convertTo(sample_float, CV_32FC1);
cv::Mat sample_normalized;
cv::subtract(sample_float, mean_, sample_normalized);
/* This operation will write the separate BGR planes directly to the
* input layer of the network because it is wrapped by the cv::Mat
* objects in input_channels. */
cv::split(sample_normalized, *input_channels);
//CHECK(reinterpret_cast<float*>(input_channels->at(0).data)
// == net_->input_blobs()[0]->cpu_data())
// << "Input channels are not wrapping the input layer of the network.";
}
使用时,在自己的工程中将头文件classifier.h包含进去,即可在调用处实例化一个类对像,并调用Classify方法即可。
在你自己的工程中可能出现的问题(windows上很可能出现):
F0519 14:54:12.494139 14504 layer_factory.hpp:77] Check failed: registry.count(t ype) == 1 (0 vs. 1) Unknown layer type: Convolution (known types: MemoryData)
这里提供一种办法,是再创建一个头文件(cafferegister.h),将未知类型的层声明或注册即可,代码如下:
#ifndef CAFFEREGISTER_H
#define CAFFEREGISTRE_H
#include "caffe/common.hpp"
#include "caffe/layers/data_layer.hpp"
#include "caffe/layers/input_layer.hpp"
#include "caffe/layers/inner_product_layer.hpp"
#include "caffe/layers/conv_layer.hpp"
#include "caffe/layers/relu_layer.hpp"
#include "caffe/layers/pooling_layer.hpp"
#include "caffe/layers/softmax_layer.hpp"
#include "caffe/layers/lrn_layer.hpp"
#include "caffe/layers/dropout_layer.hpp"
namespace caffe
{
extern INSTANTIATE_CLASS(DataLayer);
//REGISTER_LAYER_CLASS(Data);
extern INSTANTIATE_CLASS(InputLayer);
//REGISTER_LAYER_CLASS(Input);
extern INSTANTIATE_CLASS(InnerProductLayer);
extern INSTANTIATE_CLASS(DropoutLayer);
//REGISTER_LAYER_CLASS(Dropout);
extern INSTANTIATE_CLASS(ConvolutionLayer);
extern INSTANTIATE_CLASS(ReLULayer);
extern INSTANTIATE_CLASS(PoolingLayer);
extern INSTANTIATE_CLASS(LRNLayer);
extern INSTANTIATE_CLASS(SoftmaxLayer);
#ifdef WINDOWS
REGISTER_LAYER_CLASS(Convolution);
REGISTER_LAYER_CLASS(ReLU);
REGISTER_LAYER_CLASS(Pooling);
REGISTER_LAYER_CLASS(Softmax);
REGISTER_LAYER_CLASS(LRN);
#endif
}
#endif
网友评论