深度学习框架之caffe(二) —模型训练和使用

作者: 踩坑第某人 | 来源:发表于2018-06-13 22:55 被阅读0次

深度学习框架之caffe(三) —通过NetSpec自定义网络
深度学习框架之caffe(二) —模型训练和使用
深度学习框架之caffe(四) —可视化与参数提取
深度学习框架之caffe(五) —模型转换至其他框架
深度学习模型转换onnx2ncnn
《深度学习 Caffe之经典模型详解与实战.pdf》PDF高清完
Caffe——清晰高效的深度学习（Deep Learning）框
Caffe SSD Ubuntu16.04 训练自己的数据集
人工智能深度学习Caffe框架介绍，优秀的深度学习架构
基于阿里云PAI平台caffe框架实现cifar图像分类

目录
深度学习框架之caffe(一) —编译安装
 深度学习框架之caffe(二) —模型训练和使用
 深度学习框架之caffe(三) —通过NetSpec自定义网络
 深度学习框架之caffe(四) —可视化与参数提取
 深度学习框架之caffe(五) —模型转换至其他框架

更新 before 6.23

训练

CAFFE_ROOT/tools目录提供了训练和测试等需要的一些常用操作的源码实现（.cpp文件，文件名的作用一目了然），编译过程会对这些cpp文件进行编译，完成后，会在build/tools目录下生成相应的可执行应用程序，见下图：

image.png

训练前的数据准备
见这里
训练过程
见这里
几个文件说明
xxx_train_test_full.protxt
xxx_solver.protxt
xxx_iter_xxx.caffemodel
xxx_mean.binaryproto
xxx_mean.npy
xxx_classes.txt (注：类别名与索引号对应表，一般在进行使用python/C++进行分类时需要),如下：

aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor

caffe目录说明
源码主页
./src
./include
./docs
./python python 接口库
./matlab
./models
./example
./scripts
./tools

注：
a. 关于执行convert_imageset命令时所需3个文件train.txt, test.txt, val.txt的作用说明见这里
b. 所提供的帖子里的需要执行的脚本，只是根据训练过程的具体步骤，将相关程序的执行通过sh脚本实现，如常规流程是：
转为lmdb(convert_imageset) -> 训练(caffe train) -> 测试(caffe test)，通过sh脚本，可简化对相关命令的参数设置。但这些脚本的功能并不是最好，尤其是当你进行重复训练时，需要手动删除lmdb转换时创建的两个目录才能顺利执行，如果能在此基础上将这些sh脚本合并成一个，并能自动地删除、创建某些目录，更加自动方便。
c. caffe训练的脚本方式多种多样，某些开源算法如fasterRCNN，deppID等也会提供python下的训练接口脚本。本文提供的只是一种最原生的训练方式，对于fasterRCNN的训练，可直接采用作者提供的训练接口，其本质都是相通的（按顺序执行tools下的相关应用程序）。

python使用
python调用第三方库时，会通过在3个目录下进行搜索（系统默认的第三方库目录/usr/lib/python2.7/dist-packages，系统环境变量$PYTHONPATH 以及执行python命令的目录，执行python脚本是通过模块sys获取这些目录并幅值给到 sys.path）。因此首先要确保caffe的python库接口（在CAFFE_ROOT/python 目录）在python的搜索目录下，将第三方库添加到python可搜索路径下的简单方式是在python脚本（即调用caffe的 .py文件）中添加命令：sys.path.insert(0, "CAFFE_ROOT/python")
C++使用
源码编译完成后，新建一工程，根据 caffe头文件和库文件目录，对工程的头文件路径和库目录进行配置。
头文件路径:

CAFFE_ROOT/include
CAFFE_ROOT/src
CUDA_ROOT/include
usr/include (其他依赖库头文件，boost，protobuf等)

库文件路径：

CAFFE_ROOT/build/lib
CUDA_ROOT/lib64
usr/lib  (其他依赖库库文件目录，boost，protobuf等)

使用

for python

参考贴1 以及参考贴2
自己进行了封装，代码如下：

import os
from functools import partial

import caffe
import cv2
import numpy as np

from synset_words import WordCode


class CnnClassify(object):
    def __init__(self, path='/trainedCaffeData/',
                 **kwargs):
        """
        :param path:
        :param caffe_files:
        :param imgSize:
        :return:
        """
        print(os.path.abspath(path))
        if kwargs.get("use_gpu", False):
            caffe.set_mode_gpu()  # gpu or cpu
            caffe.set_device(0)
        else:
            caffe.set_mode_cpu()

        self.img_size = kwargs.get("img_size", (48, 48))
        join_func = partial(os.path.join, path)
        for k in ["model_file", "params_file", "mean_file", "synset_file"]:
            kwargs[k] = join_func(kwargs[k])

        self.net = caffe.Net(kwargs["model_file"],  # defines the structure of the model
                             kwargs["params_file"],  # contains the trained weights
                             caffe.TEST)  # use test mode (e.g., don't perform dropout)

        self.__setReadFormat(kwargs["mean_file"])
        self.synset_words = WordCode(filename=kwargs["synset_file"])

    def __setReadFormat(self, model_mean):
        '''
            :param model_mean:训练集的均值
            '''
        print(self.net.blobs['data'].data.shape)
        self.transformer = caffe.io.Transformer({'data': self.net.blobs['data'].data.shape})
        # 加载均值文件，并计算BGR三通道的均值
        mu = np.load(model_mean).mean(1).mean(1)

        # 提取均值
        self.transformer.set_transpose('data', (2, 0, 1))
        self.transformer.set_mean('data', mu)
        self.transformer.set_raw_scale('data', 255)  # 图像尺度从[0,1]归一化为[0,255]

        # swap channels from RGB to BGR
        self.transformer.set_channel_swap('data', (2, 1, 0))

    def predict_batch(self, img_arr):  # , tableList

        self.net.blobs['data'].reshape(len(img_arr), 3, self.img_size[0], self.img_size[1])  # image size is 48x48
        img_inputs = np.zeros((len(img_arr), 3, self.img_size[0], self.img_size[1]))
        for ind, img_data in enumerate(img_arr):
            img_inputs[ind, :, :, :] = self.transformer.preprocess('data', caffe.io.load_image_arr(img_data))

        self.net.blobs['data'].data[...] = img_inputs  # self.transformer.preprocess('data', img_input)  # read image
        out = self.net.forward()

        predictions = []
        for i in range(0, len(img_arr)):
            output_prob = out['prob'][i]  # the output probability vector for the first image in the batch
            pred_label = output_prob.argmax()
            word = self.synset_words.getUnicode(pred_label)
            predictions.append({"Label": word, "Prob": output_prob[pred_label]})
            # print "识别",pred_label
        return predictions  # word, output_prob[pred_label]

    def predict(self, img_arr):
        self.net.blobs['data'].reshape(1, 3, self.img_size[0], self.img_size[1])  # image size is 48x48
        img_input = self.transformer.preprocess('data', caffe.io.load_image_arr(img_arr))

        self.net.blobs['data'].data[...] = img_input  # self.transformer.preprocess('data', img_input)  # read image
        output_prob = self.net.forward()['prob'][0]
        pred_label = output_prob.argmax()
        word = self.synset_words.getUnicode(pred_label)
        return word, output_prob[pred_label]


def testCaffeCnn():
    import glob
    test = CnnClassify(path='E:/TibetOCR/Models/tibet_0323/',
                       model_file='tibet_full_train_test.prototxt',
                       params_file='tibet_full_iter_2000.caffemodel',
                       mean_file='ocr_mean.npy',
                       synset_file='synsetWords_79.pkl',
                       use_gpu=True,
                       imgSize=(48, 48)
                       )

    imageBasePath = 'E:/TibetOCR/Data/samples/*.jpg'
    imageList = glob.glob(imageBasePath)

    predict_labels = []
    for imagefile in imageList:
        # imagefile_abs = os.path.join(imageBasePath, imagefile)
        im = cv2.imread(imagefile)

        label = test.predict(im)
        print("识别结果:{},置信概率：{}".format(label[0], label[1]))
        cv2.imshow('im', im)
        cv2.waitKey(0)
        predict_labels.append(label)

for C++

caffe提供的C++分类接口是CAFFE_ROOT/examples/cpp_classification.cpp
自己参考已有帖子，封装的C++下Classifier类的声明和实现分别如下：

//classifier.h
#pragma once

#include <algorithm>
#include <vector>  

#include "caffe/caffe.hpp"
#include "caffe/util/io.hpp"
#include "caffe/blob.hpp"
#include "opencv2/opencv.hpp"
#include "boost/smart_ptr/shared_ptr.hpp"


// Caffe's required library
//#pragma comment(lib, "caffe.lib")


using namespace boost;
using namespace caffe;


/* Pair (label, confidence) representing a prediction. */
typedef std::pair<std::string, float> Prediction;
//#define CPU_ONLY      //仅在CPU上运行程序

class Classifier   
{
public:
    Classifier();
    Classifier(const std::string& model_file,
        const std::string& trained_file,
        const std::string& mean_file,
        const std::string& label_file);
    
    ~Classifier();

    //string classFaces(Rect face, Mat frame, int *w, string name);
    int LoadModelFile(std::string caffePath);
    
    Prediction Classify(const cv::Mat& img);
    std::vector<Prediction> ClassifyBatch(std::vector< cv::Mat>& img_batch);
private:
    void SetMean(const std::string& mean_file);

    int InitCaffeNet();

    std::vector<float> Predict(const cv::Mat& img);
    

    void WrapInputLayer(std::vector<cv::Mat>* input_channels);
    
    void Preprocess(const cv::Mat& img,
        std::vector<cv::Mat>* input_channels);

    std::string model_file_;
    std::string trained_file_;
    std::string mean_file_;
    std::string label_file_;
    boost::shared_ptr<Net<float> > net_;
    cv::Size input_geometry_;
    int num_channels_;
    cv::Mat mean_;
    std::vector<string> labels_;    
};

//classifier.cpp
#include "include/Classifier.h"
#include <iomanip>
#include <algorithm>
#include <time.h>
using namespace caffe;
/* Return the indices of the top N values of vector v. */
int Argmax(std::vector<float>& v) {
    
    std::vector<float>::iterator biggest = std::max_element(v.begin(), v.end());
    return std::distance(v.begin(), biggest);
}
void imagePadding(cv::Mat src, cv::Mat &dst)
{
    int maxEdge = MAX(src.cols, src.rows);
    int paddingWidth = abs(src.cols - src.rows);
    int extraPaddingWidth = MIN(src.cols, src.rows) / 2;
    int xPaddingWidth = abs(src.cols - maxEdge) / 2 + extraPaddingWidth;
    int yPaddingWidth = abs(src.rows - maxEdge) / 2 + extraPaddingWidth;
    copyMakeBorder(src.clone(), dst, yPaddingWidth, yPaddingWidth, xPaddingWidth, xPaddingWidth, cv::BORDER_CONSTANT, cv::Scalar(255, 255, 255));

    //imshow("src", src);
    //imshow("dst", dst);
    //waitKey(0);
}
Classifier::~Classifier(){  }
Classifier::Classifier(){ }
int Classifier::LoadModelFile(std::string caffePath)
{
    model_file_ = caffePath + "tibet_full_train_test.prototxt";
    trained_file_ = caffePath + "tibet_full.caffemodel";
    mean_file_ = caffePath + "Tibet_mean.binaryproto";
    label_file_ = caffePath + "synsetWords.txt";

    if (InitCaffeNet())//文件都存在，返回1，否则返回0
        return 1;
}

int Classifier::InitCaffeNet()
{

#ifdef CPU_ONLY
    Caffe::set_mode(Caffe::CPU);
#else
    Caffe::set_mode(Caffe::GPU);
#endif

    /* Load the network. */
    net_.reset(new Net<float>(model_file_, TEST));
    net_->CopyTrainedLayersFrom(trained_file_);

    CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
    CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";

    Blob<float>* input_layer = net_->input_blobs()[0];
    int num_inputs = net_->num_inputs();
    int num_outputs = net_->num_outputs();


    num_channels_ = input_layer->channels();


    CHECK(num_channels_ == 3 || num_channels_ == 1) << "Input layer should have 1 or 3 channels.";
    input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

    /* Load the binaryproto mean file. */
    SetMean(mean_file_);

    /* Load labels. */
    std::ifstream labels(label_file_.c_str());
    CHECK(labels) << "Unable to open labels file " << label_file_;
    string line;
    while (std::getline(labels, line))
        labels_.push_back(string(line));

    Blob<float>* output_layer = net_->output_blobs()[0];

    CHECK_EQ(labels_.size(), output_layer->channels())
        << "Number of labels is different from the output layer dimension.";
    return 1;
}

Classifier::Classifier(const std::string& model_file,
                        const std::string& trained_file,
                        const std::string& mean_file,
                        const std::string& label_file)
{

    model_file_ = model_file;
    trained_file_ = trained_file;
    mean_file_ = mean_file;
    label_file_ = label_file;
    InitCaffeNet();
}



static bool PairCompare(const std::pair<float, int>& lhs,
    const std::pair<float, int>& rhs) 
{
    return lhs.first > rhs.first;
}

/* Return the top N predictions. */
Prediction Classifier::Classify(const cv::Mat& img) {

    std::vector<float> output = Predict(img);
    int maxIdx = Argmax(output);
    //std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
    return std::make_pair(labels_[maxIdx],output[maxIdx]);
    //stringstream stream;
    //stream << maxIdx;
    //return std::make_pair(stream.str(), output[maxIdx]);
}


/* Load the mean file in binaryproto format. */
void Classifier::SetMean(const std::string& mean_file) {

    Blob<float> mean_blob;
    BlobProto blob_proto;
    float *mean_ptr;
    unsigned int num_pixel;

    bool succeed = ReadProtoFromBinaryFile(mean_file, &blob_proto);
    if (succeed)
    {
        mean_blob.FromProto(blob_proto);
        CHECK_EQ(mean_blob.channels(), num_channels_)
            << "Number of channels of mean file doesn't match input layer.";


        num_pixel = mean_blob.count(); /* NCHW=1x3x256x256=196608 */
        //mean_ptr = (float *)mean_blob.cpu_data();
        mean_ptr = mean_blob.mutable_cpu_data();
        
        /* The format of the mean file is planar 32-bit float BGR or grayscale. */
        std::vector<cv::Mat> channels;
        for (int i = 0; i < num_channels_; ++i) 
        {
            /* Extract an individual channel. */
            cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1, mean_ptr);
            //cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1);
            //memcpy(channel.data, data, mean_blob.width()*mean_blob.height()*sizeof(float));
            channels.push_back(channel);

            //imshow("img", channel);
            //waitKey(0);

            mean_ptr += mean_blob.height() * mean_blob.width();
        }
        
        /* Merge the separate channels into a single image. */
        //cv::Mat mean(mean_blob.height(), mean_blob.width(), CV_32FC1);//;//
        cv::Mat mean;
        cv::merge(channels, mean);
        
        /* Compute the global mean pixel value and create a mean image
        * filled with this value. */
        cv::Scalar channel_mean = cv::mean(mean);//mean);//channels[0]
        mean_ = cv::Mat(input_geometry_, mean.type(), channel_mean);
        
        //imshow("img1", mean_);
        //waitKey(0);
    }


}

std::vector<float> Classifier::Predict(const cv::Mat& img) 
{
    Blob<float>* input_layer = net_->input_blobs()[0];
    input_layer->Reshape(1, num_channels_,input_geometry_.height, input_geometry_.width);
    /* Forward dimension change to all layers. */
    net_->Reshape();

    std::vector<cv::Mat> input_channels;
    WrapInputLayer(&input_channels);

    Preprocess(img, &input_channels);
    net_->Forward(0);

    Blob<float>* output_layer = net_->output_blobs()[0];
    const float* begin = output_layer->cpu_data();
    const float* end = begin + output_layer->channels();
    return std::vector<float>(begin, end);
}


std::vector<Prediction> Classifier::ClassifyBatch(std::vector< cv::Mat>& img_batch)
{
    Blob<float>* input_layer = net_->input_blobs()[0];
    input_layer->Reshape(img_batch.size(), num_channels_, input_geometry_.height, input_geometry_.width);
    /* Forward dimension change to all layers. */
    net_->Reshape();

    std::vector<cv::Mat> input_data;
    
    WrapInputLayer(&input_data);
    //clock_t st_tm = clock();
    std::vector<cv::Mat>::iterator it = input_data.begin();
    for (int i = 0; i < img_batch.size(); i++)
    {
        std::vector<cv::Mat>tmp_channls(3);
        tmp_channls.assign(input_data.begin() + i*num_channels_, input_data.begin() + (i + 1)*num_channels_);
        Preprocess(img_batch[i], &tmp_channls);
    }
    //std::cout << "do imgPreprocess cost time : " << (double)(clock() - st_tm) / CLOCKS_PER_SEC << std::endl;
    net_->Forward(0);
    Blob<float>* output_layer = net_->output_blobs()[0];

    std::vector<Prediction>predictions;

    /* Copy the output layer to a std::vector */
    for (int i = 0; i < img_batch.size(); i++)
    {
        const float* begin = output_layer->cpu_data()+i*output_layer->channels();
        const float* end = begin + output_layer->channels();
        std::vector<float> output = std::vector<float>(begin, end);
        int maxIdx = Argmax(output);
        //std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
        predictions.push_back(std::make_pair(labels_[maxIdx], output[maxIdx]));
    }

    return predictions;
}
/* Wrap the input layer of the network in separate cv::Mat objects
* (one per channel). This way we save one memcpy operation and we
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
    Blob<float>* input_layer = net_->input_blobs()[0];

    int width = input_layer->width();
    int height = input_layer->height();
    float* input_data = input_layer->mutable_cpu_data();
    for (int j = 0; j < input_layer->num(); j++)
    {
        for (int i = 0; i < input_layer->channels(); ++i) {
            cv::Mat channel(height, width, CV_32FC1, input_data);
            input_channels->push_back(channel);
            input_data += width * height;
        }
    }

}

void Classifier::Preprocess(const cv::Mat& img,
    std::vector<cv::Mat>* input_channels) {
    /* Convert the input image to the input image format of the network. */
    cv::Mat img_padded=img;
    //imagePadding(img, img_padded);

    cv::Mat sample;
    if (img_padded.channels() == 3 && num_channels_ == 1)
        cv::cvtColor(img_padded, sample, cv::COLOR_BGR2GRAY);
    else if (img_padded.channels() == 4 && num_channels_ == 1)
        cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2GRAY);
    else if (img_padded.channels() == 4 && num_channels_ == 3)
        cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2BGR);
    else if (img_padded.channels() == 1 && num_channels_ == 3)
        cv::cvtColor(img_padded, sample, cv::COLOR_GRAY2BGR);
    else
        sample = img_padded;

    cv::Mat sample_resized;
    if (sample.size() != input_geometry_)
        cv::resize(sample, sample_resized, input_geometry_);
    else
        sample_resized = sample;

    cv::Mat sample_float;
    if (num_channels_ == 3)
        sample_resized.convertTo(sample_float, CV_32FC3);
    else
        sample_resized.convertTo(sample_float, CV_32FC1);

    cv::Mat sample_normalized;
    cv::subtract(sample_float, mean_, sample_normalized);

    /* This operation will write the separate BGR planes directly to the
    * input layer of the network because it is wrapped by the cv::Mat
    * objects in input_channels. */
    cv::split(sample_normalized, *input_channels);

    //CHECK(reinterpret_cast<float*>(input_channels->at(0).data)
    //  == net_->input_blobs()[0]->cpu_data())
    //  << "Input channels are not wrapping the input layer of the network.";
}

使用时，在自己的工程中将头文件classifier.h包含进去，即可在调用处实例化一个类对像，并调用Classify方法即可。
在你自己的工程中可能出现的问题（windows上很可能出现）：

F0519 14:54:12.494139 14504 layer_factory.hpp:77] Check failed: registry.count(t ype) == 1 (0 vs. 1) Unknown layer type: Convolution (known types: MemoryData)

这里提供一种办法，是再创建一个头文件(cafferegister.h)，将未知类型的层声明或注册即可，代码如下：

#ifndef CAFFEREGISTER_H
#define CAFFEREGISTRE_H
#include "caffe/common.hpp"
#include "caffe/layers/data_layer.hpp"
#include "caffe/layers/input_layer.hpp"
#include "caffe/layers/inner_product_layer.hpp"
#include "caffe/layers/conv_layer.hpp"
#include "caffe/layers/relu_layer.hpp"
#include "caffe/layers/pooling_layer.hpp"
#include "caffe/layers/softmax_layer.hpp"
#include "caffe/layers/lrn_layer.hpp"
#include "caffe/layers/dropout_layer.hpp"

namespace caffe
{
    extern INSTANTIATE_CLASS(DataLayer);
    //REGISTER_LAYER_CLASS(Data);
    extern INSTANTIATE_CLASS(InputLayer);
    //REGISTER_LAYER_CLASS(Input);
    extern INSTANTIATE_CLASS(InnerProductLayer);
    extern INSTANTIATE_CLASS(DropoutLayer);
    //REGISTER_LAYER_CLASS(Dropout);
    extern INSTANTIATE_CLASS(ConvolutionLayer);

    extern INSTANTIATE_CLASS(ReLULayer);

    extern INSTANTIATE_CLASS(PoolingLayer);

    extern INSTANTIATE_CLASS(LRNLayer);

    extern INSTANTIATE_CLASS(SoftmaxLayer);
#ifdef WINDOWS
    REGISTER_LAYER_CLASS(Convolution);
    REGISTER_LAYER_CLASS(ReLU);
    REGISTER_LAYER_CLASS(Pooling);
    REGISTER_LAYER_CLASS(Softmax);
    REGISTER_LAYER_CLASS(LRN);
#endif
}

#endif

深度学习框架之caffe(三) —通过NetSpec自定义网络
目录深度学习框架之caffe(一) —编译安装深度学习框架之caffe(二) —模型训练和使用深度学习框架之caf...
深度学习框架之caffe(二) —模型训练和使用
目录深度学习框架之caffe(一) —编译安装深度学习框架之caffe(二) —模型训练和使用深度学习框架之caf...
深度学习框架之caffe(四) —可视化与参数提取
目录深度学习框架之caffe(一) —编译安装深度学习框架之caffe(二) —模型训练和使用深度学习框架之caf...
深度学习框架之caffe(五) —模型转换至其他框架
目录深度学习框架之caffe(一) —编译安装深度学习框架之caffe(二) —模型训练和使用深度学习框架之caf...
深度学习模型转换onnx2ncnn
我们知道现在的深度学习训练框架（如tensorflow、caffe、pytorch、MXNet等等）都有自己的模型...
《深度学习 Caffe之经典模型详解与实战.pdf》PDF高清完
《深度学习 Caffe之经典模型详解与实战.pdf》PDF高清完整版-免费下载《深度学习 Caffe之经典模型详...
Caffe——清晰高效的深度学习（Deep Learning）框
Caffe——清晰高效的深度学习（Deep Learning）框架 Caffe的优势上手快：模型与相应优化都是以...
Caffe SSD Ubuntu16.04 训练自己的数据集
总的来说，Caffe 是一个比较难上手的框架。这次尝试训练 Caffe 框架下 SSD 模型的训练是我第一次使用 ...
人工智能深度学习Caffe框架介绍，优秀的深度学习架构
人工智能深度学习Caffe框架介绍，优秀的深度学习架构在深度学习领域，Caffe框架是人们无法绕过的一座山。这不...
基于阿里云PAI平台caffe框架实现cifar图像分类
阿里云Pai平台深度学习框架支持caffe mxnet tensorflow三种,caffe和tensorflow...