美文网首页
深度学习框架之caffe(二) —模型训练和使用

深度学习框架之caffe(二) —模型训练和使用

作者: 踩坑第某人 | 来源:发表于2018-06-13 22:55 被阅读0次

    目录
    深度学习框架之caffe(一) —编译安装
    深度学习框架之caffe(二) —模型训练和使用
    深度学习框架之caffe(三) —通过NetSpec自定义网络
    深度学习框架之caffe(四) —可视化与参数提取
    深度学习框架之caffe(五) —模型转换至其他框架

    更新 before 6.23

    训练

    CAFFE_ROOT/tools目录提供了训练和测试等需要的一些常用操作的源码实现(.cpp文件,文件名的作用一目了然),编译过程会对这些cpp文件进行编译,完成后,会在build/tools目录下生成相应的可执行应用程序,见下图:

    image.png
    1. 训练前的数据准备
      这里

    2. 训练过程
      这里

    3. 几个文件说明
      xxx_train_test_full.protxt
      xxx_solver.protxt
      xxx_iter_xxx.caffemodel
      xxx_mean.binaryproto
      xxx_mean.npy
      xxx_classes.txt (注:类别名与索引号对应表,一般在进行使用python/C++进行分类时需要),如下:

    aeroplane
    bicycle
    bird
    boat
    bottle
    bus
    car
    cat
    chair
    cow
    diningtable
    dog
    horse
    motorbike
    person
    pottedplant
    sheep
    sofa
    train
    tvmonitor
    
    1. caffe目录说明
      源码主页
      ./src
      ./include
      ./docs
      ./python python 接口库
      ./matlab
      ./models
      ./example
      ./scripts
      ./tools

    注:
    a. 关于执行convert_imageset命令时所需3个文件train.txt, test.txt, val.txt的作用说明见这里
    b. 所提供的帖子里的需要执行的脚本,只是根据训练过程的具体步骤,将相关程序的执行通过sh脚本实现,如常规流程是:
    转为lmdb(convert_imageset) -> 训练(caffe train) -> 测试(caffe test),通过sh脚本,可简化对相关命令的参数设置。但这些脚本的功能并不是最好,尤其是当你进行重复训练时,需要手动删除lmdb转换时创建的两个目录才能顺利执行,如果能在此基础上将这些sh脚本合并成一个,并能自动地删除、创建某些目录,更加自动方便。
    c. caffe训练的脚本方式多种多样,某些开源算法 如fasterRCNN,deppID等也会提供python下的训练接口脚本。本文提供的只是一种最原生的训练方式,对于fasterRCNN的训练,可直接采用作者提供的训练接口,其本质都是相通的(按顺序执行tools下的相关应用程序)。

    1. python使用
      python调用第三方库时,会通过在3个目录下进行搜索(系统默认的第三方库目录/usr/lib/python2.7/dist-packages,系统环境变量$PYTHONPATH 以及执行python命令的目录,执行python脚本是通过模块sys获取这些目录并幅值给到 sys.path)。因此首先要确保caffe的python库接口(在CAFFE_ROOT/python 目录)在python的搜索目录下,将第三方库添加到python可搜索路径下的简单方式是在python脚本(即调用caffe的 .py文件)中添加命令:sys.path.insert(0, "CAFFE_ROOT/python")
    2. C++使用
      源码编译完成后,新建一工程,根据 caffe头文件和库文件目录,对工程的头文件路径和库目录进行配置。
      头文件路径:
    CAFFE_ROOT/include
    CAFFE_ROOT/src
    CUDA_ROOT/include
    usr/include (其他依赖库头文件,boost,protobuf等)
    

    库文件路径:

    CAFFE_ROOT/build/lib
    CUDA_ROOT/lib64
    usr/lib  (其他依赖库库文件目录,boost,protobuf等)
    

    使用

    for python

    1. 参考贴1 以及 参考贴2

    2. 自己进行了封装,代码如下:

    import os
    from functools import partial
    
    import caffe
    import cv2
    import numpy as np
    
    from synset_words import WordCode
    
    
    class CnnClassify(object):
        def __init__(self, path='/trainedCaffeData/',
                     **kwargs):
            """
            :param path:
            :param caffe_files:
            :param imgSize:
            :return:
            """
            print(os.path.abspath(path))
            if kwargs.get("use_gpu", False):
                caffe.set_mode_gpu()  # gpu or cpu
                caffe.set_device(0)
            else:
                caffe.set_mode_cpu()
    
            self.img_size = kwargs.get("img_size", (48, 48))
            join_func = partial(os.path.join, path)
            for k in ["model_file", "params_file", "mean_file", "synset_file"]:
                kwargs[k] = join_func(kwargs[k])
    
            self.net = caffe.Net(kwargs["model_file"],  # defines the structure of the model
                                 kwargs["params_file"],  # contains the trained weights
                                 caffe.TEST)  # use test mode (e.g., don't perform dropout)
    
            self.__setReadFormat(kwargs["mean_file"])
            self.synset_words = WordCode(filename=kwargs["synset_file"])
    
        def __setReadFormat(self, model_mean):
            '''
                :param model_mean:训练集的均值
                '''
            print(self.net.blobs['data'].data.shape)
            self.transformer = caffe.io.Transformer({'data': self.net.blobs['data'].data.shape})
            # 加载均值文件,并计算BGR三通道的均值
            mu = np.load(model_mean).mean(1).mean(1)
    
            # 提取均值
            self.transformer.set_transpose('data', (2, 0, 1))
            self.transformer.set_mean('data', mu)
            self.transformer.set_raw_scale('data', 255)  # 图像尺度从[0,1]归一化为[0,255]
    
            # swap channels from RGB to BGR
            self.transformer.set_channel_swap('data', (2, 1, 0))
    
        def predict_batch(self, img_arr):  # , tableList
    
            self.net.blobs['data'].reshape(len(img_arr), 3, self.img_size[0], self.img_size[1])  # image size is 48x48
            img_inputs = np.zeros((len(img_arr), 3, self.img_size[0], self.img_size[1]))
            for ind, img_data in enumerate(img_arr):
                img_inputs[ind, :, :, :] = self.transformer.preprocess('data', caffe.io.load_image_arr(img_data))
    
            self.net.blobs['data'].data[...] = img_inputs  # self.transformer.preprocess('data', img_input)  # read image
            out = self.net.forward()
    
            predictions = []
            for i in range(0, len(img_arr)):
                output_prob = out['prob'][i]  # the output probability vector for the first image in the batch
                pred_label = output_prob.argmax()
                word = self.synset_words.getUnicode(pred_label)
                predictions.append({"Label": word, "Prob": output_prob[pred_label]})
                # print "识别",pred_label
            return predictions  # word, output_prob[pred_label]
    
        def predict(self, img_arr):
            self.net.blobs['data'].reshape(1, 3, self.img_size[0], self.img_size[1])  # image size is 48x48
            img_input = self.transformer.preprocess('data', caffe.io.load_image_arr(img_arr))
    
            self.net.blobs['data'].data[...] = img_input  # self.transformer.preprocess('data', img_input)  # read image
            output_prob = self.net.forward()['prob'][0]
            pred_label = output_prob.argmax()
            word = self.synset_words.getUnicode(pred_label)
            return word, output_prob[pred_label]
    
    
    def testCaffeCnn():
        import glob
        test = CnnClassify(path='E:/TibetOCR/Models/tibet_0323/',
                           model_file='tibet_full_train_test.prototxt',
                           params_file='tibet_full_iter_2000.caffemodel',
                           mean_file='ocr_mean.npy',
                           synset_file='synsetWords_79.pkl',
                           use_gpu=True,
                           imgSize=(48, 48)
                           )
    
        imageBasePath = 'E:/TibetOCR/Data/samples/*.jpg'
        imageList = glob.glob(imageBasePath)
    
        predict_labels = []
        for imagefile in imageList:
            # imagefile_abs = os.path.join(imageBasePath, imagefile)
            im = cv2.imread(imagefile)
    
            label = test.predict(im)
            print("识别结果:{},置信概率:{}".format(label[0], label[1]))
            cv2.imshow('im', im)
            cv2.waitKey(0)
            predict_labels.append(label)
    
    

    for C++

    1. caffe提供的C++分类接口是CAFFE_ROOT/examples/cpp_classification.cpp

    2. 自己参考已有帖子,封装的C++下Classifier类的声明和实现分别如下:

    //classifier.h
    #pragma once
    
    #include <algorithm>
    #include <vector>  
    
    #include "caffe/caffe.hpp"
    #include "caffe/util/io.hpp"
    #include "caffe/blob.hpp"
    #include "opencv2/opencv.hpp"
    #include "boost/smart_ptr/shared_ptr.hpp"
    
    
    // Caffe's required library
    //#pragma comment(lib, "caffe.lib")
    
    
    using namespace boost;
    using namespace caffe;
    
    
    /* Pair (label, confidence) representing a prediction. */
    typedef std::pair<std::string, float> Prediction;
    //#define CPU_ONLY      //仅在CPU上运行程序
    
    class Classifier   
    {
    public:
        Classifier();
        Classifier(const std::string& model_file,
            const std::string& trained_file,
            const std::string& mean_file,
            const std::string& label_file);
        
        ~Classifier();
    
        //string classFaces(Rect face, Mat frame, int *w, string name);
        int LoadModelFile(std::string caffePath);
        
        Prediction Classify(const cv::Mat& img);
        std::vector<Prediction> ClassifyBatch(std::vector< cv::Mat>& img_batch);
    private:
        void SetMean(const std::string& mean_file);
    
        int InitCaffeNet();
    
        std::vector<float> Predict(const cv::Mat& img);
        
    
        void WrapInputLayer(std::vector<cv::Mat>* input_channels);
        
        void Preprocess(const cv::Mat& img,
            std::vector<cv::Mat>* input_channels);
    
        std::string model_file_;
        std::string trained_file_;
        std::string mean_file_;
        std::string label_file_;
        boost::shared_ptr<Net<float> > net_;
        cv::Size input_geometry_;
        int num_channels_;
        cv::Mat mean_;
        std::vector<string> labels_;    
    };
    
    
    //classifier.cpp
    #include "include/Classifier.h"
    #include <iomanip>
    #include <algorithm>
    #include <time.h>
    using namespace caffe;
    /* Return the indices of the top N values of vector v. */
    int Argmax(std::vector<float>& v) {
        
        std::vector<float>::iterator biggest = std::max_element(v.begin(), v.end());
        return std::distance(v.begin(), biggest);
    }
    void imagePadding(cv::Mat src, cv::Mat &dst)
    {
        int maxEdge = MAX(src.cols, src.rows);
        int paddingWidth = abs(src.cols - src.rows);
        int extraPaddingWidth = MIN(src.cols, src.rows) / 2;
        int xPaddingWidth = abs(src.cols - maxEdge) / 2 + extraPaddingWidth;
        int yPaddingWidth = abs(src.rows - maxEdge) / 2 + extraPaddingWidth;
        copyMakeBorder(src.clone(), dst, yPaddingWidth, yPaddingWidth, xPaddingWidth, xPaddingWidth, cv::BORDER_CONSTANT, cv::Scalar(255, 255, 255));
    
        //imshow("src", src);
        //imshow("dst", dst);
        //waitKey(0);
    }
    Classifier::~Classifier(){  }
    Classifier::Classifier(){ }
    int Classifier::LoadModelFile(std::string caffePath)
    {
        model_file_ = caffePath + "tibet_full_train_test.prototxt";
        trained_file_ = caffePath + "tibet_full.caffemodel";
        mean_file_ = caffePath + "Tibet_mean.binaryproto";
        label_file_ = caffePath + "synsetWords.txt";
    
        if (InitCaffeNet())//文件都存在,返回1,否则返回0
            return 1;
    }
    
    int Classifier::InitCaffeNet()
    {
    
    #ifdef CPU_ONLY
        Caffe::set_mode(Caffe::CPU);
    #else
        Caffe::set_mode(Caffe::GPU);
    #endif
    
        /* Load the network. */
        net_.reset(new Net<float>(model_file_, TEST));
        net_->CopyTrainedLayersFrom(trained_file_);
    
        CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
        CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";
    
        Blob<float>* input_layer = net_->input_blobs()[0];
        int num_inputs = net_->num_inputs();
        int num_outputs = net_->num_outputs();
    
    
        num_channels_ = input_layer->channels();
    
    
        CHECK(num_channels_ == 3 || num_channels_ == 1) << "Input layer should have 1 or 3 channels.";
        input_geometry_ = cv::Size(input_layer->width(), input_layer->height());
    
        /* Load the binaryproto mean file. */
        SetMean(mean_file_);
    
        /* Load labels. */
        std::ifstream labels(label_file_.c_str());
        CHECK(labels) << "Unable to open labels file " << label_file_;
        string line;
        while (std::getline(labels, line))
            labels_.push_back(string(line));
    
        Blob<float>* output_layer = net_->output_blobs()[0];
    
        CHECK_EQ(labels_.size(), output_layer->channels())
            << "Number of labels is different from the output layer dimension.";
        return 1;
    }
    
    Classifier::Classifier(const std::string& model_file,
                            const std::string& trained_file,
                            const std::string& mean_file,
                            const std::string& label_file)
    {
    
        model_file_ = model_file;
        trained_file_ = trained_file;
        mean_file_ = mean_file;
        label_file_ = label_file;
        InitCaffeNet();
    }
    
    
    
    static bool PairCompare(const std::pair<float, int>& lhs,
        const std::pair<float, int>& rhs) 
    {
        return lhs.first > rhs.first;
    }
    
    /* Return the top N predictions. */
    Prediction Classifier::Classify(const cv::Mat& img) {
    
        std::vector<float> output = Predict(img);
        int maxIdx = Argmax(output);
        //std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
        return std::make_pair(labels_[maxIdx],output[maxIdx]);
        //stringstream stream;
        //stream << maxIdx;
        //return std::make_pair(stream.str(), output[maxIdx]);
    }
    
    
    /* Load the mean file in binaryproto format. */
    void Classifier::SetMean(const std::string& mean_file) {
    
        Blob<float> mean_blob;
        BlobProto blob_proto;
        float *mean_ptr;
        unsigned int num_pixel;
    
        bool succeed = ReadProtoFromBinaryFile(mean_file, &blob_proto);
        if (succeed)
        {
            mean_blob.FromProto(blob_proto);
            CHECK_EQ(mean_blob.channels(), num_channels_)
                << "Number of channels of mean file doesn't match input layer.";
    
    
            num_pixel = mean_blob.count(); /* NCHW=1x3x256x256=196608 */
            //mean_ptr = (float *)mean_blob.cpu_data();
            mean_ptr = mean_blob.mutable_cpu_data();
            
            /* The format of the mean file is planar 32-bit float BGR or grayscale. */
            std::vector<cv::Mat> channels;
            for (int i = 0; i < num_channels_; ++i) 
            {
                /* Extract an individual channel. */
                cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1, mean_ptr);
                //cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1);
                //memcpy(channel.data, data, mean_blob.width()*mean_blob.height()*sizeof(float));
                channels.push_back(channel);
    
                //imshow("img", channel);
                //waitKey(0);
    
                mean_ptr += mean_blob.height() * mean_blob.width();
            }
            
            /* Merge the separate channels into a single image. */
            //cv::Mat mean(mean_blob.height(), mean_blob.width(), CV_32FC1);//;//
            cv::Mat mean;
            cv::merge(channels, mean);
            
            /* Compute the global mean pixel value and create a mean image
            * filled with this value. */
            cv::Scalar channel_mean = cv::mean(mean);//mean);//channels[0]
            mean_ = cv::Mat(input_geometry_, mean.type(), channel_mean);
            
            //imshow("img1", mean_);
            //waitKey(0);
        }
    
    
    }
    
    std::vector<float> Classifier::Predict(const cv::Mat& img) 
    {
        Blob<float>* input_layer = net_->input_blobs()[0];
        input_layer->Reshape(1, num_channels_,input_geometry_.height, input_geometry_.width);
        /* Forward dimension change to all layers. */
        net_->Reshape();
    
        std::vector<cv::Mat> input_channels;
        WrapInputLayer(&input_channels);
    
        Preprocess(img, &input_channels);
        net_->Forward(0);
    
        Blob<float>* output_layer = net_->output_blobs()[0];
        const float* begin = output_layer->cpu_data();
        const float* end = begin + output_layer->channels();
        return std::vector<float>(begin, end);
    }
    
    
    std::vector<Prediction> Classifier::ClassifyBatch(std::vector< cv::Mat>& img_batch)
    {
        Blob<float>* input_layer = net_->input_blobs()[0];
        input_layer->Reshape(img_batch.size(), num_channels_, input_geometry_.height, input_geometry_.width);
        /* Forward dimension change to all layers. */
        net_->Reshape();
    
        std::vector<cv::Mat> input_data;
        
        WrapInputLayer(&input_data);
        //clock_t st_tm = clock();
        std::vector<cv::Mat>::iterator it = input_data.begin();
        for (int i = 0; i < img_batch.size(); i++)
        {
            std::vector<cv::Mat>tmp_channls(3);
            tmp_channls.assign(input_data.begin() + i*num_channels_, input_data.begin() + (i + 1)*num_channels_);
            Preprocess(img_batch[i], &tmp_channls);
        }
        //std::cout << "do imgPreprocess cost time : " << (double)(clock() - st_tm) / CLOCKS_PER_SEC << std::endl;
        net_->Forward(0);
        Blob<float>* output_layer = net_->output_blobs()[0];
    
        std::vector<Prediction>predictions;
    
        /* Copy the output layer to a std::vector */
        for (int i = 0; i < img_batch.size(); i++)
        {
            const float* begin = output_layer->cpu_data()+i*output_layer->channels();
            const float* end = begin + output_layer->channels();
            std::vector<float> output = std::vector<float>(begin, end);
            int maxIdx = Argmax(output);
            //std::cout << labels_[maxIdx] << "prob:" << output[maxIdx] << std::endl;
            predictions.push_back(std::make_pair(labels_[maxIdx], output[maxIdx]));
        }
    
        return predictions;
    }
    /* Wrap the input layer of the network in separate cv::Mat objects
    * (one per channel). This way we save one memcpy operation and we
    * don't need to rely on cudaMemcpy2D. The last preprocessing
    * operation will write the separate channels directly to the input
    * layer. */
    void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
        Blob<float>* input_layer = net_->input_blobs()[0];
    
        int width = input_layer->width();
        int height = input_layer->height();
        float* input_data = input_layer->mutable_cpu_data();
        for (int j = 0; j < input_layer->num(); j++)
        {
            for (int i = 0; i < input_layer->channels(); ++i) {
                cv::Mat channel(height, width, CV_32FC1, input_data);
                input_channels->push_back(channel);
                input_data += width * height;
            }
        }
    
    }
    
    void Classifier::Preprocess(const cv::Mat& img,
        std::vector<cv::Mat>* input_channels) {
        /* Convert the input image to the input image format of the network. */
        cv::Mat img_padded=img;
        //imagePadding(img, img_padded);
    
        cv::Mat sample;
        if (img_padded.channels() == 3 && num_channels_ == 1)
            cv::cvtColor(img_padded, sample, cv::COLOR_BGR2GRAY);
        else if (img_padded.channels() == 4 && num_channels_ == 1)
            cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2GRAY);
        else if (img_padded.channels() == 4 && num_channels_ == 3)
            cv::cvtColor(img_padded, sample, cv::COLOR_BGRA2BGR);
        else if (img_padded.channels() == 1 && num_channels_ == 3)
            cv::cvtColor(img_padded, sample, cv::COLOR_GRAY2BGR);
        else
            sample = img_padded;
    
        cv::Mat sample_resized;
        if (sample.size() != input_geometry_)
            cv::resize(sample, sample_resized, input_geometry_);
        else
            sample_resized = sample;
    
        cv::Mat sample_float;
        if (num_channels_ == 3)
            sample_resized.convertTo(sample_float, CV_32FC3);
        else
            sample_resized.convertTo(sample_float, CV_32FC1);
    
        cv::Mat sample_normalized;
        cv::subtract(sample_float, mean_, sample_normalized);
    
        /* This operation will write the separate BGR planes directly to the
        * input layer of the network because it is wrapped by the cv::Mat
        * objects in input_channels. */
        cv::split(sample_normalized, *input_channels);
    
        //CHECK(reinterpret_cast<float*>(input_channels->at(0).data)
        //  == net_->input_blobs()[0]->cpu_data())
        //  << "Input channels are not wrapping the input layer of the network.";
    }
    

    使用时,在自己的工程中将头文件classifier.h包含进去,即可在调用处实例化一个类对像,并调用Classify方法即可。
    在你自己的工程中可能出现的问题(windows上很可能出现):

    F0519 14:54:12.494139 14504 layer_factory.hpp:77] Check failed: registry.count(t ype) == 1 (0 vs. 1) Unknown layer type: Convolution (known types: MemoryData)
    

    这里提供一种办法,是再创建一个头文件(cafferegister.h),将未知类型的层声明或注册即可,代码如下:

    #ifndef CAFFEREGISTER_H
    #define CAFFEREGISTRE_H
    #include "caffe/common.hpp"
    #include "caffe/layers/data_layer.hpp"
    #include "caffe/layers/input_layer.hpp"
    #include "caffe/layers/inner_product_layer.hpp"
    #include "caffe/layers/conv_layer.hpp"
    #include "caffe/layers/relu_layer.hpp"
    #include "caffe/layers/pooling_layer.hpp"
    #include "caffe/layers/softmax_layer.hpp"
    #include "caffe/layers/lrn_layer.hpp"
    #include "caffe/layers/dropout_layer.hpp"
    
    namespace caffe
    {
        extern INSTANTIATE_CLASS(DataLayer);
        //REGISTER_LAYER_CLASS(Data);
        extern INSTANTIATE_CLASS(InputLayer);
        //REGISTER_LAYER_CLASS(Input);
        extern INSTANTIATE_CLASS(InnerProductLayer);
        extern INSTANTIATE_CLASS(DropoutLayer);
        //REGISTER_LAYER_CLASS(Dropout);
        extern INSTANTIATE_CLASS(ConvolutionLayer);
    
        extern INSTANTIATE_CLASS(ReLULayer);
    
        extern INSTANTIATE_CLASS(PoolingLayer);
    
        extern INSTANTIATE_CLASS(LRNLayer);
    
        extern INSTANTIATE_CLASS(SoftmaxLayer);
    #ifdef WINDOWS
        REGISTER_LAYER_CLASS(Convolution);
        REGISTER_LAYER_CLASS(ReLU);
        REGISTER_LAYER_CLASS(Pooling);
        REGISTER_LAYER_CLASS(Softmax);
        REGISTER_LAYER_CLASS(LRN);
    #endif
    }
    
    #endif
    

    相关文章

      网友评论

          本文标题:深度学习框架之caffe(二) —模型训练和使用

          本文链接:https://www.haomeiwen.com/subject/jwwseftx.html