OC实现Softmax识别手写数字

作者: Jiao123 | 来源:发表于2016-09-27 18:08 被阅读14988次

    简介


    Softmax回归模型是logistic回归模型在多分类问题上的推广,在多分类问题中,类标签 y 可以取两个以上的值。Softmax模型运用广泛,很多复杂精细的训练模型最后一步都会用softmax来分配概率。

    公式


    Softmax回归最主要的代价函数如下:

    代价函数

    其中,1{-} 是示性函数,其取值规则为:1{值为真的表达式}= 1。
    根据代价函数,分别对k个分类进行求导,就可以得到:


    k分类偏导函数

    其中,Xi为类别j的概率:

    Xi为类别j的概率

    有了上面的偏导数公式以后,我们就可以用梯度下降算法更新theta值,每一次迭代需要进行如下更新:


    j = 1,...,k

    以上公式并没有涉及到bias优化,加上bias的公式形如:


    softmax-regression-vectorequation.png
    其实bias的迭代更新和theta一样,只需要找到bias的偏导数就行,我的代码中包含了对bias求导优化。

    数据


    训练所用数据集是MNIST,MNIST数据集的官网是Yann LeCun's website。这个数据集包含60000行的训练数据集和10000行的测试数据集。

    文件 内容
    train-images-idx3-ubyte.gz 训练集图片 - 60000 张 训练图片
    train-labels-idx1-ubyte.gz 训练集图片对应的数字标签
    t10k-images-idx3-ubyte.gz 测试集图片 - 10000 张 图片
    t10k-labels-idx1-ubyte.gz 测试集图片对应的数字标签

    其中每一张图片都是28*28的,矩阵表示如下:


    MNIST-Matrix.png

    在数据集中,每张图片都是一个展开的向量,长度是 28x28 = 784。因此,在MNIST训练数据集中,train-images-idx3-ubyte解析后是一个形状为 [60000, 784]的张量。


    mnist-train-xs.png

    而train-labels-idx1-ubyte则是一个长度60000的向量,每一个值对应train-images-idx3-ubyte中代表的数字范围0~9。
    测试数据与训练数据结构一样,只是数量只有10000张。

    下载过后读取与解析到数据代码:

    //
    //  MLLoadMNIST.m
    //  MNIST
    //
    //  Created by Jiao Liu on 9/23/16.
    //  Copyright © 2016 ChangHong. All rights reserved.
    //
    
    #import "MLLoadMNIST.h"
    
    @implementation MLLoadMNIST
    
    int reverseInt(int input)
    {
        unsigned char ch1, ch2, ch3, ch4;
        ch1=input&255;
        ch2=(input>>8)&255;
        ch3=(input>>16)&255;
        ch4=(input>>24)&255;
        return((int)ch1<<24)+((int)ch2<<16)+((int)ch3<<8)+ch4;
    }
    
    double **readImageData(const char *filePath)
    {
        FILE *file = fopen(filePath, "rb");
        double **output = NULL;
        if (file) {
            int magic_number=0;
            int number_of_images=0;
            int n_rows=0;
            int n_cols=0;
            fread((char*)&magic_number, sizeof(magic_number), 1, file);
            magic_number= reverseInt(magic_number);
            fread((char*)&number_of_images, sizeof(number_of_images), 1, file);
            number_of_images= reverseInt(number_of_images);
            fread((char*)&n_rows, sizeof(n_rows), 1, file);
            n_rows= reverseInt(n_rows);
            fread((char*)&n_cols, sizeof(n_cols), 1, file);
            n_cols= reverseInt(n_cols);
            output = (double **)malloc(sizeof(double) * number_of_images);
            for(int i=0;i<number_of_images;++i)
            {
                output[i] = (double *)malloc(sizeof(double) * n_rows * n_cols);
                for(int r=0;r<n_rows;++r)
                {
                    for(int c=0;c<n_cols;++c)
                    {
                        unsigned char temp=0;
                        fread((char*)&temp, sizeof(temp), 1, file);
                        output[i][(n_rows*r)+c]= (double)temp;
                    }
                }
            }
        }
        fclose(file);
        return output;
    }
    
    int *readLabelData(const char *filePath)
    {
        FILE *file = fopen(filePath, "rb");
        int *output = NULL;
        if (file) {
            int magic_number=0;
            int number_of_items=0;
            fread((char*)&magic_number, sizeof(magic_number), 1, file);
            magic_number= reverseInt(magic_number);
            fread((char*)&number_of_items, sizeof(number_of_items), 1, file);
            number_of_items= reverseInt(number_of_items);
            output = (int *)malloc(sizeof(int) * number_of_items);
            for(int i=0;i<number_of_items;++i)
            {
                unsigned char temp=0;
                fread((char*)&temp, sizeof(temp), 1, file);
                output[i]= (int)temp;
            }
        }
        fclose(file);
        return output;
    }
    

    Softmax实现


    • 首先我这里选用的梯度下降算法去逼近最值,所以需要一个迭代次数,代码中默认的是500次。输入训练图片是60000,如果每次迭代都全部用上,训练会花去很多时间,所以每次迭代默认随机取100张图片进行训练。
    • 下降梯度的速率默认是0.01。
    • 代码中实现两种梯度下降逼近,一种是每次迭代依次使用每张图片去更新所有分类变量,另一种是每次迭代顺序更新每个分类变量,更新每个分类时候使用所有随机图片数据。两种方法其实效率一样,但是测试时候发现第二种方法的正确率略高。

    代码如下:

    //
    //  MLSoftMax.m
    //  MNIST
    //
    //  Created by Jiao Liu on 9/26/16.
    //  Copyright © 2016 ChangHong. All rights reserved.
    //
    
    #import "MLSoftMax.h"
    
    @implementation MLSoftMax
    
    - (id)initWithLoopNum:(int)loopNum dim:(int)dim type:(int)type size:(int)size descentRate:(double)rate
    {
        self = [super init];
        if (self) {
            _iterNum = loopNum == 0 ? 500 : loopNum;
            _dim = dim;
            _kType = type;
            _randSize = size == 0 ? 100 : size;
            _bias = malloc(sizeof(double) * type);
            _theta = malloc(sizeof(double) * type * dim);
            for (int i = 0; i < type; i++) {
                _bias[i] = 0;
                for (int j = 0; j < dim; j++) {
                    _theta[i * dim +j] = 0.0f;
                }
            }
            
            _descentRate = rate == 0 ? 0.01 : rate;
        }
        return  self;
    }
    
    - (void)dealloc
    {
        if (_bias != NULL) {
            free(_bias);
            _bias = NULL;
        }
        
        if (_theta != NULL) {
            free(_theta);
            _theta = NULL;
        }
        
        if (_randomX != NULL) {
            free(_randomX);
            _randomX = NULL;
        }
        
        if (_randomY != NULL) {
            free(_randomY);
            _randomY = NULL;
        }
    }
    
    #pragma mark - SoftMax Main
    
    - (void)randomPick:(int)maxSize
    {
        long rNum = random();
        for (int i = 0; i < _randSize; i++) {
            _randomX[i] = _image[(rNum+i) % maxSize];
            _randomY[i] = _label[(rNum+i) % maxSize];
        }
    }
    /*
    - (double *)MaxPro:(double *)index
    {
        long double maxNum = index[0];
        for (int i = 1; i < _kType; i++) {
            maxNum = MAX(maxNum, index[i]);
        }
        
        long double sum = 0;
        for (int i = 0; i < _kType; i++) {
            index[i] -= maxNum;
            index[i] = expl(index[i]);
            sum += index[i];
        }
        
        for (int i = 0; i < _kType; i++) {
            index[i] /= sum;
        }
        return index;
    }
    
    - (void)updateModel:(double *)index currentPos:(int)pos
    {
        double *delta = malloc(sizeof(double) * _kType);
        for (int i = 0; i < _kType; i++) {
            if (i != _randomY[pos]) {
                delta[i] = 0.0 - index[i];
            }
            else
            {
                delta[i] = 1.0 - index[i];
            }
            
            _bias[i] -= _descentRate * delta[i];
            
            for (int j = 0; j < _dim; j++) {
                _theta[i * _dim +j] += _descentRate * delta[i] * _randomX[pos][j] / _randSize;
            }
        }
        
        if (delta != NULL) {
            free(delta);
            delta = NULL;
        }
    }
    
    - (void)train
    {
        _randomX = malloc(sizeof(double) * _randSize);
        _randomY = malloc(sizeof(int) * _randSize);
        double *index = malloc(sizeof(double) * _kType);
        
        for (int i = 0; i < _iterNum; i++) {
            [self randomPick:_trainNum];
            for (int j = 0; j < _randSize; j++) {
                // calculate wx+b
                vDSP_mmulD(_theta, 1, _randomX[j], 1, index, 1, _kType, 1, _dim);
                vDSP_vaddD(index, 1, _bias, 1, index, 1, _kType);
                
                index = [self MaxPro:index];
                [self updateModel:index currentPos:j];
            }
        }
        if (index != NULL) {
            free(index);
            index = NULL;
        }
    }
    */
    
    - (int)indicator:(int)label var:(int)x
    {
        if (label == x) {
            return 1;
        }
        return 0;
    }
    
    - (double)sigmod:(int)type index:(int) index
    {
        double up = 0;
        for (int i = 0; i < _dim; i++) {
            up += _theta[type * _dim + i] * _randomX[index][i];
        }
        up += _bias[type];
        
        double *down = malloc(sizeof(double) * _kType);
        double maxNum = -0xfffffff;
        vDSP_mmulD(_theta, 1, _randomX[index], 1, down, 1, _kType, 1, _dim);
        vDSP_vaddD(down, 1, _bias, 1, down, 1, _kType);
        
        for (int i = 0; i < _kType; i++) {
            maxNum = MAX(maxNum, down[i]);
        }
        
        double sum = 0;
        for (int i = 0; i < _kType; i++) {
            down[i] -= maxNum;
            sum += exp(down[i]);
        }
        
        if (down != NULL) {
            free(down);
            down = NULL;
        }
        
        return exp(up - maxNum) / sum;
    }
    
    - (double *)fderivative:(int)type
    {
        double *outP = malloc(sizeof(double) * _dim);
        for (int i = 0; i < _dim; i++) {
            outP[i] = 0;
        }
        
        double *inner = malloc(sizeof(double) * _dim);
        for (int i = 0; i < _randSize; i++) {
            long double sig = [self sigmod:type index:i];
            int ind = [self indicator:_randomY[i] var:type];
            double loss = -_descentRate * (ind - sig) / _randSize;
            _bias[type] += loss * _randSize;
            vDSP_vsmulD(_randomX[i], 1, &loss, inner, 1, _dim);
            vDSP_vaddD(outP, 1, inner, 1, outP, 1, _dim);
        }
        if (inner != NULL) {
            free(inner);
            inner = NULL;
        }
        
        return outP;
    }
    
    - (void)train
    {
        _randomX = malloc(sizeof(double) * _randSize);
        _randomY = malloc(sizeof(int) * _randSize);
        for (int i = 0; i < _iterNum; i++) {
            [self randomPick:_trainNum];
            for (int j = 0; j < _kType; j++) {
                double *newTheta = [self fderivative:j];
                for (int m = 0; m < _dim; m++) {
                    _theta[j * _dim + m] = _theta[j * _dim + m] - _descentRate * newTheta[m];
                }
                if (newTheta != NULL) {
                    free(newTheta);
                    newTheta = NULL;
                }
            }
        }
    }
    
    - (void)saveTrainDataToDisk
    {
        NSFileManager *fileManager = [NSFileManager defaultManager];
        NSString *thetaPath = [[NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) objectAtIndex:0] stringByAppendingString:@"/Theta.txt"];
    //    NSLog(@"%@",thetaPath);
        NSData *data = [NSData dataWithBytes:_theta length:sizeof(double) *  _dim * _kType];
        [fileManager createFileAtPath:thetaPath contents:data attributes:nil];
        
        NSString *biasPath = [[NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES) objectAtIndex:0] stringByAppendingString:@"/bias.txt"];
        data = [NSData dataWithBytes:_bias length:sizeof(double) * _kType];
        [fileManager createFileAtPath:biasPath contents:data attributes:nil];
    }
    
    - (int)predict:(double *)image
    {
        double maxNum = -0xffffff;
        int label = -1;
        double *index = malloc(sizeof(double) * _kType);
        vDSP_mmulD(_theta, 1, image, 1, index, 1, _kType, 1, _dim);
        vDSP_vaddD(index, 1, _bias, 1, index, 1, _kType);
        for (int i = 0; i < _kType; i++) {
            if (index[i] > maxNum) {
                maxNum = index[i];
                label = i;
            }
        }
        return label;
    }
    
    - (int)predict:(double *)image withOldTheta:(double *)theta andBias:(double *)bias
    {
        double maxNum = -0xffffff;
        int label = -1;
        double *index = malloc(sizeof(double) * _kType);
        vDSP_mmulD(theta, 1, image, 1, index, 1, _kType, 1, _dim);
        vDSP_vaddD(index, 1, bias, 1, index, 1, _kType);
        for (int i = 0; i < _kType; i++) {
            if (index[i] > maxNum) {
                maxNum = index[i];
                label = i;
            }
        }
        return label;
    }
    
    @end
    

    最后训练结果:

    Simulator Screen Shot

    不断改变循环次数,下降速率等参数,都会带来正确率的变化。测试结果发现默认参数能带来接近最好的识别的正确率,90%左右👏。

    结语


    90%的正确率并非达到最优,因为这仅仅是个简单的模型,可以加上卷积神经网络来改善效果。
    关于卷积神经网络实现,我也实现了一版,但由于效率与正确率不高,后面优化后再分享😊。

    相关文章

      网友评论

      • 萧城x:很厉害
      • 爱学习的我:我的天,看不懂,对数学有排斥!哈哈
      • 清無:这个高大上,,,,
        Jiao123:@菲拉兔 谢谢
      • FDSnail:作者大大。我对这个手写识别很感兴趣。能给我发给demo吗?393213639@qq.com 万分感谢。
        Jiao123:@FDSnail 感谢关注,现在代码还在完善中,后面会分享到github方便下载。

      本文标题:OC实现Softmax识别手写数字

      本文链接:https://www.haomeiwen.com/subject/nhocyttx.html