机器学习CS229:朴素贝叶斯&exercise6

作者: 小太阳花儿 | 来源:发表于2017-09-25 21:43 被阅读9次

    用朴素贝叶斯实现垃圾邮件分类器,解题代码如下

    numTrainDocs = 700;

    numTokens = 2500;

    M = dlmread('F:\machine\ex6DataPrepared\train-features.txt', ' ');

    spmatrix = sparse(M(:,1), M(:,2), M(:,3), numTrainDocs, numTokens);

    train_matrix = full(spmatrix);

    y = dlmread('F:\machine\ex6DataPrepared\train-labels.txt', ' ');

    spam=find(y==1);

    nonspam=find(y==0);

    p_y=length(spam)/numTrainDocs;

    xofspam=zeros(numTokens,1);

    xofnonspam=zeros(numTokens,1);

    for i=1:numTokens

    xofspam(i,1)=sum(train_matrix(spam,i));

    xofnonspam(i,1)=sum(train_matrix(nonspam,i));

    end

    word=sum(train_matrix,2);

    fi_y1=(xofspam+1)./(sum(word(spam))+numTokens);

    fi_y0=(xofnonspam+1)./(sum(word(nonspam))+numTokens);

    %以上是train

    %以下是test

    numTestDocs = 260;

    M =dlmread('F:\machine\ex6DataPrepared\test-features.txt', ' ');

    test_spmatrix = sparse(M(:,1), M(:,2), M(:,3), numTestDocs, numTokens);

    test_matrix = full(test_spmatrix);

    test_result=zeros(numTestDocs,1);

    a=test_matrix*log(fi_y1);

    b=test_matrix*log(fi_y0);

    test_result=a>b;

    test_labels=dlmread('F:\machine\ex6DataPrepared\test-labels.txt', ' ');

    length(find(test_result-test_labels));

    对公式理解的两处错误导致我改了一晚上bug,以及MATLAB使用不熟练导致代码冗余,一个矩阵运算或者一个函数就可以搞定的问题我就傻傻的写了for循环。

    相关文章

      网友评论

        本文标题:机器学习CS229:朴素贝叶斯&exercise6

        本文链接:https://www.haomeiwen.com/subject/taxjextx.html