用朴素贝叶斯实现垃圾邮件分类器,解题代码如下
numTrainDocs = 700;
numTokens = 2500;
M = dlmread('F:\machine\ex6DataPrepared\train-features.txt', ' ');
spmatrix = sparse(M(:,1), M(:,2), M(:,3), numTrainDocs, numTokens);
train_matrix = full(spmatrix);
y = dlmread('F:\machine\ex6DataPrepared\train-labels.txt', ' ');
spam=find(y==1);
nonspam=find(y==0);
p_y=length(spam)/numTrainDocs;
xofspam=zeros(numTokens,1);
xofnonspam=zeros(numTokens,1);
for i=1:numTokens
xofspam(i,1)=sum(train_matrix(spam,i));
xofnonspam(i,1)=sum(train_matrix(nonspam,i));
end
word=sum(train_matrix,2);
fi_y1=(xofspam+1)./(sum(word(spam))+numTokens);
fi_y0=(xofnonspam+1)./(sum(word(nonspam))+numTokens);
%以上是train
%以下是test
numTestDocs = 260;
M =dlmread('F:\machine\ex6DataPrepared\test-features.txt', ' ');
test_spmatrix = sparse(M(:,1), M(:,2), M(:,3), numTestDocs, numTokens);
test_matrix = full(test_spmatrix);
test_result=zeros(numTestDocs,1);
a=test_matrix*log(fi_y1);
b=test_matrix*log(fi_y0);
test_result=a>b;
test_labels=dlmread('F:\machine\ex6DataPrepared\test-labels.txt', ' ');
length(find(test_result-test_labels));
对公式理解的两处错误导致我改了一晚上bug,以及MATLAB使用不熟练导致代码冗余,一个矩阵运算或者一个函数就可以搞定的问题我就傻傻的写了for循环。
网友评论