20180503
实验内容
You are offered a yale dataset. In the yale dataset, there are 15 persons, each person has eleven images. You are asked to write a program in MATLAB, to classify the human faces, and test your method. The number of training samples and test samples are decided by yourself . Please compute the recognition accuracy.
理论分析
人脸识别有多种方案,我采用DNN+KNN实现特征抽取效果与样本分类。
深度学习在图像分类,目标检测和图像分割等任务表现出了巨大的优越性。
但是伴随着模型精度的提升是计算量,存储空间以及能耗方面的巨大开销,对于移动或车载应用都是难以接受的。
之前的一些模型小型化工作是将焦点放在模型的尺寸上。
因此,在小型化方面常用的手段有:
(1)卷积核分解,使用1×N和N×1的卷积核代替N×N的卷积核
(2)使用bottleneck结构,以SqueezeNet为代表
(3)以低精度浮点数保存,例如Deep Compression
(4)冗余卷积核剪枝及哈弗曼编码
MobileNet进一步深入的研究了depthwise separable convolutions使用方法后设计出MobileNet,depthwiseseparable convolutions的本质是冗余信息更少的稀疏化表达。在此基础上给出了高效模型设计的两个选择:宽度因子(width multiplier)和分辨率因子(resolutionmultiplier);通过权衡大小、延迟时间以及精度,来构建规模更小、速度更快的MobileNet。Google团队也通过了多样性的实验证明了MobileNet作为高效基础网络的有效性。
最简单最初级的分类器是将全部的训练数据所对应的类别都记录下来,当测试对象的属性和某个训练对象的属性完全匹配时,便可以对其进行分类。但是怎么可能所有测试对象都会找到与之完全匹配的训练对象呢,其次就是存在一个测试对象同时与多个训练对象匹配,导致一个训练对象被分到了多个类的问题,基于这些问题呢,就产生了KNN。
KNN是通过测量不同特征值之间的距离进行分类。它的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别,其中K通常是不大于20的整数。KNN算法中,所选择的邻居都是已经正确分类的对象。该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。
实验结果
Python3.6,MacOS 10.13.3
硬件:2.9 GHz Intel Core i5,16 GB 1867 MHz DDR3
因为总共数据也不多,选择每类的1张图像预测,其余训练。
导出一个50176维向量的特征图
[ 0. 0. 0. ..., 1.0892477 0. 0. ]
但是,大部分维度没有能量,为0,所以进行pca降维,测试了多组参数。
全部特征
incorrect classificationn NO.[127] image and [4] class
0 0.933333333333
incorrect classificationn NO.[31 67] image and [3 7] class
1 0.866666666667
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [3 1] class
3 0.866666666667
incorrect classificationn NO.[34] image and [13] class
4 0.933333333333
mean correct ratio=0.906667
spend time=0.3180 s when knn fit 150 images and predict 15 images
pca(n=10) explained variance ratio: 0.368
incorrect classificationn NO.[ 69 164] image and [1 2] class
0 0.866666666667
incorrect classificationn NO.[ 31 67 108] image and [11 12 0] class
1 0.8
incorrect classificationn NO.[ 28 35 157] image and [ 0 13 2] class
2 0.8
incorrect classificationn NO.[31 91] image and [1 2] class
3 0.866666666667
incorrect classificationn NO.[34] image and [13] class
4 0.933333333333
mean correct ratio=0.853333
spend time=0.1152 s when knn fit 150 images and predict 15 images
pca(n=30) explained variance ratio: 0.589
incorrect classificationn NO.[127] image and [0] class
0 0.933333333333
incorrect classificationn NO.[31 67 94] image and [ 3 12 1] class
1 0.8
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [3 1] class
3 0.866666666667
incorrect classificationn NO.[34] image and [13] class
4 0.933333333333
mean correct ratio=0.893333
spend time=0.1487 s when knn fit 150 images and predict 15 images
pca(n=60) explained variance ratio: 0.744
incorrect classificationn NO.[] image and [] class
0 1.0
incorrect classificationn NO.[31 67 94] image and [3 7 1] class
1 0.8
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [3 2] class
3 0.866666666667
incorrect classificationn NO.[34] image and [7] class
4 0.933333333333
mean correct ratio=0.906667
spend time=0.2617 s when knn fit 150 images and predict 15 images
pca(n=100) explained variance ratio: 0.882
incorrect classificationn NO.[127] image and [4] class
0 0.933333333333
incorrect classificationn NO.[31 94] image and [3 1] class
1 0.866666666667
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [ 3 14] class
3 0.866666666667
incorrect classificationn NO.[34] image and [7] class
4 0.933333333333
mean correct ratio=0.906667
spend time=0.3598 s when knn fit 150 images and predict 15 images
pca(n=150) explained variance ratio: 0.993
incorrect classificationn NO.[127] image and [4] class
0 0.933333333333
incorrect classificationn NO.[31 67] image and [3 7] class
1 0.866666666667
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [3 1] class
3 0.866666666667
incorrect classificationn NO.[34] image and [13] class
4 0.933333333333
mean correct ratio=0.906667
spend time=0.1643 s when knn fit 150 images and predict 15 images
'''
'''
mean correct ratio=0.906667
spend time=0.3133 s when knn fit 150 images and predict 15 images
pca(n=10) explained variance ratio: 0.368 and time=0.495
mean correct ratio=0.853333
spend time=0.1075 s when knn fit 150 images and predict 15 images
pca(n=30) explained variance ratio: 0.589 and time=0.481
mean correct ratio=0.893333
spend time=0.1044 s when knn fit 150 images and predict 15 images
pca(n=60) explained variance ratio: 0.744 and time=0.962
mean correct ratio=0.906667
spend time=0.2009 s when knn fit 150 images and predict 15 images
pca(n=120) explained variance ratio: 0.936 and time=1.536
mean correct ratio=0.906667
spend time=0.3198 s when knn fit 150 images and predict 15 images
pca(n=180) explained variance ratio: 1.000 and time=0.696
mean correct ratio=0.906667
spend time=0.1525 s when knn fit 150 images and predict 15 images
可以看出使用全部特征确实得到了最好的结果,但是拖累了knn的运算速度,所以采用pca降维,降到180维,提高了一倍的速度并且没有精度损失。
错误分类
观察错误分类的图像,多是带有眼镜,表情等与其它训练集极大不同的旋转、尺度、扭曲变换,仅仅通过特征学习,可能并不能正确推理。
实验总结
众所周知通常CNN要求输入图像尺寸是固定的,比如现有的效果比较好的pre-trained的模型要求输入为224224,227227等。这个要求是CNN本身结构决定的,因为CNN一般包括多个全连接层,而全连接层神经元数目通常是固定的,如4096,4096,1000。这一限制决定了利用CNN提取的特征是单一尺度的,因为输入图像是单一的。
多尺度特征(multi-scale feature)能有效改善image retrieval, image classification以及object detection等任务的结果,以下参考已有文献介绍几种常见的利用CNN提取multi-scale feature的方法。
这里可以把multi-scale feature分为Fc特征(从全连接层提取的特征),Conv特征(从卷积层提取的特征)以及Fc与Conv特征结合三类。
本次实验采用的是224尺寸的图像,在imagenet上预训练,使其权重的特征抽取效果较好,所以最终录的93%的分类正确率。之后可以尝试概率图模型(A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs ),解决人装饰品和形变的问题。
在研究中使用了MobileNet,需特别引用:
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
代码
cluster.py
from matplotlib import pyplot as plt
from matplotlib import markers
import glob
import numpy as np
from keras.models import load_model
from keras.applications import MobileNet,mobilenet
from keras.applications.mobilenet import preprocess_input
from keras.preprocessing import image
from keras import backend as K
from sklearn.decomposition import PCA
from sklearn.externals import joblib
from sklearn.cluster import k_means
# 获取图像列表
import cv2
import os
import time
import json
workdir='/home/momo/Project/jurkis_ws/src/jurvis/scripts/Program/Outline/parameters/'
def restore_model(workdir=workdir):
global top_model
top_model = load_model(workdir+'mobilenet.h5', custom_objects={
'relu6': mobilenet.relu6,
'DepthwiseConv2D': mobilenet.DepthwiseConv2D})
print('load cluster model')
return top_model
def load_data():
imlist = glob.glob('/home/momo/Project/jurkis_ws/src/jurvis/scripts/Program/Outline/data/*.png')[:]
imgs=np.array(list(image.img_to_array(image.load_img(i,target_size=(224,224))) for i in imlist))
#导入的图像已经成float32,不清楚
imnbr = len(imlist)
print("The number of images is %d" % imnbr)
return imgs
def draw_into_a_pic(K,labels,imgs,**kwargs):
# imgs=list(map(cvtRGB,imgs))
L=len(imgs)
imgs=imgs.astype(np.uint8)
# plt.figure(figsize=(8,8))
# fig, axes=plt.subplots(2,K//2 if K%2==0 else K//2+1)
i_sum=0
for k in range(K):
ind = np.where(labels == k)[0]
for i in range(len(ind)):
plt.figure(0,figsize=(3,8))
plt.subplot(L+K,1,i+1+i_sum)
plt.imshow(imgs[ind[i]])
plt.axis('off')
i_sum+=len(ind)+1
def downDimension(data,train_model=False,n_components = 20):
'''
主成分分析,降维
:param data:
:return:
'''
file_name=workdir+'pca_'+str(n_components)+'.m'
if train_model or not os.path.isfile(file_name):
pca = PCA(n_components=n_components)
pca.fit(data)
joblib.dump(pca,file_name)
# print('pca(n=%d) explained variance ratio: %.4f' % (n_components,np.sum(pca.explained_variance_ratio_)))
#('pca18 ratio:', 0.9724457)
else:
pca=joblib.load(file_name)
results = pca.transform(data)
print('pca(n=%d) explained variance ratio: %.4f' % (n_components, np.sum(pca.explained_variance_ratio_)))
return results
def get_x(ims):
imgs=ims.copy()
# print(imgs.shape)#(21, 224, 224, 3)
imgs=preprocess_input(imgs)#归一化
# print(imgs.shape)#(21, 224, 224, 3)
# t1=time.time()
features=top_model.predict(imgs)
# t2=time.time()
# results.shape=(-1,7,7,1024)
# print('per picture speed time(ms):',(t2-t1)/len(features))#0.2 CPU #0.0469 GTX 960M
features=np.reshape(features,(len(features),-1))
return features
def save_centroid_npz(centroid,K,filename=workdir+'*means.npz',update_npz=False):
# 保存 centers,K
# 如果已经保存就恢复
filename=glob.glob(filename)[0]
if update_npz or not os.path.isfile(filename):
np.savez(filename,centroid,K)
else:
npdata = np.load(filename)
centroid, K = list(npdata[na] for na in npdata.files)
centroid = np.array(centroid,dtype=np.float32)
return centroid,K
def get_distance_score(x,cent):
# 0~1,越大越近
return 1/(1+np.linalg.norm(x-cent,axis=1))
def nearest_neighbor(x,cents,th=None):
'''
:param x: (1,D)
:param cents: list(np.ndarray) (K,D)
:th :判断是不是已知旧类的阈值
:return:
'''
dists=get_distance_score(x,cents)
max_Score_idx=np.argmax(dists)
max_Score=dists[max_Score_idx]
if np.all(th!=None):
if max_Score<th[max_Score_idx]*0.8:#手动放宽范围
max_Score_idx=5
return max_Score_idx
def two_D_visualization(datas,labels,cents):
data2d = datas[:, :2]
print(data2d.shape)
for k in range(max(labels+1)):
idx=np.where(labels == k)[0]
plt.scatter(data2d[idx, 1], data2d[idx, 0],label=str(k),s=40)
plt.scatter(cents[k][1],cents[k][0],marker='+',s=50)
def get_nearest_neighbor_threshold(datas=None,cents=None,labels=None):
'''
得到阈值,这个是分数阈值(0,1)越小越远,所以要倒着排
:param datas:
:param cents:
:param labels:
:return:
'''
file_name=workdir+'Dist_ThresholdK='+str(len(cents))+'.npy'
if os.path.isfile(file_name):
dist_threshold=np.load(file_name)
else:
dist_threshold=[]
for i,c in enumerate(cents):
idx=np.where(labels==i)
dists=get_distance_score(datas[idx],c)
dists=sorted(dists,reverse=True)
#2西格玛
id=len(dists)-1
# id= int(0.8*id if id >= 12 else id)
dist_threshold.append(dists[id])
# print(dist_threshold)
dist_threshold=np.array(dist_threshold)
np.save(file_name,dist_threshold)
return dist_threshold
def loadz(filename):
data=np.load(filename)
return list(data[n].astype(np.float32) for n in data.files)
class Who_am_I(object):
'''
此类给ros使用,综合上面各个函数
'''
def __init__(self):
restore_model()
self.__cents,self.__K=save_centroid_npz(None,None)
# print(self.__K)
self.__get_x=get_x
self.__down_Dimension=downDimension
self.__th=get_nearest_neighbor_threshold(None,self.__K)
def get_kind(self,img):
img=np.resize(img,(224,224))
x=self.__get_x(img[np.newaxis,:,:,:])
x=self.__down_Dimension(x)
label=nearest_neighbor(x, self.__cents, self.__th)
return label
if __name__ == '__main__':
K = 4
# hists=get_Hist(imgs)
filename=workdir+'data.npy'
imgs = load_data()
if not os.path.isfile(filename) :
restore_model()
datas = get_x(imgs)
datas = downDimension(datas)
np.save(filename, datas)
else:
datas = np.load(filename)
print(datas.shape)
# exit()
centroid, labels, inertia = k_means(np.float32(datas), K,copy_x=False)
# 多试几次,肯定能有个好点初始值
centroid, K=save_centroid_npz(centroid,K,False,filename=workdir+str(K)+'means.npz')
neighbor_threshold=get_nearest_neighbor_threshold(datas,centroid,labels)
new_labels=[]
for i in range(len(datas)):
new_labels.append(nearest_neighbor(datas[i],centroid,neighbor_threshold))
two_D_visualization(datas[:,:2],labels,centroid[:,:2])
draw_into_a_pic(K, labels, imgs)
print(np.array(new_labels))
print(labels)
plt.show()
main.py
import os
import numpy as np
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.externals import joblib
import json
from time import time,sleep
DownDimension=True
def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,
n_jobs=1, train_sizes=np.linspace(.1, 1.0, 5)):
"""
Generate a simple plot of the test and training learning curve.
Parameters
----------
estimator : object type that implements the "fit" and "predict" methods
An object of that type which is cloned for each validation.
title : string
Title for the chart.
X : array-like, shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and
n_features is the number of features.
y : array-like, shape (n_samples) or (n_samples, n_features), optional
Target relative to X for classification or regression;
None for unsupervised learning.
ylim : tuple, shape (ymin, ymax), optional
Defines minimum and maximum yvalues plotted.
cv : int, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if ``y`` is binary or multiclass,
:class:`StratifiedKFold` used. If the estimator is not a classifier
or if ``y`` is neither binary nor multiclass, :class:`KFold` is used.
Refer :ref:`User Guide <cross_validation>` for the various
cross-validators that can be used here.
n_jobs : integer, optional
Number of jobs to run in parallel (default 1).
"""
plt.figure()
plt.title(title)
if ylim is not None:
plt.ylim(*ylim)
plt.xlabel("Training examples")
plt.ylabel("Score")
train_sizes, train_scores, test_scores = learning_curve(
estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plt.grid()
plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std, alpha=0.1,
color="r")
plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
test_scores_mean + test_scores_std, alpha=0.1, color="g")
plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
label="Training score")
plt.plot(train_sizes, test_scores_mean, 'o-', color="g",
label="Cross-validation score")
plt.legend(loc="best")
return plt
def downDimension(data,train_model=False,n_components = 10):
'''
主成分分析,降维
:param data:
:return:
'''
T1=time()
file_name='./pca_'+str(n_components)+'.m'
if train_model or not os.path.isfile(file_name):
pca = PCA(n_components=n_components)
pca.fit(data)
# joblib.dump(pca,file_name)
# print('pca(n=%d) explained variance ratio: %.3f' % (n_components,np.sum(pca.explained_variance_ratio_)))
#('pca18 ratio:', 0.9724457)
else:
pca=joblib.load(file_name)
print('pca(n=%d) explained variance ratio: %.3f and time=%.3f' %
(n_components, np.sum(pca.explained_variance_ratio_),(time()-T1)))
results = pca.transform(data)
return results
# 载入数据,抽取特征
if not os.path.isfile('feature.npy'):
from cluster import image
from cluster import restore_model, get_x
file_dir = './yale_face/'
img_list = list(file_dir + 's' + str(i) + '.bmp' for i in range(1, len(os.listdir(file_dir)) + 1))
imgs = np.array(
list(image.img_to_array(image.load_img(i, target_size=(224, 224)).convert('RGB')) for i in img_list))
restore_model('./')
feature = get_x(imgs.copy())
np.save('feature.npy', feature)
else:
print('load feature.npy')
feature = np.load('feature.npy')
# 导入标签
with open('label.json', 'r') as f:
label = json.load(f)
assert len(feature) == len(label)
kind_num = 15
knn = KNeighborsClassifier(n_neighbors=5)
# 测试参数
from sklearn.model_selection import validation_curve,learning_curve,ShuffleSplit
import matplotlib.pyplot as plt
title = "Learning Curves 60 Components Of Features"
feature=downDimension(feature,n_components=60)
cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0)
plot_learning_curve(knn, title, feature, label, cv=cv, n_jobs=4)
# plt.show()
plt.savefig(title+'.tiff')
exit()
# param_range=np.arange(1,7)
# train_scores, test_scores =validation_curve(
# KNeighborsClassifier(), feature, label, cv=3, param_name='n_neighbors',param_range=np.arange(1,7),scoring="accuracy", n_jobs=4)
# train_scores_mean = np.mean(train_scores, axis=1)
# train_scores_std = np.std(train_scores, axis=1)
# test_scores_mean = np.mean(test_scores, axis=1)
# test_scores_std = np.std(test_scores, axis=1)
# plt.title("Validation Curve")
# plt.xlabel("$\gamma$")
# plt.ylabel("Score")
# plt.ylim(0.0, 1.1)
# lw = 2
# plt.plot(param_range, train_scores_mean, label="Training score",
# color="darkorange", lw=lw)
# plt.fill_between(param_range, train_scores_mean - train_scores_std,
# train_scores_mean + train_scores_std, alpha=0.2,
# color="darkorange", lw=lw)
# plt.plot(param_range, test_scores_mean, label="Cross-validation score",
# color="navy", lw=lw)
# plt.fill_between(param_range, test_scores_mean - test_scores_std,
# test_scores_mean + test_scores_std, alpha=0.2,
# color="navy", lw=lw)
# plt.legend(loc="best")
# # plt.show()
# plt.savefig('Validation Curve.tiff')
# exit()
def main(DownDimension,n,feature):
# n 降维到n个元素
# 交叉验证
# 每11个是一类,从中抽一个做预测,保证训练样本数均衡
# print(len(label))
scores=[]
T0=time()
# pca降维
if DownDimension:
feature=downDimension(feature,n_components=n)
for j in range(5): # 随机5次
np.random.seed(j)
test_index=list(i*11+t for i,t in enumerate(list(np.random.randint(0,11,size=kind_num))))
test_feature,test_label=[],[]
for t in test_index:
# print(i,t,i*11+t)
test_feature.append(feature[t])
test_label.append(label[t])
train_feature=np.delete(feature,test_index,axis=0)
train_label=np.delete(label,test_index,axis=0)
knn.fit(train_feature, train_label)
pred=knn.predict(test_feature)
# print("test_index",test_index)
# print("pred label",pred)
correct=(np.array(pred)==np.array(test_label))
# incorrect_index =np.where(~ correct)[0]
# print("incorrect classificationn NO.%s image and %s class" %
# (str(np.array(test_index)[incorrect_index]),str(np.array(pred)[incorrect_index])))
score=np.mean(correct)
# print(j,score)
scores.append(score)
print("mean correct ratio=%f" % (np.mean(scores)))
print("spend time=%.4f s when knn fit %d images and predict %d images" %
((time()-T0)/5,150,15))
if __name__ == '__main__':
main(False,0,feature)
print()
for n in [10,30,60,120,180]:
main(True,n,feature)
print()
sleep(1)
'''
全部特征
incorrect classificationn NO.[127] image and [4] class
0 0.933333333333
incorrect classificationn NO.[31 67] image and [3 7] class
1 0.866666666667
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [3 1] class
3 0.866666666667
incorrect classificationn NO.[34] image and [13] class
4 0.933333333333
mean correct ratio=0.906667
spend time=0.3180 s when knn fit 150 images and predict 15 images
pca(n=10) explained variance ratio: 0.368
incorrect classificationn NO.[ 69 164] image and [1 2] class
0 0.866666666667
incorrect classificationn NO.[ 31 67 108] image and [11 12 0] class
1 0.8
incorrect classificationn NO.[ 28 35 157] image and [ 0 13 2] class
2 0.8
incorrect classificationn NO.[31 91] image and [1 2] class
3 0.866666666667
incorrect classificationn NO.[34] image and [13] class
4 0.933333333333
mean correct ratio=0.853333
spend time=0.1152 s when knn fit 150 images and predict 15 images
pca(n=30) explained variance ratio: 0.589
incorrect classificationn NO.[127] image and [0] class
0 0.933333333333
incorrect classificationn NO.[31 67 94] image and [ 3 12 1] class
1 0.8
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [3 1] class
3 0.866666666667
incorrect classificationn NO.[34] image and [13] class
4 0.933333333333
mean correct ratio=0.893333
spend time=0.1487 s when knn fit 150 images and predict 15 images
pca(n=60) explained variance ratio: 0.744
incorrect classificationn NO.[] image and [] class
0 1.0
incorrect classificationn NO.[31 67 94] image and [3 7 1] class
1 0.8
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [3 2] class
3 0.866666666667
incorrect classificationn NO.[34] image and [7] class
4 0.933333333333
mean correct ratio=0.906667
spend time=0.2617 s when knn fit 150 images and predict 15 images
pca(n=100) explained variance ratio: 0.882
incorrect classificationn NO.[127] image and [4] class
0 0.933333333333
incorrect classificationn NO.[31 94] image and [3 1] class
1 0.866666666667
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [ 3 14] class
3 0.866666666667
incorrect classificationn NO.[34] image and [7] class
4 0.933333333333
mean correct ratio=0.906667
spend time=0.3598 s when knn fit 150 images and predict 15 images
pca(n=150) explained variance ratio: 0.993
incorrect classificationn NO.[127] image and [4] class
0 0.933333333333
incorrect classificationn NO.[31 67] image and [3 7] class
1 0.866666666667
incorrect classificationn NO.[157] image and [2] class
2 0.933333333333
incorrect classificationn NO.[31 91] image and [3 1] class
3 0.866666666667
incorrect classificationn NO.[34] image and [13] class
4 0.933333333333
mean correct ratio=0.906667
spend time=0.1643 s when knn fit 150 images and predict 15 images
'''
'''
mean correct ratio=0.906667
spend time=0.3133 s when knn fit 150 images and predict 15 images
pca(n=10) explained variance ratio: 0.368 and time=0.495
mean correct ratio=0.853333
spend time=0.1075 s when knn fit 150 images and predict 15 images
pca(n=30) explained variance ratio: 0.589 and time=0.481
mean correct ratio=0.893333
spend time=0.1044 s when knn fit 150 images and predict 15 images
pca(n=60) explained variance ratio: 0.744 and time=0.962
mean correct ratio=0.906667
spend time=0.2009 s when knn fit 150 images and predict 15 images
pca(n=120) explained variance ratio: 0.936 and time=1.536
mean correct ratio=0.906667
spend time=0.3198 s when knn fit 150 images and predict 15 images
pca(n=180) explained variance ratio: 1.000 and time=0.696
mean correct ratio=0.906667
spend time=0.1525 s when knn fit 150 images and predict 15 images
'''
'''
使用全部特征确实得到了最好的结果
但是拖累了knn的运算速度
所以采用pca降维
当降到180维时,提高了一倍的速度并且没有精度损失。
错误分类的分析
有表情啊……
我现在明白了,就这几张图,直接翻着看就好了,写程序检查反而费时间
'''
网友评论