使用forge数据集,分别对近邻K=1,3,9进行对比分析决策边界。
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import mglearn
from sklearn.neighbors import KNeighborsClassifier
X,y=mglearn.datasets.make_forge()
fig,axes=plt.subplots(1,3,figsize=(10,3))
for n_neighbors,ax in zip([1,3,9],axes):
clf=KNeighborsClassifier(n_neighbors=n_neighbors).fit(X,y)
mglearn.plots.plot_2d_separator(clf,X,fill=True,eps=0.5,ax=ax,alpha=.4)
mglearn.discrete_scatter(X[:,0],X[:,1],y,ax=ax)
ax.set_title("{} neighbors".format(n_neighbors))
ax.set_xlabel("feature 0")
ax.set_ylabel("feature 1")
axes[0].legend(loc=3)
输出结果如下图所示:
可发现:K=1时,决策边界与训练数据非常靠近,且边界较为复杂;随着K增大,边界线逐渐平滑。对应的,K增大时,模型复杂度在降低。
网友评论