当特征选择完成后,我们面临的问题就是有可能特征过多导致的计算量过大或者训练时间过长的问题。因此,对特征数据进行降维是一种比较实用的方法,最常用的便是“主成分分析法PCA”。有关PCA的介绍,接下来我会用单独一片文章介绍,还是蛮重要的。
python实现
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D
iris = datasets.load_iris()
data = iris.data
target=iris.target
pca_3 = PCA(n_components=3)
data_pca_3 = pca_3.fit_transform(data)
data_pca_3
colors = {
0: 'r',
1: 'b',
2: 'k'
}
markers = {
0: 'x',
1: 'D',
2: 'o'
}
fig = plt.figure(1,figsize=(8,6))
ax = Axes3D(fig, elev=-150, azim=110)
data_pca_gb = pd.DataFrame(data_pca_3).groupby(target)
for g in data_pca_gb.groups:
ax.scatter(data_pca_gb.get_group(g)[0], data_pca_gb.get_group(g)[1], data_pca_gb.get_group(g)[2],
c=colors[g],marker=markers[g],cmap=plt.cm.Paired)
pca_2 = PCA(n_components=2)
data_pca_2 = pca_2.fit_transform(data)
data_pca_gb = pd.DataFrame(data_pca_2).groupby(target)
for g in data_pca_gb.groups:
plt.scatter(
data_pca_gb.get_group(g)[0],
data_pca_gb.get_group(g)[1],
c=colors[g],
marker=markers[g]
)
网友评论