楔子:
我们在R的使用中会出现,包无法导入成功,并提示你需要安装的依赖包,这是因为我们是本地安装的,如果联网会自动安装依赖包。有时还会出现R的版本与包不匹配,此时也只需要安装更换R的版本就可以了,但是注意一点,更换R的版本后,所有包都要重新安装。一般包的现有版本对应R5.0,以前的版本对应R4.0。
本地安装包的同学一定要有耐心,如果没有,就多试几次在线安装。
1、数据预处理
这里与Python的区别在于不用把自变量和因变量分开,我们用特征名表示因变量,导入数据,划分训练集和测试集,然后对除了因变量以外的其它特征进行特征缩放。
代码:
# Importing the dataset
dataset = read.csv('Wine.csv')
# Splitting the dataset into the Training set and Test set
# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Customer_Segment, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
# Feature Scaling
training_set[-14] = scale(training_set[-14])
test_set[-14] = scale(test_set[-14])
2、主成分分析降维
代码:
# Applying PCA
# install.packages('caret')
library(caret)
# install.packages('e1071')
library(e1071)
pca = preProcess(x = training_set[-14], method = 'pca', pcaComp = 2)
training_set = predict(pca, training_set)
training_set = training_set[c(2, 3, 1)]
test_set = predict(pca, test_set)
test_set = test_set[c(2, 3, 1)]
主成分分析,比较重要的参数就是我们要得到的特征的个数,或者说是特征所占的总的方差比是多少。
3、运用SVM算法拟合分类模型
代码:
library(e1071)
classifier = svm(formula = Customer_Segment ~ .,
data = training_set,
type = 'C-classification',
kernel = 'linear')
# Predicting the Test set results
y_pred = predict(classifier, newdata = test_set[-3])
# Making the Confusion Matrix
cm = table(test_set[, 3], y_pred)
支持向量机前面提到过,与其它分类算法不同的是考虑到异常值,并且根据异常值建模。
4、将分类模型可视化显示
代码:
# Visualising the Training set results
library(ElemStatLearn)
set = training_set
X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('PC1', 'PC2')
y_grid = predict(classifier, newdata = grid_set)
plot(set[, -3],
main = 'SVM (Training set)',
xlab = 'PC1', ylab = 'PC2',
xlim = range(X1), ylim = range(X2))
contour(X1, X2, matrix(as.numeric(y_grid), length(X1), length(X2)), add = TRUE)
points(grid_set, pch = '.', col = ifelse(y_grid == 2, 'deepskyblue', ifelse(y_grid == 1, 'springgreen3', 'tomato')))
points(set, pch = 21, bg = ifelse(set[, 3] == 2, 'blue3', ifelse(set[, 3] == 1, 'green4', 'red3')))
# Visualising the Test set results
library(ElemStatLearn)
set = test_set
X1 = seq(min(set[, 1]) - 1, max(set[, 1]) + 1, by = 0.01)
X2 = seq(min(set[, 2]) - 1, max(set[, 2]) + 1, by = 0.01)
grid_set = expand.grid(X1, X2)
colnames(grid_set) = c('PC1', 'PC2')
y_grid = predict(classifier, newdata = grid_set)
plot(set[, -3], main = 'SVM (Test set)',
xlab = 'PC1', ylab = 'PC2',
xlim = range(X1), ylim = range(X2))
contour(X1, X2, matrix(as.numeric(y_grid), length(X1), length(X2)), add = TRUE)
points(grid_set, pch = '.', col = ifelse(y_grid == 2, 'deepskyblue', ifelse(y_grid == 1, 'springgreen3', 'tomato')))
points(set, pch = 21, bg = ifelse(set[, 3] == 2, 'blue3', ifelse(set[, 3] == 1, 'green4', 'red3')))
可视化显示,由于我们之前显示的分类结果都是两类,这次是三类,所以有些地方是需要修改的,比如我们根据分类的类型,决定显示的颜色,另外就是X轴和Y轴的标签是需要修改的。我们可以看出这个效果是非常好的。
网友评论