R语言(一)SVM & LDA

Yangkai Hong


    The R package “e1071” has the implementation of SVM with a number of kernel choices. Try to classify Ionosphere dataset from “mlbench” package with:


    (1) Linear kernel, polynomial kernel with different degrees, and radial basis kernel.

    linear.model <- svm(x=Ionosphere[,-35],y=Ionosphere[,35],kernel='linear',type='C-classification',scale=FALSE)
    poly3.model <- svm(x=Ionosphere[,-35],y=Ionosphere[,35],kernel='polynomial',degree=3,type='C-classification',scale=FALSE)
    poly6.model <- svm(x=Ionosphere[,-35],y=Ionosphere[,35],kernel='polynomial',degree=6,type='C-classification',scale=FALSE)
    radial.model <- svm(x=Ionosphere[,-35],y=Ionosphere[,35],kernel='radial',type='C-classification',scale=FALSE)

    (2) Benchmark your classification accuracy using 10-fold cross-validation.

    fold <- createFolds(Ionosphere$Class,k=10)
    linearTrue <- c()
    poly3True <- c()
    poly6True <- c()
    radialTrue <- c()
    for(i in 1:length(fold)){
      truth <- Ionosphere$Class[fold[[i]]]
      linearPreds <- predict(linear.model,newdata = Ionosphere[fold[[i]],-35])
      poly3Preds <- predict(poly3.model,newdata = Ionosphere[fold[[i]],-35])
      poly6Preds <- predict(poly6.model,newdata = Ionosphere[fold[[i]],-35])
      radialPreds <- predict(radial.model,newdata = Ionosphere[fold[[i]],-35])
      linearTrue <- c(linearTrue,sum(linearPreds==truth))
      poly3True <- c(poly3True,sum(poly3Preds==truth))
      poly6True <- c(poly6True,sum(poly6Preds==truth))
      radialTrue <- c(radialTrue,sum(radialPreds==truth))
    cat(c("Linear kernel accuracy:",sum(linearTrue)/nrow(Ionosphere),"\n"))
    ## Linear kernel accuracy: 0.923076923076923
    cat(c("Polynomial kernel with degree 3 accuracy:",sum(poly3True)/nrow(Ionosphere),"\n"))
    ## Polynomial kernel with degree 3 accuracy: 0.689458689458689
    cat(c("Polynomial kernel with degree 6 accuracy:",sum(poly6True)/nrow(Ionosphere),"\n"))
    ## Polynomial kernel with degree 6 accuracy: 0.641025641025641
    cat(c("Radial kernel accuracy:",sum(radialTrue)/nrow(Ionosphere),"\n"))
    ## Radial kernel accuracy: 0.945868945868946

    (3) Repeat the above classification using LDA with 10-fold cross-validation.

    ionosphere <- Ionosphere[,-2] #delete constant column
    lda.model <- lda(Class~.,ionosphere) 
    ldaTrue <- c()
    for(i in 1:length(fold)){
      truth <- ionosphere$Class[fold[[i]]]
      ldaPreds <- predict(lda.model,ionosphere[fold[[i]],-34])$posterior[,'good'] 
      lda.decision <- ifelse(ldaPreds > 0.5,'good','bad')
      ldaTrue <- c(ldaTrue,sum(lda.decision==truth))
    cat(c("LDA accuracy:",sum(ldaTrue)/nrow(ionosphere)))
    ## LDA accuracy: 0.9002849002849

    (4) Comment on the linear separability of the data based on the classification result using SVM with different kernels and LDA.

    The linear separability of the data is high. Because accuracy of both linear kernel SVM and LDA are high, while accuracy of polynomial kernel SVM with degree 3 and 6 are low.



