美文网首页数据分析
sklearn SVM的参数与R语言的区别

sklearn SVM的参数与R语言的区别

作者: leengsmile | 来源:发表于2017-04-03 15:31 被阅读410次

    sklearn的SVM函数没有对数据做scale操作,而e1071包的对应函数做数据做了scale。因此在R语言中需要指定scale=FALSE,才会产生跟sklearn类似的结果。

    这里以Machine learning with R(机器学习与R语言)一书的letter recognition举例,该数据集也在UCI数据库中,uci letter recognition,这里为了可重复性,使用UCI的数据。

    首先在python中,使用pandas读取相应的数据,并将前16000条数据放入训练集,后4000条数据放入测试集,用以评估svm的预测性能。

    import pandas as pd
    letter_reco_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/letter-recognition.data"
    colnames = [
     "letter", "xbox", "ybox", "width", "height", "onpix", "xbar", "ybar", "x2bar", "y2bar",
                "xybar", "x2ybar", "xy2bar", "xedge", "xedgey", "yedge", "yedgex"   
    ]
    letter_data = pd.read_csv(letter_reco_path, header = None, names = colnames)
    training = letter_data.iloc[0:16000,]
    testing = letter_data.iloc[16000:, ]
    X_train, y_train = training.ix[:, 1:].values, training.ix[:, 0].values
    X_train, y_train = training.ix[:, 1:].values, training.ix[:, 0].values
    

    下面使用sklearn的SVC进行SVM的分类,并使用高斯核。

    from sklearn.svm import SVC
    svm_model = SVC(kernel="rbf", random_state=1071).fit(X_train, y_train)
    

    再对测试集进行预测,得到预测精度0.9722。

    svm_pred = svm_model.predict(X_test)
    from sklearn.metrics import accuracy_score
    accuracy_score(y_test, svm_pred)
    0.97224999999999995
    

    同样地,在R语言中,读取UCI对应的数据,把前16000条放入训练集,剩下的放入测试集。

    letter_reco_path <- "https://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/letter-recognition.data"
    colnames <- c("letter", "xbox", "ybox", "width", "height", "onpix", "xbar", "ybar", "x2bar", "y2bar", "xybar", "x2ybar", "xy2bar", "xedge", "xedgey", "yedge", "yedgex")
    
    letter_data <- read.csv( letter_reco_path, header = FALSE,  col.names = colnames)
    training_index <- seq.int(1, 16000)
    
    training <- letter_data[training_index, ]
    testing <- letter_data[-training_index, ]
    

    通过e1071的svm函数做对应的模型训练,使用高斯核,且对数据不做scale操作,即scale=FALSE

    svm_model2 <- svm(
        letter ~.,
        data = training,
        kernal = "radial",
        type = "C-classification",
        scale = FALSE
    )
    

    再通过predict对测试集进行预测,得到精度,0.9725,与sklearn的精度接近。

    svm_pred2 <- predict(svm_model2, newdata = testing)
    table(svm_pred2 == testing$letter) %>% prop.table
    
     FALSE   TRUE
    0.0275 0.9725
    

    查阅文档,发现sklearn的SVC函数不会对数据做scale操作,而e1071的svm函数默认情况下有scale的操作,需要在实际的使用中注意这种差异。

    相关文章

      网友评论

        本文标题:sklearn SVM的参数与R语言的区别

        本文链接:https://www.haomeiwen.com/subject/dbqoottx.html