美文网首页
(十五、)bagging算法

(十五、)bagging算法

作者: 羽天驿 | 来源:发表于2020-04-07 08:35 被阅读0次

    一、bagging、

    首先bagging算法是集成学习中两大类算法中的其中一个代表算法,还有另一类的经典算法是Xgboost。他们主要的区别是前者学习器之间不存在依赖关系和可以并行生成学习器,后者学习器之间存在强依赖关系和可以串行生成学习器。
    bagging算法:(1)bagging算法可以解决回归问题和分类问题。(2)它从原始数据中随机抽取n个样本,重复s次,于是就有个s个训练集,每个训练集都可以训练出一个弱分类器,最终生成s个弱分类器,预测结果将有这些分类器投票决定(选择分类器投票结果中最多的类别作为最后预测结果)。代表的有随机森林。

    二、代码实现

    import numpy as np
    
    from sklearn.neighbors import KNeighborsClassifier
    
    from sklearn.ensemble import BaggingClassifier
    
    from sklearn import datasets
    
    from sklearn.model_selection import train_test_split
    
    from sklearn.linear_model import LogisticRegression
    
    from sklearn.tree import DecisionTreeClassifier
    
    X,y = datasets.load_wine(True)
    X_train,X_test,y_train,y_test = train_test_split(X,y,random_state = 10244021)
    
    # 一个算法,准确率 71%
    knn = KNeighborsClassifier()
    knn.fit(X_train,y_train)
    knn.score(X_test,y_test)
    
    0.7111111111111111
    
    # 100个算法,集成算法,准确提升到了77.7%
    knn = KNeighborsClassifier()
    # bag中100个knn算法
    bag = BaggingClassifier(base_estimator=knn,n_estimators=100,max_samples=0.8,max_features=0.7)
    bag.fit(X_train,y_train)
    bag.score(X_test,y_test)
    
    0.7777777777777778
    
    import warnings
    warnings.filterwarnings('ignore')
    
    lr = LogisticRegression()
    lr.fit(X_train,y_train)
    lr.score(X_test,y_test)
    
    0.9555555555555556
    
    bag = BaggingClassifier(base_estimator=LogisticRegression(),n_estimators=100,
                            max_samples=0.7,max_features=0.5)
    bag.fit(X_train,y_train)
    bag.score(X_test,y_test)
    
    0.9555555555555556
    
    clf = DecisionTreeClassifier()
    clf.fit(X_train,y_train)
    clf.score(X_test,y_test)
    
    0.9333333333333333
    
    bag = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=100,
                            max_samples=1.0,max_features=0.5)
    bag.fit(X_train,y_train)
    bag.score(X_test,y_test)
    
    0.9777777777777777
    
    import matplotlib.pyplot as plt
    from sklearn import tree
    
    plt.figure(figsize=(9,9))
    _ = tree.plot_tree(bag[0],filled=True)
    
    output_10_0.png
    plt.figure(figsize=(9,9))
    _ = tree.plot_tree(bag[1],filled=True)
    
    output_11_0.png
    X_train.shape
    
    (133, 13)
    
    133*0.7
    
    93.1
    
    32+36+25
    
    93
    
    plt.figure(figsize=(9,9))
    _ = tree.plot_tree(bag[2],filled=True)
    
    ---------------------------------------------------------------------------
    
    NameError                                 Traceback (most recent call last)
    
    <ipython-input-1-b56635e6beeb> in <module>
    ----> 1 plt.figure(figsize=(9,9))
          2 _ = tree.plot_tree(bag[2],filled=True)
    
    
    NameError: name 'plt' is not defined
    
    pd.describe()
    

    相关文章

      网友评论

          本文标题:(十五、)bagging算法

          本文链接:https://www.haomeiwen.com/subject/obbvphtx.html