一、bagging、
首先bagging算法是集成学习中两大类算法中的其中一个代表算法,还有另一类的经典算法是Xgboost。他们主要的区别是前者学习器之间不存在依赖关系和可以并行生成学习器,后者学习器之间存在强依赖关系和可以串行生成学习器。
bagging算法:(1)bagging算法可以解决回归问题和分类问题。(2)它从原始数据中随机抽取n个样本,重复s次,于是就有个s个训练集,每个训练集都可以训练出一个弱分类器,最终生成s个弱分类器,预测结果将有这些分类器投票决定(选择分类器投票结果中最多的类别作为最后预测结果)。代表的有随机森林。
二、代码实现
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
X,y = datasets.load_wine(True)
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state = 10244021)
# 一个算法,准确率 71%
knn = KNeighborsClassifier()
knn.fit(X_train,y_train)
knn.score(X_test,y_test)
0.7111111111111111
# 100个算法,集成算法,准确提升到了77.7%
knn = KNeighborsClassifier()
# bag中100个knn算法
bag = BaggingClassifier(base_estimator=knn,n_estimators=100,max_samples=0.8,max_features=0.7)
bag.fit(X_train,y_train)
bag.score(X_test,y_test)
0.7777777777777778
import warnings
warnings.filterwarnings('ignore')
lr = LogisticRegression()
lr.fit(X_train,y_train)
lr.score(X_test,y_test)
0.9555555555555556
bag = BaggingClassifier(base_estimator=LogisticRegression(),n_estimators=100,
max_samples=0.7,max_features=0.5)
bag.fit(X_train,y_train)
bag.score(X_test,y_test)
0.9555555555555556
clf = DecisionTreeClassifier()
clf.fit(X_train,y_train)
clf.score(X_test,y_test)
0.9333333333333333
bag = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=100,
max_samples=1.0,max_features=0.5)
bag.fit(X_train,y_train)
bag.score(X_test,y_test)
0.9777777777777777
import matplotlib.pyplot as plt
from sklearn import tree
plt.figure(figsize=(9,9))
_ = tree.plot_tree(bag[0],filled=True)
output_10_0.png
plt.figure(figsize=(9,9))
_ = tree.plot_tree(bag[1],filled=True)
output_11_0.png
X_train.shape
(133, 13)
133*0.7
93.1
32+36+25
93
plt.figure(figsize=(9,9))
_ = tree.plot_tree(bag[2],filled=True)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-b56635e6beeb> in <module>
----> 1 plt.figure(figsize=(9,9))
2 _ = tree.plot_tree(bag[2],filled=True)
NameError: name 'plt' is not defined
pd.describe()
网友评论