2020-04-12 Stacking made easy wi

作者: 春生阁 | 来源:发表于2020-04-12 23:52 被阅读0次

2020-04-12 Stacking made easy wi
Word Power Made Easy Review 1
Impostors Made Easy
Egg的Scalers Talk第四轮新概念朗读持续力训练Day
Radio Frequency Propagation Made
Ubuntu Made Easy.pdf 免费下载
Daisy的ScalersTalk第四轮新概念朗读持续力训练Da
HannahLin的ScalersTalk第四轮新概念朗读持续力
标题：阳光的ScalersTalk第四轮《新概念》朗读持续力训
左左-ScalersTalk-R4-NCE2-D32-20181

Introduction

The underlying principle of ensemble methods is that there is a strength to be found in unity. By combining multiple methods, each with its own pros and cons, more powerful models can be created.

The main reason for writing this article is not to explain how stacking works, but to demonstrate how you can use Scikit-Learn V0.22 in order to simplify stacking pipelines and create interesting models.

Stacking

Although there are many great sources that introduce stacking (here, here, and here), let me quickly get you up to speed.
Stacking is a technique that takes several regression or classification models and uses their output as the input for the meta-classifier/regressor.

In its essence, stacking is an ensemble learning technique much like Random Forests where the quality of prediction is improved by combining, typically, weak models.

The image above gives a basic overview of the principle of stacking. It typically consists of many weak base learnings or several stronger. The meta learner then learns based on the prediction outputs of each base learner.

Sklearn Stacking

Although there are many packages that can be used for stacking like mlxtend and vecstack, this article will go into the newly added stacking regressors and classifiers in the new release of scikit-learn.

First, we need to make sure to upgrade Scikit-learn to version 0.22:

pip install --upgrade scikit-learn

The first model that we are going to make is a classifier that can predict the specifies of flowers. The model is relatively simple, we use a Random Forest and k-Nearest Neighbors as our base learners and a Logistic Regression as our meta learner.

Coding stacking models can be quite tricky as you will have to take into account the folds that you want to generate and cross-validation at different steps. Fortunately, the new scikit-learn version makes it possible to create the model shown above in just a few lines of code:


from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

X, y = load_iris(return_X_y=True)

# Create Base Learners
base_learners = [
                 ('rf_1', RandomForestClassifier(n_estimators=10, random_state=42)),
                 ('rf_2', KNeighborsClassifier(n_neighbors=5))             
                ]

# Initialize Stacking Classifier with the Meta Learner
clf = StackingClassifier(estimators=base_learners, final_estimator=LogisticRegression())

# Extract score
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)
clf.fit(X_train, y_train).score(X_test, y_test)