基于Kaggle的Titanic入门赛,跑一个xgboost模型。
使用NNI后,调参只需要3步,就可以自动搜索定义的参数空间,找到最好参数组合,并将结果很好地展示出来。
第一步:准备超参数搜索空间文件
search_space.json
{
"max_depth": {"_type": "choice", "_value": [3,4,5,6,7,8,9]},
"min_child_weight": {"_type":"choice", "_value": [1,2,3,4,5]},
"gamma": {"_type": "choice", "_value": [0,0.1,0.2,0.3,0.4]},
"subsample": {"_type": "choice", "_value": [0.6,0.7,0.8,0.9,1]},
"colsample_bytree": {"_type": "choice", "_value": [0.6,0.7,0.8,0.9,1]},
"reg_alpha": {"_type": "choice", "_value": [0, 1e-2, 0.1, 1]},
"learning_rate": {"_type": "choice", "_value": [0.1,0.001]}
}
更多内容请看文档:How to define search space?。
第二步:准备trial文件
main.py
import nni
import logging
#数据分析库
import pandas as pd
#科学计算库
import numpy as np
from xgboost import XGBClassifier
from sklearn import model_selection
LOG = logging.getLogger('sklearn_randomForest')
def load_data():
'''Load dataset'''
data_train = pd.read_csv("titanic/train.csv")
# Age列中的缺失值用Age中位数进行填充
data_train["Age"] = data_train['Age'].fillna(data_train['Age'].median())
# Sex性别列处理:male用0,female用1
data_train.loc[data_train["Sex"] == "male", "Sex"] = 0
data_train.loc[data_train["Sex"] == "female", "Sex"] = 1
# 缺失值用最多的S进行填充
data_train["Embarked"] = data_train["Embarked"].fillna('S')
# 地点用0,1,2
data_train.loc[data_train["Embarked"] == "S", "Embarked"] = 0
data_train.loc[data_train["Embarked"] == "C", "Embarked"] = 1
data_train.loc[data_train["Embarked"] == "Q", "Embarked"] = 2
features = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
label = ["Survived"]
X_train, y_train = data_train[features], data_train[label]
return X_train, y_train
def get_default_parameters():
'''get default parameters'''
params = {
'max_depth': 3,
'min_child_weight': 1,
'gamma': 0,
'subsample': 0.8,
'colsample_bytree': 0.8,
'reg_alpha': 0,
'learning_rate': 0.1
}
return params
def get_model(PARAMS):
model = XGBClassifier(booster = 'gbtree', silent = True, nthread = None, random_state = 42, base_score = 0.5,
colsample_bylevel=1, n_estimators = 100, reg_lambda = 1, objective = 'binary:logistic')
model.max_depth = PARAMS.get("max_depth")
model.min_child_weight = PARAMS.get("min_child_weight")
model.gamma = PARAMS.get("gamma")
model.subsample = PARAMS.get("subsample")
model.colsample_bytree = PARAMS.get("colsample_bytree")
model.reg_alpha = PARAMS.get('reg_alpha')
model.learning_rate = PARAMS.get("learning_rate")
return model
def run(X_train, y_train, model):
'''Train model and predict result'''
# 10折交叉验证
kf = model_selection.KFold(n_splits=10, shuffle=False, random_state=42)
scores = model_selection.cross_val_score(model, X_train, y_train, cv=kf)
print(scores)
score = scores.mean()
LOG.debug('score: %s' % score)
nni.report_final_result(score)
if __name__ == '__main__':
X_train, y_train = load_data()
try:
# get parameters from tuner
RECEIVED_PARAMS = nni.get_next_parameter()
LOG.debug(RECEIVED_PARAMS)
PARAMS = get_default_parameters()
PARAMS.update(RECEIVED_PARAMS)
LOG.debug(PARAMS)
model = get_model(PARAMS)
run(X_train, y_train, model)
except Exception as exception:
LOG.exception(exception)
raise
使用NNI,代码需要修改3个地方:
1. 导入NNI包
import nni
2. 获取参数
用下面的一行代码
RECEIVED_PARAMS = nni.get_next_parameter()
可以获取tuner的一组超参数。其中变量RECEIVED_PARAMS
是一个对象,比如:
{"conv_size": 2, "hidden_size": 124, "learning_rate": 0.0307, "dropout_rate": 0.2029}
3. 返回NNI结果
使用下面的API
nni.report_intermediate_result(accuracy)
把accuracy
返回给assessor.
使用下面的API
nni.report_final_result(accuracy)
把accuracy
返回给tuner.
注意:
- accuracy: 准确率,一个评估模型性能的指标,可以是一个数值。
- assessor: 根据一个trial的性能变化(intermediate result of one trial),决定trial什么时候提前停止训练。
- tuner: 根据历史的trials的性能(final result of all trials),生成下一组参数或结构。
更多内容请看文档:Write a Trial Run on NNI。
第三步:准备config文件
config.yml
authorName: default
experimentName: xgb-classification
trialConcurrency: 1
maxExecDuration: 2h
maxTrialNum: 10000
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
更多内容请看文档:Experiment config reference。
结果展示


网友评论