美文网首页
自动机器学习工具NNI给xgboost模型调参

自动机器学习工具NNI给xgboost模型调参

作者: cuizixin | 来源:发表于2019-01-25 11:25 被阅读43次

    基于Kaggle的Titanic入门赛,跑一个xgboost模型。
    使用NNI后,调参只需要3步,就可以自动搜索定义的参数空间,找到最好参数组合,并将结果很好地展示出来。

    第一步:准备超参数搜索空间文件

    search_space.json

    {
      "max_depth": {"_type": "choice", "_value": [3,4,5,6,7,8,9]},
      "min_child_weight": {"_type":"choice", "_value": [1,2,3,4,5]},
      "gamma": {"_type": "choice", "_value": [0,0.1,0.2,0.3,0.4]},
      "subsample": {"_type": "choice", "_value": [0.6,0.7,0.8,0.9,1]},
      "colsample_bytree": {"_type": "choice", "_value": [0.6,0.7,0.8,0.9,1]},
      "reg_alpha": {"_type": "choice", "_value": [0, 1e-2, 0.1, 1]},
      "learning_rate": {"_type": "choice", "_value": [0.1,0.001]}
    }
    

    更多内容请看文档:How to define search space?

    第二步:准备trial文件

    main.py

    import nni
    import logging
    #数据分析库
    import pandas as pd
    #科学计算库
    import numpy as np
    from xgboost import XGBClassifier
    from sklearn import model_selection
    
    LOG = logging.getLogger('sklearn_randomForest')
    
    def load_data():
        '''Load dataset'''
        data_train = pd.read_csv("titanic/train.csv")
    
        # Age列中的缺失值用Age中位数进行填充
        data_train["Age"] = data_train['Age'].fillna(data_train['Age'].median())
        # Sex性别列处理:male用0,female用1
        data_train.loc[data_train["Sex"] == "male", "Sex"] = 0
        data_train.loc[data_train["Sex"] == "female", "Sex"] = 1
        # 缺失值用最多的S进行填充
        data_train["Embarked"] = data_train["Embarked"].fillna('S')
        # 地点用0,1,2
        data_train.loc[data_train["Embarked"] == "S", "Embarked"] = 0
        data_train.loc[data_train["Embarked"] == "C", "Embarked"] = 1
        data_train.loc[data_train["Embarked"] == "Q", "Embarked"] = 2
    
        features = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
        label = ["Survived"]
    
        X_train, y_train = data_train[features], data_train[label]
        return X_train, y_train
    
    
    def get_default_parameters():
        '''get default parameters'''
        params = {
            'max_depth': 3,
            'min_child_weight': 1,
            'gamma': 0,
            'subsample': 0.8,
            'colsample_bytree': 0.8,
            'reg_alpha': 0,
            'learning_rate': 0.1
        }
        return params
    
    
    def get_model(PARAMS):
        model = XGBClassifier(booster = 'gbtree', silent = True, nthread = None, random_state = 42, base_score = 0.5,
                              colsample_bylevel=1, n_estimators = 100, reg_lambda = 1, objective = 'binary:logistic')
        model.max_depth = PARAMS.get("max_depth")
        model.min_child_weight = PARAMS.get("min_child_weight")
        model.gamma = PARAMS.get("gamma")
        model.subsample = PARAMS.get("subsample")
        model.colsample_bytree = PARAMS.get("colsample_bytree")
        model.reg_alpha = PARAMS.get('reg_alpha')
        model.learning_rate = PARAMS.get("learning_rate")
    
        return model
    
    
    def run(X_train, y_train, model):
        '''Train model and predict result'''
        # 10折交叉验证
        kf = model_selection.KFold(n_splits=10, shuffle=False, random_state=42)
        scores = model_selection.cross_val_score(model, X_train, y_train, cv=kf)
        print(scores)
        score = scores.mean()
        LOG.debug('score: %s' % score)
        nni.report_final_result(score)
    
    
    if __name__ == '__main__':
        X_train, y_train = load_data()
    
        try:
            # get parameters from tuner
            RECEIVED_PARAMS = nni.get_next_parameter()
            LOG.debug(RECEIVED_PARAMS)
            PARAMS = get_default_parameters()
            PARAMS.update(RECEIVED_PARAMS)
            LOG.debug(PARAMS)
            model = get_model(PARAMS)
            run(X_train, y_train, model)
        except Exception as exception:
            LOG.exception(exception)
            raise
    

    使用NNI,代码需要修改3个地方:

    1. 导入NNI包
    import nni
    
    2. 获取参数

    用下面的一行代码

    RECEIVED_PARAMS = nni.get_next_parameter()
    

    可以获取tuner的一组超参数。其中变量RECEIVED_PARAMS是一个对象,比如:

    {"conv_size": 2, "hidden_size": 124, "learning_rate": 0.0307, "dropout_rate": 0.2029}
    
    3. 返回NNI结果

    使用下面的API

    nni.report_intermediate_result(accuracy)
    

    accuracy返回给assessor.
    使用下面的API

    nni.report_final_result(accuracy)
    

    accuracy返回给tuner.
    注意:

    • accuracy: 准确率,一个评估模型性能的指标,可以是一个数值。
    • assessor: 根据一个trial的性能变化(intermediate result of one trial),决定trial什么时候提前停止训练。
    • tuner: 根据历史的trials的性能(final result of all trials),生成下一组参数或结构。

    更多内容请看文档:Write a Trial Run on NNI

    第三步:准备config文件

    config.yml

    authorName: default
    experimentName: xgb-classification
    trialConcurrency: 1
    maxExecDuration: 2h
    maxTrialNum: 10000
    #choice: local, remote
    trainingServicePlatform: local
    searchSpacePath: search_space.json
    #choice: true, false
    useAnnotation: false
    tuner:
      #choice: TPE, Random, Anneal, Evolution
      builtinTunerName: TPE
      classArgs:
        #choice: maximize, minimize
        optimize_mode: maximize
    trial:
      command: python3 main.py
      codeDir: .
      gpuNum: 0
    

    更多内容请看文档:Experiment config reference

    结果展示

    结果展示1
    结果展示2

    相关文章

      网友评论

          本文标题:自动机器学习工具NNI给xgboost模型调参

          本文链接:https://www.haomeiwen.com/subject/xfbujqtx.html