MLflow使用方法

作者: 北邮郭大宝 | 来源:发表于2019-11-04 18:32 被阅读0次

MLflow使用方法
【mlflow系列6】mlflow model registry
【短文】Spark危机与机遇杂谈
【mlflow 系列8】向 mlflow 提交pr(pull r
【mlflow系列1】mlflow的搭建使用
【mlflow系列7】flask VS Gunicorn
MLflow Meetup Feb 2019
MLOPS之实验追踪
【mlflow系列3】mlflow 升级(upgrade)
【mlflow系列4】mlflow upgrade(升级) My

1. MLflow介绍

直接copy官网上的介绍：

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles three primary functions:

Tracking experiments to record and compare parameters and results (MLflow Tracking).

Packaging ML code in a reusable, reproducible form in order to share with other data scientists or transfer to production (MLflow Projects).

Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms (MLflow Models).

简单说，MLflow设计出三个概念，解决ML开发过程中的三个痛点。

Tracking：模型参数、指标记录繁琐，Tracking可以记录模型的配置信息，并可视化展示
Projects：模型结果难以再现，Projects通过conda重现模型所需环境、依赖，使得模型结果可以复现
Models：开发的模型部署难，Models打包、封装模型，并提供部署

1572856548288.jpg

2. 安装MLflow

官网上说直接pip即可：

pip install mlflow

但是在mac上不行，官网上推荐了如下使用方法。

1572857232101.jpg

我这里直接用virtualenv，亲测可用，步骤如下：

mdkir mlflow
cd mlflow

创建一个干净的venv环境

virtualenv --no-site-packages venv

安装mlflow、sklean等包

pip3 install mlflow
pip3 install sklearn

3. Tracking

这里根据官网上的QuickStart提供的例子，改装了一下，用我们熟悉的Iris作为demo，展示相关功能。

3.1 code

新建一个iris的文件夹，里面存放Iris.csv数据，新建train.py

import os
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn


def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2



if __name__ == "__main__":
    np.random.seed(40)

    # Read the Iris csv file from the URL
    csv_url =\
        'Iris.csv'
    try:
        data = pd.read_csv(csv_url, sep=',')
        data.loc[data['Species'] == 'Iris-setosa','Species'] = 0
        data.loc[data['Species'] == 'Iris-versicolor','Species'] = 1
        data.loc[data['Species'] == 'Iris-virginica','Species'] = 2
        data['Species'] = data['Species'].astype('int')
    except Exception as e:
        print("Unable to download data, Error: %s", e)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["Species", "Id"], axis=1)
    test_x = test.drop(["Species", "Id"], axis=1)
    train_y = train[["Species"]]
    test_y = test[["Species"]]

    alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
    l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5

    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

这段代码展示的是如何使用Tracking功能，可以看到使用过程非常简单。

mlflow.log_param("key", value) log模型参数
mlflow.log_metric("key", value) log模型的指标
......

执行代码

python train.py 0.5 0.1 # 0.5 0.1是参数

Tracking的结果会记录在目录下，生成mlruns目录

3.2 效果

在iris目录中执行命令，即可在http://localhost:5000/#/看到效果。

mlflow ui

1572858023456.jpg

1572858204809.jpg

4. Projects

在iris目录下，新建MLproject文件和conda.yaml。

MLproject：


name: iris_model

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: {type: float, default: 0.5}
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py {alpha} {l1_ratio}"

conda.yaml


name: iris_model
channels:
  - defaults
dependencies:
  - python=3.6
  - scikit-learn=0.19.1
  - pip:
    - mlflow>=1.0

这样就会记录该模型所需的环境信息，执行如下命令即可复现模型结果。如果不需要conda，则需要保障运行的环境已经安装了必要的依赖，在命令上加上--no-conda即可。

mlflow run sklearn_elasticnet_wine -P alpha=0.5 -P l1_ratio=0.1

5. Models

这部分没有细看，总感觉不是很通用，部署这部分跟ML平台联系比较紧密。对于模型开发者来说，并不关心部署和打包；对于模型平台开发来说，使用MLflow又丧失了一些独特性的开发需求，并且跟自己业务系统适配性、分布式部署等等都不太搭。

6. 总结

MLflow中的Tracking还是挺有用的，希望模型开发的同学可以考虑试用下。Projects、Models还需要再看看。

7. 参考

https://www.mlflow.org/

网友评论

机器学习与数据挖掘

本文标题：MLflow使用方法

本文链接：https://www.haomeiwen.com/subject/zbndbctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！