Fateflow

作者: Jayce_xi | 来源:发表于2020-09-10 18:22 被阅读0次

1. 简介

FATE-Flow是用于联邦学习的端到端Pipeline系统,它由一系列高度灵活的组件构成,专为高性能的联邦学习任务而设计。其中包括数据处理、建模、训练、验证、发布和在线推理等功能。


目前支持的东西:

  • 使用DAG定义Pipeline;
  • 使用 JSON 格式的 FATE-DSL(domain-specific language) 描述DAG;
  • FATE具有大量默认的联邦学习组件, 例如Hetero LR/Homo LR/Secure Boosting Tree等;
  • 开发人员可以使用最基本的API轻松实现自定义组件, 并通过DSL构建自己的Pipeline;
  • 联邦建模任务生命周期管理器, 启动/停止, 状态同步等;
  • 强大的联邦调度管理, 支持DAG任务和组件任务的多种调度策略;
  • 运行期间实时跟踪数据, 参数, 模型和指标;
    -联邦模型管理器, 模型绑定, 版本控制和部署工具;
  • 提供HTTP API和命令行界面;
  • 提供可视化支持, 可在 FATE-Board 上进行可视化建模。

2.架构图

3.使用

FATE-Flow提供 REST API命令行界面. 让我们开始使用client端来运行一个联邦学习Pipeline任务 (单机版本).

离线建模

  1. 上传数据:
    guest 方:
$ pwd 
/XXX/FATE
$ cat fate_flow/examples/upload_guest.json
{
  "file": "examples/data/breast_hetero_guest.csv",
  "head": 1,
  "partition": 10,
  "work_mode": 0,
  "namespace": "fate_flow_test_breast_hetero",
  "table_name": "breast_hetero_guest"
}
$ cat 
id,y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9
133,1,0.254879,-1.046633,0.209656,0.074214,-0.441366,-0.377645,-0.485934,0.347072,-0.287570,-0.733474
273,1,-1.142928,-0.781198,-1.166747,-0.923578,0.628230,-1.021418,-1.111867,-0.959523,-0.096672,-0.121683
175,1,-1.451067,-1.406518,-1.456564,-1.092337,-0.708765,-1.168557,-1.305831,-1.745063,-0.499499,-0.302893
551,1,-0.879933,0.420589,-0.877527,-0.780484,-1.037534,-0.483880,-0.555498,-0.768581,0.433960,-0.200928
199,0,0.426758,0.723479,0.316885,0.287273,1.000835,0.962702,1.077099,1.053586,2.996525,0.961696
274,0,0.963102,1.467675,0.829202,0.772457,-0.038076,-0.468613,-0.307946,-0.015321,-0.641864,-0.247477
420,1,-0.662496,0.212149,-0.620475,-0.632995,-0.327392,-0.385278,-0.077665,-0.730362,0.217178,-0.061280
76,1,-0.453343,-2.147457,-0.473631,-0.483572,0.558093,-0.740244,-0.896170,-0.617229,-0.308601,-0.666975
315,1,-0.606584,-0.971725,-0.678558,-0.591332,-0.963013,-1.302401,-1.212855,-1.321154,-1.591501,-1.230554
399,1,-0.583805,-0.193332,-0.633283,-0.560041,-0.349310,-0.519504,-0.610669,-0.929526,-0.196974,-0.151608
238,1,-0.107515,2.420311,-0.141817,-0.204943,-1.063835,-0.074206,0.164131,-0.493589,-1.635181,-0.331709
246,1,-0.482335,0.348938,-0.565371,-0.489725,-0.976164,-0.658182,-0.203360,-0.988301,-0.216387,-0.663096
253,0,0.741523,-0.095626,0.704101,0.600181,0.404667,-0.087565,0.314773,1.082516,0.383809,-0.156041
550,1,-0.954483,-0.147736,-0.988330,-0.823201,-1.414523,-1.150045,-1.305831,-1.745063,-0.716282,-0.998915
208,1,-0.356014,0.567149,-0.231770,-0.424155,0.110966,1.182806,0.211146,-0.030548,1.985412,1.310816
185,1,-0.910995,-0.732345,-0.949311,-0.779780,0.864944,-0.969255,-1.272632,-1.586402,0.052164,-0.386571
156,0,0.869914,-0.092369,0.763673,0.740814,0.413434,0.607736,0.413122,0.561767,-0.708193,-0.363850
0,0,1.886690,-1.359293,2.303601,2.001237,1.307686,2.616665,2.109526,2.296076,2.750622,1.937015
.....
$ python fate_flow_client.py -f upload -c examples/upload_guest.json

host 方:

$  cat fate_flow/examples/upload_host.json
{
  "file": "examples/data/breast_hetero_host.csv",
  "head": 1,
  "partition": 10,
  "work_mode": 0,
  "namespace": "fate_flow_test_breast_hetero",
  "table_name": "breast_hetero_host"
}

$ cat examples/data/breast_hetero_host.csv
id,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19
133,0.449512,-1.247226,0.413178,0.303781,-0.123848,-0.184227,-0.219076,0.268537,0.015996,-0.789267,-0.337360,-0.728193,-0.442587,-0.272757,-0.608018,-0.577235,-0.501126,0.143371,-0.466431,-0.554102
273,-1.245485,-0.842317,-1.255026,-1.038066,-0.426301,-1.088781,-0.976392,-0.898898,0.983496,0.045702,-0.493639,0.348620,-0.552483,-0.526877,2.253098,-0.827620,-0.780739,-0.376997,-0.310239,0.176301
175,-1.549664,-1.126219,-1.546652,-1.216392,-0.354424,-1.167051,-1.114873,-1.261820,-0.327193,0.629755,-0.666881,-0.779358,-0.708418,-0.637545,0.710369,-0.976454,-1.057501,-1.913447,0.795207,-0.149751
551,-0.851273,0.733108,-0.843535,-0.786363,-0.049836,-0.424532,-0.509221,-0.679649,0.797298,0.385927,-0.451772,0.453852,-0.431696,-0.494754,-1.182041,0.281228,0.084759,-0.252420,1.038575,0.351054
199,0.091654,0.216499,0.103839,-0.034667,0.167930,0.308132,0.366614,0.280661,0.505223,0.264013,-0.707304,-1.026834,-0.702973,-0.460212,-0.999033,-0.531406,-0.394360,-0.728830,-0.644416,-0.688003
274,1.080023,1.207830,0.956888,0.978402,-0.555822,-0.645696,-0.399365,-0.038153,-0.998966,-1.091216,0.057848,0.392164,-0.050027,0.120414,-0.532348,-0.770613,-0.519694,-0.531097,-0.769127,-0.394858
420,-0.726307,-0.058095,-0.731910,-0.697343,-0.775723,-0.513983,-0.426233,-0.893482,0.800949,-0.018090,-0.428673,0.404865,-0.326750,-0.440850,0.079010,-0.279903,0.416992,-0.486165,-0.225484,-0.172446
76,-0.169639,-1.943019,-0.167192,-0.272150,2.329937,0.006804,-0.251467,0.429234,2.159100,0.512094,0.017786,-0.368046,-0.105966,-0.169129,2.119760,0.162743,-0.672216,-0.577002,0.626908,0.896114
315,-0.465014,-0.567723,-0.526371,-0.492852,-0.800631,-1.250816,-1.058714,-1.096145,-2.178221,-0.860147,-0.843011,-0.910353,-0.900490,-0.608283,-0.704355,-1.255622,-0.970629,-1.363557,-0.800607,-0.927058
399,-0.660984,-0.472313,-0.688248,-0.634204,-0.390718,-0.796360,-0.756680,-0.839314,0.129175,-0.369656,-0.221505,-0.139439,-0.317344,-0.336122,-0.526014,-0.326291,-0.368166,-1.037840,-0.698901,-0.273818
238,0.026330,1.992051,0.023930,-0.088136,-1.005588,-0.008357,0.269940,-0.124821,-1.714551,-0.213719,-0.251822,2.008745,-0.376748,-0.228313,-0.244670,0.166096,0.219045,-0.273508,-1.052451,-0.077883
...
$ python fate_flow_client.py -f upload -c examples/upload_host.json
  1. 提交任务
$  cat fate_flow/examples/test_hetero_lr_job_conf.json

{
    "initiator": {
        "role": "guest",
        "party_id": 9999
    },
    "job_parameters": {
        "work_mode": 1,
        "processors_per_node": 1,
        "align_task_input_data_partition": true
    },
    "role": {
        "guest": [9999],
        "host": [10000],
        "arbiter": [10000]
    },
    "role_parameters": {
        "guest": {
            "args": {
                "data": {
                    "train_data": [{"name": "breast_hetero_guest", "namespace": "fate_flow_test_breast_hetero"}]
                }
            },
            "dataio_0":{
                "with_label": [true],
                "label_name": ["y"],
                "label_type": ["int"],
                "output_format": ["dense"]
            }
        },
        "host": {
            "args": {
                "data": {
                    "train_data": [{"name": "breast_hetero_host", "namespace": "fate_flow_test_breast_hetero"}]
                }
            },
             "dataio_0":{
                "with_label": [false],
                "output_format": ["dense"]
            }
        }
    },
    "algorithm_parameters": {
        "hetero_lr_0": {
            "penalty": "L2",
            "optimizer": "rmsprop",
            "eps": 1e-5,
            "alpha": 0.01,
            "max_iter": 3,
            "converge_func": "diff",
            "batch_size": 320,
            "learning_rate": 0.15,
            "init_param": {
                                "init_method": "random_uniform"
            }
        }
    }
}


$ python fate_flow_client.py -f submit_job -d examples/test_hetero_lr_job_dsl.json -c examples/test_hetero_lr_job_conf.json

{
    "data": {
        "board_url": "http://localhost:8080/index.html#/dashboard?job_id=2019121910313566330118&role=guest&party_id=9999",
        "job_dsl_path": "xxx/jobs/2019121910313566330118/job_dsl.json",
        "job_runtime_conf_path": "xxx/jobs/2019121910313566330118/job_runtime_conf.json",
        "logs_directory": "xxx/logs/2019121910313566330118",
        "model_info": {
            "model_id": "arbiter-10000#guest-9999#host-10000#model",
            "model_version": "2019121910313566330118"
        }
    },
    "jobId": "2019121910313566330118",
    "retcode": 0,
    "retmsg": "success"
}
  1. 查询任务
$ python fate_flow_client.py -f query_job -r guest -p 10000 -j $job_id

上述任务:


4. fate_flow 项目结构

├── README.rst
├── README_zh.rst
├── __init__.py
├── apps # rest api 接口 使用的是 flask
│   ├── __init__.py
│   ├── data_access_app.py
│   ├── forward_app.py
│   ├── job_app.py
│   ├── model_app.py
│   ├── permission_app.py
│   ├── pipeline_app.py
│   ├── schedule_app.py
│   ├── table_app.py
│   ├── tracking_app.py
│   └── version_app.py
├── components  # 自定义组件时用到的base 类
│   ├── component_base.py
│   └── model_operation_components.py
├── db  # 表模型
│   ├── __init__.py
│   └── db_models.py
├── doc  # 文档
│   ├── fate_flow_cli.rst
│   └── fate_flow_rest_api.rst
├── driver  # 驱动器物 比较核心的东西
│   ├── __init__.py
│   ├── dag_scheduler.py
│   ├── dsl_parser.py
│   ├── job_controller.py
│   ├── job_detector.py
│   ├── task_executor.py
│   └── task_scheduler.py
├── entity  # 常量定义
│   ├── __init__.py
│   ├── constant_config.py
│   ├── metric.py
│   └── runtime_config.py
├── examples  # 例子
│   ├── __init__.py
│   ├── bind_model_service.json
│   ├── download_file.json
│   ├── download_guest.json
│   ├── download_host.json
│   ├── export_model.json
│   ├── import_model.json
│   ├── inference_request.py
│   ├── publish_load_model.json
│   ├── restore_model.json
│   ├── store_model.json
│   ├── test_hetero_lr_job_conf.json
│   ├── test_hetero_lr_job_dsl.json
│   ├── test_inference.py
│   ├── test_predict_conf.json
│   ├── test_rsa_job_conf.json
│   ├── test_rsa_job_dsl.json
│   ├── toy_example_conf.json
│   ├── toy_example_dsl.json
│   ├── upload_guest.json
│   ├── upload_homo_guest.json
│   ├── upload_homo_host.json
│   └── upload_host.json
├── fate_flow_client.py  # 客户端脚本,也是fate_flow CLI 使用的东西
├── fate_flow_server.py  # fate_flow服务端启动脚本
├── images
│   ├── fate_flow_arch.png
│   ├── fate_flow_component_dsl.png
│   ├── fate_flow_dag.png
│   ├── fate_flow_dsl.png
│   └── federated_learning_pipeline.png
├── manager  # 管理器,比较核心
│   ├── __init__.py
│   ├── data_manager.py
│   ├── model_manager
│   │   ├── __init__.py
│   │   ├── model_storage_base.py
│   │   ├── mysql_model_storage.py
│   │   ├── pipelined_model.py
│   │   ├── pipelined_model_structure.yaml
│   │   ├── publish_model.py
│   │   └── redis_model_storage.py
│   ├── pipeline_manager.py
│   ├── queue_manager.py
│   └── tracking_manager.py
├── monkey_patch
│   └── __init__.py
├── param  # 参数定义类
│   ├── __init__.py
│   ├── download_param.py
│   ├── model_operation_param.py
│   └── upload_param.py
├── service.sh  # 启动的shell脚本
├── settings.py  # 配置
├── tests  # 测试
│   ├── __init__.py
│   ├── api_tests
│   │   ├── data_access_test.py
│   │   ├── job_operation_test.py
│   │   ├── table_test.py
│   │   └── tracking_test.py
│   ├── check_all_api.py
│   ├── check_fate_python_requirement.py
│   ├── python3_7_modules.csv
│   └── unit_test
│       └── job_queue_test.py
├── upgrade  # 数据库更新记录
│   ├── 1_4_0-1_4_1
│   │   └── upgrade_fate_flow_db.sql
│   └── 1_4_1-1_4_2
│       └── upgrade_fate_flow_db.sql
└── utils  # 工具
    ├── __init__.py
    ├── api_utils.py
    ├── authentication_utils.py
    ├── cron.py
    ├── data_utils.py
    ├── detect_utils.py
    ├── download.py
    ├── grpc_utils.py
    ├── job_controller_utils.py
    ├── job_utils.py
    ├── model_utils.py
    ├── node_check_utils.py
    ├── parameter_util.py
    ├── proto_compatibility.py
    ├── service_utils.py
    ├── session_utils.py
    └── upload.py

5. 表结构分析

直接看DrawIO

6. 源码分析

  1. 关于 fate_flow_server.py
    让我们直接看IDE里面的代码吧。
  1. 关于 fate_flow_client.py

7. driver分析

10. 参考链接

腾讯云--构建端到端的联邦学习 Pipeline 生产服务
fate-flow官方简介
fate-flow CLI
DSL与GPL介绍

相关文章

  • Fateflow

    1. 简介 FATE-Flow是用于联邦学习的端到端Pipeline系统,它由一系列高度灵活的组件构成,专为高性能...

网友评论

      本文标题:Fateflow

      本文链接:https://www.haomeiwen.com/subject/lwkyektx.html