01-datax安装和简单实用

作者: 张不二01 | 来源:发表于2019-04-30 10:01 被阅读2次

    参考连接:

    datax github官方地址:https://github.com/alibaba/DataX

    1, 安装使用

    1.1, 下载地址

    http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

    1.2, 使用方式(DataX工具包,非源码编译方式)
    • 下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

      $ cd  {YOUR_DATAX_HOME}/bin
      $ python datax.py {YOUR_JOB.json}
      
    • 自检脚本:

      $ python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
      
    1.3, 配置示例:从stream读取数据并打印到控制台
    1.3.1, 第一步, 查找配置文件模板(json格式)

    可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

    如:python datax.py -r streamreader -w streamwriter, 会打印如下信息:

    当然也可以通过官网的github地址下载配置的模板, 里面还有具体的字段的详细解释

    DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
    Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
    
    
    Please refer to the streamreader document:
         https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md 
    
    Please refer to the streamwriter document:
         https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md 
     
    Please save the following configuration as a json file and  use
         python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
    to run the job.
    
    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "streamreader", 
                        "parameter": {
                            "column": [], 
                            "sliceRecordCount": ""
                        }
                    }, 
                    "writer": {
                        "name": "streamwriter", 
                        "parameter": {
                            "encoding": "", 
                            "print": true
                        }
                    }
                }
            ], 
            "setting": {
                "speed": {
                    "channel": ""
                }
            }
        }
    }
    
    1.3.2, 第二步, 根据模板自定义配置文件
    #stream2stream.json
    {
      "job": {
        "content": [
          {
            "reader": {
              "name": "streamreader",
              "parameter": {
                "sliceRecordCount": 10,
                "column": [
                  {
                    "type": "long",
                    "value": "10"
                  },
                  {
                    "type": "string",
                    "value": "hello,你好,世界-DataX"
                  }
                ]
              }
            },
            "writer": {
              "name": "streamwriter",
              "parameter": {
                "encoding": "UTF-8",
                "print": true
              }
            }
          }
        ],
        "setting": {
          "speed": {
            "channel": 5
           }
        }
      }
    }
    
    1.3.3, 第三步, 启动datax,根据配置json文件执行即可
    • 如下执行后即可通过streamwriter把streamreader从内存中读取的内存打印在控制台上
    $ python ../bin/datax.py stream2stream.json
    

    相关文章

      网友评论

        本文标题:01-datax安装和简单实用

        本文链接:https://www.haomeiwen.com/subject/kblmnqtx.html