美文网首页
Druid快速入门

Druid快速入门

作者: 主君_05c4 | 来源:发表于2018-11-21 11:38 被阅读0次

    硬件及软件要求:

    • Java 8 or higher
    • Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
    • 8G of RAM
    • 2 vCPUs

    1、下载解压缩软件包

    curl -O http://static.druid.io/artifacts/releases/druid-0.12.3-bin.tar.gz
    tar -xzf druid-0.12.3-bin.tar.gz
    cd druid-0.12.3
    

    压缩包包含如下目录:

    • LICENSE - license文件.
    • bin/ - 快速启动相关脚本.
    • conf/* - 为集群安装提供的配置模板.
    • conf-quickstart/* - 快速入门配置文件.
    • extensions/* - 所有Druid扩展文件.
    • hadoop-dependencies/* - Druid Hadoop依赖文件.
    • lib/* - 所有Druid核心软件包.
    • quickstart/* - quickstart相关文件目录.

    2、下载教程示例

    curl -O http://druid.io/docs/0.12.3/tutorials/tutorial-examples.tar.gz
    tar zxvf tutorial-examples.tar.gz
    

    3、启动zookeeper

    curl http://mirror.bit.edu.cn/apache/zookeeper/stable/zookeeper-3.4.12.tar.gz -o zookeeper-3.4.12.tar.gz
    tar -xzf zookeeper-3.4.12.tar.gz
    cd zookeeper-3.4.12
    cp conf/zoo_sample.cfg conf/zoo.cfg
    ./bin/zkServer.sh start
    

    4、启动Druid服务

    在druid-0.12.3目录下,执行如下命令

    bin/init
    

    init会做一些初始化工作,脚本内容如下:

    #!/bin/bash -eu
    
    gzip -c -d quickstart/wikiticker-2015-09-12-sampled.json.gz > "quickstart/wikiticker-2015-09-12-sampled.json"
    
    LOG_DIR=var
    
    mkdir log
    mkdir -p $LOG_DIR/tmp;
    mkdir -p $LOG_DIR/druid/indexing-logs;
    mkdir -p $LOG_DIR/druid/segments;
    mkdir -p $LOG_DIR/druid/segment-cache;
    mkdir -p $LOG_DIR/druid/task;
    mkdir -p $LOG_DIR/druid/hadoop-tmp;
    mkdir -p $LOG_DIR/druid/pids;
    

    在不同的终端窗口启动Druid进程,本教程在同一个操作系统运行所有Druid进程,在大型分布式生产集群环境中,部分Druid进程仍可以部署在一起。

    java `cat examples/conf/druid/coordinator/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
    java `cat examples/conf/druid/overlord/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/overlord:lib/*" io.druid.cli.Main server overlord
    java `cat examples/conf/druid/historical/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/historical:lib/*" io.druid.cli.Main server historical
    java `cat examples/conf/druid/middleManager/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
    java `cat examples/conf/druid/broker/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/broker:lib/*" io.druid.cli.Main server broker
    

    jvm.config为java进程运行参数配置,cat coordinator/jvm.config输出如下:

    -server
    -Xms256m
    -Xmx256m
    -Duser.timezone=UTC
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=var/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    -Dderby.stream.error.file=var/druid/derby.log
    

    以上命令在不同终端窗口运行,分别启动了coordinator、overlord、historical、middleManager、broker进程。

    5、重置Druid

    所有持久化状态,如集群元数据存储和服务的segments都会保存在druid-0.12.3/var目录下.
    如果你想停止服务,CTRL-C退出运行中的java进程。假如希望停止服务后,以初始化状态启动服务,删除log和var目录,再跑一遍init脚本,然后关闭Zookeeper,删除Zookeeper的数据目录/tmp/zookeeper。

    在druid-0.12.3 目录下:

    rm -rf log
    rm -rf var
    bin/init
    

    假如你学习了Loading stream data from Kafka教程,在关闭Zookeeper之前你需要先关闭Kafka,删除Kafka日志目录/tmp/kafka-logs

    Ctrl-C 关闭Kafka broker,删除日志目录:

    rm -rf /tmp/kafka-logs
    

    现在关闭Zookeeper,清理状态,在zookeeper-3.4.12目录下:

    ./bin/zkServer.sh stop
    rm -rf /tmp/zookeeper
    

    清理了Druid和Zookeeper状态数据后,重启Zookeeper和Druid服务。

    6、数据集

    如下数据加载教程中,我们会使用到一份数据文件,Druid包根目录下的quickstart/wikiticker-2015-09-12-sampled.json.gz,文件内容包含了2015-09-12这一天Wikipedia页面编辑事件。页面编辑事件以json对象格式存储于text文件中。

    数据包含了如下列:

    • added
    • channel
    • cityName
    • comment
    • countryIsoCode
    • countryName
    • deleted
    • delta
    • isAnonymous
    • isMinor
    • isNew
    • isRobot
    • isUnpatrolled
    • metroCode
    • namespace
    • page
    • regionIsoCode
    • regionName
    • user

    如下为一条示例数据:

    {
      "timestamp":"2015-09-12T20:03:45.018Z",
      "channel":"#en.wikipedia",
      "namespace":"Main"
      "page":"Spider-Man's powers and equipment",
      "user":"foobar",
      "comment":"/* Artificial web-shooters */",
      "cityName":"New York",
      "regionName":"New York",
      "regionIsoCode":"NY",
      "countryName":"United States",
      "countryIsoCode":"US",
      "isAnonymous":false,
      "isNew":false,
      "isMinor":false,
      "isRobot":false,
      "isUnpatrolled":false,
      "added":99,
      "delta":99,
      "deleted":0,
    }
    

    7、从文件中夹在数据

    1) 准备数据、定义数据摄取任务

    A data load is initiated by submitting an ingestion task spec to the Druid overlord. For this tutorial, we'll be loading the sample Wikipedia page edits data.
    向Druid overlord提交一个数据摄取任务,即完成了数据的初始化,如下我们会加载Wikipedia页面编辑数据。
    examples/wikipedia-index.json定义了一个数据摄入任务,该任务读取quickstart/wikiticker-2015-09-12-sampled.json.gz中数据:

    {
      "type" : "index",
      "spec" : {
        "dataSchema" : {
          "dataSource" : "wikipedia",
          "parser" : {
            "type" : "string",
            "parseSpec" : {
              "format" : "json",
              "dimensionsSpec" : {
                "dimensions" : [
                  "channel",
                  "cityName",
                  "comment",
                  "countryIsoCode",
                  "countryName",
                  "isAnonymous",
                  "isMinor",
                  "isNew",
                  "isRobot",
                  "isUnpatrolled",
                  "metroCode",
                  "namespace",
                  "page",
                  "regionIsoCode",
                  "regionName",
                  "user",
                  { "name": "added", "type": "long" },
                  { "name": "deleted", "type": "long" },
                  { "name": "delta", "type": "long" }
                ]
              },
              "timestampSpec": {
                "column": "time",
                "format": "iso"
              }
            }
          },
          "metricsSpec" : [],
          "granularitySpec" : {
            "type" : "uniform",
            "segmentGranularity" : "day",
            "queryGranularity" : "none",
            "intervals" : ["2015-09-12/2015-09-13"],
            "rollup" : false
          }
        },
        "ioConfig" : {
          "type" : "index",
          "firehose" : {
            "type" : "local",
            "baseDir" : "quickstart/",
            "filter" : "wikiticker-2015-09-12-sampled.json.gz"
          },
          "appendToExisting" : false
        },
        "tuningConfig" : {
          "type" : "index",
          "targetPartitionSize" : 5000000,
          "maxRowsInMemory" : 25000,
          "forceExtendableShardSpecs" : true
        }
      }
    }
    

    如上定义,创建了一个名为"wikipedia"的数据源.

    2)加载批量数据

    在druid-0.12.3 目录下,通过POST方式,提交数据摄取任务:

    curl -X 'POST' -H 'Content-Type:application/json' -d @examples/wikipedia-index.json http://localhost:8090/druid/indexer/v1/task
    

    假如任务提交成功,控制台将会打印任务ID:

    {"task":"index_wikipedia_2018-06-09T21:30:32.802Z"}
    

    可至overlord控制台http://localhost:8090/console.html查看你已提交的数据摄取任务的状态,可以周期性地刷新控制台,当任务成功时,你可以看到任务的状态变为"SUCCESS"
    当摄取任务结束,数据将会被historical节点加载,且在一两分钟内可被查询到,你可以通过coordinator控制台监控数据加载进度,当控制台http://localhost:8081/#/数据源“ wikipedia”带有一个蓝色圈圈时表明"fully available"

    image

    8、

    相关文章

      网友评论

          本文标题:Druid快速入门

          本文链接:https://www.haomeiwen.com/subject/qbmqqqtx.html