硬件及软件要求:
- Java 8 or higher
- Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
- 8G of RAM
- 2 vCPUs
1、下载解压缩软件包
curl -O http://static.druid.io/artifacts/releases/druid-0.12.3-bin.tar.gz
tar -xzf druid-0.12.3-bin.tar.gz
cd druid-0.12.3
压缩包包含如下目录:
- LICENSE - license文件.
- bin/ - 快速启动相关脚本.
- conf/* - 为集群安装提供的配置模板.
- conf-quickstart/* - 快速入门配置文件.
- extensions/* - 所有Druid扩展文件.
- hadoop-dependencies/* - Druid Hadoop依赖文件.
- lib/* - 所有Druid核心软件包.
- quickstart/* - quickstart相关文件目录.
2、下载教程示例
curl -O http://druid.io/docs/0.12.3/tutorials/tutorial-examples.tar.gz
tar zxvf tutorial-examples.tar.gz
3、启动zookeeper
curl http://mirror.bit.edu.cn/apache/zookeeper/stable/zookeeper-3.4.12.tar.gz -o zookeeper-3.4.12.tar.gz
tar -xzf zookeeper-3.4.12.tar.gz
cd zookeeper-3.4.12
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
4、启动Druid服务
在druid-0.12.3目录下,执行如下命令
bin/init
init会做一些初始化工作,脚本内容如下:
#!/bin/bash -eu
gzip -c -d quickstart/wikiticker-2015-09-12-sampled.json.gz > "quickstart/wikiticker-2015-09-12-sampled.json"
LOG_DIR=var
mkdir log
mkdir -p $LOG_DIR/tmp;
mkdir -p $LOG_DIR/druid/indexing-logs;
mkdir -p $LOG_DIR/druid/segments;
mkdir -p $LOG_DIR/druid/segment-cache;
mkdir -p $LOG_DIR/druid/task;
mkdir -p $LOG_DIR/druid/hadoop-tmp;
mkdir -p $LOG_DIR/druid/pids;
在不同的终端窗口启动Druid进程,本教程在同一个操作系统运行所有Druid进程,在大型分布式生产集群环境中,部分Druid进程仍可以部署在一起。
java `cat examples/conf/druid/coordinator/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
java `cat examples/conf/druid/overlord/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/overlord:lib/*" io.druid.cli.Main server overlord
java `cat examples/conf/druid/historical/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/historical:lib/*" io.druid.cli.Main server historical
java `cat examples/conf/druid/middleManager/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
java `cat examples/conf/druid/broker/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/broker:lib/*" io.druid.cli.Main server broker
jvm.config为java进程运行参数配置,cat coordinator/jvm.config输出如下:
-server
-Xms256m
-Xmx256m
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dderby.stream.error.file=var/druid/derby.log
以上命令在不同终端窗口运行,分别启动了coordinator、overlord、historical、middleManager、broker进程。
5、重置Druid
所有持久化状态,如集群元数据存储和服务的segments都会保存在druid-0.12.3/var目录下.
如果你想停止服务,CTRL-C退出运行中的java进程。假如希望停止服务后,以初始化状态启动服务,删除log和var目录,再跑一遍init脚本,然后关闭Zookeeper,删除Zookeeper的数据目录/tmp/zookeeper。
在druid-0.12.3 目录下:
rm -rf log
rm -rf var
bin/init
假如你学习了Loading stream data from Kafka教程,在关闭Zookeeper之前你需要先关闭Kafka,删除Kafka日志目录/tmp/kafka-logs
Ctrl-C 关闭Kafka broker,删除日志目录:
rm -rf /tmp/kafka-logs
现在关闭Zookeeper,清理状态,在zookeeper-3.4.12目录下:
./bin/zkServer.sh stop
rm -rf /tmp/zookeeper
清理了Druid和Zookeeper状态数据后,重启Zookeeper和Druid服务。
6、数据集
如下数据加载教程中,我们会使用到一份数据文件,Druid包根目录下的quickstart/wikiticker-2015-09-12-sampled.json.gz,文件内容包含了2015-09-12这一天Wikipedia页面编辑事件。页面编辑事件以json对象格式存储于text文件中。
数据包含了如下列:
- added
- channel
- cityName
- comment
- countryIsoCode
- countryName
- deleted
- delta
- isAnonymous
- isMinor
- isNew
- isRobot
- isUnpatrolled
- metroCode
- namespace
- page
- regionIsoCode
- regionName
- user
如下为一条示例数据:
{
"timestamp":"2015-09-12T20:03:45.018Z",
"channel":"#en.wikipedia",
"namespace":"Main"
"page":"Spider-Man's powers and equipment",
"user":"foobar",
"comment":"/* Artificial web-shooters */",
"cityName":"New York",
"regionName":"New York",
"regionIsoCode":"NY",
"countryName":"United States",
"countryIsoCode":"US",
"isAnonymous":false,
"isNew":false,
"isMinor":false,
"isRobot":false,
"isUnpatrolled":false,
"added":99,
"delta":99,
"deleted":0,
}
7、从文件中夹在数据
1) 准备数据、定义数据摄取任务
A data load is initiated by submitting an ingestion task spec to the Druid overlord. For this tutorial, we'll be loading the sample Wikipedia page edits data.
向Druid overlord提交一个数据摄取任务,即完成了数据的初始化,如下我们会加载Wikipedia页面编辑数据。
examples/wikipedia-index.json
定义了一个数据摄入任务,该任务读取quickstart/wikiticker-2015-09-12-sampled.json.gz
中数据:
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "wikipedia",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
{ "name": "added", "type": "long" },
{ "name": "deleted", "type": "long" },
{ "name": "delta", "type": "long" }
]
},
"timestampSpec": {
"column": "time",
"format": "iso"
}
}
},
"metricsSpec" : [],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"],
"rollup" : false
}
},
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "quickstart/",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
}
}
}
如上定义,创建了一个名为"wikipedia"的数据源.
2)加载批量数据
在druid-0.12.3 目录下,通过POST方式,提交数据摄取任务:
curl -X 'POST' -H 'Content-Type:application/json' -d @examples/wikipedia-index.json http://localhost:8090/druid/indexer/v1/task
假如任务提交成功,控制台将会打印任务ID:
{"task":"index_wikipedia_2018-06-09T21:30:32.802Z"}
可至overlord控制台http://localhost:8090/console.html查看你已提交的数据摄取任务的状态,可以周期性地刷新控制台,当任务成功时,你可以看到任务的状态变为"SUCCESS"
当摄取任务结束,数据将会被historical节点加载,且在一两分钟内可被查询到,你可以通过coordinator控制台监控数据加载进度,当控制台http://localhost:8081/#/数据源“ wikipedia”带有一个蓝色圈圈时表明"fully available"
8、
网友评论