Druid集群搭建

作者: MrSocean | 来源:发表于2019-01-21 18:17 被阅读52次

    环境配置

    Java1.8 | Mysql:5.6.42 | Hadoop:3.1.1 | Druid:0.12.3 。本篇文章默认读者环境中已经有了Java Mysql Hadoop,对于Druid所依赖的这些配置不做具体讲解。

    准备工作

    1:mysql(作为Druid的 Metadata Storage)
    1):为druid创建库druid
        CREATE DATABASE 'druid' DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
        GRANT ALL PRIVILEGES;
    2):为druid创建用户druid:druid1234
        grant all on druid.* to druid@'%' identified by 'druid1234' WITH GRANT OPTION;
        flush privileges;
    
    2:Hadoop集群正常提供服务(zk,hdfs)
    HDFS作为Druid的 Deep Storage
    ZK作为Druid的集群状态管理,用于集群协调
    

    集群节点规划

    集群部署在三个节点上(192.168.0.180,192.168.0.181,192.168.0.182),具体硬件要求,官网:http://druid.io/docs/0.12.3/tutorials/cluster.html

    节点192.168.0.180-Master Server: Coordinator / Overlord
    
    节点192.168.0.181-Query Server:Broker
    
    节点192.168.0.182-Data Server: Historical / Middle Manager
    

    集群配置

    下载&解压

    1)在节点192.168.0.180:/opt/app目录下下载Druid:
    curl -O http://static.druid.io/artifacts/releases/druid- 0.12.3-bin.tar.gz
    2)解压:tar -zxvf druid-0.12.3-bin.tar.gz 得到文件夹:druid-0.12.3

    修改配置

    1)进入druid-0.12.3/conf/druid/_common,修改配置文件 [common.runtime.properties]
    [common.runtime.properties]

    #Extensions
    druid.extensions.loadList=["druid-datasketches", "druid-lookups-cached-global","mysql-metadata-storage","druid-hdfs-storage""druid-histogram","druid-kafka-indexing-service"]
    
    # Logging
    druid.startup.logging.logProperties=true
        
    # Zookeeper
    druid.zk.service.host=192.168.0.180:2181,192.168.0.181:2181,192.168.0.182:2181
    druid.zk.paths.base=/druid
    
    # For MySQL
    druid.metadata.storage.type=mysql
    druid.metadata.storage.connector.connectURI=jdbc:mysql://192.168.0.182:3306/druid?characterEncoding=UTF-8
    druid.metadata.storage.connector.user=druid
    druid.metadata.storage.connector.password=druid1234
    
    # Deep storage - HDFS
    druid.storage.type=hdfs
    druid.storage.storageDirectory=/druid/segments
        
    # Indexing service logs
    druid.indexer.logs.type=hdfs
    druid.indexer.logs.directory=/druid/indexing-logs
    
    # Monitoring
    druid.monitoring.monitors=["io.druid.java.util.metrics.JvmMonitor"]
    druid.emitter=logging
    druid.emitter.logging.logLevel=info
        
    # Storage type of double columns
    druid.indexing.doubleStorage=double
    

    2)进入druid-0.12.3/conf/druid/coordinator,修改配置文件[jvm.config] 和 [runtime.properties]:
    [jvm.config]

    -server
    -Xms256m
    -Xmx256m
    -Duser.timezone=UTC+8
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=var/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    -Dderby.stream.error.file=var/druid/derby.log
    

    [runtime.properties]

    druid.service=druid/coordinator
    druid.host=192.168.0.180
    druid.port=8081
    druid.coordinator.startDelay=PT30S
    druid.coordinator.period=PT30S
    

    3)进入druid-0.12.3/conf/druid/overlord,修改配置文件[jvm.config] 和 [runtime.properties]:
    [jvm.config]

    -server
    -Xms512m
    -Xmx512m
    -XX:NewSize=256m
    -XX:MaxNewSize=256m
    -XX:+UseConcMarkSweepGC
    -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps
    -Duser.timezone=UTC+8
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=var/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    

    [runtime.properties]

    druid.service=druid/overlord
    druid.host=192.168.0.180
    druid.port=8090
        
    druid.indexer.queue.startDelay=PT30S
        
    druid.indexer.runner.type=remote
    druid.indexer.storage.type=metadata
    

    4)将节点192.168.0.180上的druid-0.12.3文件copy到其它两个节点上

    scp -r /opt/app/druid-0.12.3/ bigdata@192.168.0.181:/opt/app/.
    scp -r /opt/app/druid-0.12.3/ bigdata@192.168.0.182:/opt/app/.
    

    注意
    在修改broker和historical的配置文件时,以下参数设置要求可以参考官网

    MaxDirectMemorySize >= druid.processing.buffer.sizeByte *(druid.processing.numMergeBuffers + druid.processing.numThreads + 1) 
    
    druid.processing.numMergeBuffers = max(2, druid.processing.numThreads / 4)
    
    druid.processing.numThreads =  Number of cores - 1 (or 1)
    
    druid.server.http.numThreads = max(10, (Number of cores * 17) / 16 + 2) + 30
    

    5)在节点192.168.0.181上,进入druid-0.12.3/conf/druid/broker,修改配置文件[jvm.config] 和 [runtime.properties]
    [jvm.config]

    -server
    -Xms1g
    -Xmx1g
    -XX:NewSize=256m
    -XX:MaxNewSize=256m
    -XX:MaxDirectMemorySize=4608m
    -XX:+UseConcMarkSweepGC
    -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps
    -Duser.timezone=UTC+8
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=var/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    

    [runtime.properties]

    druid.service=druid/broker
    druid.host=192.168.0.181
    druid.port=8082
        
    # HTTP server threads
    druid.broker.http.numConnections=5
    druid.server.http.numThreads=25
        
    # Processing threads and buffers
    druid.processing.buffer.sizeBytes=536870912
    druid.processing.numMergeBuffers=2
    druid.processing.numThreads=6
        
        
    # Query cache
    druid.broker.cache.useCache=false
    druid.broker.cache.populateCache=false
    druid.cache.type=local
    druid.cache.sizeInBytes=2000000000
        
    # SQL
    druid.sql.enable=true
    

    6)在节点192.168.0.182上,进入druid-0.12.3/conf/druid/historical,修改配置文件[jvm.config] 和 [runtime.properties]
    [jvm.config]

    -server
    -Xms1g
    -Xmx1g
    -XX:NewSize=512m
    -XX:MaxNewSize=512m
    -XX:MaxDirectMemorySize=3072m
    -XX:+UseConcMarkSweepGC
    -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps
    -Duser.timezone=UTC+8
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=var/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    

    [runtime.properties]

    druid.service=druid/historical
    druid.host=192.168.0.182
    druid.port=8083
        
    # HTTP server threads
    druid.server.http.numThreads=25
        
    # Processing threads and buffers
    druid.processing.buffer.sizeBytes=25600000
    druid.processing.numMergeBuffers=2
    druid.processing.numThreads=6
        
    # Segment storage
    druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize":130000000000}]
    druid.server.maxSize=130000000000
        
    # Query cache
    druid.historical.cache.useCache=true
    druid.historical.cache.populateCache=true
    druid.cache.type=caffeine
    druid.cache.sizeInBytes=2000000000
    

    7)在节点192.168.0.182上,进入druid-0.12.3/conf/druid/middleManager,修改配置文件[jvm.config] 和 [runtime.properties]
    [jvm.config]

    -server
    -Xms1024m
    -Xmx1024m
    -XX:+UseConcMarkSweepGC
    -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps
    -Duser.timezone=UTC+8
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=var/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    

    [runtime.properties]

    druid.service=druid/middleManager
    druid.host=192.168.0.182
    druid.port=8091
        
    # Number of tasks per middleManager
    druid.worker.capacity=3
        
    # Task launch parameters
    druid.indexer.runner.javaOpts=-server -Xmx2g -Duser.timezone=UTC+8 -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    druid.indexer.task.baseTaskDir=var/druid/task
    druid.indexer.task.restoreTasksOnRestart=true
        
    # HTTP server threads
    druid.server.http.numThreads=25
        
    # Processing threads and buffers on Peons
    druid.indexer.fork.property.druid.processing.buffer.sizeBytes=25600000
    druid.indexer.fork.property.druid.processing.numThreads=2
        
    # Hadoop indexing
    druid.indexer.task.hadoopWorkingPath=/druid/hadoop-tmp
    druid.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:3.1.1"]
    

    8)需要在三个节点分别执行下面的步骤:
    ①切换到Druid根目录,下载hadoop-client,3.1.1的外部依赖

    java -classpath "lib/*" io.druid.cli.Main tools pull-deps -h org.apache.hadoop:hadoop-client:3.1.1
    

    ②本文选用mysql存metadata,需要下载mysql storage

    curl -O http://static.druid.io/artifacts/releases/mysql-metadata-storage-0.12.3.tar.gz
    解压:tar -zxvf mysql-metadata-storage-0.12.3.tar.gz  生成:mysql-metadata-storage ,将解压后的文件copy到/druid-0.12.3/extensions目录下
    
    extension.png

    ③本文选用hdfs作为数据的存储位置,故需要将Hadoop配置XML文件(core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml)放在Druid节点的classpath上。你可以通过将它们复制到conf/druid/_common/中来实现

    hadoop-config.png

    ④切换到Druid根目录,执行 bin/init,会在根目录下生成: var 和 log两个文件夹
    var文件

    集群启动

    1)在192.168.0.180节点上启动coordinator和overlord

    方式1:java `cat conf/druid/coordinator/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
           java `cat conf/druid/overlord/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/overlord:lib/*" io.druid.cli.Main server overlord
    
    方式2:./bin/coordinator.sh start
            ./bin/overlord.sh start
    

    2)在192.168.0.181节点上启动broker

    方式1:java `cat conf/druid/broker/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/broker:lib/*" io.druid.cli.Main server broker
    方式2:./bin/broker.sh start
    

    3)在192.168.0.182节点上启动historical 和 middleManager

    方式1:java `cat conf/druid/historical/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/historical:lib/*" io.druid.cli.Main server historical
           java `cat conf/druid/middleManager/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
                
    方式2:./bin/historical.sh start
          ./bin/middleManager.sh start
    

    查看集群状态

    http://192.168.0.180:8081
    5.png
    http://192.168.0.180:8090
    4.png

    小小案例

    描述:现在我们实现一个从hdfs中把数据加载到Druid中的Demo。然后在broker节点上再去将我们刚刚创建的表中放入的数据查询出来。
    1:首先我们将在本地创建的一个json文件hdfs-data.json上传到hdfs /tmp/druid目录下,hdfs-data.json内容如下:
    {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024}
    {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133}
    {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780}
    {"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289}
    {"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971}
    {"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204}
    {"timestamp":"2018-01-02T21:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":38,"bytes":6289}
    {"timestamp":"2018-01-02T21:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":123,"bytes":93999}
    {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":12,"bytes":2818}
    

    hdfs dfs -put /opt/app/druid-0.12.3/quickstart/hdfs-book/hdfs-data.json /tmp/druid/.

    2:创建一个index的json文件 hdfs-index.json, hdfs-index.json内容:
      {
        "type" : "index_hadoop",
        "spec" : {
                "dataSchema" : {
                "dataSource" : "rollup-tutorial",
                "parser" : {
                        "type" : "string",
                        "parseSpec" : {
                        "format" : "json",
                        "dimensionsSpec" : {
                            "dimensions" : [
                                    "srcIP",
                                    "dstIP",
                                    "packets",
                                    "bytes"
                                 ]
                         },
                        "timestampSpec": {
                            "column": "timestamp",
                            "format": "iso"
                          }
                      }
                   },
                  "metricsSpec" : [],
                  "granularitySpec" : {
                        "type" : "uniform",
                        "segmentGranularity" : "day",
                        "queryGranularity" : "none",
                        "intervals" : ["2008-01-01/2018-01-03"],
                        "rollup" : false
                    }
                },
             "ioConfig" : {
                    "type" : "hadoop",
                    "inputSpec" : {
                          "type" : "static",
                          "paths" : "/tmp/druid/hdfs-data.json"
                      }
               },
            "tuningConfig" : {
                "type" : "hadoop",
                "targetPartitionSize" : 5000000,
                "maxRowsInMemory" : 25000,
                "forceExtendableShardSpecs" : true,
                "jobProperties" : {
                        "mapreduce.job.classloader" : "true"
                   }
              }
            },
            "hadoopDependencyCoordinates" : [
                  "org.apache.hadoop:hadoop-client:3.1.1"
              ]
      }
    
    3:执行下面的语句,摄入数据
     curl -X 'POST' -H 'Content-Type:application/json' -d @hdfs-index.json http://host:8090/druid/indexer/v1/task
    
    出现下面的提示,说明我们的任务已经提交成功了!
    提交task.png
    4:在overlord web页面上查看我们提交的task
    task.png
    5.在Coordinator web页面查看刚刚创建的表
    5.png
    6.png

    相关文章

      网友评论

        本文标题:Druid集群搭建

        本文链接:https://www.haomeiwen.com/subject/yiiqjqtx.html