美文网首页
爬虫日志收集(flume+kafka+elk)

爬虫日志收集(flume+kafka+elk)

作者: 财务自由_lang | 来源:发表于2017-08-09 13:37 被阅读0次

    (一)flume1.6

    1.1 flume配置(将日志上传到HDFS离线分析和kafka实时分析)

    a1.sources = r1

    a1.sinks = k2 k1

    a1.channels = c2 c1

    # Describe/configure the source

    a1.sources.r1.type = exec

    a1.sources.r1.command=tail -n +0 -f /usr/lang/log.log

    a1.sources.r1.channels = c1

    a1.sources.r1.channels = c2

    # Describe the sink

    a1.sinks.k1.type = hdfs

    a1.sinks.k1.channel = c1

    a1.sinks.k1.hdfs.path = hdfs://lang:8020/user/flume

    a1.sinks.k1.hdfs.filePrefix = events-

    a1.sinks.k1.hdfs.round = true

    a1.sinks.k1.hdfs.roundValue = 10

    a1.sinks.k1.hdfs.roundUnit = minute

    a1.sinks.k2.channel=c2

    a1.sinks.k2.type=org.apache.flume.sink.kafka.KafkaSink

    a1.sinks.k2.topic=lang

    a1.sinks.k2.brokerList=node1:9092

    a1.sinks.k2.requiredAcks=1

    a1.sinks.k2.batchSize=20

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity = 1000

    a1.channels.c1.transactionCapacity = 100

    a1.channels.c2.type = memory

    a1.channels.c2.capacity = 1000

    a1.channels.c2.transactionCapacity = 100

    1.2 flume启动

    bin/flume-ng  agent -c conf -f conf/flume-conf -n a1 -Dflume.root.logger=DEBUG,console

    (二)kafka 0.11集群

    2.1重要配置文件

    server.properties:

            broker.id=0  (根据实际主机,分配0,1,2)

            listeners=PLAINTEXT://:9092

            zookeeper.connect=192.168.205.11:2181,192.168.205.12:2181,192.168.205.13:2181

    producer.properties

            bootstrap.servers=192.168.205.11:9092,192.168.205.12:9092,192.168.205.13:9092

    consumer.properties

             zookeeper.connect=192.168.205.11:2181,192.168.205.12:2181,192.168.205.13:2181

    2.2同步配置文件

    2.3相关命令

    先启动zookeeper

    启动kafka   bin/kafka-server-start.sh config/server.properties &

    停止kafka    bin/kafka-server-stop.sh

    创建topic     bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic lang

    展示topic     bin/kafka-topics.sh --list --zookeeper localhost:2181

    描述topic     bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic lang

    生产者:      bin/kafka-console-producer.sh --broker-list node1:9092 --topic lang

    消费者:      bin/kafka-console-consumer.sh -bootstrap-server localhost:9092 --topic lang --from-beginning

    删除topic:     bin/kafka-topics.sh --delete --zookeeper 130.51.23.95:2181 --topic topicname

    (三)logstash5.5.1

    3.1配置(文件输入,es输出)

    input {

    file {

    path => ["/usr/lang/log.log"]

    start_position => "beginning"

    }

    }

    filter {

    date {

    match => [ "timestamp" , "YYYY-MM-dd HH:mm:ss" ]

    }

    }

    output {

    elasticsearch {

    hosts => ["192.168.205.14:9200"]

    }

    stdout {

    codec => rubydebug

    }

    }

    3.2配置(kafka输入,es输出)

    input {

    kafka {

    #workers =>2

    bootstrap_servers => "node1:9092,node2:9092,node3:9092"    #zookeeper地址

    topics => "lang"    #kafka中topic名称,记得创建该topic

    #group_id => "logstash"    #默认为“logstash”

    #consumer_threads =>2    #消费的线程数

    #reset_beginning => false

    #reset_beginning=>true

    #decorate_events => true    #在输出消息的时候回输出自身的信息,包括:消费消息的大小、topic来源以及consumer的group信息。

    #type => "nginx-access-log"

    }

    }

    filter {

    date {

    match => [ "timestamp" , "YYYY-MM-dd HH:mm:ss" ]

    }

    }

    output {

    elasticsearch {

    hosts => ["192.168.205.14:9200"]

    #index => "kafakindex-%{+YYYY.MM.dd}"

    }

    stdout {

    codec => rubydebug

    }

    }

    (四)elasticsearch

    4.1内存配置   config/jvm.properties

    4.2配置文件   config/elsticsearch

    cluster.name: my-application

    node.name: node-1(集群中名称不一样)

    network.host: 192.168.205.14

    http.port: 9200

    bootstrap.system_call_filter: false

    http.cors.enabled: true

    http.cors.allow-origin: "*"

    4.3注意事项:Java内存参数,配置文件中空格问题

    4.4elasticsearch-head(索引UI管理界面)

    (五)kibana

    没啥,直接启动

    有问题直接联系我 QQ:1146941596

    相关文章

      网友评论

          本文标题:爬虫日志收集(flume+kafka+elk)

          本文链接:https://www.haomeiwen.com/subject/qcofrxtx.html