美文网首页好程序员大数据
好程序员大数据培训Logstach与flume对比

好程序员大数据培训Logstach与flume对比

作者: ab6973df9221 | 来源:发表于2019-08-13 15:33 被阅读0次

    好程序员大数据培训分享Logstach与flume对比,没有集群的概念,logstach与flume都称为组

    logstash是用JRuby语言开发的

    组件的对比:

      logstach : input  filter  output

      flume    : source  channel  sink  

    优劣对比:

    logstach :

     安装简单,安装体积小

     有filter组件,使得该工具具有数据过滤,数据切分的功能

     可以与ES无缝结合

     具有数据容错功能,在数据采集的时候,如果发生宕机或断开的情况,会断点续传(会记录读取的偏移量)

      综上,该工具主要用途为采集日志数据

    flume:

     高可用方面要比logstach强大

     flume一直在强调数据的安全性,flume在数据传输过程中是由事务控制的

     flume可以应用在多类型数据传输领域

    数据对接

    将logstach.gz文件上传解压即可

    可以在logstach目录下创建conf文件,用来存储配置文件

    一 命令启动

    1.bin/logstash -e 'input { stdin {} } output { stdout{} }'  

      stdin/stdout(标准输入输出流)

    hello xixi

    2018-09-12T21:58:58.649Z hadoop01 hello xixi

    hello haha

    2018-09-12T21:59:19.487Z hadoop01 hello haha

    2.bin/logstash -e 'input { stdin {} } output { stdout{codec => rubydebug} }'

    hello xixi

    {

           "message" => "hello xixi",

          "@version" => "1",

        "@timestamp" => "2018-09-12T22:00:49.612Z",

              "host" => "hadoop01"

    }

    3.es集群中 ,需要启动es集群

      bin/logstash -e 'input { stdin {} } output { elasticsearch {hosts => ["192.168.88.81:9200"]} stdout{} }'

    输入命令后,es自动生成index,自动mapping.

    hello haha

    2018-09-12T22:13:05.361Z hadoop01 hehello haha

      bin/logstash -e 'input { stdin {} } output { elasticsearch {hosts => ["192.168.88.81:9200", "192.168.88.82:9200"]} stdout{} }'

    4.kafka集群中,启动kafka集群

      bin/logstash -e 'input { stdin {} } output { elasticsearch {hosts => ["192.168.88.81:9200", "192.168.88.82:9200"]} stdout{} }'

    二 配置文件启动

    需要启动zookeeper集群,kafka集群,es集群

    1.与kafka数据对接

    vi logstash-kafka.conf

      启动

      bin/logstash -f logstash-kafka.conf  (-f:指定文件)

      在另一节点上启动kafka消费命令

    input {

      file {

        path => "/root/data/test.log"

        discover_interval => 5

        start_position => "beginning"

      }

    }

    output {

        kafka {

      topic_id => "test1"

      codec => plain {

            format => "%{message}"

    charset => "UTF-8"

          }

      bootstrap_servers => "node01:9092,node02:9092,node03:9092"

        }

    }

    2.与kafka-es数据对接

    vi logstash-es.conf

    #启动logstash

    bin/logstash -f logstash-es.conf

      在另一节点上启动kafka消费命令

    input {

    file {

    type => "gamelog"

    path => "/log/*/*.log"

    discover_interval => 10

    start_position => "beginning"

    }

    }

    output {

        elasticsearch {

    index => "gamelog-%{+YYYY.MM.dd}"

            hosts => ["node01:9200", "node02:9200", "node03:9200"]

        }

    }

    数据对接过程

    logstach节点存放: 哪个节点空闲资源多放入哪个节点 (灵活存放)

    1.启动logstach监控logserver目录,把数据采集到kafka

    2.启动另外一个logstach,监控kafka某个topic数据,把他采集到elasticsearch

    数据对接案例

    需要启动两个logstach,调用各个配置文件,进行对接

    1.采集数据到kafka

      cd conf

      创建配置文件: vi gs-kafka.conf

    input {

      file {

    codec => plain {

          charset => "GB2312"

        }

        path => "/root/basedir/*/*.txt"

        discover_interval => 5

        start_position => "beginning"

      }

    }

    output {

        kafka {

      topic_id => "gamelogs"

      codec => plain {

            format => "%{message}"

    charset => "GB2312"

          }

      bootstrap_servers => "node01:9092,node02:9092,node03:9092"

        }

    }

      创建kafka对应的topic

    bin/kafka-topics.sh --create --zookeeper hadoop01:2181 --replication-factor 1 --partitions 1 --topic gamelogs

    2.在hadoop01上启动logstach

      bin/logstash -f conf/gs-kafka.conf

    3.在hadoop02上启动另外一个logstach

      cd logstach/conf

      vi kafka-es.conf

    input {

      kafka {

        type => "accesslogs"

        codec => "plain"

        auto_offset_reset => "smallest"

        group_id => "elas1"

        topic_id => "accesslogs"

        zk_connect => "node01:2181,node02:2181,node03:2181"

      }

      kafka {

        type => "gamelogs"

        auto_offset_reset => "smallest"

        codec => "plain"

        group_id => "elas2"

        topic_id => "gamelogs"

        zk_connect => "node01:2181,node02:2181,node03:2181"

      }

    }

    filter {

      if [type] == "accesslogs" {

        json {

          source => "message"

      remove_field => [ "message" ]

      target => "access"

        }

      }

      if [type] == "gamelogs" {

        mutate {

          split => { "message" => " " }

          add_field => {

            "event_type" => "%{message[3]}"

            "current_map" => "%{message[4]}"

            "current_X" => "%{message[5]}"

            "current_y" => "%{message[6]}"

            "user" => "%{message[7]}"

            "item" => "%{message[8]}"

            "item_id" => "%{message[9]}"

            "current_time" => "%{message[12]}"

         }

         remove_field => [ "message" ]

       }

      }

    }

    output {

      if [type] == "accesslogs" {

        elasticsearch {

          index => "accesslogs"

      codec => "json"

          hosts => ["node01:9200", "node02:9200", "node03:9200"]

        }

      }

      if [type] == "gamelogs" {

        elasticsearch {

          index => "gamelogs1"

          codec => plain {

            charset => "UTF-16BE"

          }

          hosts => ["node01:9200", "node02:9200", "node03:9200"]

        }

      }

    }

       bin/logstash -f conf/kafka-es.conf

    4.修改basedir文件中任意数据即可产生es的index文件

    5.网页数据存储在设置的/data/esdata中

    6.在网页中查找指定字段

      默认分词器为term,只能查找单个汉字,query_string可以查找全汉字

    好程序员大数据培训官网:http://www.goodprogrammer.org/bigdata.shtml

    相关文章

      网友评论

        本文标题:好程序员大数据培训Logstach与flume对比

        本文链接:https://www.haomeiwen.com/subject/ukcbjctx.html