美文网首页
Flume 实战

Flume 实战

作者: 学术界末流打工人 | 来源:发表于2020-02-03 14:53 被阅读0次

    概述

    Flume官网配置文档

    使用Flume的关键就是写配置文件
    A) 配置Source
    B) 配置Channel
    C) 配置Sink
    D) 把以上三个组件串起来

    配置文件解析

    a1: agent名称
    r1: source的名称
    k1: sink的名称
    c1: channel的名称

    # example.conf: A single-node Flume configuration
    
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = localhost
    a1.sources.r1.port = 44444
    
    # Describe the sink
    a1.sinks.k1.type = logger
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    启动agent

    flume-ng agent \
    --name a1  \
    --conf $FLUME_HOME/conf  \
    --conf-file $FLUME_HOME/conf/example.conf \
    -Dflume.root.logger=INFO,console
    

    实战一

    需求:从指定网络端口采集数据输出到控制台

    配置文件

    example.conf 在conf文件夹下

    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = hadoop000
    a1.sources.r1.port = 44444
    
    # Describe the sink
    a1.sinks.k1.type = logger
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    启动agent

    flume-ng agent \
    --name a1  \
    --conf $FLUME_HOME/conf  \
    --conf-file $FLUME_HOME/conf/example.conf \
    -Dflume.root.logger=INFO,console
    

    测试

    telnet hadoop000 44444
    
    hello
    hadoop
    

    实战二

    需求:监控一个文件实时采集新增的数据到控制台
    Agent选型:Source(exec) + Channel(memory) + sink(logger)

    配置文件

    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -F /home/hadoop/data/data.log
    a1.sources.r1.shell = /bin/sh -c
    
    # Describe the sink
    a1.sinks.k1.type = logger
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    启动agent

    flume-ng agent \
    --name a1  \
    --conf $FLUME_HOME/conf  \
    --conf-file $FLUME_HOME/conf/exec-memory-logger.conf \
    -Dflume.root.logger=INFO,console
    

    实战三

    需求:将A服务器上的日志实时采集到B服务器上


    结构

    Agent选型:

    1. Source(exec) + Channel(memory) + sink(avro)
    2. Source(avro) + Channel(memory) + sink(logger)

    配置文件

    配置文件1: exec-memory-avro.conf

    exec-memory-avro.sources = exec-source
    exec-memory-avro.sinks = avro-sink
    exec-memory-avro.channels = memory-channel
    
    exec-memory-avro.sources.exec-source.type = exec
    exec-memory-avro.sources.exec-source.command = tail -F /home/hadoop/data/data.log
    exec-memory-avro.sources.exec-source.shell = /bin/sh -c
    
    exec-memory-avro.sinks.avro-sink.type = avro
    exec-memory-avro.sinks.avro-sink.hostname = hadoop000
    exec-memory-avro.sinks.avro-sink.port = 44444
    
    exec-memory-avro.channels.memory-channel.type = memory
    
    exec-memory-avro.sources.exec-source.channels = memory-channel
    exec-memory-avro.sinks.avro-sink.channel = memory-channel
    

    配置文件2:avro-memory-logger.conf

    avro-memory-logger.sources = avro-source
    avro-memory-logger.sinks = logger-sink
    avro-memory-logger.channels = memory-channel
    
    avro-memory-logger.sources.avro-source.type = avro
    avro-memory-logger.sources.avro-source.bind = hadoop000
    avro-memory-logger.sources.avro-source.port = 44444
    
    avro-memory-logger.sinks.logger-sink.type = logger
    
    avro-memory-logger.channels.memory-channel.type = memory
    
    avro-memory-logger.sources.avro-source.channels = memory-channel
    avro-memory-logger.sinks.logger-sink.channel = memory-channel
    

    启动Agent

    先启动avro-memory-logger

    flume-ng agent \
    --name avro-memory-logger  \
    --conf $FLUME_HOME/conf  \
    --conf-file $FLUME_HOME/conf/avro-memory-logger.conf \
    -Dflume.root.logger=INFO,console
    
    flume-ng agent \
    --name exec-memory-avro  \
    --conf $FLUME_HOME/conf  \
    --conf-file $FLUME_HOME/conf/exec-memory-avro.conf \
    -Dflume.root.logger=INFO,console
    

    References

    1. Flume 官网
    2. Spark Streaming实时流处理项目实战

    相关文章

      网友评论

          本文标题:Flume 实战

          本文链接:https://www.haomeiwen.com/subject/vqefxhtx.html