美文网首页
Hadoop-Flume基础实战(2)

Hadoop-Flume基础实战(2)

作者: GuangHui | 来源:发表于2018-04-21 11:15 被阅读122次

    一. Flume安装与配置

    1. Flume官网: http://flume.apache.org
    2. JDK版本要求1.7及以上
    3. 此次下载与安装使用的Flume版本为: apache-flume-1.6.0-bin.tar.gz
      <1> 解压命令:tar -zxvf apache-flume-1.6.0-bin.tar.gz
      <2> 安装目录: /usr/local/src/apache-flume-1.6.0-bin
      <3> 配置环境变量vi ~/.bashrc如下配置:
    # new add FLUME_HOME
    export FLUME_HOME=/usr/local/src/apache-flume-1.6.0-bin
    
    # new add FLUME_HOME into PATH
    export PATH=$FLUME_HOME/bin:$PATH
    

    <4> 完整的~/.bashrc环境变量配置为:

    # .bashrc
    
    # User specific aliases and functions
    
    alias rm='rm -i'
    alias cp='cp -i'
    alias mv='mv -i'
    
    # Source global definitions
    if [ -f /etc/bashrc ]; then
            . /etc/bashrc
    fi
    
    iptables -F
    setenforce 0
    hostname master
    
    
    export JAVA_HOME=/usr/local/src/jdk1.7.0_80
    export HADOOP_HOME=/usr/local/src/hadoop-2.6.1
    # new add FLUME_HOME
    export FLUME_HOME=/usr/local/src/apache-flume-1.6.0-bin
    
    # added by Anaconda3
    #export PATH =/root/anaconda3/bin:$PATH
    export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
    export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
    # new add FLUME_HOME into PATH
    export PATH=$FLUME_HOME/bin:$PATH
    

    <5> 重新加载环境变量: source ~/.bashrc
    <6> 检查$FLUME_HOME配置是否生效,执行命令echo $FLUME_HOME,并观察:

    [root@master ~]# echo $FLUME_HOME
    /usr/local/src/apache-flume-1.6.0-bin
    

    二.Flume实战小项目

    Flume配置文件存放路径: /usr/local/src/apache-flume-1.6.0-bin/conf
    配置说明:
    a) 配置source
    b) 配置channel
    c) 配置sink
    d) 把以上三个组件串起来

    2.1 NetCat方式

    需求: 监听一个ip端口,并将收到的信息输出到console控制台中

    <1> 在conf/目录下新增配置文件netcat_console.conf,配置内容如下:

    mple.conf: A single-node Flume configuration
    
    # Name the components on this agent
    ## agent的名称: a1
    ## a1的source名称: r1
    ## a1的sink名称: k1
    ## a1的channel名称为:c1
    ## 复数表示可以配置多个
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    # 配置agent a1的source r1
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = localhost
    a1.sources.r1.port = 44444
    
    # Describe the sink
    # 配置agent a1的sink k1
    a1.sinks.k1.type = logger
    
    # Use a channel which buffers events in memory
    # 配置agent a1的channel c1
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    # 一个source可以对应多个channel,一个sink只能对应一个channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    <2> 运行flume-ng

     flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/netcat_console.conf --name a1 -Dflume.root.logger=INFO,console
    

    说明:

     flume-ng agent   \
    --conf $FLUME_HOME/conf   \   #指定配置文件存放的文件夹
    --conf-file $FLUME_HOME/conf/netcat_console.conf  \    #指定配置文件
    --name a1   \   #指定agent名称
    -Dflume.root.logger=INFO,console
    

    <3> Telnet对应host和端口:

    [root@master badou]# telnet localhost 44444
    Trying ::1...
    telnet: connect to address ::1: Connection refused
    Trying 127.0.0.1...
    Connected to localhost.
    Escape character is '^]'.
    111
    OK
    222
    OK
    333
    OK
    

    观察flume logger

    2.2 Exec方式

    需求:监听一个日志文件的变化,并实时将文件新增内容,输出到console控制台中

    <1> 在conf/目录下新增配置文件exec_console.conf,配置内容如下:

    mple.conf: A single-node Flume configuration
    
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -F /usr/local/src/flume_test.txt
    
    # Describe the sink
    a1.sinks.k1.type = logger
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    <2> 运行flume-ng
    执行命令:

    flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec_console.conf --name a1 -Dflume.root.logger=INFO,console
    

    <3> 向对应文件尾部追加内容:

    echo 111 >> /usr/local/src/flume_test.txt
    

    观察flume logger.

    2.3 HDFS

    **需求: **通过flume将指定的文件,上传到hdfs中,并指定位置与命名规则
    <1> 在conf/目录下新增配置文件avro_hdfs.conf,配置内容如下:

    mple.conf: A single-node Flume configuration
    
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = avro
    a1.sources.r1.bind = 0.0.0.0
    a1.sources.r1.port = 41414
    
    # Describe the sink
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.hdfs.path = hdfs://master:9000/flume_data_pool
    a1.sinks.k1.hdfs.filePrefix = events-
    a1.sinks.k1.hdfs.fileType=DataStream
    a1.sinks.k1.hdfs.writeFormat=Text
    a1.sinks.k1.hdfs.roundSize = 0
    a1.sinks.k1.hdfs.roundCount = 600000
    a1.sinks.k1.hdfs.roundInterval = 600
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    <2> 运行flume-ng
    执行命令:

    flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/avro_hdfs.conf --name a1 -Dflume.root.logger=INFO,console
    

    <3> 验证

    flume-ng avro-client --conf conf -H master -p 41414 -F /usr/local/src/flume_test.txt -Dflume.root.logger=DEBUG,console
    

    执行hdfs命令查看文件是否存在:

    hadoop fs -ls /
    # 查看文件内容是否一致:
    hadoop fs -text /flume_data_pool/events-.1524279392273
    
    2.4 模拟使用Flume监听日志变化,并且把增量日志文件写入到hdfs中

    <1> 在conf/目录下新增配置文件exec_hdfs.conf,配置内容如下:

    mple.conf: A single-node Flume configuration
    
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    ##
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -F /usr/local/src/flume_test/monitor_source/1.log
    
    # Describe the sink
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/tailout/%y-%m-%d/%H%M/
    a1.sinks.k1.hdfs.filePrefix = events-
    a1.sinks.k1.hdfs.fileType=DataStream
    a1.sinks.k1.hdfs.round=true
    a1.sinks.k1.hdfs.roundValue=1
    a1.sinks.k1.hdfs.rountUnit=minute
    a1.sinks.k1.hdfs.writeFormat=Text
    a1.sinks.k1.hdfs.roundSize = 20
    a1.sinks.k1.hdfs.roundCount = 5
    a1.sinks.k1.hdfs.roundInterval = 3
    a1.sinks.k1.hdfs.bathchSize=10
    a1.sinks.k1.hdfs.useLocalTimeStamp=true
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    <2> 运行flume-ng
    执行命令:

    flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec_hdfs.conf --name a1 -Dflume.root.logger=INFO,console
    

    <3> 验证

    echo 111 >> /usr/local/src/flume_test/monitor_source/1.log
    

    根据日志查看比对内容:

    hadoop fs -text /flume/tailout/18-04-21/1104/events-.1524279852216
    

    相关文章

      网友评论

          本文标题:Hadoop-Flume基础实战(2)

          本文链接:https://www.haomeiwen.com/subject/pbbtlftx.html