美文网首页
Flume:Agent应用--实时监控读取日志数据,存储hdfs

Flume:Agent应用--实时监控读取日志数据,存储hdfs

作者: chengruru | 来源:发表于2018-08-15 20:36 被阅读0次
    一、任务描述
    1.收集hive 运行的log日志(source)
          /opt/cloudera/hive/logs/hive.log
          读取文件内容目录命令:tail -f
    2.内存(channel)
    3.存储在HDFS上(sink)
    
    二、创建agent文件

    创建一个agent配置文件:agent.conf

    $ cd flume/conf
    $ sudo cp flume-conf.properties.template agent.conf
    $ sudo vim agent.conf
    

    修改agent.conf文件内容如下所示:

    # 定义一个agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe configure the source
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -f /opt/cloudera/hive/logs/hive.log
    a1.sources.r1.shell = /bin/sh -c
    
    # Describe the sink 
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.hdfs.path = hdfs://Master:9000/user/hadoop/flume/hive-log
    a1.sinks.k1.hdfs.fileType = DataStream
    a1.sinks.k1.hdfs.writeFormat = Text
    a1.sinks.k1.hdfs.batchSize = 10
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # 通过channel将source与sink连接起来 
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    
    三、执行命令
    $ cd flume
    $ ./bin/flume-ng agent \
    > --conf conf \
    > --name agent \
    > --conf-file agent.conf
    

    结果显示:

    2018-08-15 05:13:20,988 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://Master:9000/user/hadoop/flume/hive-log/FlumeData.1534335148147.tmp
    2018-08-15 05:13:21,002 (hdfs-k1-call-runner-7) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas(AbstractHDFSWriter.java:200)] Using getNumCurrentReplicas--HDFS-826
    2018-08-15 05:13:21,002 (hdfs-k1-call-runner-7) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetDefaultReplication(AbstractHDFSWriter.java:228)] Using FileSystem.getDefaultReplication(Path) from HADOOP-8014
    2018-08-15 05:13:21,006 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:618)] rolling: rollCount: 10, events: 10
    2018-08-15 05:13:21,007 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:393)] Closing hdfs://Master:9000/user/hadoop/flume/hive-log/FlumeData.1534335148147.tmp
    2018-08-15 05:13:21,010 (hdfs-k1-call-runner-2) [INFO - org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:655)] Renaming hdfs://Master:9000/user/hadoop/flume/hive-log/FlumeData.1534335148147.tmp to hdfs://Master:9000/user/hadoop/flume/hive-log/FlumeData.1534335148147
    

    此时,我们通过进入hive命令行执行相关命令,产生日志,则flume则可进行收集。

    $ cd hive/
    $ ./bin/hive
    hive (default)> show tables;
    

    查看hdfs上,在agent.conf中配置的文件夹:/user/hadoop/flume/hive-log

    $ cd hadoop/
    $ ./bin/hdfs dfs -ls /user/hadoop/flume/hive-log
    结果显示如下:
    Found 5 items
    -rw-r--r--   1 hadoop supergroup       1158 2018-08-15 05:06 /user/hadoop/flume/hive-log/FlumeData.1534334788517
    -rw-r--r--   1 hadoop supergroup       1036 2018-08-15 05:06 /user/hadoop/flume/hive-log/FlumeData.1534334788518
    -rw-r--r--   1 hadoop supergroup       1036 2018-08-15 05:07 /user/hadoop/flume/hive-log/FlumeData.1534334788519
    -rw-r--r--   1 hadoop supergroup       1225 2018-08-15 05:07 /user/hadoop/flume/hive-log/FlumeData.1534334788520
    -rw-r--r--   1 hadoop supergroup        847 2018-08-15 05:07 /user/hadoop/flume/hive-log/FlumeData.1534334788521.tmp
    

    查看收集文件的内容:

    $ ./bin/hdfs dfs -cat /user/hadoop/flume/hive-log/FlumeData.1534334788517
    结果显示如下(hive运行日志信息):
    2018-08-15 05:03:31,525 INFO  [main]: ql.Driver (Driver.java:compile(570)) - Semantic Analysis Completed
    2018-08-15 05:03:31,526 INFO  [main]: ql.Driver (Driver.java:getSchema(303)) - Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
    2018-08-15 05:03:31,530 INFO  [main]: ql.Driver (Driver.java:compile(690)) - Completed compiling command(queryId=hadoop_20180815050303_474cafaf-4338-4a93-b861-11e0d1297a42); Time taken: 0.01 seconds
    2018-08-15 05:03:31,531 INFO  [main]: ql.Driver (Driver.java:checkConcurrency(223)) - Concurrency mode is disabled, not creating a lock manager
    2018-08-15 05:03:31,531 INFO  [main]: ql.Driver (Driver.java:execute(1656)) - Executing command(queryId=hadoop_20180815050303_474cafaf-4338-4a93-b861-11e0d1297a42): show tables
    2018-08-15 05:03:31,531 INFO  [main]: ql.Driver (Driver.java:launchTask(2050)) - Starting task [Stage-0:DDL] in serial mode
    2018-08-15 05:03:31,561 INFO  [main]: ql.Driver (Driver.java:execute(1958)) - Completed executing command(queryId=hadoop_20180815050303_474cafaf-4338-4a93-b861-11e0d1297a42); Time taken: 0.03 seconds
    

    至此结束!

    相关文章

      网友评论

          本文标题:Flume:Agent应用--实时监控读取日志数据,存储hdfs

          本文链接:https://www.haomeiwen.com/subject/yfbwbftx.html