美文网首页
利用flume采集日志写到HDFS

利用flume采集日志写到HDFS

作者: 圈半球 | 来源:发表于2021-05-01 06:53 被阅读0次

flume安装比较简单,直接解压就好。

注意点:
1,flume必须持有hadoop相关的包才能将数据输出到hdfs, 将如下包上传到flume/lib下  
涉及到的包如下, 以hadoop-2.9.2为例:
    commons-configuration-1.6.jar
    commons-io-2.4.jar
    hadoop-auth-2.9.2.jar
    hadoop-common-2.9.2.jar
    hadoop-hdfs-2.9.2.jar
    hadoop-hdfs-client-2.9.2.jar
    htrace-core4-4.1.0-incubating.jar
    stax2-api-3.1.4.jar
    woodstox-core-5.0.3.jar
2,修改/etc/hosts, 加入hadoop的地址

3,暂不支持snappy的压缩形式
官网:File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC

配置文件内容:
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = g1
a1.sources.r1.filegroups.g1 = /script/flume/logdata/random_log.log
a1.sources.r1.headers.g1.x = y
a1.sources.r1.fileHeader = true
a1.sources.r1.fileHeaderKey = filepath
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
a1.sources.r1.interceptors.i1.headerName = timestamp

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000

a1.sinks.k1.channel = c1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop1:9000/flumedata/%Y-%m-%d/%H
a1.sinks.k1.hdfs.filePrefix = cc-log-
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.rollSize = 268435456
a1.sinks.k1.hdfs.rollInterval = 120
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.batchSize = 1000
a1.sinks.k1.hdfs.fileType = DataStream

a1.sinks.k1.hdfs.codeC = snappy

a1.sinks.k1.hdfs.useLocalTimeStamp = false

启动:
bin/flume-ng agent -c conf -f /script/flume/conf/titan_flumn.conf -n a1 -Dflume.root.logger=DEBUG,console

多目录日志文件采集配置:
a1.sources = r1
a1.sinks = k1
a1.channels = c1 c2

配置了两个source根据不同的路径采集文件

a1.sources.r1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = g1
a1.sources.r1.filegroups.g1 = /script/flume/logdata/random_log.log
a1.sources.r1.headers.g1.x = y
a1.sources.r1.fileHeader = true
a1.sources.r1.fileHeaderKey = filepath
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
a1.sources.r1.interceptors.i1.headerName = timestamp

a1.sources.r1.channels = c2
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = g2
a1.sources.r1.filegroups.g2 = /script/flume/logdata/random_log_b.log
a1.sources.r1.headers.g2.x = y
a1.sources.r1.fileHeader = true
a1.sources.r1.fileHeaderKey = filepath
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
a1.sources.r1.interceptors.i1.headerName = timestamp

channel也是如此

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 1000

sink也是如此

a1.sinks.k1.channel = c1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop1:9000/flumedata/%Y-%m-%d/%H
a1.sinks.k1.hdfs.filePrefix = cc-log-
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.rollSize = 268435456
a1.sinks.k1.hdfs.rollInterval = 120
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.batchSize = 1000
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = false

a1.sinks.k1.channel = c2
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop1:9000/flumedata_b/%Y-%m-%d/%H
a1.sinks.k1.hdfs.filePrefix = cc-log-
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.rollSize = 268435456
a1.sinks.k1.hdfs.rollInterval = 120
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.batchSize = 1000
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = false

相关文章

  • 利用flume采集日志写到HDFS

    flume安装比较简单,直接解压就好。 注意点:1,flume必须持有hadoop相关的包才能将数据输出到hdfs...

  • (十)大数据学习之sqoop

    Sqoop 1.架构: (1)flume数据采集 采集日志数据(2)sqoop数据迁移 hdfs->mysql(3...

  • 大数据学习之:Flume

    flume作用 从磁盘采集文件发送到HDFS 数据采集来源:系统日志文件、Python爬虫数据、端口数据 数据发送...

  • Kafka学习笔记二:Flume+Kafka安装

    Flume介绍 Flume是流式日志采集工具,FLume提供对数据进行简单处理并且写到各种数据接收方(可定制)的能...

  • Flume连接HDFS和Hive

    Flume连接HDFS 进入Flume配置 配置flume.conf 测试telnet通信 查看日志找到HDFS文...

  • 数据仓库基础架构

    数据采集:采用Flume收集日志,采用Sqoop将RDBMS以及NoSQL中的数据同步到HDFS上 消息系统:可以...

  • Flume从入门到精通2:Flume实战之采集日志保存到HDFS

    案例:采集一个目录下面的日志信息,并将采集结果保存到HDFS上。 1.定义Agent配置文件 由Flume的体系结...

  • Flume

    日志采集框架Flume 1 Flume介绍 1.概述 Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和...

  • 大数据架构

    移动计算比移动数据更划算 HDFS--离线数据 数据库同步经常用 Sqoop,日志同步用 Flume,打点采集的数...

  • Flume基础学习

    Flume是一款非常优秀的日志采集工具。支持多种形式的日志采集,作为apache的顶级开源项目,Flume再大数据...

网友评论

      本文标题:利用flume采集日志写到HDFS

      本文链接:https://www.haomeiwen.com/subject/zocarltx.html