美文网首页造个轮子
Flume将 kafka 中的数据转存到 HDFS 中

Flume将 kafka 中的数据转存到 HDFS 中

作者: XIAO_WS | 来源:发表于2018-12-19 10:22 被阅读13次

    flume1.8 kafka Channel + HDFS sink(without sources)

    将 kafka 中的数据转存到 HDFS 中, 用作离线计算, flume 已经帮我们实现了, 添加配置文件, 直接启动 flume-ng 即可.

    The Kafka channel can be used for multiple scenarios:

    1. With Flume source and sink - it provides a reliable and highly available channel for events
    2. With Flume source and interceptor but no sink - it allows writing Flume events into a Kafka topic, for use by other apps
    3. With Flume sink, but no source - it is a low-latency, fault tolerant way to send events from Kafka to Flume sinks such as HDFS, HBase or Solr
    • $FLUME_HOME/conf/kafka-hdfs.conf
    # kafka Channel + HDFS sink(without sources)
    a1.channels = c1
    a1.sinks = k1
    
    # 定义 KafkaChannel
    a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
    a1.channels.c1.parseAsFlumeEvent = false
    a1.channels.c1.kafka.bootstrap.servers = kafka-1:9092,kafka-2:9092,kafka-3:9092
    a1.channels.c1.kafka.topic = user
    a1.channels.c1.kafka.consumer.group.id = g1
    
    # 定义 HDFS sink
    a1.sinks.k1.channel = c1
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.hdfs.path = hdfs://hadoop-1:9000/flume/%Y%m%d/%H
    a1.sinks.k1.hdfs.useLocalTimeStamp = true
    a1.sinks.k1.hdfs.filePrefix = log
    a1.sinks.k1.hdfs.fileType = DataStream
    # 不按照条数生成文件
    a1.sinks.k1.hdfs.rollCount = 0
    # HDFS 上的文件达到128M 生成一个文件
    a1.sinks.k1.hdfs.rollSize = 134217728
    # HDFS 上的文件达到10分钟生成一个文件
    a1.sinks.k1.hdfs.rollInterval = 600
    

    记得配 hosts

    • <u>添加 HDFS 相关jar包和配置文件</u>
    commons-configuration-1.6.jar
    commons-io-2.4.jar
    hadoop-auth-2.8.3.jar
    hadoop-common-2.8.3.jar
    hadoop-hdfs-2.8.3.jar
    hadoop-hdfs-client-2.8.3.jar
    htrace-core4-4.0.1-incubating.jar
    core-site.xml
    hdfs-site.xml
    
    • flume-1.8 kafka客户端默认版本0.9 但是向上兼容(别用这个 有巨坑 _#)
      kafka-clients-2.0.0.jar kafka_2.11-2.0.0.jar

    • 先启动 zookeeper kafka 和 HDFS(否则会各种报错,)

    • 进入$FLUME_HOME启动 flume
      root@common:/usr/local/flume# ./bin/flume-ng agent -c conf/ -f conf/kafka-hdfs.conf -n a1 -Dflume.root.logger=INFO,console

    相关文章

      网友评论

        本文标题:Flume将 kafka 中的数据转存到 HDFS 中

        本文链接:https://www.haomeiwen.com/subject/hpmdkqtx.html