美文网首页
07-Flume01

07-Flume01

作者: CrUelAnGElPG | 来源:发表于2018-08-01 14:39 被阅读0次

    1)array、map、struct

    2)meta

    3)join

    4)compression

    Flume

    RDBMS ==> Sqoop ==> Hadoop

    日志:分散在各个服务器上  ??? ===> Hadoop

    Flume is a distributed, reliable, and available service

    for efficiently collecting, aggregating, and moving large amounts of log data.

    It has a simple and flexible architecture based on streaming data flows.

    collecting   采集  source

    aggregating    聚合  channel (找个地方把采集过来的数据暂存下)

    moving        移动  sink

    Flume: 编写配置文件,组合source、channel、sink三者之间的关系

    Agent:就是由source、channel、sink组成

    编写flume的配置文件其实就是配置agent的构成

    Flume就是一个框架:针对日志数据进行采集汇总,把日志从A地方搬运到B地方去

    Flume部署

    1) 下载

    2) 解压到~/app

    3) 添加到系统环境变量 ~/.bash_profile

    export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin

    export PATH=$FLUME_HOME/bin:$PATH

    4) $FLUME_HOME/conf/flume-env.sh

    JAVA_HOME

    flume-og

    flume-ng

    ./flume-ng agent \

    --name a1 \

    --conf $FLUME_HOME/conf \

    --conf-file /home/hadoop/script/flume/simple-flume.conf \

    -Dflume.root.logger=INFO,console \

    -Dflume.monitoring.type=http \

    -Dflume.monitoring.port=34343

    agent_name: 配置的agent的名称a1:就是agent的名称a1、r1、k1、c1# Name the components on this agent.sources =.sinks =.channels =.sources..type = xx.sinks..type = yyy.channels..type = zzz.sources..channels =.sinks..channel =从指定的网络端口上采集日志到控制台输出

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = netcat

    a1.sources.r1.bind = 0.0.0.0

    a1.sources.r1.port = 44444

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    Event: 一条数据

    Event: { headers:{} body: 72 75 6F 7A 65 64 61 74 61 0D                  ruozedata. }

    Event:headers + body(字节数组)

    Flume支持的source、channel、sink有哪些呢?

    source

    avro

    exec  : tail -F  xx.log

    Spooling Directory:

    Taildir

    netcat

    sink

    HDFS

    logger

    avro : 配合avro source使用

    kafka

    channel

    memory

    file

    Agent:各种组合source、channel、sink之间的关系

    把一个文件中新增的内容收集到HDFS上去

    exec - memory - hdfs

    一个文件夹

    spooling - memory - hdfs

    文件数据写入kafka

    exec - memory - kafka

    exce - memory - hdfs ==> Spark/Hive/MR ETL ==> HDFS <== 分析

    需求:采集指定文件的内容到HDFS

    技术选型:exec - memory - hdfs

    ./flume-ng agent \

    --name a1 \

    --conf $FLUME_HOME/conf \

    --conf-file /home/hadoop/script/flume/exec-memory-hdfs.conf \

    -Dflume.root.logger=INFO,console \

    -Dflume.monitoring.type=http \

    -Dflume.monitoring.port=34343

    需求:采集指定文件夹的内容到控制台

    选型:spooling - memory - logger

    ./flume-ng agent \

    --name a1 \

    --conf $FLUME_HOME/conf \

    --conf-file /home/hadoop/script/flume/spooling-memory-logger.conf \

    -Dflume.root.logger=INFO,console \

    -Dflume.monitoring.type=http \

    -Dflume.monitoring.port=34343

    taildir

    ./flume-ng agent \

    --name a1 \

    --conf $FLUME_HOME/conf \

    --conf-file /home/hadoop/script/flume/taildir-memory-logger.conf \

    -Dflume.root.logger=INFO,console \

    -Dflume.monitoring.type=http \

    -Dflume.monitoring.port=34343

    相关文章

      网友评论

          本文标题:07-Flume01

          本文链接:https://www.haomeiwen.com/subject/twfrvftx.html