美文网首页
【2019-05-29】Flume

【2019-05-29】Flume

作者: BigBigFlower | 来源:发表于2019-05-29 20:40 被阅读0次

    设计flume的宗旨是向Hadoop批量导入基于事件的流量海报。Flume由一组以分布式拓扑结构相互连接的代理构成。系统边缘的代理负责采集数据,并把数据转发给负责汇总的代理,然后再将这些数据存储到其最终的目的地。代理通过配置来运行一组特定的source(数据来源)和sink(数据目标)。Flume代理是由持续运行的source、sink、channel(用于连接source和sink)构成的Java进程。

    事务和可靠性
    flume使用两个独立的事务分别负责从source到channel和从channel到sink的事件传递。

    //使用spooling directory source和logger sink的Flume配置
    agent1.sources = source1
    agent1.sinks = sink1
    agent1.channels = channel1
    
    agent1.sources.source1.channels = channel1
    agent1.sinks.sink1.channel = channel1
    
    agent1.sources.source1.type = spooldir
    agent1.sources.source1.spoolDir = /tmp/spooldir
    
    agent1.sinks.sink1.type = logger
    
    agent1.channels.channel1.type = file
    
    

    HDFS sink

    //使用spooling directory source和HDFS sink的Flume配置
    agent1.sources = source1
    agent1.sinks = sink1
    agent1.channels = channel1
    
    agent1.sources.source1.channels = channel1
    agent1.sinks.sink1.channel = channel1
    
    agent1.sources.source1.type = spooldir
    agent1.sources.source1.spoolDir = /tmp/spooldir
    
    agent1.sinks.sink1.type = hdfs
    agent1.sinks.sink1.hdfs.path = /tmp/flume
    agent1.sinks.sink1.hdfs.filePrefix = events
    agent1.sinks.sink1.hdfs.fileSuffix = .log
    agent1.sinks.sink1.hdfs.inUsePrefix = _
    agent1.sinks.sink1.hdfs.fileType = DataStream
    
    agent1.channels.channel1.type = file
    
    

    扇出指从一个source向多个channel,亦即向多个sink传递事件。

    使用spooling directory source且扇出到HDFS sink和logger sink的Flume Agent
    //使用spooling directory source且扇出到HDFS sink和logger sink的Flume配置
    agent1.sources = source1
    agent1.sinks = sink1a sink1b
    agent1.channels = channel1a channel1b
    
    agent1.sources.source1.channels = channel1a channel1b
    agent1.sources.source1.selector.type = replicating
    agent1.sources.source1.selector.optional = channel1b
    agent1.sinks.sink1a.channel = channel1a
    agent1.sinks.sink1b.channel = channel1b
    
    agent1.sources.source1.type = spooldir
    agent1.sources.source1.spoolDir = /tmp/spooldir
    
    agent1.sinks.sink1a.type = hdfs
    agent1.sinks.sink1a.hdfs.path = /tmp/flume
    agent1.sinks.sink1a.hdfs.filePrefix = events
    agent1.sinks.sink1a.hdfs.fileSuffix = .log
    agent1.sinks.sink1a.hdfs.fileType = DataStream
    
    agent1.sinks.sink1b.type = logger
    
    agent1.channels.channel1a.type = file
    agent1.channels.channel1b.type = memory
    
    

    通过代理层分发


    通过第二层代理汇聚来自第一层的Flume事件

    第一层代理负责采集来自原始source,并将他们发放到第二层,第二层代理的数量比第一层少,这些代理先汇总来自第一层代理的事件,再把这些事件写入HDFS。

    //使用spooling directory source和HDFS sink 的两层Flume代理的配置
    # First tier agent
    
    agent1.sources = source1
    agent1.sinks = sink1
    agent1.channels = channel1
    
    agent1.sources.source1.channels = channel1
    agent1.sinks.sink1.channel = channel1
    
    agent1.sources.source1.type = spooldir
    agent1.sources.source1.spoolDir = /tmp/spooldir
    
    agent1.sinks.sink1.type = avro
    agent1.sinks.sink1.hostname = localhost
    agent1.sinks.sink1.port = 10000
    
    agent1.channels.channel1.type = file
    agent1.channels.channel1.checkpointDir=/tmp/agent1/file-channel/checkpoint
    agent1.channels.channel1.dataDirs=/tmp/agent1/file-channel/data
    
    # Second tier agent
    
    agent2.sources = source2
    agent2.sinks = sink2
    agent2.channels = channel2
    
    agent2.sources.source2.channels = channel2
    agent2.sinks.sink2.channel = channel2
    
    agent2.sources.source2.type = avro
    agent2.sources.source2.bind = localhost
    agent2.sources.source2.port = 10000
    
    agent2.sinks.sink2.type = hdfs
    agent2.sinks.sink2.hdfs.path = /tmp/flume
    agent2.sinks.sink2.hdfs.filePrefix = events
    agent2.sinks.sink2.hdfs.fileSuffix = .log
    agent2.sinks.sink2.hdfs.fileType = DataStream
    
    agent2.channels.channel2.type = file
    agent2.channels.channel2.checkpointDir=/tmp/agent2/file-channel/checkpoint
    agent2.channels.channel2.dataDirs=/tmp/agent2/file-channel/data
    
    由Avro sink-source对连接的一个两层Flume代理

    sink组
    sink组允许将多个sink当作一个sink来处理,以实现故障转移或负载均衡。若某个第二层代理不能用,事件将传递给另一个第二层代理,从而使这些事件不中断地到达HDFS。


    为了使负载均衡或故障而使用多个sink
    在两个代理之间实现负载均衡

    相关文章

      网友评论

          本文标题:【2019-05-29】Flume

          本文链接:https://www.haomeiwen.com/subject/zmpgoqtx.html