a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
描述和配置source
第1步:配置数据源
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
配置需要监控的日志输出目录
a1.sources.r1.command = tail -F /var/log/data
Describe the sink
第2步:配置数据输出
a1.sinks.k1.type=hdfs
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.path=hdfk://hadoop01:9000/flume/exents/%Y/%m/%d/%H/%M
a1.sinks.k1.hdfs.filePrefix=cmcc-%H
a1.sinks.k1.hdfs.minBlockReplicas=1
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.rollInterval=3600
a1.sinks.k1.hdfs.rollSize=0
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.idleTimeout=0
definel the sink k2, 定义Kafka输出端
a1.sinks.k2.channel=c2
a1.sinks.k2.type=com.cmcc.chiwei.Kafka.CmccKafkaSink
a1.sinks.k2.metadata.broker.list=hadoop01:9092,hadoop02:9092,hadoop03:9092
a1.sinks.k2.partition.key=0
a1.sinks.k2.partitioner.class=com.cmcc.chiwei.Kafka.CmccPartition
a1.sinks.k2.serializer.class=Kafka.serializer.StringEncoder
a1.sinks.k2.request.required.acks=0
a1.sinks.k2.cmcc.encoding=UTF-8
a1.sinks.k2.cmcc.topic.name=cmcc
a1.sinks.k2.producer.type=async
a1.sinks.k2.batchSize=100
Use a channel which buffers events in memeory
第3步:配置数据通道
define the channel c1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir=~/flume/flumeCheckpoint
a1.channels.c1.dataDirs=~/flume/flumeData , ~/flume/flumeDataExt
a1.channels.c1.capacity = 2000000
a1.channels.c1.transactionCapacity = 100
define the channel c2
a1.channels.c2.type=memeory
a1.channels.c2.capacity=2000000
a1.channels.c2.transactionCapacity=100
Bind the source and sink to channel
第4步:将三者联级
a1.sources.r1.channels = c1
a1.sinks.k1.channels = c1
a1.sources.r1.selector.type=replicating
网友评论