美文网首页flink入门
flink学习之四-使用kafka作为数据源

flink学习之四-使用kafka作为数据源

作者: AlanKim | 来源:发表于2019-03-13 20:01 被阅读178次

    上文中基于spring、druid及mysql实现了基于db的数据源,本文使用kafka作为数据源。

    FlinkKafkaConsumer010

    flink中已经预置了kafka相关的数据源实现FlinkKafkaConsumer010,先看下具体的实现:

    @PublicEvolving
    public class FlinkKafkaConsumer010<T> extends FlinkKafkaConsumer09<T> {
        private static final long serialVersionUID = 2324564345203409112L;
    
        public FlinkKafkaConsumer010(String topic, DeserializationSchema<T> valueDeserializer, Properties props) {
            this(Collections.singletonList(topic), valueDeserializer, props);
        }
    
        public FlinkKafkaConsumer010(String topic, KeyedDeserializationSchema<T> deserializer, Properties props) {
            this(Collections.singletonList(topic), deserializer, props);
        }
    
        public FlinkKafkaConsumer010(List<String> topics, DeserializationSchema<T> deserializer, Properties props) {
            this((List)topics, (KeyedDeserializationSchema)(new KeyedDeserializationSchemaWrapper(deserializer)), props);
        }
    
        public FlinkKafkaConsumer010(List<String> topics, KeyedDeserializationSchema<T> deserializer, Properties props) {
            super(topics, deserializer, props);
        }
    
        @PublicEvolving
        public FlinkKafkaConsumer010(Pattern subscriptionPattern, DeserializationSchema<T> valueDeserializer, Properties props) {
            this((Pattern)subscriptionPattern, (KeyedDeserializationSchema)(new KeyedDeserializationSchemaWrapper(valueDeserializer)), props);
        }
    
        @PublicEvolving
        public FlinkKafkaConsumer010(Pattern subscriptionPattern, KeyedDeserializationSchema<T> deserializer, Properties props) {
            super(subscriptionPattern, deserializer, props);
        }
        ......
           
    }
    
    

    kafka的Consumer有一堆实现,不过最终都是继承自FlinkKafkaConsumerBase,而这个抽象类则是继承RichParallelSourceFunction,是不是很眼熟,跟上面自定义mysql数据源继承的抽象类RichSourceFunction很类似。

    public abstract class FlinkKafkaConsumerBase<T> extends RichParallelSourceFunction<T> implements CheckpointListener, ResultTypeQueryable<T>, CheckpointedFunction 
    

    可以看到,这里有很多构造函数,我们直接使用即可。

    代码使用

    package myflink.job;
    
    import org.apache.flink.api.common.serialization.SimpleStringSchema;
    import org.apache.flink.streaming.api.datastream.DataStreamSource;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction;
    import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
    
    import java.util.Properties;
    
    /**
     * kafka作为数据源,消费kafka中的消息
     * 教程详见
     * @See http://www.54tianzhisheng.cn/tags/Flink/
     */
    public class KafkaDatasouceForFlinkJob {
    
        public static void main(String[] args) throws Exception {
            final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
            Properties properties = new Properties();
            properties.put("bootstrap.servers","localhost:9092");
            properties.put("zookeeper.connect","localhost:2181");
            properties.put("group.id","metric-group");
            properties.put("auto.offset.reset","latest");
            properties.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
            properties.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
    
            DataStreamSource<String> dataStreamSource = env.addSource(
                    new FlinkKafkaConsumer010<String>(
                            "testjin" ,// topic
                            new SimpleStringSchema(),
                            properties
                    )
            ).setParallelism(1);
    
    //        dataStreamSource.print();
            // 同样效果
            dataStreamSource.addSink(new PrintSinkFunction<>());
    
            env.execute("Flink add kafka data source");
        }
    }
    
    

    说明:

    a、这里直接使用properties对象来设置kafka相关配置,比如brokers、zk、groupId、序列化、反序列化等。

    b、使用FlinkKafkaConsumer010构造函数,指定topic、properties配置

    c、SimpleStringSchema仅针对String类型数据的序列化及反序列化,如果kafka中消息的内容不是String,则会报错;看下SimpleStringSchema的定义:

    public class SimpleStringSchema implements DeserializationSchema<String>, SerializationSchema<String>
    

    d、这里直接把获取到的消息打印出来。

    至于kafka的安装、配置等,参见上文

    kafka send消息:

    package myflink;
    
    import com.alibaba.fastjson.JSON;
    import lombok.extern.slf4j.Slf4j;
    import myflink.model.Metric;
    import myflink.model.UrlInfo;
    import org.apache.flink.shaded.guava18.com.google.common.collect.ImmutableMap;
    import org.apache.kafka.clients.producer.KafkaProducer;
    import org.apache.kafka.clients.producer.ProducerRecord;
    
    import java.util.Map;
    import java.util.Properties;
    
    @Slf4j
    public class KafkaSender {
    
        private static final String kafkaTopic = "testjin";
    
        private static final String brokerAddress = "localhost:9092";
    
        private static Properties properties;
    
        private static void init() {
            properties = new Properties();
            properties.put("bootstrap.servers", brokerAddress);
            properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
            properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    
        }
    
        public static void main(String[] args) throws InterruptedException {
            init();
            while (true) {
                Thread.sleep(3000); // 每三秒发送过一次
                sendUrlToKafka(); // 发送kafka消息
            }
        }
    
        private static void sendUrlToKafka() {
            KafkaProducer producer = new KafkaProducer<String, String>(properties);
    
            UrlInfo urlInfo = new UrlInfo();
            long currentMills = System.currentTimeMillis();
            if (currentMills % 100 > 30) {
                urlInfo.setUrl("http://so.com/" + currentMills);
            } else {
                urlInfo.setUrl("http://baidu.com/" + currentMills);
            }
    
            String msgContent = JSON.toJSONString(urlInfo); // 确保发送的消息都是string类型
            ProducerRecord record = new ProducerRecord<String, String>(kafkaTopic, null, null, msgContent);
            producer.send(record);
    
            log.info("send msg:" + msgContent);
    
            producer.flush();
        }
    }
    

    相关文章

      网友评论

        本文标题:flink学习之四-使用kafka作为数据源

        本文链接:https://www.haomeiwen.com/subject/vponuqtx.html