美文网首页
flink整合kafka实现消费和生产

flink整合kafka实现消费和生产

作者: maskwang520 | 来源:发表于2019-11-15 10:30 被阅读0次
1. flink通常整合kafka实现消费和生产。在很大原因上是由于kafka很适合流处理

在我们平常的业务场景中,仅读取,写入和存储数据流是不够的,更多目的是启用流的实时处理。在Kafka中,流处理器是指从输入topic获取连续数据流,对该输入执行一些处理,并生成连续数据流以输出topic的任何内容。例如,零售应用程序可能会接受销售和装运的输入流,并输出一系列重新排序和根据此数据计算出的价格调整。
可以直接使用生产者API和消费者API进行简单的处理。然而,对于更复杂的转换,Kafka提供完全集成的Streams API。这允许构建应用程序进行非平凡的处理,从而计算聚合关闭流或将流连接在一起。在这过程中,fiink整合kafka来实现对流数据的处理是一个非常好的选择。

2. 采用flink的Api,实现消费者。

往kafka的“flinktest"这个topic不断发布消息,然后经过flink的消费之后,输出处理的时间和处理的字符串。采用的是flink1.4.2 和kafka1.0.0
pom文件如下

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.maskwang.flink</groupId>
    <artifactId>flink_kafka</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>flink_kafka</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <flink.version>1.4.2</flink.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka-0.10 -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.10_2.11</artifactId>
            <version>1.4.2</version>
        </dependency>



    </dependencies>

    <build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.0.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <artifactSet>
                            <excludes>
                                <exclude>com.google.code.findbugs:jsr305</exclude>
                                <exclude>org.slf4j:*</exclude>
                                <exclude>log4j:*</exclude>
                            </excludes>
                        </artifactSet>
                        <filters>
                            <filter>
                                <!-- Do not copy the signatures in the META-INF folder.
                                Otherwise, this might cause SecurityExceptions when using the JAR. -->
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.maskwang.flink.ReadFromKafka</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>


</project>

业务逻辑如下

package com.maskwang.flink;

import java.util.Date;
import java.util.Properties;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;

public class ReadFromKafka {

    public static void main(String[] args) throws Exception {
        // 构建环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        //这里是由一个kafka
        properties.setProperty("bootstrap.servers", "localhost:9092");
        properties.setProperty("group.id", "flink_consumer");
        //第一个参数是topic的名称
        DataStream<String> stream=env.addSource(new FlinkKafkaConsumer010("flinktest", new SimpleStringSchema(), properties));
        
        stream.map(new MapFunction<String, String>() {

            @Override
            public String map(String value) throws Exception {
                return new Date().toString()+":  "+value;
            }
            
        }).print();
        env.execute();
        
        
    }

}

  • 没有运用到集群,所以只需要添加一个ip地址就可以。如果是集群,则把其他的地址加入到这里。
  • 我采用的是flink1.4.2。所以这里使用FlinkKafkaConsumer010。如果是其他版本,这里可能不同。详细的可以参考官网。
    注意这里有个坑,之前我不是采用这种打包方式,导致会产生如下异常
java.lang.NoClassDefFoundError: org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer010
    at com.maskwang.flink.ReadFromKafka.main(ReadFromKafka.java:22)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)
    at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:396)
    at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:802)
    at org.apache.flink.client.CliFrontend.run(CliFrontend.java:282)
    at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1054)
    at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1101)
    at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1098)
    at org.apache.flink.runtime.security.HadoopSecurityContext$$Lambda$8/989447607.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1098)
Caused by: java.lang.ClassNotFoundException: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 19 more

运行状态如下:

  • 向kafka的topic推送数据


    image.png
  • flink作为客户对订阅topic实现消费并输出


    image.png
3. flink作为生产者向topic推送数据
package com.maskwang.flink;

import java.util.Properties;

import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer010;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;

public class WriteToKafka {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "localhost:9092");

        DataStream<String> stream = env.addSource(new SimpleStringGenerator());
        stream.addSink(new FlinkKafkaProducer010("flinkwrite", new SimpleStringSchema(), properties));  //配置topic

        env.execute();

    }

    public static class SimpleStringGenerator implements SourceFunction<String> {

        long i = 0;
        boolean swith = true;

        @Override
        public void run(SourceContext<String> ctx) throws Exception {
            for(int k=0;k<5;k++) {
                ctx.collect("flink:" + k++);
                //Thread.sleep(5);  //家里这个后会有问题
            }
        }

        @Override
        public void cancel() {
            swith = false;
        }

    }

}

在kafka上的finkwrite(与消费的队列不同,可以自定义) topic上可以看到,数据确实被生产了
运行结果:


image.png

参考文章
https://www.zhihu.com/question/28925721
https://github.com/tgrall/kafka-flink-101

相关文章

网友评论

      本文标题:flink整合kafka实现消费和生产

      本文链接:https://www.haomeiwen.com/subject/hauucftx.html