美文网首页
Flink入门案例-WordCount批处理

Flink入门案例-WordCount批处理

作者: Anthons | 来源:发表于2021-03-29 11:29 被阅读0次

一、maven项目pom.xml依赖

<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>1.9.1</version>
    </dependency>
    <!--引入scala版本-->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_2.12</artifactId>
        <version>1.9.1</version>
    </dependency>
</dependencies>

二、测试数据

input file path:./hello.txt

hello world
hello flink
hello spark
hello scala
how are you
fine thank you
and you

三、Flink WordCount Java版

package com.cn.wc;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;


// 批处理word count
public class WordCount {
    public static void main(String[] args) throws Exception {
        // 创建执行环境
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        // 设置2个并行度
        env.setParallelism(2);

        // 从文件中读取数据
        String inputPath = "C:\\workstation\\maven_project\\flink_wordcount\\src\\main\\resources\\hello.txt";
        DataSource<String> stringDataSource = env.readTextFile(inputPath);
        // flink collector
        // flink java 提供元组类型, Tuple2 2元组
        DataSet<Tuple2<String, Integer>> result = stringDataSource.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
                // 按空格分词
                String[] words = s.split(" ");
                // 遍历所有word变成(word, 1)
                for (String word : words) {
                    collector.collect(new Tuple2<String, Integer>(word, 1));
                }
            }
        })
        .groupBy(0) // 针对相同word进行合并,按照第一个位置分组。
        .sum(1); // 将第二个位置上的数据求和;
        result.print();
    }
}

四、运行结果

(and,1)
(fine,1)
(flink,1)
(world,1)
(are,1)
(hello,4)
(how,1)
(scala,1)
(spark,1)
(thank,1)
(you,3)

相关文章

网友评论

      本文标题:Flink入门案例-WordCount批处理

      本文链接:https://www.haomeiwen.com/subject/aclyhltx.html