flink基础——安装和demo

作者: lifesmily | 来源:发表于2018-12-05 15:31 被阅读177次

flink基础——安装和demo
Flink1.7从安装到体验
flink demo 可以通过IDE工具直接运行
Flink集群部署
flink学习之二-入门版概念
flink demo坏境搭建
[flink]flink入门：安装和使用样例demo
04 kafka作为Flink的数据源完成词频统计
2021-01-12
flink join基础demo代码

清华开源软件镜像
https://mirrors.tuna.tsinghua.edu.cn/apache/

1. 安装

1） Mac上安装

参考：
https://segmentfault.com/a/1190000016901469
https://www.jianshu.com/p/17676d34dd35
启动位置：
/usr/local/Cellar/apache-flink/1.6.2/libexec/bin
启动与停止：

$ ./start-cluster.sh
$ ./stop-cluster.sh

2）Windows上安装

参考
https://ci.apache.org/projects/flink/flink-docs-stable/start/flink_on_windows.html
启动位置
D:\flink-1.6.2-bin-scala_2\flink-1.6.2\bin
启动

$ start-cluster.bat 
# Starting a local cluster with one JobManager process and one TaskManager process. 
# You can terminate the processes via CTRL-C in the spawned shell windows.
# Web interface by default on [http://localhost:8081/.]

启动一个job

进入目录，运行

flink.bat run -c wikiedits.WikipediaAnalysis D:\Flink_Project\target\original-wiki-edits-1.0-SNAPSHOT.jar 127.0.0.1 9000

2. 运行第一个job

1） Java程序

maven 模版

$ mvn archetype:generate \  
-DarchetypeGroupId=org.apache.flink \  
-DarchetypeArtifactId=flink-quickstart-java \  
-DarchetypeVersion=1.6.2

mvn archetype:generate \
    -DarchetypeGroupId=org.apache.flink \
    -DarchetypeArtifactId=flink-quickstart-java \
    -DarchetypeVersion=1.6.1 \
    -DgroupId=wiki-edits \
    -DartifactId=wiki-edits \
    -Dversion=0.1 \
    -Dpackage=wikiedits \
    -DinteractiveMode=false

自定义任务

package FlinkTest;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class SocketTextStreamWordCount {
    public static void main(String[] args) throws Exception {
        //参数检查
        if (args.length != 2) {
            System.err.println("USAGE:\nSocketTextStreamWordCount <hostname> <port>");
            return;
        }

        String hostname = args[0];
        Integer port = Integer.parseInt(args[1]);


        // set up the streaming execution environment
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //获取数据
        DataStreamSource<String> stream = env.socketTextStream(hostname, port);

        //计数
        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = stream.flatMap(new LineSplitter())
                .keyBy(0)
                .sum(1);

        sum.print();

        env.execute("Java WordCount from SocketTextStream Example");
    }

    public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) {
            String[] tokens = s.toLowerCase().split("\\W+");

            for (String token: tokens) {
                if (token.length() > 0) {
                    collector.collect(new Tuple2<String, Integer>(token, 1));
                }
            }
        }
    }
}

打包
进入工程目录（pom.xml所在目录），使用以下命令打包。
$ maven clean package -Dmaven.test.skip=true
执行

flink run -c FlinkTest.SocketTextStreamWordCount \
/Users/dodoyuan/IdeaProjects/flink-quickstart/target/\
original-flink-quickstart-1.8-SNAPSHOT.jar 127.0.0.1 9000

完整参考来源：
https://www.jianshu.com/p/17676d34dd35

2） python程序

自定义任务

from flink.plan.Environment import get_environment
from flink.functions.GroupReduceFunction import GroupReduceFunction

class Adder(GroupReduceFunction):
  def reduce(self, iterator, collector):
    count, word = iterator.next()
    count += sum([x[0] for x in iterator])
    collector.collect((count, word))

env = get_environment()
data = env.from_elements("Who's there?",
 "I think I hear them. Stand, ho! Who's there?")

data \
  .flat_map(lambda x, c: [(1, word) for word in x.lower().split()]) \
  .group_by(1) \
  .reduce_group(Adder(), combinable=True) \
  .output()

env.execute(local=True)