Flink官方翻译-01使用Java API开始

作者: zachary_1db5 | 来源:发表于2018-04-19 20:48 被阅读0次

Flink官方翻译-01使用Java API开始
Elasticsearch使用API心得
Flink1.10—ProcessFunction使用的问题及分
ios官方文档 - 通知中心 NSNotification Ce
Flink DataStream API 介绍与使用
CFNetwork(五)
6.1 初始化ArrayList|图像处理课
Flink官方翻译-03Table API & SQL
flink基础wordcount scala版
Learning Apache Flink(API)

https://ci.apache.org/projects/flink/flink-docs-release-1.3/quickstart/java_api_quickstart.html

创建Project

Use one of the following commands to create a project:

1.使用maven

mvn archetype:generate \

-DarchetypeGroupId=org.apache.flink \

-DarchetypeArtifactId=flink-quickstart-java \

-DarchetypeVersion=1.3.2

2.使用qucikstart 脚本

$ curl https://flink.apache.org/q/quickstart.sh | bash

检查Project

There will be a new directory in your working directory. If you’ve used the curl approach, the directory is called quickstart. Otherwise, it has the name of your artifactId:

$ tree quickstart/ quickstart/ ├── pom.xml └── src └── main ├── java │ └── org │ └── myorg │ └── quickstart │ ├── BatchJob.java │ ├── SocketTextStreamWordCount.java │ ├── StreamingJob.java │ └── WordCount.java └── resources └── log4j.properties

这个sample项目是使用maven project，它包含了4个class。StreamingJob 和 BatchJob是基本的骨架项目，SocketTextStreamWordCount 是一个工作的流式例子，WordCountJob 是一个批量例子。可以直接在在本地环境运行flink的example。

We recommend you import this project into your IDE to develop and test it. If you use Eclipse, the m2e plugin allows to import Maven projects. Some Eclipse bundles include that plugin by default, others require you to install it manually. The IntelliJ IDE supports Maven projects out of the box.

A note to Mac OS X users: The default JVM heapsize for Java is too small for Flink. You have to manually increase it. In Eclipse, chooseRun Configurations -> Arguments and write into the VM Arguments box: -Xmx800m.

编译 Project

可以输入命令 mvn clean install -Pbuild-jar ，就可以编译一个好的jar包在 target/original-your-artifact-id-your-version.jar，这个是没有依赖的thin jar包，如果需要fat jar包arget/your-artifact-id-your-version.jar 。（fat jar包是指所有的依赖也包含在里面）

下一步

编写你的应用

The quickstart project contains a WordCount implementation, the “Hello World” of Big Data processing systems. The goal of WordCount is to determine the frequencies of words in a text, e.g., how often do the terms “the” or “house” occur in all Wikipedia texts.

开始项目包含一个 wordcount的实现，这相当于大数据处理领域的“hello world”。wordcount的目的是计算一个文本中单次的频率。比如计算 “the” 或者 “house” 出现在Wikipedia texts的频率

比如：

Sample Input:

big data is big

Sample Output:

big 2 data 1 is 1

下面的code展示了wordcount的实现处理每行数据包含两个操作（(a FlatMap and a Reduce operation 通过聚合求 sum），然后把结果单词和次数输出

public class WordCount {

public static void main(String[] args) throws Exception {

// set up the execution environment

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

// get input data

DataSet<String> text = env.fromElements(

"To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune",

"Or to take arms against a sea of troubles,"

);

DataSet<Tuple2<String, Integer>> counts =

// split up the lines in pairs (2-tuples) containing: (word,1)

text.flatMap(new LineSplitter())

// group by the tuple field "0" and sum up tuple field "1"

.groupBy(0)

.sum(1);

// execute and print result

counts.print();

}

The operations are defined by specialized classes, here the LineSplitter class.

public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {

@Override

public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {

// normalize and split the line

String[] tokens = value.toLowerCase().split("\W+");

// emit the pairs

for (String token : tokens) {

if (token.length() > 0) {

out.collect(new Tuple2<String, Integer>(token, 1));

}

网友评论

本文标题：Flink官方翻译-01使用Java API开始

本文链接：https://www.haomeiwen.com/subject/ipqgkftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Flink官方翻译-01使用Java API开始

相关文章