Windows are at the heart of processing infinite streams. Windows split the stream into “buckets” of finite size, over which we can apply computations. This document focuses on how windowing is performed in Flink and how the programmer can benefit to the maximum from its offered functionality.
windows 是计算无限流的核心。windows 分割流到有限的bucket(桶)中,通过它我们可以执行计算。
Flink window 计算的结构如下
Keyed Windows
stream
.keyBy(...) <- keyed versus non-keyed windows
.window(...) <- required: "assigner"
[.trigger(...)] <- optional: "trigger" (else default trigger)
[.evictor(...)] <- optional: "evictor" (else no evictor)
[.allowedLateness(...)] <- optional: "lateness" (else zero)
[.sideOutputLateData(...)] <- optional: "output tag" (else no side output for late data)
.reduce/aggregate/fold/apply() <- required: "function"
[.getSideOutput(...)] <- optional: "output tag"
Non-Keyed Windows
stream
.windowAll(...) <- required: "assigner"
[.trigger(...)] <- optional: "trigger" (else default trigger)
[.evictor(...)] <- optional: "evictor" (else no evictor)
[.allowedLateness(...)] <- optional: "lateness" (else zero)
[.sideOutputLateData(...)] <- optional: "output tag" (else no side output for late data)
.reduce/aggregate/fold/apply() <- required: "function"
[.getSideOutput(...)] <- optional: "output tag"
Window Lifecycle
In a nutshell, a window is created as soon as the first element that should belong to this window arrives, and the window is completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness
(see Allowed Lateness). Flink guarantees removal only for time-based windows and not for other types, e.g. global windows (see Window Assigners). For example, with an event-time-based windowing strategy that creates non-overlapping (or tumbling) windows every 5 minutes and has an allowed lateness of 1 min, Flink will create a new window for the interval between 12:00
and 12:05
when the first element with a timestamp that falls into this interval arrives, and it will remove it when the watermark passes the 12:06
timestamp.
简单的来说,当属于此window的元素来到时候,window就会被创建。当window 的结束时间 或者 运行延时的时间 到来时,就 remove 此window。
Keyed vs Non-Keyed Windows
keyBy 会切分流为逻辑上的 keyed 流。
keyed 流 可以按照key 并行计算。
而non-keyed streams 只能由一个任务来执行。
参考地址
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/windows.html
网友评论