将数据处理的概念拆分成dataflow model中的四个问题:what, where,when,how
What results are calculated? - transformations
Where in event-time are results calculated? - windowing
When in processing time are results materialized? - watermarks and triggers(allowed lateness)
- 介绍了Perfect watermarks和Heuristic watermarks
- 由于两种watermarks带来了too slow和too fast的问题,所以使用triggers来解决太慢和太快的问题。perfect watermark的太慢问题,使用trigger来每隔一段小时间,如两分钟就materialized一次。heuristic watermark太快的问题,watermark过来之后还继续接收事件,并修正结果,允许继续接收修正一段时间,时间过后就不再接收事件了
How do refinements of results relate? - accumulation
When triggers are used to produce multiple panes for a single window over time, we find ourselves confronted with the last question: “How do refinements of results relate?” In the examples we’ve seen so far, each successive pane built upon the one immediately preceding it. However, there are actually three different modes of accumulation[6]:
有三种类型的累计方法:
- Discarding,不管之前的,只要现在的
- Accumulating,累计之前的和现在的
- Accumulating & retracting,最新的结果需要累计之前的结果,但是之前的结果已经发射出去,所以就需要回撤之前的结果了,可以用在两个方面:regrouping和sessions
When consumers downstream are re-grouping data by a different dimension, it’s entirely possible the new value may end up keyed differently from the previous value, and thus end up in a different group. In that case, the new value can’t just overwrite the old value; you instead need the retraction to remove the old value from the old group, while the new value is incorporated the newgroup.
When dynamic windows (e.g., sessions, which we’ll look at more closely below) are in use, the new value may be replacing more than one previous window, due to window merging. In this case, it can be difficult to determine from the new window alone which old windows are being replaced. Having explicit retractions for the old windows makes the task straightforward.
模拟processing window
dataflow原生只支持event window,processing window需要使用现有的方法去模拟出来,有两种方法:使用triggers来按照processing time来做snapshot或者修改event time为ingress time
介绍session window
随着processing time的推移,不断得合并之前的window来创建新的session window,同时回撤之前已经发射过的session window所materialized的值
网友评论