美文网首页
flink状态容错

flink状态容错

作者: 泽林呗 | 来源:发表于2018-03-14 21:57 被阅读0次

什么是State(状态)?

  • 某task/operator在某时刻的一个中间结果
  • 快照(shapshot)
  • 在flink中状态可以理解为一种数据结构
  • 举例
    对输入源为<key,value>的数据,计算其中某key的最大值,如果使用HashMap,也可以进行计算,但是每次都需要重新遍历,使用状态的话,可以获取最近的一次计算结果,减少了系统的计算次数
  • 程序一旦crash,恢复
  • 程序扩容

State类型

Operator State(算子状态)


With Operator State (or non-keyed state), each operator state is bound to one parallel operator instance. The Kafka Connector is a good motivating example for the use of Operator State in Flink. Each parallel instance of the Kafka consumer maintains a map of topic partitions and offsets as its Operator State.
The Operator State interfaces support redistributing state among parallel operator instances when the parallelism is changed. There can be different schemes for doing this redistribution.

kafka示例
flink官方文档用kafka的消费者举例,认为kafka消费者的partitionId和offset类似flink的operator state

提供的数据结构:ListState<T>
每一个Operator都存在自己的状态
key State


Keyed State is always relative to keys and can only be used in functions and operators on a KeyedStream.
You can think of Keyed State as Operator State that has been partitioned, or sharded, with exactly one state-partition per key. Each keyed-state is logically bound to a unique composite of <parallel-operator-instance, key>, and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as <operator, key>.
Keyed State is further organized into so-called Key Groups. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.

基于KeyStream之上的状态
可理解为dataStream.keyBy()之后的Operator State,Operator State是对每一个Operator的状态进行记录,而key State则是在dataSteam进行keyBy()后,记录相同keyId的keyStream上的状态
key State提供的数据类型:ValueState<T>、ListState<T>、ReducingState<T>、MapState<T>

状态容错

  • Introduction
    Apache Flink offers a fault tolerance mechanism to consistently recover the state of data streaming applications. The mechanism ensures that even in the presence of failures, the program’s state will eventually reflect every record from the data stream exactly once. Note that there is a switch to downgrade the guarantees to at least once (described below).
    The fault tolerance mechanism continuously draws snapshots of the distributed streaming data flow. For streaming applications with small state, these snapshots are very light-weight and can be drawn frequently without impacting the performance much. The state of the streaming applications is stored at a configurable place (such as the master node, or HDFS).
    In case of a program failure (due to machine-, network-, or software failure), Flink stops the distributed streaming dataflow. The system then restarts the operators and resets them to the latest successful checkpoint. The input streams are reset to the point of the state snapshot. Any records that are processed as part of the restarted parallel dataflow are guaranteed to not have been part of the checkpointed state before.
    Note: For this mechanism to realize its full guarantees, the data stream source (such as message queue or broker) needs to be able to rewind the stream to a defined recent point. Apache Kafka has this ability and Flink’s connector to Kafka exploits this ability.
    Note: Because Flink’s checkpoints are realized through distributed snapshots, we use the words snapshot and checkpointinterchangeably.
依靠checkPoint

checkPoint概念:进行全局快照,持久化保存所有的task/operator的State

  • 特点:
    异步:轻量级,不影响系统处理数据
    Barrier机制
    失败情况下可回滚致最近一次成功的checkpoint
    周期性
  • 保证exactly-once


    chcekPoint
    Restore
shapshot(快照)
  • Barriers(屏障)
    Barriers是flink分布式快照中的重要元素
    单并行度Barriers
    多并行度Barriers
    Barrier被注入数据流中,并随着数据流和记录一起流动,每一个Barrier携带者快照ID,并且十分轻量级,不会打断数据的流动,不同时期的快照的barrier可以同时存在数据流中,所以各种快照可以同时发生。
    相对于单并行度,多并行度的快照需要不同数据流中携带相同快照ID的Barrier经过operator之后,才能进行checkpoint
image.png

个人理解:感觉对于Flink的状态迁移和容错来说,主要依赖checkpoint机制,而其中最重要的元素就是Barrier,通过Barrier保证流入Operator的数据都进行了checkpoint

相关文章

  • flink状态容错

    什么是State(状态)? 某task/operator在某时刻的一个中间结果 快照(shapshot) 在fli...

  • Flink检查点机制与状态管理

    1 检查点机制 1.1 CheckPoints 为了使 Flink 的状态具有良好的容错性,Flink 提供了检查...

  • Flink状态与容错

    一致性检查点 1.什么是一致性检查点 Flink故障恢复机制的核心,就是应用的一致性检查点。有状态应用的一致性检查...

  • flink状态和容错

    flink是有状态的计算,可以存储一些中间过程和结果在内部存储里。 状态有三种存储方案MemoryStateBac...

  • Flink 2.2CheckPoint

    CheckPointCheckPoint用于flink的故障恢复/容错机制,保存任务的状态(所有任务(source...

  • Flink Checkpoint机制解析-代码走读

    Flink的Checkpoint机制是Flink容错能力的基本保证,能够对流处理运行时的状态进行保存,当故障发生时...

  • flink的状态与容错

    状态 状态性的函数和操作通过处理单个(元素/事件)存储数据,使任何类型的state可以构建更复杂的操作。flink...

  • flink状态管理和容错

    一、有状态计算 在flink的结构体系当中,有状态的计算可以说是flink非常重要的特性之一了。有状态的计算是指在...

  • flink状态管理和容错

    有状态计算 无状态计算实现的复杂度相对较低,实现起来较容易,但是无法完成提交的比较复杂的业务场景。 用户想要实现c...

  • 译:Flink ---状态和容错

    Flink 1.7 Google翻译 有状态函数和运算符在各个元素/事件的处理中存储数据,使状态成为任何类型的更精...

网友评论

      本文标题:flink状态容错

      本文链接:https://www.haomeiwen.com/subject/wieqqftx.html