Apache Flink源码解析（六）执行计划生成之Strea

作者: 铛铛铛clark | 来源:发表于2018-09-02 17:19 被阅读513次

概述

根据上一篇文章 DataStream API 可以得知每一个方法生成的Transformation和实际运行中对Task的影响。事实上从Transformation到实际运行的Task中间还要经过StreamGraph，JobGraph，ExecutionGraph这三个转换

Prerequisites

DataStream API
- 下图是不同类型的DataStream之间的转换关系。为了清晰的表达这种转换关系，在每两个流的转换中我选取了比较代表性的方法，此外还有红线表示这层转换对API的使用者是屏蔽的。关于API详细信息还是参考上一篇文章
- DataStream

执行计划生成
- 下图就是官方给出的执行计划的生成过程例图
- 在执行计划的生成过程中，会经历四个阶段，StreamGraph，JobGraph，ExecutionGraph，Execution。其中在客户端生成的是StreamGraph和JobGraph，在JobManager中生成的是ExecutionGraph，最后实际运行在各个节点上的这张物理执行图事实上是一个抽象的图。
- flink job graphs.jpg

StreamGraph

如上图所示，StreamGraph是执行计划生成的第一张图，它包含两个重要元素，StreamNode和StreamEdge。
StreamGraph的生成发生在用户调用了env.execute() 方法之后，而在这之前，用户编写的应用程序会转换成一个包含了Transformation的集合，关于具体的Transformation接下来就会介绍。接着根据这个Transformation集合来生成相应的StreamNode并且用StreamEdge连接起来形成一张图。
Transformation
- 每一个StreamTransformation都包含了一些标识信息（id和name等），还有输出的类型信息以及并行度和资源相关的信息。
- 我个人将Transformation分为四大类：包含算子的Transformation，包含Partitioner的Transformation，连接型的Transformation，迭代型的Transformation。
- 包含算子的Transformation
  - OneInputTransformation，TwoInputTransformation
    - 这两者顾名思义，分别包含了OneInputStreamOperator和TwoInputStreamOperator 算子（在StreamOperator 一文当中详细介绍），并且分别对应着一个输入流和两个输入流。
  - SourceTransformation，SinkTransformation
    - 这两者也很容易看出来包含了StreamSource和StreamSink算子。区别在于SourceTransformation没有输入。
- 包含Partitioner的Transformation
  - PartitionTransformation
    - Partitioner决定了数据以什么样的方式发送到下游（例如轮询，hash），如果没有ParitionTransformation，那么就会默认使用ForwardPartitioner（类似于Spark中的窄依赖）。
- 连接型Transformation
  - SplitTransformation，SelectTransformation
    - 这两者总是配对使用，SplitTransformation当中有用户注入的OutputSelector来决定数据会被发送到哪几个流中（命名的逻辑流），SelectTransformation中根据用户注入的selectedNames来连接到对应的上游。
  - SideOutputTransformation
    - 包含了用户指定的OutputTag，根据OutputTag连接到指定的下游。
  - UnionTransformation
    - 包含了一个输入流的集合，把它们一起连接到指定的下游。
- 迭代型的Transformation不展开讨论

Attention：关于DataStream API如何转换成相应的Transformation在上一篇文章中有详细的例子。从这里开始就是如何将Transformation转换成StreamGraph。首先是StreamGraph组成的元素，之后是如何生成。

StreamNode
- 就如StreamNode类的注释所说，它表示了一个算子以及它的属性。
- 下图是StreamNode中的所有属性，最重要的当然是operator，从这里也可以看出最后StreamNode是和包含算子的Transformation一一对应的。
- 除此之外，还有几个重要的属性是包含算子的Transformation所不具备的，那就是statePartitioner，outputSelectors，inEdges，outEdges。这些属性是如何被注入的就是生成过程中所要讲解的重要部分。
- StreamNode fields

StreamEdge
- StreamEdge相对来说要简单的多，它起到了连接两个StreamNode的作用。
- 下图是StreamEdge的所有属性。比较重要的属性有sourceVertex（起点），targetVertex（终点），selectedNames（SelectTransformation当中用户注入的名字集合），outputTag（SideOutputTransformation当中用户指定的OutputTag）， outputPartitioner（默认ForwardPartitioner，可由PartitionTransformation指定）。
- StreamEdge fields

生成过程

在用户调用了env.execute() ，会调用StreamExecutionEnvironment中的getStreamGraph方法。

  @Override
  public JobExecutionResult execute(String jobName) throws ProgramInvocationException {
      StreamGraph streamGraph = getStreamGraph();
      streamGraph.setJobName(jobName);
      transformations.clear();
      return executeRemotely(streamGraph, jarFiles);
  }

在getStreamGraph中，会将用户程序生成的Transformation集合作为生成StreamGraph的参数

  public StreamGraph getStreamGraph() {
      if (transformations.size() <= 0) {
          throw new IllegalStateException("No operators defined in streaming topology. Cannot execute.");
      }
      return StreamGraphGenerator.generate(this, transformations);
  }

在StreamGraphGenerator中，会遍历Transformation集合并调用transform方法来完成Transformation向StreamGraph的转换。

  private StreamGraph generateInternal(List<StreamTransformation<?>> transformations) {
      for (StreamTransformation<?> transformation: transformations) {
          transform(transformation);
      }
      return streamGraph;
  }

在transform方法中，会首先判断是否已经处理过该Transformation来防止重复处理，然后根据Transformation类型去掉用相应的子方法处理，子方法如下图。（迭代在这里不做介绍）
- transform*

就如之前介绍Transformation，先从transform包含算子的Transformation开始。首先递归调用input的transform方法（SourceTransformation除外），之后将算子加入到StreamGraph中，核心方法是addOperator（addCoOperator）, addNode和addEdge。

在addOperator中，根据StreamOperator类型调用addNode方法生成相应的StreamNode，并注入相应的输入和输出序列化器（上文中StreamNode中的属性）和输入输出类型。

public <IN, OUT> void addOperator(
        Integer vertexID,
        String slotSharingGroup,
        @Nullable String coLocationGroup,
        StreamOperator<OUT> operatorObject,
        TypeInformation<IN> inTypeInfo,
        TypeInformation<OUT> outTypeInfo,
        String operatorName) {

    if (operatorObject instanceof StoppableStreamSource) {
        addNode(vertexID, slotSharingGroup, coLocationGroup, StoppableSourceStreamTask.class, operatorObject, operatorName);
    } else if (operatorObject instanceof StreamSource) {
        addNode(vertexID, slotSharingGroup, coLocationGroup, SourceStreamTask.class, operatorObject, operatorName);
    } else {
        addNode(vertexID, slotSharingGroup, coLocationGroup, OneInputStreamTask.class, operatorObject, operatorName);
    }

    TypeSerializer<IN> inSerializer = inTypeInfo != null && !(inTypeInfo instanceof MissingTypeInfo) ? inTypeInfo.createSerializer(executionConfig) : null;

    TypeSerializer<OUT> outSerializer = outTypeInfo != null && !(outTypeInfo instanceof MissingTypeInfo) ? outTypeInfo.createSerializer(executionConfig) : null;

    setSerializers(vertexID, inSerializer, null, outSerializer);

    if (operatorObject instanceof OutputTypeConfigurable && outTypeInfo != null) {
        @SuppressWarnings("unchecked")
        OutputTypeConfigurable<OUT> outputTypeConfigurable = (OutputTypeConfigurable<OUT>) operatorObject;
        // sets the output type which must be know at StreamGraph creation time
        outputTypeConfigurable.setOutputType(outTypeInfo, executionConfig);
    }

    if (operatorObject instanceof InputTypeConfigurable) {
        InputTypeConfigurable inputTypeConfigurable = (InputTypeConfigurable) operatorObject;
        inputTypeConfigurable.setInputType(inTypeInfo, executionConfig);
    }

    if (LOG.isDebugEnabled()) {
        LOG.debug("Vertex: {}", vertexID);
    }
}

addNode方法具体执行了生成StreamNode的任务。

protected StreamNode addNode(Integer vertexID,
    String slotSharingGroup,
    @Nullable String coLocationGroup,
    Class<? extends AbstractInvokable> vertexClass,
    StreamOperator<?> operatorObject,
    String operatorName) {

    if (streamNodes.containsKey(vertexID)) {
        throw new RuntimeException("Duplicate vertexID " + vertexID);
    }

    StreamNode vertex = new StreamNode(environment,
        vertexID,
        slotSharingGroup,
        coLocationGroup,
        operatorObject,
        operatorName,
        new ArrayList<OutputSelector<?>>(),
        vertexClass);

    streamNodes.put(vertexID, vertex);

    return vertex;
}

除此之外就是将Transformation中包含的信息（如并行度，资源）注入到生成好的StreamNode中。并且对每个input通过addEdge生成StreamEdge（在讲完接下来的Transformation之后会详细讲如何生成StreamEdge）。

对于transform包含Partitioner的Transformation，首先获取所有的Input（调用tranform input最后只会返回所有包含StreamOperator的父Transformation Id），再将其遍历生成一个虚拟节点并将这个虚拟节点和（Input，partitioner）的映射加入到一个叫virtualPartitionNodes的Map中。

  private <T> Collection<Integer> transformPartition(PartitionTransformation<T> partition) {
      StreamTransformation<T> input = partition.getInput();
      List<Integer> resultIds = new ArrayList<>();

      Collection<Integer> transformedIds = transform(input);
      for (Integer transformedId: transformedIds) {
          int virtualId = StreamTransformation.getNewNodeId();
          streamGraph.addVirtualPartitionNode(transformedId, virtualId, partition.getPartitioner());
          resultIds.add(virtualId);
      }

      return resultIds;
  }

对于连接型的Transformation
- SplitTransformation的transform过程中会获取所有的Input（所有包含StreamOperator的父Transformation Id），将OutputSelector注入到Input的StreamNode中。
```
    for (int inputId : resultIds) {
        streamGraph.addOutputSelector(inputId, split.getOutputSelector());
    }
```
- SelectTransformation同PartitionTransformation，只是将新建的虚拟节点和（Input, SelectedNames)的映射加入到了叫virtualSelectNodes的Map中
- SideOutputTransformation同PartitionTransformation，只是将新建的虚拟节点和(Input，OutputTag）的映射加入到了叫virtualSideOutputNodes的Map中
- UnionTransformation则简单的将所有的Input的id的集合返回，为下游节点准备好所有的Input

addEdge。在StreamNode生成之前，会调用所有上游的Transformation的transform方法，相应的Partitioner， SelectedNames，OutputTag都已经在上述的三个Map中。

在addEdgeInternal方法中，会递归地处理OutputTag，SelectedNames，Partitioner（如果没有则生成ForwardPartitioner），最后生成StreamEdge，并加入到上游的outEdges和下游的为inEdges集合中。

private void addEdgeInternal(Integer upStreamVertexID,
        Integer downStreamVertexID,
        int typeNumber,
        StreamPartitioner<?> partitioner,
        List<String> outputNames,
        OutputTag outputTag) {

    if (virtualSideOutputNodes.containsKey(upStreamVertexID)) {
        int virtualId = upStreamVertexID;
        upStreamVertexID = virtualSideOutputNodes.get(virtualId).f0;
        if (outputTag == null) {
            outputTag = virtualSideOutputNodes.get(virtualId).f1;
        }
        addEdgeInternal(upStreamVertexID, downStreamVertexID, typeNumber, partitioner, null, outputTag);
    } else if (virtualSelectNodes.containsKey(upStreamVertexID)) {
        int virtualId = upStreamVertexID;
        upStreamVertexID = virtualSelectNodes.get(virtualId).f0;
        if (outputNames.isEmpty()) {
            // selections that happen downstream override earlier selections
            outputNames = virtualSelectNodes.get(virtualId).f1;
        }
        addEdgeInternal(upStreamVertexID, downStreamVertexID, typeNumber, partitioner, outputNames, outputTag);
    } else if (virtualPartitionNodes.containsKey(upStreamVertexID)) {
        int virtualId = upStreamVertexID;
        upStreamVertexID = virtualPartitionNodes.get(virtualId).f0;
        if (partitioner == null) {
            partitioner = virtualPartitionNodes.get(virtualId).f1;
        }
        addEdgeInternal(upStreamVertexID, downStreamVertexID, typeNumber, partitioner, outputNames, outputTag);
    } else {
        StreamNode upstreamNode = getStreamNode(upStreamVertexID);
        StreamNode downstreamNode = getStreamNode(downStreamVertexID);

        // If no partitioner was specified and the parallelism of upstream and downstream
        // operator matches use forward partitioning, use rebalance otherwise.
        if (partitioner == null && upstreamNode.getParallelism() == downstreamNode.getParallelism()) {
            partitioner = new ForwardPartitioner<Object>();
        } else if (partitioner == null) {
            partitioner = new RebalancePartitioner<Object>();
        }

        if (partitioner instanceof ForwardPartitioner) {
            if (upstreamNode.getParallelism() != downstreamNode.getParallelism()) {
                throw new UnsupportedOperationException("Forward partitioning does not allow " +
                        "change of parallelism. Upstream operation: " + upstreamNode + " parallelism: " + upstreamNode.getParallelism() +
                        ", downstream operation: " + downstreamNode + " parallelism: " + downstreamNode.getParallelism() +
                        " You must use another partitioning strategy, such as broadcast, rebalance, shuffle or global.");
            }
        }

        StreamEdge edge = new StreamEdge(upstreamNode, downstreamNode, typeNumber, outputNames, partitioner, outputTag);

        getStreamNode(edge.getSourceId()).addOutEdge(edge);
        getStreamNode(edge.getTargetId()).addInEdge(edge);
    }
}

当所有Transformation被遍历过后，完整的StreamGraph就生成了。

总结

详细讲解了StreamGraph的构建过程，因为有很多的递归调用，所有逻辑相对来说比较复杂。
本文与上两篇文章都有一定的关联性，还详细的讲解了Transformation，概念比较多，不好理解，建议多看两遍。

Apache Flink源码解析（六）执行计划生成之Strea

概述

Prerequisites

DataStream API

执行计划生成

StreamGraph

Transformation

包含算子的Transformation

包含Partitioner的Transformation

连接型Transformation

迭代型的Transformation不展开讨论

Attention：关于DataStream API如何转换成相应的Transformation在上一篇文章中有详细的例子。从这里开始就是如何将Transformation转换成StreamGraph。首先是StreamGraph组成的元素，之后是如何生成。

StreamNode

StreamEdge

生成过程

总结

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Flink源码解析

大数据，机器学习，人工智能

大数据

玩转大数据

大数据

flink

Apache Flink源码解析 （六）执行计划生成之Strea

概述

Prerequisites

DataStream API

执行计划生成

StreamGraph

Transformation

包含算子的Transformation

包含Partitioner的Transformation

连接型Transformation

迭代型的Transformation不展开讨论

Attention：关于DataStream API如何转换成相应的Transformation在上一篇文章中有详细的例子。从这里开始就是如何将Transformation转换成StreamGraph。首先是StreamGraph组成的元素，之后是如何生成。

StreamNode

StreamEdge

生成过程

总结

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Flink源码解析

大数据，机器学习，人工智能

大数据

玩转大数据

大数据

flink

Apache Flink源码解析（六）执行计划生成之Strea