美文网首页
Spark源码:提交Tasks

Spark源码:提交Tasks

作者: Jorvi | 来源:发表于2019-12-17 15:39 被阅读0次

    源码目录


    1 程序入口

        var conf: SparkConf = new SparkConf().setAppName("SparkJob_Demo").setMaster("local[*]")
        val sparkContext: SparkContext = new SparkContext(conf)
    
        sparkContext.parallelize(List("aaa", "bbb", "ccc", "ddd"), 2)
          .repartition(4)
          .collect()
    
    

    2 进入源码

    • 进入org.apache.spark.scheduler.TaskSchedulerImpl.scala
      override def submitTasks(taskSet: TaskSet) {
        val tasks = taskSet.tasks
        logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
        this.synchronized {
          val manager = createTaskSetManager(taskSet, maxTaskFailures)
          val stage = taskSet.stageId
          val stageTaskSets =
            taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])
    
          // Mark all the existing TaskSetManagers of this stage as zombie, as we are adding a new one.
          // This is necessary to handle a corner case. Let's say a stage has 10 partitions and has 2
          // TaskSetManagers: TSM1(zombie) and TSM2(active). TSM1 has a running task for partition 10
          // and it completes. TSM2 finishes tasks for partition 1-9, and thinks he is still active
          // because partition 10 is not completed yet. However, DAGScheduler gets task completion
          // events for all the 10 partitions and thinks the stage is finished. If it's a shuffle stage
          // and somehow it has missing map outputs, then DAGScheduler will resubmit it and create a
          // TSM3 for it. As a stage can't have more than one active task set managers, we must mark
          // TSM2 as zombie (it actually is).
          stageTaskSets.foreach { case (_, ts) =>
            ts.isZombie = true
          }
          stageTaskSets(taskSet.stageAttemptId) = manager
          schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)
    
          if (!isLocal && !hasReceivedTask) {
            starvationTimer.scheduleAtFixedRate(new TimerTask() {
              override def run() {
                if (!hasLaunchedTask) {
                  logWarning("Initial job has not accepted any resources; " +
                    "check your cluster UI to ensure that workers are registered " +
                    "and have sufficient resources")
                } else {
                  this.cancel()
                }
              }
            }, STARVATION_TIMEOUT_MS, STARVATION_TIMEOUT_MS)
          }
          hasReceivedTask = true
        }
        backend.reviveOffers()
      }
    
    1. 为提交的各 TaskSet 分别创建 TaskSetManager;
    2. 将 TaskSetManager 放到 SchedulableBuilder 的队列中;
    3. 在非本地模式下,启一个定时器定时检测提交的任务是否被启动;
    4. 调用 backend.reviveOffers() 恢复 SchedulerBackend 开始运行。

    2.1 TaskSet放入调度器队列

    以默认的 FIFO 调度模式为例。

    • 进入org.apache.spark.scheduler.FIFOSchedulableBuilder.scala
      override def addTaskSetManager(manager: Schedulable, properties: Properties) {
        rootPool.addSchedulable(manager)
      }
    
    • 进入org.apache.spark.scheduler.Pool.scala
      override def addSchedulable(schedulable: Schedulable) {
        require(schedulable != null)
        schedulableQueue.add(schedulable)
        schedulableNameToSchedulable.put(schedulable.name, schedulable)
        schedulable.parent = this
      }
    

    将 TaskSetManager 加入到 Pool.schedulableQueue 队列中。

    2.2 分配资源

    • 进入org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.scala
      override def reviveOffers() {
        driverEndpoint.send(ReviveOffers)
      }
    

    发消息 ReviveOffers 给 Driver。

    • 进入org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.DriverEndpoint.scala
        override def receive: PartialFunction[Any, Unit] = {
          case ReviveOffers =>
            makeOffers()
        }
    
    
        // Make fake resource offers on all executors
        private def makeOffers() {
          // Make sure no executor is killed while some task is launching on it
          val taskDescs = withLock {
            // Filter out executors under killing
            val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
            val workOffers = activeExecutors.map {
              case (id, executorData) =>
                new WorkerOffer(id, executorData.executorHost, executorData.freeCores,
                  Some(executorData.executorAddress.hostPort))
            }.toIndexedSeq
            scheduler.resourceOffers(workOffers)
          }
          if (!taskDescs.isEmpty) {
            launchTasks(taskDescs)
          }
        }
    
    1. 从 CoarseGrainedSchedulerBackend.executorDataMap 中选出状态为 Alive 的 Executors(Spark源码:启动Executors 时,当Executor启动完成注册到Driver时,会将Executor加入到 CoarseGrainedSchedulerBackend.executorDataMap 中);
    2. 将 Executor 分别封装成 WorkerOffer;
    3. 调用 scheduler.resourceOffers(workOffers) 为 Tasks 分配资源;
    4. 启动 Tasks。
    • 进入org.apache.spark.scheduler.TaskSchedulerImpl.scala
      /**
       * Called by cluster manager to offer resources on slaves. We respond by asking our active task
       * sets for tasks in order of priority. We fill each node with tasks in a round-robin manner so
       * that tasks are balanced across the cluster.
       */
      def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {
        // Mark each slave as alive and remember its hostname
        // Also track if new executor is added
        var newExecAvail = false
        for (o <- offers) {
          if (!hostToExecutors.contains(o.host)) {
            hostToExecutors(o.host) = new HashSet[String]()
          }
          if (!executorIdToRunningTaskIds.contains(o.executorId)) {
            hostToExecutors(o.host) += o.executorId
            executorAdded(o.executorId, o.host)
            executorIdToHost(o.executorId) = o.host
            executorIdToRunningTaskIds(o.executorId) = HashSet[Long]()
            newExecAvail = true
          }
          for (rack <- getRackForHost(o.host)) {
            hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += o.host
          }
        }
    
        // Before making any offers, remove any nodes from the blacklist whose blacklist has expired. Do
        // this here to avoid a separate thread and added synchronization overhead, and also because
        // updating the blacklist is only relevant when task offers are being made.
        blacklistTrackerOpt.foreach(_.applyBlacklistTimeout())
    
        val filteredOffers = blacklistTrackerOpt.map { blacklistTracker =>
          offers.filter { offer =>
            !blacklistTracker.isNodeBlacklisted(offer.host) &&
              !blacklistTracker.isExecutorBlacklisted(offer.executorId)
          }
        }.getOrElse(offers)
    
        val shuffledOffers = shuffleOffers(filteredOffers)
        // Build a list of tasks to assign to each worker.
        val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK))
        val availableCpus = shuffledOffers.map(o => o.cores).toArray
        val availableSlots = shuffledOffers.map(o => o.cores / CPUS_PER_TASK).sum
        val sortedTaskSets = rootPool.getSortedTaskSetQueue
        for (taskSet <- sortedTaskSets) {
          logDebug("parentName: %s, name: %s, runningTasks: %s".format(
            taskSet.parent.name, taskSet.name, taskSet.runningTasks))
          if (newExecAvail) {
            taskSet.executorAdded()
          }
        }
    
        // Take each TaskSet in our scheduling order, and then offer it each node in increasing order
        // of locality levels so that it gets a chance to launch local tasks on all of them.
        // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY
        for (taskSet <- sortedTaskSets) {
          // Skip the barrier taskSet if the available slots are less than the number of pending tasks.
          if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
            // Skip the launch process.
            // TODO SPARK-24819 If the job requires more slots than available (both busy and free
            // slots), fail the job on submit.
            logInfo(s"Skip current round of resource offers for barrier stage ${taskSet.stageId} " +
              s"because the barrier taskSet requires ${taskSet.numTasks} slots, while the total " +
              s"number of available slots is $availableSlots.")
          } else {
            var launchedAnyTask = false
            // Record all the executor IDs assigned barrier tasks on.
            val addressesWithDescs = ArrayBuffer[(String, TaskDescription)]()
            for (currentMaxLocality <- taskSet.myLocalityLevels) {
              var launchedTaskAtCurrentMaxLocality = false
              do {
                launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(taskSet,
                  currentMaxLocality, shuffledOffers, availableCpus, tasks, addressesWithDescs)
                launchedAnyTask |= launchedTaskAtCurrentMaxLocality
              } while (launchedTaskAtCurrentMaxLocality)
            }
    
            if (!launchedAnyTask) {
              taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors).foreach { taskIndex =>
                  // If the taskSet is unschedulable we try to find an existing idle blacklisted
                  // executor. If we cannot find one, we abort immediately. Else we kill the idle
                  // executor and kick off an abortTimer which if it doesn't schedule a task within the
                  // the timeout will abort the taskSet if we were unable to schedule any task from the
                  // taskSet.
                  // Note 1: We keep track of schedulability on a per taskSet basis rather than on a per
                  // task basis.
                  // Note 2: The taskSet can still be aborted when there are more than one idle
                  // blacklisted executors and dynamic allocation is on. This can happen when a killed
                  // idle executor isn't replaced in time by ExecutorAllocationManager as it relies on
                  // pending tasks and doesn't kill executors on idle timeouts, resulting in the abort
                  // timer to expire and abort the taskSet.
                  executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) match {
                    case Some ((executorId, _)) =>
                      if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) {
                        blacklistTrackerOpt.foreach(blt => blt.killBlacklistedIdleExecutor(executorId))
    
                        val timeout = conf.get(config.UNSCHEDULABLE_TASKSET_TIMEOUT) * 1000
                        unschedulableTaskSetToExpiryTime(taskSet) = clock.getTimeMillis() + timeout
                        logInfo(s"Waiting for $timeout ms for completely "
                          + s"blacklisted task to be schedulable again before aborting $taskSet.")
                        abortTimer.schedule(
                          createUnschedulableTaskSetAbortTimer(taskSet, taskIndex), timeout)
                      }
                    case None => // Abort Immediately
                      logInfo("Cannot schedule any task because of complete blacklisting. No idle" +
                        s" executors can be found to kill. Aborting $taskSet." )
                      taskSet.abortSinceCompletelyBlacklisted(taskIndex)
                  }
              }
            } else {
              // We want to defer killing any taskSets as long as we have a non blacklisted executor
              // which can be used to schedule a task from any active taskSets. This ensures that the
              // job can make progress.
              // Note: It is theoretically possible that a taskSet never gets scheduled on a
              // non-blacklisted executor and the abort timer doesn't kick in because of a constant
              // submission of new TaskSets. See the PR for more details.
              if (unschedulableTaskSetToExpiryTime.nonEmpty) {
                logInfo("Clearing the expiry times for all unschedulable taskSets as a task was " +
                  "recently scheduled.")
                unschedulableTaskSetToExpiryTime.clear()
              }
            }
    
            if (launchedAnyTask && taskSet.isBarrier) {
              // Check whether the barrier tasks are partially launched.
              // TODO SPARK-24818 handle the assert failure case (that can happen when some locality
              // requirements are not fulfilled, and we should revert the launched tasks).
              require(addressesWithDescs.size == taskSet.numTasks,
                s"Skip current round of resource offers for barrier stage ${taskSet.stageId} " +
                  s"because only ${addressesWithDescs.size} out of a total number of " +
                  s"${taskSet.numTasks} tasks got resource offers. The resource offers may have " +
                  "been blacklisted or cannot fulfill task locality requirements.")
    
              // materialize the barrier coordinator.
              maybeInitBarrierCoordinator()
    
              // Update the taskInfos into all the barrier task properties.
              val addressesStr = addressesWithDescs
                // Addresses ordered by partitionId
                .sortBy(_._2.partitionId)
                .map(_._1)
                .mkString(",")
              addressesWithDescs.foreach(_._2.properties.setProperty("addresses", addressesStr))
    
              logInfo(s"Successfully scheduled all the ${addressesWithDescs.size} tasks for barrier " +
                s"stage ${taskSet.stageId}.")
            }
          }
        }
    
        // TODO SPARK-24823 Cancel a job that contains barrier stage(s) if the barrier tasks don't get
        // launched within a configured time.
        if (tasks.size > 0) {
          hasLaunchedTask = true
        }
        return tasks
      }
    
    1. 将所有可用的 WorkerOffer 随机打散;
    2. 遍历这些 WorkerOffers,每个 WorkerOffer 都利用 WorkerOffer.cores / CPUS_PER_TASK 算出每个 WorkerOffer(Executor) 上可以处理的 Task 数,这样每个 WorkerOffer 都对应一个 Array[TaskDescription];
    3. 调用 rootPool.getSortedTaskSetQueue 获取排完序的 TaskSets;
    4. 遍历这些 TaskSets,每个 TaskSet 都计算出一个对应的本地性级别(PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY),将这些级别按顺序分别传入 resourceOfferSingleTaskSet 方法去为当前 TaskSet 分配资源;
    5. 当遍历完所有 TaskSets,且每个 TaskSet 又利用 resourceOfferSingleTaskSet 方法在每个本地性级别下为 Tasks 分配了资源后,所有获得资源的 Tasks 就计算出来了,返回这些 Tasks 准备提交。

    注:

    • PROCESS_LOCAL:本地进程,Task要计算的数据在同一个Executor中,即同一个JVM中
    • NODE_LOCAL:本地节点,速度比 PROCESS_LOCAL 稍慢,因为数据需要在不同进程之间传递或从文件中读取
    • NO_PREF:没有偏好
    • RACK_LOCAL:本地机架:数据在同一机架的不同节点上。需要通过网络传输数据及文件 IO,比 NODE_LOCAL 慢
    • ANY:跨机架,数据在非同一机架的网络上,速度最慢
    • 进入org.apache.spark.scheduler.Pool.scala
      override def getSortedTaskSetQueue: ArrayBuffer[TaskSetManager] = {
        val sortedTaskSetQueue = new ArrayBuffer[TaskSetManager]
        val sortedSchedulableQueue =
          schedulableQueue.asScala.toSeq.sortWith(taskSetSchedulingAlgorithm.comparator)
        for (schedulable <- sortedSchedulableQueue) {
          sortedTaskSetQueue ++= schedulable.getSortedTaskSetQueue
        }
        sortedTaskSetQueue
      }
    

    在将 TaskSet 放入调度器队列中时,将封装好的 TaskSetManager 放入到了 Pool.schedulableQueue 中。

    这里将 Pool.schedulableQueue 中的 TaskSetManager 按照调度算法排序(本文中假设是默认的FIFO)后返回,就得到了排好序的 TaskSets 了。

    • 进入org.apache.spark.scheduler.TaskSchedulerImpl.scala
      private def resourceOfferSingleTaskSet(
          taskSet: TaskSetManager,
          maxLocality: TaskLocality,
          shuffledOffers: Seq[WorkerOffer],
          availableCpus: Array[Int],
          tasks: IndexedSeq[ArrayBuffer[TaskDescription]],
          addressesWithDescs: ArrayBuffer[(String, TaskDescription)]) : Boolean = {
        var launchedTask = false
        // nodes and executors that are blacklisted for the entire application have already been
        // filtered out by this point
        for (i <- 0 until shuffledOffers.size) {
          val execId = shuffledOffers(i).executorId
          val host = shuffledOffers(i).host
          if (availableCpus(i) >= CPUS_PER_TASK) {
            try {
              for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {
                tasks(i) += task
                val tid = task.taskId
                taskIdToTaskSetManager.put(tid, taskSet)
                taskIdToExecutorId(tid) = execId
                executorIdToRunningTaskIds(execId).add(tid)
                availableCpus(i) -= CPUS_PER_TASK
                assert(availableCpus(i) >= 0)
                // Only update hosts for a barrier task.
                if (taskSet.isBarrier) {
                  // The executor address is expected to be non empty.
                  addressesWithDescs += (shuffledOffers(i).address.get -> task)
                }
                launchedTask = true
              }
            } catch {
              case e: TaskNotSerializableException =>
                logError(s"Resource offer failed, task set ${taskSet.name} was not serializable")
                // Do not offer resources for this task, but don't throw an error to allow other
                // task sets to be submitted.
                return launchedTask
            }
          }
        }
        return launchedTask
      }
    

    此方法用于为单个 TaskSet 分配资源。

    遍历 WorkerOffers,针对每个 WorkerOffer,得到 executorId 和 host,如果该 WorkerOffer 的可用核数大于 CPUS_PER_TASK,则调用 resourceOffer 方法基于当前的本地性,为当前 TaskSet 分配资源。

    • 进入org.apache.spark.scheduler.TaskSetManager.scala
      @throws[TaskNotSerializableException]
      def resourceOffer(
          execId: String,
          host: String,
          maxLocality: TaskLocality.TaskLocality)
        : Option[TaskDescription] =
      {
        val offerBlacklisted = taskSetBlacklistHelperOpt.exists { blacklist =>
          blacklist.isNodeBlacklistedForTaskSet(host) ||
            blacklist.isExecutorBlacklistedForTaskSet(execId)
        }
        if (!isZombie && !offerBlacklisted) {
          val curTime = clock.getTimeMillis()
    
          var allowedLocality = maxLocality
    
          if (maxLocality != TaskLocality.NO_PREF) {
            allowedLocality = getAllowedLocalityLevel(curTime)
            if (allowedLocality > maxLocality) {
              // We're not allowed to search for farther-away tasks
              allowedLocality = maxLocality
            }
          }
    
          dequeueTask(execId, host, allowedLocality).map { case ((index, taskLocality, speculative)) =>
            // Found a task; do some bookkeeping and return a task description
            val task = tasks(index)
            val taskId = sched.newTaskId()
            // Do various bookkeeping
            copiesRunning(index) += 1
            val attemptNum = taskAttempts(index).size
            val info = new TaskInfo(taskId, index, attemptNum, curTime,
              execId, host, taskLocality, speculative)
            taskInfos(taskId) = info
            taskAttempts(index) = info :: taskAttempts(index)
            // Update our locality level for delay scheduling
            // NO_PREF will not affect the variables related to delay scheduling
            if (maxLocality != TaskLocality.NO_PREF) {
              currentLocalityIndex = getLocalityIndex(taskLocality)
              lastLaunchTime = curTime
            }
            // Serialize and return the task
            val serializedTask: ByteBuffer = try {
              ser.serialize(task)
            } catch {
              // If the task cannot be serialized, then there's no point to re-attempt the task,
              // as it will always fail. So just abort the whole task-set.
              case NonFatal(e) =>
                val msg = s"Failed to serialize task $taskId, not attempting to retry it."
                logError(msg, e)
                abort(s"$msg Exception during serialization: $e")
                throw new TaskNotSerializableException(e)
            }
            if (serializedTask.limit() > TaskSetManager.TASK_SIZE_TO_WARN_KB * 1024 &&
              !emittedTaskSizeWarning) {
              emittedTaskSizeWarning = true
              logWarning(s"Stage ${task.stageId} contains a task of very large size " +
                s"(${serializedTask.limit() / 1024} KB). The maximum recommended task size is " +
                s"${TaskSetManager.TASK_SIZE_TO_WARN_KB} KB.")
            }
            addRunningTask(taskId)
    
            // We used to log the time it takes to serialize the task, but task size is already
            // a good proxy to task serialization time.
            // val timeTaken = clock.getTime() - startTime
            val taskName = s"task ${info.id} in stage ${taskSet.id}"
            logInfo(s"Starting $taskName (TID $taskId, $host, executor ${info.executorId}, " +
              s"partition ${task.partitionId}, $taskLocality, ${serializedTask.limit()} bytes)")
    
            sched.dagScheduler.taskStarted(task, info)
            new TaskDescription(
              taskId,
              attemptNum,
              execId,
              taskName,
              index,
              task.partitionId,
              addedFiles,
              addedJars,
              task.localProperties,
              serializedTask)
          }
        } else {
          None
        }
      }
    
    1. 找出 TaskSet 中 Tasks 允许的本地性级别;
    2. 基于上述本地性级别,调用 dequeueTask 方法得到一个 Task 和 TaskLocality 的队列;
    3. 遍历队列,得到 TaskSet 中的各 Task,针对每个 Task,生成一个新的 TaskId;
    4. 调用 ser.serialize(task) 序列化 Task;
    5. 封装成 TaskDescription(taskId, attemptNumber, executorId, taskName, index, partitionId, addedFiles, addedJars, properties,serializedTask) 并返回。

    2.3 启动Tasks

    • 进入org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.DriverEndpoint.scala
        // Launch tasks returned by a set of resource offers
        private def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
          for (task <- tasks.flatten) {
            val serializedTask = TaskDescription.encode(task)
            if (serializedTask.limit() >= maxRpcMessageSize) {
              Option(scheduler.taskIdToTaskSetManager.get(task.taskId)).foreach { taskSetMgr =>
                try {
                  var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
                    "spark.rpc.message.maxSize (%d bytes). Consider increasing " +
                    "spark.rpc.message.maxSize or using broadcast variables for large values."
                  msg = msg.format(task.taskId, task.index, serializedTask.limit(), maxRpcMessageSize)
                  taskSetMgr.abort(msg)
                } catch {
                  case e: Exception => logError("Exception in error callback", e)
                }
              }
            }
            else {
              val executorData = executorDataMap(task.executorId)
              executorData.freeCores -= scheduler.CPUS_PER_TASK
    
              logDebug(s"Launching task ${task.taskId} on executor id: ${task.executorId} hostname: " +
                s"${executorData.executorHost}.")
    
              executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
            }
          }
        }
    
    1. 遍历所有 TaskDescription,针对每个 TaskDescription,调用 encode 方法将其序列化为 ByteBuffer;
    2. 如果序列化后的 ByteBuffer 小于spark.rpc.message.maxSize配置的大小,则从 CoarseGrainedSchedulerBackend.executorDataMap 中取出与此 TaskDescription.executorId 对应的 ExecutorData(在Spark源码:启动Executors 中,注册Executor到Driver时将ExecutorData 放到了CoarseGrainedSchedulerBackend.executorDataMap中);
    3. 调用 ExecutorEndpointRef 发送 LaunchTask 消息,即将 LaunchTask 消息发送到 Executor,用于启动 Task;
    4. 遍历完所有 TaskDescription,则 TaskSet 提交完毕。
    • 进入org.apache.spark.executor.CoarseGrainedExecutorBackend.scala
      override def receive: PartialFunction[Any, Unit] = {
        case LaunchTask(data) =>
          if (executor == null) {
            exitExecutor(1, "Received LaunchTask command but executor was null")
          } else {
            val taskDesc = TaskDescription.decode(data.value)
            logInfo("Got assigned task " + taskDesc.taskId)
            executor.launchTask(this, taskDesc)
          }
      }
    

    调用 Executor.launchTask 方法运行任务(后面文章分析)。

    3. 总结

    1. 将提交上来的 TaskSet 放入到调度器的队列中(默认是 FIFOSchedulableBuilder.schedulableQueue);
    2. 从 CoarseGrainedSchedulerBackend.executorDataMap 中选出状态为 Alive 的 Executors(Spark源码:启动Executors 时,当Executor启动完成注册到Driver时,会将Executor加入到 CoarseGrainedSchedulerBackend.executorDataMap 中);
    3. 将所有 Executor 分别封装成 WorkerOffer;
    4. 随机打散所有的 WorkerOffers,计算出每个 WorkerOffer 上可运行的 Task 数;
    5. 从调度器的队列中取出 TaskSet,利用 resourceOfferSingleTaskSet 方法依次在各本地性级别下(PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY)为 TaskSet 中的 Tasks 分配 WorkerOffer 资源,最终返回获得资源的 Tasks;
    6. 调用 launchTasks 方法分别发送 LaunchTask 消息给 Executors,用于启动 Tasks。

    相关文章

      网友评论

          本文标题:Spark源码:提交Tasks

          本文链接:https://www.haomeiwen.com/subject/diawgctx.html