美文网首页
[第八章]Worker原理深入剖析

[第八章]Worker原理深入剖析

作者: hoose | 来源:发表于2016-02-17 17:29 被阅读164次

    上一节我们通过源码详细剖析了spark资源调度的算法,其中涉到Master分别过方法LaunchDriver,LaunchExecutor发送Driver,Eecutor到Worker上启动。本节就以这两方面进行原理深入剖析

    1:Master要求Worker启动Driver与Executor.调用方法分别是LaunchDriver,LaunchExecutor

     case LaunchDriver(driverId, driverDesc) => {
          logInfo(s"Asked to launch driver $driverId")
          val driver = new DriverRunner(
            conf,
            driverId,
            workDir,
            sparkHome,
            driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
            self,
            akkaUrl)
          drivers(driverId) = driver
          driver.start()
    
          coresUsed += driverDesc.cores
          memoryUsed += driverDesc.mem
        }
    

    通过上面的代码,我们可以看到创建了一个DriverRunner对象,并且driver.start().不难看出,这个方法本身就是一个线程,接着看下面的代码

      /** Starts a thread to run and manage the driver. */
      def start() = {
    //启动一个线程,调用start
        new Thread("DriverRunner for " + driverId) {
          override def run() {
            try {
             //创建driver的工作目录
              val driverDir = createWorkingDirectory()
              //下载用户上传的jar(我们编写的application程序)
              val localJarFilename = downloadUserJar(driverDi
    
              def substituteVariables(argument: String): String = argument match {
                case "{{WORKER_URL}}" => workerUrl
                case "{{USER_JAR}}" => localJarFilename
                case other => other
    

    不难看出,这还是一个java线程,所以spark源码中,其实大量用了java的代码,这个后面我们都会提到的。所以我们在开发中,不一定学了scala就一定全是用scala开发Applicaiton。
    在上面的代码中,首先通过createWorkingDirectory()创建了工作目录,其中driverDir=new File(...)这也是JAVA中的FILE

    private def createWorkingDirectory(): File = {
       val driverDir = new File(workDir, driverId)
       if (!driverDir.exists() && !driverDir.mkdirs()) {
         throw new IOException("Failed to create directory " + driverDir)
       }
       driverDir
     }
    

    接下来看代码:这就是创建一个ProcessBuilder,用这个对象启动driver进程

    val builder = CommandUtils.buildProcessBuilder(driverDesc.command, driverDesc.mem,
                sparkHome.getAbsolutePath, substituteVariables)
              launchDriver(builder, driverDir, driverDesc.supervise)
            }
    
    。。。
     val processStart = clock.getTimeMillis()
          val exitCode = process.get.waitFor()
    

    接下来看代码,当driver启动,或者被kill,会调用worker中的DriverStateChanged(),来通知Master改变driver的状态

      finalState = Some(state)
      worker ! DriverStateChanged(driverId, state, finalException)
    

    下面是worker中的DriverStateChanged()源码:

    case DriverStateChanged(driverId, state, exception) => {
       state match {
         case DriverState.ERROR =>
           logWarning(s"Driver $driverId failed with unrecoverable exception: ${exception.get}")
         case DriverState.FAILED =>
           logWarning(s"Driver $driverId exited with failure")
         case DriverState.FINISHED =>
           logInfo(s"Driver $driverId exited successfully")
         case DriverState.KILLED =>
           logInfo(s"Driver $driverId was killed by user")
         case _ =>
           logDebug(s"Driver $driverId changed state to $state")
       }
       //向Master通知,修改driver的状态信息
       master ! DriverStateChanged(driverId, state, exception)
       val driver = drivers.remove(driverId).get
       finishedDrivers(driverId) = driver
       memoryUsed -= driver.driverDesc.mem
    

    不难看出,我们现在分析到这里,是不是与前面几节我们分析的都已经连起来了。当Master收到Worker的状态改变时,更新在自己的内存区的Driver信息.。以上就是Driver在Worker的运行原理.

    二:Executor在Worker的启动过程:

    相关文章

      网友评论

          本文标题:[第八章]Worker原理深入剖析

          本文链接:https://www.haomeiwen.com/subject/crrmkttx.html