美文网首页
【spark系列2】spark 合并github (pull r

【spark系列2】spark 合并github (pull r

作者: 鸿乃江边鸟 | 来源:发表于2020-10-28 17:39 被阅读0次

    最近在做内部spark版本升级的工作,涉及到github 上合并pr的工作,具体的是spark 2.4.0升级到spark 3.0.1时兼容hdfs cdh-2.6.0-5.13.1,报编译错误

    [INFO] Compiling 25 Scala sources to /Users/libinsong/Documents/codes/tongdun/spark-3.0/resource-managers/yarn/target/scala-2.12/classes ...
    [ERROR] [Error] resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:298: value setRolledLogsIncludePattern is not a member of org.apache.hadoop.yarn.api.records.LogAggregationContext
    [ERROR] [Error] resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:300: value setRolledLogsExcludePattern is not a member of org.apache.hadoop.yarn.api.records.LogAggregationContext
    [ERROR] [Error] resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:551: not found: value isLocalUri
    [ERROR] [Error] resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:1367: not found: value isLocalUri
    [ERROR] four errors found
    

    具体的解决方法github spark pr 已经给出了解决方法,修改相应的代码就行,但是仅仅是修改的话,我们可以选择一种更加优雅的方式(以git cherry-pick方式),
    现在简单的分享一下:

    直接找到setRolledLogsIncludePattern一行,

     sparkConf.get(ROLLED_LOG_INCLUDE_PATTERN).foreach { includePattern =>
          try {
            val logAggregationContext = Records.newRecord(classOf[LogAggregationContext])
            logAggregationContext.setRolledLogsIncludePattern(includePattern)
            sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern =>
              logAggregationContext.setRolledLogsExcludePattern(excludePattern)
            }
            appContext.setLogAggregationContext(logAggregationContext)
          } catch {
            case NonFatal(e) =>
              logWarning(s"Ignoring ${ROLLED_LOG_INCLUDE_PATTERN.key} because the version of YARN " +
                "does not support it", e)
          }
        }
        appContext.setUnmanagedAM(isClientUnmanagedAMEnabled)
    
        sparkConf.get(APPLICATION_PRIORITY).foreach { appPriority =>
          appContext.setPriority(Priority.newInstance(appPriority))
        }
        appContext
      }
    

    发现master上的代码并不是我们想要的,这个时候我们就可以使用git blame,在github上为

    git-blame.png

    这样我们就能发现该代码有多次修改,找到对应[SPARK-19545][YARN] Fix compile issue for Spark on Yarn when building… 点击进去

    git-blame-result.png

    找到对应的commitId

    git-cherry-pick.png

    执行命令 git cherry-pick 8e8afb3a3468aa743d13e23e10e77e94b772b2ed 就能把该commit 追加到自己的工作目录下
    只有既能不需要手动修改代码,也能很好的保存了原始的commit的信息,以便追踪

    相关文章

      网友评论

          本文标题:【spark系列2】spark 合并github (pull r

          本文链接:https://www.haomeiwen.com/subject/gaqcvktx.html