美文网首页
Apache官方Spark整合HDP的Hadoop版本

Apache官方Spark整合HDP的Hadoop版本

作者: WFitz | 来源:发表于2019-04-01 16:58 被阅读0次

    环境描述

    • spark 版本: spark-2.4.0-bin-hadoop2.7
    • hdp 版本: 2.6.3.0-235

    Spark整合Hadoop

    1. cp spark-env.sh.template spark-env.sh
    2. vim spark-env.sh
    HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/etc/hadoop
    YARN_CONF_DIR=/usr/hdp/current/hadoop-client/etc/hadoop
    

    提交任务时的异常解决

    异常

    1. Caused by: java.lang.ClassNotFoundException: org.glassfish.jersey.server.spi.Container
    2. Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
    3. YARN LOG: java.lang.NoSuchMethodError: javax.ws.rs.core.Application.getProperties()Ljava/util/Map;
    4. 命令行错误: Container exited with a non-zero exit code 1 && YARN LOG: ... bad substitution

    解决问题1

    • 原因: 复制jersey依赖,统一spark和hadoop的jersey-client和jersey-core版本
    cd /home/wcm/apps/spark-2.4.0-bin-hadoop2.7/jars
    cp /usr/hdp/current/hadoop-yarn-client/lib/jersey-client-1.9.jar .
    cp /usr/hdp/current/hadoop-yarn-client/lib/jersey-core-1.9.jar .
    

    解决问题2

    • 原因: hadoop依赖不统一, 替换hadoop-yarn相关依赖
    cd /home/wcm/apps/spark-2.4.0-bin-hadoop2.7/jars
    rm -f ./hadoop-yarn*
    cp /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-api-2.7.3.2.6.3.0-235.jar .
    cp /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-client-2.7.3.2.6.3.0-235.jar .
    cp /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-common-2.7.3.2.6.3.0-235.jar .
    cp /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-server-common-2.7.3.2.6.3.0-235.jar .
    cp /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-server-web-proxy-2.7.3.2.6.3.0-235.jar .
    

    解决问题3

    • 原因: jsr305依赖版本不统一, 替换jsr305依赖
    cd /home/wcm/apps/spark-2.4.0-bin-hadoop2.7/jars
    rm -f ./jsr305-1.3.9.jar
    cp /usr/hdp/current/hadoop-client/lib/jsr305-3.0.0.jar .
    

    解决问题4

    • 原因: hdp版本变量未设置
    1. 查看hdp版本
      hdp-select status hadoop-client
      
    image.png
    1. 添加变量

      cd /home/wcm/apps/spark-2.4.0-bin-hadoop2.7/conf
      cp spark-defaults.conf.template spark-defaults.conf
      
      • vim spark-defaults.conf

        spark.driver.extraJavaOptions -Dhdp.version=2.6.3.0-235
        spark.executor.extraJavaOptions  -Dhdp.version=2.6.3.0-235
        spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.3.0-235
        
      • vim java-opts

        -Dhdp.version=2.6.3.0-235
        

    相关文章

      网友评论

          本文标题:Apache官方Spark整合HDP的Hadoop版本

          本文链接:https://www.haomeiwen.com/subject/wnnvbqtx.html