美文网首页
spark setting up dynamic alloca

spark setting up dynamic alloca

作者: oo_思维天空 | 来源:发表于2019-06-05 20:13 被阅读0次

    什么是spark 的dynamic allocate

    • 它允许你的spark 根据 spark的运行时的负载情况动态增加或者减少executors 的个数

    • 下面展示的是一张静态资源分配下spark实际使用的资源与分配的资源图


    • 下面展示的是动态资源分配下spark 实际使用的资源与分配的资源 图


    什么场景下使用spark 的dynamic allocate

    • 任何需要大的shuffle拉取的情况


    开启spark dynamic allocate 步骤设置

    1. 设置环境变量

    $ export HADOOP_CONF_DIR=/hadoop/hadoop-2.7XX/etc/hadoop
    $ export HADOOP_HOME=/hadoop/hadoop-2.7XX
    $ export SPARK_HOME=/hadoop/spark-2.4.0-bin-hadoop2.7
    $ hds=(`cat ${HADOOP_CONF_DIR}/slaves` 'namenode1' 'namenode2') 
    # 配置hadoop cluster 的所有机器的host 
    # remeber to unset hds at the end
    

    2. yarn文件配置

    • bakup yarn-site.xml

      $ for i in ${hds[@]}  ; do echo $i ; ssh $i "cp ${HADOOP_CONF_DIR}/yarn-site.xml ${HADOOP_CONF_DIR}/yarn-site.xml.pre_spark_shuffle.bak"  ; done;
      $ for i in ${hds[@]}  ; do echo $i ; ssh $i "ls ${HADOOP_CONF_DIR} | grep pre_spark_shuffle.bak"  ; done;
      
    • modify yarn-site.xml

      $ more ${HADOOP_CONF_DIR}/yarn-site.xml | grep -B 1 -A 2 "aux-services"
      
    • output

      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle,spark_shuffle</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
        <value>org.apache.spark.network.yarn.YarnShuffleService</value>
      </property>
      
    • broadcast yarn-site.xml

       $ for i in ${hds[@]} ; do echo $i ; scp ${HADOOP_CONF_DIR}/yarn-site.xml ${i}:${HADOOP_CONF_DIR}/ ; done ;
       $ for i in ${hds[@]} ; do echo $i ; ssh $i  "cat ${HADOOP_CONF_DIR}/yarn-site.xml | grep -B 1 -A 2 'aux-services' " ; done ;
      
    • Check heapsize

      $ more ${HADOOP_CONF_DIR}/yarn-env.sh | grep "YARN_HEAPSIZE"
      
      • output

      YARN_HEAPSIZE=2000 # 根据实际情况调整
      
    • check yarn class path

      $ yarn classpath | sed -r 's/:/\n/g'
      $ more ${HADOOP_CONF_DIR}/yarn-site.xml | grep "yarn.application.classpath"  
      # if finds nothing  , we can use default path $HADOOP_HOME/share/hadoop/yarn/
      
    • check yarn shuffle jar

      $ find ${SPARK_HOME} -iname "*yarn-shuffle.jar" 
      # get result :  spark-2.4.0-yarn-shuffle.jar 
      
    • copy yarn shuffle jar

      $ for i in ${hds[@]} ; do echo $i ; scp `find ${SPARK_HOME} -iname "*yarn-shuffle.jar"` ${i}:$HADOOP_HOME/share/hadoop/yarn/ ; done ;
      $ for i in ${hds[@]}  ; do echo $i ; ll -ltr $HADOOP_HOME/share/hadoop/yarn/   | grep shuffle  ; done ;
      
    • Restart yarn

        $ bash $HADOOP_HOME/sbin/stop-yarn.sh
        $ bash $HADOOP_HOME/sbin/start-yarn.sh
      
    • check application

      $ for i in ${hds[@]}  ; do echo $i ; ssh $i ". /etc/profile ; jps" | grep -i manager  ; done;
      

    3. spark 配置

    • spark-default.conf

      $ more ${SPARK_HOME}/conf/spark-defaults.conf
      
    • and add the following entries:

       spark.dynamicAllocation.enabled true
       spark.shuffle.service.enabled  true
       spark.dynamicAllocation.minExecutors 1
       spark.dynamicAllocation.maxExecutors 4
       spark.dynamicAllocation.executorIdleTimeout 60
      

    refrence

    ppt
    config

    相关文章

      网友评论

          本文标题:spark setting up dynamic alloca

          本文链接:https://www.haomeiwen.com/subject/ueasxctx.html