美文网首页
134、Spark核心编程进阶之spark-submit基础及例

134、Spark核心编程进阶之spark-submit基础及例

作者: ZFH__ZJ | 来源:发表于2019-01-18 14:01 被阅读0次

    基础参数

    wordcount.sh

    /usr/local/spark/bin/spark-submit \
    --class com.zj.spark.core.WordCountCluster \
    --master spark://spark-project-1:7077 \
    --deploy-mode client \
    --conf <key>=<value> \
    /opt/spark-study/mysparkstudy-1.0-SNAPSHOT-jar-with-dependencies.jar \
    ${1}
    

    以下是上面的spark-submit讲解
    --class: spark应用程序对应的主类,也就是spark应用运行的主入口,通常是一个包含了main方法的java类或scala类,需要包含全限定包名,比如org.leo.spark.study.WordCount
    --master: spark集群管理器的master URL,standalone模式下,就是ip地址+端口号,比如spark://192.168.0.101:7077,standalone默认端口号就是7077
    --deploy-mode: 部署模式,决定了将driver进程在worker节点上启动,还是在当前本地机器上启动;默认是client模式,就是在当前本地机器上启动driver进程,如果是cluster,那么就会在worker上启动
    --conf: 配置所有spark支持的配置属性,使用key=value的格式;如果value中包含了空格,那么需要将key=value包裹的双引号中
    application-jar: 打包好的spark工程jar包,在当前机器上的全路径名
    application-arguments: 传递给主类的main方法的参数; 在shell中用${1}这种格式获取传递给shell的参数;然后在比如java中,可以通过main方法的args[0]等参数获取

    给main类传递参数

    /opt/module/spark/bin/spark-submit \
    --class com.zj.spark.core.WordCountCluster \
    --master spark://spark-project-1:7077 \
    --deploy-mode client \
    --num-executors 1 \
    --driver-memory 100m \
    --executor-memory 450m \
    --executor-cores 1 \
    /opt/spark-study/mysparkstudy-1.0-SNAPSHOT-jar-with-dependencies.jar \
    hello \
    haha
    
    ./standalone-client.sh hello haha
    /opt/module/spark/bin/spark-submit \
    --class com.zj.spark.core.WordCountCluster \
    --master spark://spark-project-1:7077 \
    --deploy-mode client \
    --num-executors 1 \
    --driver-memory 100m \
    --executor-memory 450m \
    --executor-cores 1 \
    /opt/spark-study/mysparkstudy-1.0-SNAPSHOT-jar-with-dependencies.jar \
    ${1} \
    ${2}
    

    例子

    1. 使用local本地模式,以及8个线程运行
      --class 指定要执行的main类
      --master 指定集群模式,local,本地模式,local[8],进程中用几个线程来模拟集群的执行
    /opt/module/spark/bin/spark-submit \
    --class com.zj.spark.core.WordCountCluster \
    --master local[8] \
    /opt/spark-study/mysparkstudy-1.0-SNAPSHOT-jar-with-dependencies.jar \
    
    1. 使用standalone client模式运行
      executor-memory,指定每个executor的内存量,这里每个executor内存是2G
      total-executor-cores,指定所有executor的总cpu core数量,这里所有executor的总cpu core数量是100个
    /opt/module/spark/bin/spark-submit \
    --class com.zj.spark.core.WordCountCluster \
    --master spark://192.168.0.101:7077 \
    --executor-memory 2G \
    --total-executor-cores 100 \
    /opt/spark-study/mysparkstudy-1.0-SNAPSHOT-jar-with-dependencies.jar \
    
    1. 使用standalone cluster模式运行
      supervise参数,指定了spark监控driver节点,如果driver挂掉,自动重启driver
    /opt/module/spark/bin/spark-submit \
    --class com.zj.spark.core.WordCountCluster \
    --master spark://192.168.0.101:7077 \
    --deploy-mode cluster \
    --supervise \
    --executor-memory 2G \
    --total-executor-cores 100 \
    /opt/spark-study/mysparkstudy-1.0-SNAPSHOT-jar-with-dependencies.jar \
    
    1. 使用yarn-cluster模式运行
      num-executors,指定总共使用多少个executor运行spark应用
    /opt/module/spark/bin/spark-submit \
    --class com.zj.spark.core.WordCountCluster \
    --master yarn-cluster \  
    --executor-memory 20G \
    --num-executors 50 \
    /opt/spark-study/mysparkstudy-1.0-SNAPSHOT-jar-with-dependencies.jar \
    
    5. 使用standalone client模式,运行一个python应用
    ```sh
    /opt/module/spark/bin/spark-submit \
    --master spark://192.168.0.101:7077 \
    /usr/local/python-spark-wordcount.py \
    

    常用的配置

    /opt/module/spark/bin/spark-submit \
    --class com.zj.spark.core.WordCountCluster \
    --master yarn-cluster \
    --num-executors 100 \
    --executor-cores 2 \
    --executor-memory 6G \
    --driver-memory  1G \
    /opt/spark-study/mysparkstudy-1.0-SNAPSHOT-jar-with-dependencies.jar \
    

    相关文章

      网友评论

          本文标题:134、Spark核心编程进阶之spark-submit基础及例

          本文链接:https://www.haomeiwen.com/subject/tyipdqtx.html