spark启动流程
sbin/start-all.sh -> start-master.sh -> start-slaves.sh
sbin/start-master.sh -> 先读取变量 sbin/spark-daemon.sh start org.apache.spark.deploy.master.Master 1 --ip $SPARK_MASTER_IP --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT
sbin/spark-daemon.sh -> /bin/spark-class $command "$@"
/bin/spark-class -> exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
spark提交任务过程
bin/spark-submit --class cn.itcast.spark.WordCount --master spark://node-1.itcast.cn:7077 --executor-memory 2g --total-executor-cores 4
exec "$SPARK_HOME"/bin/spark-class org.apache.spark.deploy.SparkSubmit -> exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
重点看spark-class org.apache.spark.deploy.SparkSubmit ->submit -> doRunMain (args->class cn.itcast.spark.WordCount …)
class.forname 在自己进程里反射
--> Class.forName通过反射调用自定义类的main方法(只有一个进程)
sparkcontext运行在sparksubmit(driver)进程中,然后与master建立连接,以后要rpc通信
创建DAGScheduler->TaskScheduler
New sparkcontext ->调用主构造器->1.创建aparkEnv(创建actorsystem)2.创建taskscheduler->create dagscheduler->start tasksceduler
网友评论