Spark on YARN
将spark作业提交到yarn上去执行
spark仅仅作业一个客户端
./spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
/home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.3.0.jar \
3
deploy-mode: client / cluster
yarn = yarn-client
yarn-cluster =
--queue
--num-executors
--executor-cores
--executor-memory
40-50s ==> 10-15s
client vs cluster
driver运行在哪里
client
am
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop000:8020/directory -Dspark.history.ui.port=7777"
coalesce vs reparition
200 200 1条 200 200
rdd1 -map-> rdd2 -filter--coalesce-> rddc --> save...
xxxx.coalesce(1)
map vs mapPartitions
foreach vs foreachPartition
foreachPartition
只要涉及到输出的,用这个
网友评论