16-SparkCore03

作者: CrUelAnGElPG | 来源:发表于2018-09-01 04:30 被阅读0次

Spark on YARN

将spark作业提交到yarn上去执行

spark仅仅作业一个客户端

./spark-submit \

--class org.apache.spark.examples.SparkPi \

--master yarn \

/home/hadoop/app/spark-2.1.0-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.3.0.jar \

3

deploy-mode: client / cluster

yarn = yarn-client

yarn-cluster =

--queue

--num-executors

--executor-cores

--executor-memory

40-50s ==> 10-15s

client vs cluster

driver运行在哪里

client

am

SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop000:8020/directory -Dspark.history.ui.port=7777"

coalesce vs reparition

200 200 1条 200 200

rdd1 -map-> rdd2 -filter--coalesce-> rddc --> save...

xxxx.coalesce(1)

map vs mapPartitions

foreach vs foreachPartition

foreachPartition

只要涉及到输出的，用这个

网友评论

本文标题：16-SparkCore03

本文链接：https://www.haomeiwen.com/subject/nrdywftx.html

16-SparkCore03