Spark on Yarn资源分配实验，解决长期Accepted

作者: Caucher | 来源:发表于2020-11-03 16:22 被阅读0次

Spark on Yarn资源分配实验，解决长期Accepted
Spark优化
Spark on yarn遇到的问题
spark on Yarn 动态资源分配
PySpark on Yarn的相关依赖的解决方式
Spark on Yarn2.2.0资源分配
Yarn的运行机制
“Spark on YARN”模式下作业资源分配
阿里云搭建CDH (Step 2: 启动yarn)
Spark任务默认启动吃光内存的解决方法

如前文，为了彻底解决Yarn提交任务之后停留在ACCEPTED的问题，我们对Yarn的资源分配做了多组实验，过程及结果如下。

调度方式：FairScheduler
节点配置信息：

image.png

Spark on Yarn实验：
client模式和cluster模式差距不大，统一用client运行一个资源消耗比较大的连表Spark SQL查询并输出。

第一次实验
Driver: 1核2g
executor：1核2g
executor/container数量：未指定

spark-submit --master yarn --conf spark.yarn.am.memory=2g --conf spark.yarn.am.cores=1 --conf spark.executor.cores=1 --executor-memory 2g ./src/main/pybin/ttemp.py

实验结果：
成功运行
Driver：1核3g
excutor：1核3g
executor数量：6个

image.png

简单解释下，3g是因为Yarn会额外把分配的内存，加上max(384,10%已分配的内存)，规整向上1024MB，即为3g。

第二次实验：
Driver: 1核2g
executor：1核2g
executor/container数量：2

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=2g --conf spark.yarn.am.cores=1 --conf spark.executor.cores=1 --executor-memory 2g --num-executors 2 ./src/main/pybin/ttemp.py

实验结果：
成功运行：
Driver：1核3G
Executor：1核3G
Executor个数：2个

image.png

实验结论1：在Fair模式下，不指定Executor数量，则会尽量多的配置，结果不固定；指定数量的话，按照指定的来配置。

第三次实验：
Driver: 1核4g
executor：2核4g
executor/container数量：未指定

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=4g --conf spark.yarn.am.cores=1 --conf spark.executor.cores=2 --executor-memory 4g ./src/main/pybin/ttemp.py

实验结果：
成功运行：
Driver：1核5G
Executor：2核5G
Executor个数：2个

image.png

第四次实验：
第四次实验我们过度指定Executor数量，看会不会陷入ACCEPTED：
Driver: 1核4g
executor：2核4g
executor/container数量：5

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=4g --conf spark.yarn.am.cores=1 --conf spark.executor.cores=2 --executor-memory 4g --num-executors 5 ./src/main/pybin/ttemp.py

实验结果：
成功运行
Driver：1核5G
Executor：2核5G
Executor个数：2个

image.png
实验结论2：在Fair模式下，过度指定Executor数量会被忽视。

第五次实验：
第五次试验我们对资源需求过度要求，再次看看情况

Driver: 2核4g
executor：3核8g
executor/container数量：未指定

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=4g --conf spark.yarn.am.cores=2 --conf spark.executor.cores=3 --executor-memory 8g  ./src/main/pybin/ttemp.py

实验结果：
在Spark端就挂掉了
java.lang.IllegalArgumentException: Required executor memory (8192), overhead (819 MB), and PySpark memory (0 MB) is above the max threshold (6144 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
这个主要是因为我们设置了单个应用最大申请的资源为3核6g，超过的拒绝接受，发token。

image.png

第六次实验：
第六次试验我们仍然对资源需求过度要求，但不超过限制，再次看看情况
Driver: 2核4g
executor：3核5g
executor/container数量：未指定

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=4g --conf spark.yarn.am.cores=2 --conf spark.executor.cores=3 --executor-memory 5g  ./src/main/pybin/ttemp.py

实验结果：
Driver：1核5g(不知道为什么CPU核被削减了)
Exexutor：3核6g
executor/container数量：2个

image.png

第七次实验
第七次试验我们满载
Driver: 3核5g
executor：3核5g
executor/container数量：未指定

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=4g --conf spark.yarn.am.cores=2 --conf spark.executor.cores=3 --executor-memory 5g  ./src/main/pybin/ttemp.py

实验结果：
运行成功
Driver：1核5g
Exexutor：3核6g
executor/container数量：未指定

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=5g --conf spark.yarn.am.cores=3 --conf spark.executor.cores=3 --executor-memory 5g  ./src/main/pybin/ttemp.py

image.png
实验结论3：在Fair模式下，Driver的cpu cores会被设为1个，内存不会增加10%

综上来看，对每个任务进行资源限制，是非常重要的！！
接下来我们测试多任务在Fair Scheduler的调度情况。
我们统一设置为一个队列。

image.png

第八次实验
任务1：
Driver: 1核2g
executor：2核5g
executor/container数量：未指定

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=2g --conf spark.executor.cores=2 --executor-memory 5g  ./src/main/pybin/ttemp.py

任务2资源申请同任务1

实验结果：
任务1运行成功，任务2ACCEPTED。
任务1运行结束，任务2RUNNING。

第九次实验
任务1：
Driver: 1核2g
executor：1核2g
executor/container数量：未指定

spark-submit --master yarn --deploy-mode client --conf spark.yarn.am.memory=2g --conf spark.executor.cores=1 --executor-memory 2g  ./src/main/pybin/ttemp.py

任务2资源申请同任务1

实验结果：
任务1运行成功，分配了4个Executor

image.png

任务2Accepted了..
在任务1运行结束后，任务2长时间ACCEPTED了。。。
原来是同事把主机名给改了，看了日志才知道，Yarn不认识，一直在发请求，重新实验。

同时RUNNING起来了。

image.png

经验教训：多读日志！

网友评论

本文标题：Spark on Yarn资源分配实验，解决长期Accepted

本文链接：https://www.haomeiwen.com/subject/qnwbvktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Spark on Yarn资源分配实验，解决长期Accepted

相关文章