spark报错问题

作者: 井底蛙蛙呱呱呱 | 来源:发表于2022-04-01 18:09 被阅读0次

spark报错问题
Spark3.1.1 ERROR AsyncEventQueu
failure: ``union'' expected but
macOS安装Spark时遇到的问题
解决：执行spark-submit的时候报java.lang.C
Specified key was too long; max
解决： Exception in thread "main" j
java 读取hive报错java.lang.Byte cann
NoClassDefFoundError: org/slf4j/
Spark 之 Had a not serializable r

shuffle.FetchFailedException 错误

org.apache.spark.shuffle.FetchFailedException:
    Failed to connect to hostname/192.168.xx.xxx:50268

解决方法：

1、提高spark.sql.shuffle.partitions分区数；
2、提高spark.default.parallelism 并行数，控制shuffle read与reduce处理的分区数，默认为运行任务的core的总数，官方建议为设置成运行任务的core的2-3倍；
3、提高executor.memory，通过spark.executor.memory适当提高executor的memory值；
4、一些其它配置：

--conf spark.blacklist.enabled=true # blacklist bad machine
--conf spark.reducer.maxReqsInFlight=10 # limit concurrent requests from reducer 
--conf spark.shuffle.io.retryWait=10s # increase retry wait
--conf spark.shuffle.io.maxRetries=10 # increase retry times
--conf spark.shuffle.io.backLog=4096 # increase tcp connection wait queue length
--conf spark.network.timeout=360 提升网络连接时长等

5、检查数据倾斜问题；

java.lang.RuntimeException: Executor is not registered

在任务执行中间，偶尔会出现 java.lang.RuntimeException: Executor is not registered 的报错。查看后主要原因是因为 NodeManager 在任务运行中挂掉重启以后，本来在它管理下的 Executor 没有办法重现注册导致的。但是看到 Spark 社区有人报这个bug，并且被标记为在 1.6.0 版本已经 fix 了。黑人问号脸。

参考解决方案：https://www.ibm.com/support/pages/spark-jobs-fail-due-executor-not-registered

参考：
Spark Shuffle FetchFailedException解决方案
 spark 官方配置文档
 FetchFailedException or MetadataFetchFailedException when processing big data set