美文网首页
sparksql读parquet表执行报错

sparksql读parquet表执行报错

作者: liuzx32 | 来源:发表于2020-06-19 15:56 被阅读0次

集群内存:1024G(数据量:400G)

报错信息:

Job aborted due to stage failure: Serialized task 2231:2304 was 637417604 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.

原因:

Driver端发送的数据太大导致超过spark默认的传输限制

解决方案:

增加配置信息 spark.rpc.message.maxSize=1024

spark2-submit \
--class com.lhx.test \
--master yarn \
--deploy-mode cluster \
--conf spark.rpc.message.maxSize=1024 \
--driver-memory 30g \
--executor-memory 12g \
--num-executors 12 \
--executor-cores 3 \
--conf spark.yarn.driver.memoryOverhead=4096m \
--conf spark.yarn.executor.memoryOverhead=4096m \
./test.jar

相关文章

网友评论

      本文标题:sparksql读parquet表执行报错

      本文链接:https://www.haomeiwen.com/subject/mpzvxktx.html