近日莫名遭遇异常一枚,如下:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 271.0 failed 1 times, most recent failure: Lost task 0.0 in stage 271.0 (TID 544, localhost): java.io.IOException: Failed to create local dir in /tmp/blockmgr-4223dca8-7355-4ab2-98b9-87e763c7becd/1d.
at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:87)
at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:97)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.getIndexFile(IndexShuffleBlockResolver.scala:58)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:140)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:127)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:87)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:107)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
原因分析:
1 Failed to create local dir,什么时候spark会创建临时文件呢?
shuffle时需要通过diskBlockManage将map结果写入本地,优先写入memory store,在memore store空间不足时会创建临时文件(二级目录,如异常中的blockmgr-4223dca8-7355-4ab2-98b9-87e763c7becd/1d)。
2 shuffle又是咋回事呢?
spark作为并行计算框架,同一个作业会被划分为多个任务在多个节点执行,reduce的输入可能存在于多个节点,因此需要shuffle将所有reduce的输入汇总起来。
3 memory store的大小是多少,什么情况下会超出使用disk store?
memory store的大小取决于spark.excutor.memory大小,默认为spark.excutor.memory*0.6
4 临时文件默认创建于/temp,如果修改?
spark.env中添加配置SPARK_LOCAL_DIRS或程序中配置,可配置多个路径,逗号分隔增强io效率
SPARK_LOCAL_DIRS:
Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks.
5 保证磁盘空间充足和磁盘读写权限。磁盘空间按需配置。
网友评论