最近有个同事说自己提交了spark job 后,client的进程一直占用内存,而且类似的job提交了很多,最后导致提交job的机器,内存不够用了。
查看了一下该同学的job的deploy mode 是cluster ,也就是提交之后,client的进程会一直运行,直到job结束,但是同学的这个job又是一个Streaming的job,所以不会停下来。最后这个同学采用了一个比较粗暴有效的办法,就是提交完job之后,去checkjob的运行状态,如果是运行中的话,就kill掉client的进程。
今天重新看了下yarn.client的代码发现一个参数
spark.yarn.submit.waitAppCompletion 如果设置这个参数为true的话,client将会一直运行并且报告application的状态直到application退出(无论何种原因)。false的话,client的进程将会在application提交后退出。
/**
* Submit an application to the ResourceManager.
* If set spark.yarn.submit.waitAppCompletion to true, it will stay alive
* reporting the application's status until the application has exited for any reason.
* Otherwise, the client process will exit after submission.
* If the application finishes with a failed, killed, or undefined status,
* throw an appropriate SparkException.
*/
def run(): Unit = {
this.appId = submitApplication()
if (!launcherBackend.isConnected() && fireAndForget) {
val report = getApplicationReport(appId)
val state = report.getYarnApplicationState
logInfo(s"Application report for $appId (state: $state)")
logInfo(formatReportDetails(report))
if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {
throw new SparkException(s"Application $appId finished with status: $state")
}
} else {
val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId)
if (yarnApplicationState == YarnApplicationState.FAILED ||
finalApplicationStatus == FinalApplicationStatus.FAILED) {
throw new SparkException(s"Application $appId finished with failed status")
}
if (yarnApplicationState == YarnApplicationState.KILLED ||
finalApplicationStatus == FinalApplicationStatus.KILLED) {
throw new SparkException(s"Application $appId is killed")
}
if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) {
throw new SparkException(s"The final status of application $appId is undefined")
}
}
}
网友评论