ContainerLaunch类在启动一个container前会在临时目录中生成default_container_executor.sh、default_container_executor_session.sh、launch_container.sh三个文件,下面对以某个container启动为例分析其进程启动过程。
首先执行脚本
tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0008/container_1555117719646_0008_01_000001/default_container_executor.sh
default_container_executor.sh内容:
/bin/bash "/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0008/container_1555117719646_0008_01_000001/default_container_executor_session.sh"
rc=$?
echo $rc > "/tmp/hadoop-hadoop/nm-local-dir/nmPrivate/application_1555117719646_0008/container_1555117719646_0008_01_000001/container_1555117719646_0008_01_000001.pid.exitcode.tmp"
/bin/mv -f "/tmp/hadoop-hadoop/nm-local-dir/nmPrivate/application_1555117719646_0008/container_1555117719646_0008_01_000001/container_1555117719646_0008_01_000001.pid.exitcode.tmp" "/tmp/hadoop-hadoop/nm-local-dir/nmPrivate/application_1555117719646_0008/container_1555117719646_0008_01_000001/container_1555117719646_0008_01_000001.pid.exitcode"
exit $rc
default_container_executor_session.sh脚本内容:
echo $$ > /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/application_1555117719646_0008/container_1555117719646_0008_01_000001/container_1555117719646_0008_01_000001.pid.tmp
/bin/mv -f /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/application_1555117719646_0008/container_1555117719646_0008_01_000001/container_1555117719646_0008_01_000001.pid.tmp /tmp/hadoop-hadoop/nm-local-dir/nmPrivate/application_1555117719646_0008/container_1555117719646_0008_01_000001/container_1555117719646_0008_01_000001.pid
exec setsid /bin/bash "/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0008/container_1555117719646_0008_01_000001/launch_container.sh"
default_container_executor_session.sh先获取shell的pid写入到.pid.tmp文件,然后去掉后缀tmp,最后调用launch_container.sh启动container进程,注意启动launch_container.sh时使用的是exec setsid,即替换default_container_executor_session.sh进程,且在新的sessionid中,这样pid.tmp记录的pid就成为新session中的首进程,然后lauch_container.sh中在启动container进程时前面也加了exec,见下面代码,这样container进程pid就是上述首进程的pid,这样做的目的是在kill container时可以kill -15/-9首进程,该进程产生的所有子进程都将被结束。
[hadoop@node1 testshell]$ cat /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0011/container_1555117719646_0011_01_000001/launch_container.sh
export HADOOP_CONF_DIR="/home/hadoop/hadoop-2.6.5/etc/hadoop"
export MAX_APP_ATTEMPTS="2"
export JAVA_HOME="/usr/local/jdk1.8.0_121"
export LEVER_APPLICATION_ID="application_1555117719646_0011"
export LEVER_APPLICATION_QUEUE="default"
export APP_SUBMIT_TIME_ENV="1555164926233"
export NM_HOST="node1"
export HADOOP_HDFS_HOME="/home/hadoop/hadoop-2.6.5"
export LOGNAME="hadoop"
export JVM_PID="$$"
export PWD="/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0011/container_1555117719646_0011_01_000001"
export HADOOP_COMMON_HOME="/home/hadoop/hadoop-2.6.5"
export LOCAL_DIRS="/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0011"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1555117719646_0011"
export NM_HTTP_PORT="8042"
export LOG_DIRS="/home/hadoop/hadoop-2.6.5/logs/userlogs/application_1555117719646_0011/container_1555117719646_0011_01_000001"
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
"
export NM_PORT="45301"
export USER="hadoop"
export HADOOP_YARN_HOME="/home/hadoop/hadoop-2.6.5"
export CLASSPATH="$CLASSPATH:./*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*"
export HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0011/container_1555117719646_0011_01_000001/container_tokens"
export NM_AUX_SERVICE_spark_shuffle=""
export HOME="/home/"
export CONTAINER_ID="container_1555117719646_0011_01_000001"
export MALLOC_ARENA_MAX="4"
ln -sf "/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0011/filecache/10/action.conf.4master" "action.conf.4master"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
exit $hadoop_shell_errorcode
fi
ln -sf "/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1555117719646_0011/filecache/11/lever-master-0.1.0-jar-with-dependencies.jar" "lever-master-0.1.0-jar-with-dependencies.jar"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
exit $hadoop_shell_errorcode
fi
exec /bin/bash -c "$JAVA_HOME/bin/java -Xmx512m com.lucky.lever.master.LeverMaster --container_memory 128 --container_vcores 1 1>/home/hadoop/hadoop-2.6.5/logs/userlogs/application_1555117719646_0011/container_1555117719646_0011_01_000001/application_master.stdout 2>/home/hadoop/hadoop-2.6.5/logs/userlogs/application_1555117719646_0011/container_1555117719646_0011_01_000001/application_master.stderr "
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
exit $hadoop_shell_errorcode
fi
网友评论