美文网首页一些收藏
SystemServer启动和重启流程

SystemServer启动和重启流程

作者: 付凯强 | 来源:发表于2020-06-16 08:42 被阅读0次

    序言

    记录SystemServer启动过程以及crash后如何重启的。

    流程

    SystemServer 是由Zygote进程fork出来的位于ZygoteInit.java的main方法中。

                if (startSystemServer) {
                    Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);
    
                    // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                    // child (system_server) process.
                    if (r != null) {
                        r.run();
                        return;
                    }
                }
    

    接下来我们分析下forkSystemServer这个方法:

            /* Hardcoded command line to start the system server */
            String args[] = { // 1
                    "--setuid=1000",
                    "--setgid=1000",
                    "--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,"
                            + "1024,1032,1065,3001,3002,3003,3006,3007,3009,3010",
                    "--capabilities=" + capabilities + "," + capabilities,
                    "--nice-name=system_server",
                    "--runtime-args",
                    "--target-sdk-version=" + VMRuntime.SDK_VERSION_CUR_DEVELOPMENT,
                    "com.android.server.SystemServer",
            };
            ZygoteArguments parsedArgs = null;
    
            int pid;
    
            try {
                ...
                /* Request to fork the system server process */
                pid = Zygote.forkSystemServer( // 2
                        parsedArgs.mUid, parsedArgs.mGid,
                        parsedArgs.mGids,
                        parsedArgs.mRuntimeFlags,
                        null,
                        parsedArgs.mPermittedCapabilities,
                        parsedArgs.mEffectiveCapabilities);
            } catch (IllegalArgumentException ex) {
                throw new RuntimeException(ex);
            }
    
            /* For child process */
            if (pid == 0) { 
                if (hasSecondZygote(abiList)) {
                    waitForSecondaryZygote(socketName);
                }
    
                zygoteServer.closeServerSocket(); // 3
                return handleSystemServerProcess(parsedArgs); // 4
            }
    

    代码1的地方,设置了system_server进程的uid、gid和groups(Process.java中有定义),以及进程名字"system_server",接着调用代码2处Zygote的7个参数的forkSystemServer来fork一个进程,由于fork出来的子进程拥有所有父进程的东西,所以这里的pid会返回两个值,如果这个值是fork出来的子进程的pid,那么就证明此时代码运行在Zygote进程,如果pid == 0 ,那就证明此时代码运行在systemsever进程。如果运行在SystemServer进程,SystemServer继承了Zygote进程的所有内容,但是SystemServer进程又不用Zygote进程中的Socket,所以必须close它,如代码3所示。

    接下来分析下代码2和代码4的相关逻辑,首先我们来看下代码2的源码:

        public static int forkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
                int[][] rlimits, long permittedCapabilities, long effectiveCapabilities) {
            ...
            int pid = nativeForkSystemServer(
                    uid, gid, gids, runtimeFlags, rlimits,
                    permittedCapabilities, effectiveCapabilities);
            ...
        }
    

    forkSystemServer方法又调用了nativeForkSystemServer方法,从名称上可以看出,它是一个native方法:

        private static native int nativeForkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
                int[][] rlimits, long permittedCapabilities, long effectiveCapabilities);
    

    我们看下它对应的jni方法。由于nativeForkSystemServer位于Zygote.java中,Zygote.java的路径为
    frameworks/base/core/java/com/android/internal/os/,所以相对应的native方法位于frameworks/base/core/jni/中。而Zygote.java对应的jni的文件名是以包名+类名定义的,即com_android_internal_os_Zygote.cpp。而nativeForkSystemServer对应的jni方法的名字必须包括包名+类名+方法名,即

    static jint com_android_internal_os_Zygote_nativeForkAndSpecialize(
            JNIEnv* env, jclass, jint uid, jint gid, jintArray gids,
            jint runtime_flags, jobjectArray rlimits,
            jint mount_external, jstring se_info, jstring nice_name,
            jintArray managed_fds_to_close, jintArray managed_fds_to_ignore, jboolean is_child_zygote,
            jstring instruction_set, jstring app_data_dir) {
        ...
        pid_t pid = ForkCommon(env, false, fds_to_close, fds_to_ignore);
        ...
    }
    

    nativeForkAndSpecialize又调用了ForkCommon方法,对应的实现如下:

    // Utility routine to fork a process from the zygote.
    static pid_t ForkCommon(JNIEnv* env, bool is_system_server,
                            const std::vector<int>& fds_to_close,
                            const std::vector<int>& fds_to_ignore) {
      SetSignalHandlers();
      ...
      pid_t pid = fork();
      ...
    }
    

    ForkCommon调用了两个重要的函数,一个是fork函数(它的作用是创建一个新的子进程这里fork出来的进程就是SystemServer进程),一个是SetSignalHandlers函数。

    static void SetSignalHandlers() {
      struct sigaction sig_chld = {};
      sig_chld.sa_handler = SigChldHandler;
    
      if (sigaction(SIGCHLD, &sig_chld, nullptr) < 0) {
        ALOGW("Error setting SIGCHLD handler: %s", strerror(errno));
      }
    
      struct sigaction sig_hup = {};
      sig_hup.sa_handler = SIG_IGN;
      if (sigaction(SIGHUP, &sig_hup, nullptr) < 0) {
        ALOGW("Error setting SIGHUP handler: %s", strerror(errno));
      }
    }
    

    在SetSignalHandlers函数中调用SigChldHandler函数,此函数用来捕捉SigChld信号(SigChld属于linux的一种信号,在一个进程终止或者停止时,将SIGCHLD信号发送给其父进程。系统默认将忽略此信号。如果父进程希望被告知其子系统的这种状态,则应捕捉此信号。信号的捕捉函数中通常调用wait函数以取得进程ID和其终止状态),我们看下它的实现:

    // This signal handler is for zygote mode, since the zygote must reap its children
    static void SigChldHandler(int /*signal_number*/) {
      ...
      while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        ...
        // If the just-crashed process is the system_server, bring down zygote
        // so that it is restarted by init and system server will be restarted
        // from there.
        if (pid == gSystemServerPid) {
          async_safe_format_log(ANDROID_LOG_ERROR, LOG_TAG,
                                "Exit zygote because system server (pid %d) has terminated", pid);
          kill(getpid(), SIGKILL);
        }
    

    就像SIGCHLD信号的描述,SigChldHandler 利用一个死循环和一个waitpd函数来获取进程的ID和其终止状态,如果发现捕获的crash进程的pid是SystemServer进程,则通过getpid函数获取自己的pid,然后自己杀死自己。目的是同生共死,因为当Zygote进程死掉后,其父进程Init进程会检测到,就会重启其子进程Zygote进程,这样Zygote也会拉起SystemServer进程。

    分析完了代码2 forkSystemServer的代码,我们再来看下代码4的handleSystemServerProcess的代码,实现如下:

        /**
         * Finish remaining work for the newly forked system server process.
         */
        private static Runnable handleSystemServerProcess(ZygoteArguments parsedArgs) {
            ...
                /*
                 * Pass the remaining arguments to SystemServer.
                 */
                return ZygoteInit.zygoteInit(parsedArgs.mTargetSdkVersion,
                        parsedArgs.mRemainingArgs, cl);
        }
    

    handleSystemServerProcess又调用了ZygoteInit的zygoteInit方法,如以上注释所言:handleSystemServerProcess是完成fork进程之后的工作,而ZygoteInit的zygoteInit方法是为了传递ZygoteArguments类型的mRemainingArgs变量内容给SystemServer,具体来看下zygoteInit的实现:

        public static final Runnable zygoteInit(int targetSdkVersion, String[] argv,
                ClassLoader classLoader) {
            if (RuntimeInit.DEBUG) {
                Slog.d(RuntimeInit.TAG, "RuntimeInit: Starting application from zygote");
            }
    
            Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ZygoteInit");
            RuntimeInit.redirectLogStreams();
    
            RuntimeInit.commonInit();
            ZygoteInit.nativeZygoteInit(); // 5
            return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader); // 6
        }
    

    zygoteInit 方法接收三个参数,分别是targetSdkVersion,剩余参数,以及一个ClassLoder(对这个感兴趣可以返回上一个方法进行查看)。并且最终调用了代码5和代码6。代码5处执行的是一个native方法:

        private static final native void nativeZygoteInit();
    

    实现在AndroidRuntime.cpp文件里:

    static void com_android_internal_os_ZygoteInit_nativeZygoteInit(JNIEnv* env, jobject clazz)
    {
        gCurRuntime->onZygoteInit();
    }
    

    这里调用了AndroidRuntime的onZygoteInit方法

        virtual void onZygoteInit()
        {
            sp<ProcessState> proc = ProcessState::self();
            ALOGV("App process: starting thread pool.\n");
            proc->startThreadPool();
        }
    

    这个方法定义在app_main.cpp中,proc是一个ProcessState类型的对象,这里调用startThreadPool函数来启动线程池,主要用来进行Binder进程间通信,这里就不做详细分析了。

    我们重点来看代码6的逻辑实现:

        protected static Runnable applicationInit(int targetSdkVersion, String[] argv,
                ClassLoader classLoader) {
           ...
            // Remaining arguments are passed to the start class's static main
            return findStaticMain(args.startClass, args.startArgs, classLoader);
        }
    

    applicationInit又调用了findStaticMain方法,而findStaticMain如注释所言是为了传递数据给SystemServer的main方法。

        protected static Runnable findStaticMain(String className, String[] argv,
                ClassLoader classLoader) {
            Class<?> cl;
    
            try {
                cl = Class.forName(className, true, classLoader);
            } catch (ClassNotFoundException ex) {
                throw new RuntimeException(
                        "Missing class when invoking static main " + className,
                        ex);
            }
    
            Method m;
            try {
                m = cl.getMethod("main", new Class[] { String[].class }); // 7
            } catch (NoSuchMethodException ex) {
                throw new RuntimeException(
                        "Missing static main on " + className, ex);
            } catch (SecurityException ex) {
                throw new RuntimeException(
                        "Problem getting static main on " + className, ex);
            }
    
            int modifiers = m.getModifiers();
            if (! (Modifier.isStatic(modifiers) && Modifier.isPublic(modifiers))) { // 8
                throw new RuntimeException(
                        "Main method is not public and static on " + className);
            }
    
            /*
             * This throw gets caught in ZygoteInit.main(), which responds
             * by invoking the exception's run() method. This arrangement
             * clears up all the stack frames that were required in setting
             * up the process.
             */
            return new MethodAndArgsCaller(m, argv); // 9
        }
    

    代码7利用反射拿到了SystemServer类的main方法,代码8处校验main方法,代码9返回一个Runnable类型的MethodAndArgsCaller对象,对象里面保存了方法和其他参数以及一个run方法

        static class MethodAndArgsCaller implements Runnable {
            /** method to call */
            private final Method mMethod;
    
            /** argument array */
            private final String[] mArgs;
    
            public MethodAndArgsCaller(Method method, String[] args) {
                mMethod = method;
                mArgs = args;
            }
    
            public void run() {
                try {
                    mMethod.invoke(null, new Object[] { mArgs }); // 10
                } catch (IllegalAccessException ex) {
                    throw new RuntimeException(ex);
                } catch (InvocationTargetException ex) {
                    Throwable cause = ex.getCause();
                    if (cause instanceof RuntimeException) {
                        throw (RuntimeException) cause;
                    } else if (cause instanceof Error) {
                        throw (Error) cause;
                    }
                    throw new RuntimeException(ex);
                }
            }
        }
    

    这个对象在在ZygoteInit.java的main方法中拿到,并执行这个run方法,即在执行10的时候,其实就调用了SystemServer类的main方法。那为什么不直接调用这个main方法,而是在这里返回一个对象呢,如注释所言:清理堆栈,即执行main方法之前看不到堆栈信息。而事实上,在调用main方法之前已经做了大量工作。再看下ZygoteInit.java的main方法:

                if (startSystemServer) {
                    Runnable r = forkSystemServer(abiList, zygoteSocketName, zygoteServer);
    
                    // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
                    // child (system_server) process.
                    if (r != null) {
                        r.run();
                        return;
                    }
                }
    

    到此,SystemServer启动完成,而整个流程主要完成五件事情,分别是fork SystemServer进程、关闭SystemServer中的Socket、初始化Binder驱动程序以及调用SystemServer类的main方法,和处理SystemServer死亡后进行重启的相关工作。

    后续

    如果大家喜欢这篇文章,欢迎点赞!
    如果想看更多 framework 方面的文章,欢迎关注!

    相关文章

      网友评论

        本文标题:SystemServer启动和重启流程

        本文链接:https://www.haomeiwen.com/subject/tbatxktx.html