美文网首页
Android Watchdog学习笔记

Android Watchdog学习笔记

作者: 浪里_个郎 | 来源:发表于2021-09-25 11:11 被阅读0次

    Watchdog的作用是监控系统服务有没有卡住,如果有,就会重启系统服务。

    1. Watchdog的启动

    SystemServer中创建Watchdog,并让它的run跑起来:

                final Watchdog watchdog = Watchdog.getInstance();
                watchdog.init(context, mActivityManagerService);
                ...
                Watchdog.getInstance().start();
    

    2. 监控服务

    2.1 注册监控服务

    系统服务首先会向Watchdog注册自己

            Watchdog.getInstance().addMonitor(this);
            Watchdog.getInstance().addThread(mHandler);
    

    注册的内容会放入Watchdog:

    public void addMonitor(Monitor monitor) {
        // 将monitor对象添加到Monitor Checker中,
        // 在Watchdog初始化时,可以看到Monitor Checker本身也是一个HandlerChecker对象
        mMonitors.add(monitor);
    }
     
    public void addThread(Handler thread, long timeoutMillis) {
        synchronized (this) {
            if (isAlive()) {
                throw new RuntimeException("Threads can't be added once the Watchdog is running");
            }
            final String name = thread.getLooper().getThread().getName();
            // 为Handler构建一个HandlerChecker对象,其实就是**Looper Checker**
            mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
        }
    

    HandlerChecker会调用Monitor检测服务状态,同时根据检测状态做下一步处理。

    2.2 Watchdog的监控方式

    Watchdog会在自己的run()方法中不断进行监控:

        public void run() {
    
            while (true) {
                synchronized (this) {
                    long timeout = CHECK_INTERVAL;
                    // 1. 开始监控
                    // Make sure we (re)spin the checkers that have become idle within
                    // this wait-and-check interval
                    for (int i=0; i<mHandlerCheckers.size(); i++) {
                        HandlerChecker hc = mHandlerCheckers.get(i);
                        hc.scheduleCheckLocked();
                    }
    
                // 2. 给监控线程一点时间(30s)
                long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    ...
                    try {
                        wait(timeout);
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    ...
                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
                }
     
                // 3. 检查HandlerChecker的完成状态
                final int waitState = evaluateCheckerCompletionLocked();
                if (waitState == COMPLETED) {
                    ...
                    continue;
                } else if (waitState == WAITING) {
                    ...
                    continue;
                } else if (waitState == WAITED_HALF) {
                    ...
                    continue;
                }
     
                // 4. 存在超时的HandlerChecker
                blockedCheckers = getBlockedCheckersLocked();
                subject = describeCheckersLocked(blockedCheckers);
                allowRestart = mAllowRestart;
            }
            ...
            // 5. 保存日志,判断是否需要杀掉系统进程
            Slog.w(TAG, "*** GOODBYE!");
            Process.killProcess(Process.myPid());
    
    

    2.3 监控服务时做了什么

    HandlerChecker的scheduleCheckLocked用于发布一个HandlerChecker自己的run的执行任务:

           public void scheduleCheckLocked() {
                ...
                mStartTime = SystemClock.uptimeMillis();
                mHandler.postAtFrontOfQueue(this);
            }
    

    执行到被监控的服务的monitor():

            public void run() {
                final int size = mMonitors.size();
                for (int i = 0 ; i < size ; i++) {
                    synchronized (Watchdog.this) {
                        mCurrentMonitor = mMonitors.get(i);
                    }
                    mCurrentMonitor.monitor();
                }
                // 这里mCompleted置true。如果死锁或者等待超时没来得及置true,会在检测时被认为服务出现了问题
                synchronized (Watchdog.this) {
                    mCompleted = true;
                    mCurrentMonitor = null;
                }
            }
    

    服务的monitor其实就是尝试获取一下锁:

        public void monitor() {
            synchronized (this) { }
        }
    

    为什么获取锁就可以监控服务状态?因为系统服务会被很多客户端调用,需要处理多线程,就必然会用到锁。如果出现死锁,或者某次调用卡住了占着锁不放,Watchdog就获取不到锁,就可以认为服务出现了异常。

    2.3 检查状态

    通过服务注册的Handler(存在Watchdog的mHandlerCheckers中),来判断服务的状态。主要就是看Monitor方法是否顺利完成,若没有完成就计算耗时情况。

        private int evaluateCheckerCompletionLocked() {
            int state = COMPLETED;
            for (int i=0; i<mHandlerCheckers.size(); i++) {
                HandlerChecker hc = mHandlerCheckers.get(i);
                state = Math.max(state, hc.getCompletionStateLocked());
            }
            return state;
        }
    
            public int getCompletionStateLocked() {
                if (mCompleted) {
                    return COMPLETED;
                } else {
                    long latency = SystemClock.uptimeMillis() - mStartTime;
                    if (latency < mWaitMax/2) {
                        return WAITING;
                    } else if (latency < mWaitMax) {
                        return WAITED_HALF;
                    }
                }
                return OVERDUE;
            }
    

    2.4 异常处理

    出现异常有两种处理:
    1,杀异常的进程

    Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: XXX
    Watchdog: XXX
    Watchdog: "*** GOODBYE!
    

    2,重启

    Rebooting system because:xxx
    

    相关文章

      网友评论

          本文标题:Android Watchdog学习笔记

          本文链接:https://www.haomeiwen.com/subject/xyfzgltx.html