美文网首页Android进化
Android WatchDog原理分析

Android WatchDog原理分析

作者: 锄禾豆 | 来源:发表于2020-01-08 07:57 被阅读0次

    简述

    了解WatchDog的原理,可以更好的理解系统服务的运行机制

    分析

    1.Watchdog extends Thread

    Watchdog是一个线程

    2.在SystemServer.java中启动

    private void startOtherServices() {
        ······
        traceBeginAndSlog("InitWatchdog");
        final Watchdog watchdog = Watchdog.getInstance();
        watchdog.init(context, mActivityManagerService);
        traceEnd();
        ······
        traceBeginAndSlog("StartWatchdog");
        Watchdog.getInstance().start();
       traceEnd();
    }
    因为是线程,所以,只要start即可
    

    3.查看WatchDog的构造方法

    private Watchdog() {
            super("watchdog");
            // Initialize handler checkers for each common thread we want to check.  Note
            // that we are not currently checking the background thread, since it can
            // potentially hold longer running operations with no guarantees about the timeliness
            // of operations there.
    
            // The shared foreground thread is the main checker.  It is where we
            // will also dispatch monitor checks and do other work.
            mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                    "foreground thread", DEFAULT_TIMEOUT);
            mHandlerCheckers.add(mMonitorChecker);
            // Add checker for main thread.  We only do a quick check since there
            // can be UI running on the thread.
            mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
                    "main thread", DEFAULT_TIMEOUT));
            // Add checker for shared UI thread.
            mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
                    "ui thread", DEFAULT_TIMEOUT));
            // And also check IO thread.
            mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
                    "i/o thread", DEFAULT_TIMEOUT));
            // And the display thread.
            mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
                    "display thread", DEFAULT_TIMEOUT));
    
            // Initialize monitor for Binder threads.
            addMonitor(new BinderThreadMonitor());
    
            mOpenFdMonitor = OpenFdMonitor.create();
    
            // See the notes on DEFAULT_TIMEOUT.
            assert DB ||
                    DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
    
            // mtk enhance
            exceptionHWT = new ExceptionLog();
        }
    1.重点关注两个对象:mMonitorChecker和mHandlerCheckers
    
    2.mHandlerCheckers列表元素的来源:
    1)构造对象的导入:UiThread、IoThread、DisplatyThread、FgThread加入
    2)外部导入:Watchdog.getInstance().addThread(handler);
    
    3.mMonitorChecker列表元素的来源:
    外部导入:Watchdog.getInstance().addMonitor(monitor);
    特别说明:addMonitor(new BinderThreadMonitor());
    

    4.查看WatchDog的run方法

    public void run() {
            boolean waitedHalf = false;
            boolean mSFHang = false;
            while (true) {
                ······
                synchronized (this) {
                    ······
                    for (int i=0; i<mHandlerCheckers.size(); i++) {
                        HandlerChecker hc = mHandlerCheckers.get(i);
                        hc.scheduleCheckLocked();
                    }
                    ······
                }
                ······
    }
    对mHandlerCheckers列表元素进行检测
    

    5.查看HandlerChecker的scheduleCheckLocked

    public void scheduleCheckLocked() {
            if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
                    // If the target looper has recently been polling, then
                    // there is no reason to enqueue our checker on it since that
                    // is as good as it not being deadlocked.  This avoid having
                    // to do a context switch to check the thread.  Note that we
                    // only do this if mCheckReboot is false and we have no
                    // monitors, since those would need to be executed at this point.
                    mCompleted = true;
                    return;
            }
    
            if (!mCompleted) {
                    // we already have a check in flight, so no need
                    return;
            }
            
            mCompleted = false;
            mCurrentMonitor = null;
            mStartTime = SystemClock.uptimeMillis();
            mHandler.postAtFrontOfQueue(this);
    }
    
    1.mMonitors.size() == 0的情況,
    主要为了检查mHandlerCheckers中的元素是否超时,运用的手段:mHandler.getLooper().getQueue().isPolling()
    
    2.mMonitorChecker对象的列表元素一定是大于0,此时,关注点在mHandler.postAtFrontOfQueue(this):
    public void run() {
           final int size = mMonitors.size();
           for (int i = 0 ; i < size ; i++) {
                synchronized (Watchdog.this) {
                    mCurrentMonitor = mMonitors.get(i);
                }
                mCurrentMonitor.monitor();
           }
    
           synchronized (Watchdog.this) {
                mCompleted = true;
                mCurrentMonitor = null;
           }
    }
    运用的手段:监听monitor方法
    1)这里是对mMonitors进行monitor,而能够满足条件的只有:mMonitorChecker,例如:各种服务通过addMonitor加入列表
    ActivityManagerService.java
        Watchdog.getInstance().addMonitor(this); 
    
    InputManagerService.java
        Watchdog.getInstance().addMonitor(this); 
    
    PowerManagerService.java
        Watchdog.getInstance().addMonitor(this); 
    
    ActivityManagerService.java
        Watchdog.getInstance().addMonitor(this); 
    
    WindowManagerService.java
        Watchdog.getInstance().addMonitor(this); 
    而被执行的monitor方法很简单,例如ActivityManagerService:
    public void monitor() {
         synchronized (this) { }
    }
    这里仅仅是检查系统服务是否被锁住。
    
    2)特别说明,怎样检查BinderThreadMonitor?
    Watchdog的内部类
    private static final class BinderThreadMonitor implements Watchdog.Monitor {
            @Override
            public void monitor() {
                Binder.blockUntilThreadAvailable();
            }
    }
    
    android.os.Binder.java
    public static final native void blockUntilThreadAvailable();
    
    android_util_Binder.cpp
    static void android_os_Binder_blockUntilThreadAvailable(JNIEnv* env, jobject clazz)
    {
        return IPCThreadState::self()->blockUntilThreadAvailable();
    }
    
    IPCThreadState.cpp
    void IPCThreadState::blockUntilThreadAvailable()
    {
        pthread_mutex_lock(&mProcess->mThreadCountLock);
        while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
            ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",
                    static_cast<unsigned long>(mProcess->mExecutingThreadsCount),
                    static_cast<unsigned long>(mProcess->mMaxThreads));
            pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
        }
        pthread_mutex_unlock(&mProcess->mThreadCountLock);
    }
    这里仅仅是检查进程中包含的可执行线程的数量不能超过mMaxThreads,如果超过了最大值(31个),就需要等待。
    原因:
    ProcessState.cpp
    #define DEFAULT_MAX_BINDER_THREADS 15
    但是systemserver.java进行了设置
    // maximum number of binder threads used for system_server
    // will be higher than the system default
    private static final int sMaxBinderThreads = 31;
    private void run() {
        ······
        BinderInternal.setMaxThreads(sMaxBinderThreads);
        ······
    }
    

    6.发生超时后,WatchDog会做什么?

    public void run() {
        ······
        Process.killProcess(Process.myPid());
        System.exit(10);
        ······
    }
    kill自己所在进程(system_server),并退出。
    

    7.问题

    1).WatchDog会打印什么日志?

    (1)process stack traces

    保存路径由dalvik.vm.stack-trace-file或dalvik.vm.stack-trace-dir控制,常规为/data/anr/ ActivityManagerService.dumpStackTraces(true, pids, null, null, getInterestingNativePids()); 
    注意点: 1.堵塞一半时即WAITED_HALF,也会打印process stack traces
    

    (2)slog

    sys log ---> android.util.Slog (hide类) 
    
    Slog.e(TAG, "**SWT happen **" + subject); 
    
    Slog.v(TAG, "** save all info before killnig system server **"); 
    
    Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); 
    
    Slog.w(TAG, "*** GOODBYE!");
    

    (3)event log

    EventLog.writeEvent(EventLogTags.WATCHDOG, name.isEmpty() ? subject : name);
    

    (4)kernel stack traces

    保存路径由dalvik.vm.stack-trace-file控制,常规为/data/anr/
    if (RECORD_KERNEL_THREADS) {
       dumpKernelStackTraces();
    }
    private File dumpKernelStackTraces() {
            String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
            if (tracesPath == null || tracesPath.length() == 0) {
                return null;
            }
    
            native_dumpKernelStacks(tracesPath);
            return new File(tracesPath);
    }
    

    (5)dropbox

    Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
         public void run() {
                Slog.v(TAG, "** start addErrorToDropBox **");
                mActivity.addErrorToDropBox(
                                    "watchdog", null, "system_server", null, null,
                                    name.isEmpty() ? subject : name, null, stack, null);
                }
    };
    dropboxThread.start();
    注意:
    dropbox一般放在/data/system/dropbox目录下,具体原因如下:
    DropBoxManagerService.java
    public DropBoxManagerService(final Context context) {
            this(context, new File("/data/system/dropbox"), FgThread.get().getLooper());
    }
    

    2.为什么要监测UiThread、IoThread、DisplatyThread、FgThread?

    首先,这4个类,继承ServiceThread,是单例模式。例如UiThread.java

    /**
     * Shared singleton thread for showing UI.  This is a foreground thread, and in
     * additional should not have operations that can take more than a few ms scheduled
     * on it to avoid UI jank.
     */
    public final class UiThread extends ServiceThread {
        private static final long SLOW_DISPATCH_THRESHOLD_MS = 100;
        private static UiThread sInstance;
        private static Handler sHandler;
    
        private UiThread() {
            super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/);
        }
    
        @Override
        public void run() {
            // Make sure UiThread is in the fg stune boost group
            Process.setThreadGroup(Process.myTid(), Process.THREAD_GROUP_TOP_APP);
            super.run();
        }
    
        private static void ensureThreadLocked() {
            if (sInstance == null) {
                sInstance = new UiThread();
                sInstance.start();
                final Looper looper = sInstance.getLooper();
                looper.setTraceTag(Trace.TRACE_TAG_ACTIVITY_MANAGER);
                looper.setSlowDispatchThresholdMs(SLOW_DISPATCH_THRESHOLD_MS);
                sHandler = new Handler(sInstance.getLooper());
            }
        }
    
        public static UiThread get() {
            synchronized (UiThread.class) {
                ensureThreadLocked();
                return sInstance;
            }
        }
    
        public static Handler getHandler() {
            synchronized (UiThread.class) {
                ensureThreadLocked();
                return sHandler;
            }
        }
    }
    1.通过get()获取对象
    2.通过getHandler()获取各自线程里面的Handler对象
    3.注意看,创建自身对象ensureThreadLocked的时候,就进行了start动作。也就是说,这个线程
    在创建对象的时候就,就已经启动了。
    

    其次,这四个类都继承ServiceThread ,而ServiceThread继承HandlerThread。我们重点关注线程中的Handler,因为ActivityManagerService、WMS、PMS等系统服务都涉及调用它们。

    final class UiHandler extends Handler {
            public UiHandler() {
                super(com.android.server.UiThread.get().getLooper(), null, true);
            }
    
            @Override
            public void handleMessage(Message msg) {
                switch (msg.what) {
                case SHOW_ERROR_UI_MSG: {
                    mAppErrors.handleShowAppErrorUi(msg);
                    ensureBootCompleted();
                } break;
                ······
            }
    }
    1.UiHandler是直接获取的UiThread里面的Looper。我们清楚一个线程一个Looper,一个MessageQueue,但是可以有多个Handler.
    2.我们看handleMessage里面的处理方式,说明并不一定是主线程才能更新Ui。
    

    最后,UIThread、IoThread、DisplatyThread、FgThread之间有什么区别?

    a.线程名称不一样:
    分别对应名称为android.ui、android.io、android.display、android.fg
    
    b.线程等级有差异
    UiThread-->Process.THREAD_PRIORITY_FOREGROUND
    IoThread、FgThread-->android.os.Process.THREAD_PRIORITY_DEFAULT
    DisplatyThread-->Process.THREAD_PRIORITY_DISPLAY + 1
    
    c.使用的场景略有差异
    UiThread --> ActivityManagerService
    DisplayThread --> WindowManagerService、InputManagerService、DisplayMangerService
    IoThread -->
     PackageInstallerService、StorageManagerService、BluetoothManagerService
    

    8.总结

    1.Watchdog的核心对象为mHandlerCheckers和mMonitorChecker。

    mHandlerCheckers:监控消息队列是否发生阻塞

    mMonitorChecker:监控系统核心服务是否发生长时间持锁。

    2.mHandlerCheckers的对象采用手段为通过mHandler.getLooper().getQueue().isPolling()判断是否超时;mMonitorChecker通过synchronized(this)判断是否超时,其中特别注意,BinderThreadMonitor主要是通过判断Binder线程是否超过了系统最大值来判断是否超时。

    3.超时之后,系统会打印一系列的日志,可以根据各种日志输出,进行有效分析

    4. 超时之后,Watchdog会杀掉自己的进程,也就是此时system_server进程id会变化

    5.拓展:是否我们可以采用此方式来监听我们app是否也发生相关问题?

    9.参考学习

    https://blog.csdn.net/xiaosayidao/article/details/75453195

    相关文章

      网友评论

        本文标题:Android WatchDog原理分析

        本文链接:https://www.haomeiwen.com/subject/zsejactx.html