美文网首页
Android ANR触发原理

Android ANR触发原理

作者: CyanStone | 来源:发表于2018-11-27 21:29 被阅读0次

    原理简介

    Android中的ANR,是Application Not Responding的简称。在Android系统中,ActivityManagerService和WindowManagerService会检测APP的响应时间,在应用进程的主线程处理特定的事件之前,用AMS/BroadcastQueue等相关的Handler像系统进程的Looper发送一个延时消息,在延时的时间之内,如果特定事件被执行完,则会移除掉MessageQueue中加入的那个延时消息;否则,如果特定的事件没有执行完,则不会移除那个消息,相应的Looper会取出该消息进行处理,从而触发ANR。这就是触发ANR的原理。


    触发ANR的条件

    • InputDipatching TimeOut:5秒内无法响应屏幕触发事件或者键盘事件;
    • BroadcastQueue TimeOut:在执行前台广播(BroadcastReceiver)的onReceive()方法时10秒没有处理完成,后台广播的超时时间为60s;
    • Service TimeOut:前台服务20秒内没有执行完毕;后台服务200秒内没有执行完毕;
    • ContentProvider TimeOut:ContentProvider的publish方法在10秒内没有执行完;

    源码分析(基于Android 8.0)

    1.Service

    ActiveServices是AMS管理的一个对象,它主要负责Service的启动、停止、绑定等相关的工作。具体的Service启动流程这里暂时不做分析,现在主要来看在我们调用了ContextImpl.startService()方法后的真正启动Service的方法realStartServiceLocked:

    private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app,
        boolean execInFg) throws RemoteException {
        ...
        //发送延时消息的方法
        bumpServiceExecutingLocked(r, execInFg, "create");
        ...
        //创建Service并执行onCreate方法,这里不再进一步分析
        app.thread.scheduleCreateService(r, r.serviceInfo, 
            mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
            app.repProcState);
       ...
    }
    

    下面来看下bumpServiceExecutingLocked方法

    private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
      ...
      scheduleServiceTimeoutLocked(r.app);
      ...
    }
    
    void scheduleServiceTimeoutLocked(ProcessRecord proc) {
        if (proc.executingServices.size() == 0 || proc.thread == null) {
            return;
        }
        Message msg = mAm.mHandler.obtainMessage(
                ActivityManagerService.SERVICE_TIMEOUT_MSG);
        msg.obj = proc;
       //execServicesFg是是否需要Service在前台执行的标志位
        mAm.mHandler.sendMessageDelayed(msg,
                    proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
        }
    

    可以看出,这里使用mAm.Handler向AMS所在的线程的MessageQueue发送了一个延时消息(消息的what值是ActivityManagerService.SERVICE_TIMEOUT_MSG),根据是否需要在前台执行,延时的时间是不一样的:

    //定义在ActivityManagerService中,Service超时消息的what值
    static final int SERVICE_TIMEOUT_MSG = 12;
    
    //ActiveServices文件
    // How long we wait for a service to finish executing.
    //等待前台Service执行完毕,超时时间20秒
    static final int SERVICE_TIMEOUT = 20*1000;
    // How long we wait for a service to finish executing.
    //后台广播的执行时间是前台广播执行的10倍,200秒
    static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
    

    这样便向主线程的MessageQueue中发送了延时消息,并开启了Service。那么,如果Service在延时时间到达前如果执行完毕,应该把入队的这个延时消息给移除掉,移除的逻辑是在哪儿呢?通过调用链的层层调用,发现答案就在ActivityThread的handleCreateService方法中:

    private void handleCreateService(CreateServiceData data) {
        ...
        Service service = null;
        java.lang.ClassLoader cl = loadedApk.getClassLoader();
        //通过反射创建Service的实例对象
        service = (Service) cl.loadClass(data.info.name).newInstance();
        ...
        //执行Service的onCreate方法
        service.onCreate();
        ...
        //
        ActivityManager.getService().serviceDoneExecuting(data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
       ...
    }
    

    可以看到,在执行完Service的onCreate方法后,通过Binder调用了AMS中的serviceDoneExecuting方法去通知Service已经启动。下面来看AMS中的serviceDoneExecuting方法:

    public void serviceDoneExecuting(IBinder token, int type, int startId, int res) {
        synchronized(this) {
            ...
            mServices.serviceDoneExecutingLocked((ServiceRecord) token, type, startId, res);
        }
    }
    

    AMS中的serviceDoneExecuting方法直接回调了ActiveServices中的serviceDoneExecutingLocked方法:

     void serviceDoneExecutingLocked(ServiceRecord r, int type, int startId, int res) { 
        ...
        serviceDoneExecutingLocked(r, inDestroying, inDestroying);
       ...
    }
    
    private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
          boolean finishing) {
        ...
       //在这里,把加入的延时消息给移除掉了
       mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
       ...
    }
    

    到此,我们就把添加延时消息和移除延时消息的逻辑分析清楚了,那么,假如在延时时间内,Service没有执行完,会发生什么呢?熟悉Android异步消息机制的同学应该明白,我们应该去mAm.Handler中查看对SERVICE_TIMEOUT_MSG消息的处理了,mAm.Handler是AMS中定义的一个内部类:

    //ActivityManagerService.java
    final class MainHandler extends Handler {
        public MainHandler(Looper looper) {
            super(looper, null, true);
        }
        @Override 
        public void handleMessage(Message msg) {
            switch (msg.what) {
                  case SERVICE_TIMEOUT_MSG:{
                     mServices.serviceTimeout((ProcessRecord)msg.obj);
                }
            }
        }
    }
    

    可以看到,超时处理,最后又交给了ActiveServices对象进行处理:

     void serviceTimeout(ProcessRecord proc) {
        ...
        if (anrMessage != null) {
             mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
        }
    }
    

    最后,利用AppErrors对象去进行ANR通知用户,具体ANR执行操作的方法就不再进行分析了;
    至此,关于Service的整个ARN的源码就分析完了,可以看出流程就是:1.事件执行前添加延时消息;2.事件执行完毕后移除延时消息; 3.延时时间内事件为执行完,延时消息被处理,发生ANR。

    2. BroadcastReceiver

    这里不具体分析Broadcast的注册、接收等整个流程,需要知道的是,我们注册广播的时候,其实是注册进了AMS中,当AMS接收到发送来的广播后,最后对广播进行处理的方法其实是在BroadcastQueue文件的中的processNextBroadcast方法:

    final void processNextBroadcast(boolean fromMsg) {
       ...
            do {
                r = mOrderedBroadcasts.get(0);
                //获取所有该广播所有的接收者
                int numReceivers = (r.receivers != null) ? r.receivers.size() : 0;
                if (mService.mProcessesReady && r.dispatchTime > 0) {
                    long now = SystemClock.uptimeMillis();
                    if ((numReceivers > 0) &&
                            (now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) {
                        //当广播处理时间超时,则强制结束这条广播
                        broadcastTimeoutLocked(false);
                        ...
                    }
                }
                if (r.receivers == null || r.nextReceiver >= numReceivers
                        || r.resultAbort || forceReceive) {
                    if (r.resultTo != null) {
                        //处理广播消息消息
                        performReceiveLocked(r.callerApp, r.resultTo,
                            new Intent(r.intent), r.resultCode,
                            r.resultData, r.resultExtras, false, false, r.userId);
                        r.resultTo = null;
                    }
                    //执行完毕,取消超时处理
                    cancelBroadcastTimeoutLocked();
                    ...
                    mOrderedBroadcasts.remove(0);
                   ...
                }
            } while (r == null);
            ...
    
            //获取下条有序广播
            r.receiverTime = SystemClock.uptimeMillis();
            if (!mPendingBroadcastTimeoutMessage) {
                long timeoutTime = r.receiverTime + mTimeoutPeriod;
                //添加延迟消息,延时的时间为mTimeoutPeriod
                setBroadcastTimeoutLocked(timeoutTime);
            }
            ...
    }
    

    从上述代码可以知道,调用setBroadcastTimeoutLocked方法把延时消息加进去,在所有注册的广播接收器的逻辑执行完了以后,再把延时消息给移除掉,下面我们来看setBroadcastTimeoutLocked方法和cancelBroadcastTimeoutLocked方法:

    final void setBroadcastTimeoutLocked(long timeoutTime) {
        if (!mPendingBroadcastTimeoutMessage) {
            Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
            mHandler.sendMessageAtTime(msg, timeoutTime);
            mPendingBroadcastTimeoutMessage = true;
        }
    }
    
    final void cancelBroadcastTimeoutLocked() {
        if (mPendingBroadcastTimeoutMessage) {
            mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
            mPendingBroadcastTimeoutMessage = false;
        }
    }
    

    可以看到两个方法就是添加消息和移除消息,其中timeoutTime是 r.receiverTime + mTimeoutPeriod得到的,receiverTime是当前系统时间,而mTimeoutPeriod则是在初始化BroadcastQueue初始化的时候传进来的,而BroadcastQueue则是在AMS中初始化的:

    //ActivityManagerService.java
     //前台广播超时时间
     static final int BROADCAST_FG_TIMEOUT = 10*1000;
    //后台广播超时时间
     static final int BROADCAST_BG_TIMEOUT = 60*1000;
    
    //前台广播队列
    BroadcastQueue mFgBroadcastQueue;
    //后台广播队列
    BroadcastQueue mBgBroadcastQueue;
    
    public ActivityManagerService(Context systemContext) {
        mFgBroadcastQueue = new BroadcastQueue(this, mHandler,
                    "foreground", BROADCAST_FG_TIMEOUT, false);
        mBgBroadcastQueue = new BroadcastQueue(this, mHandler,
                    "background", BROADCAST_BG_TIMEOUT, true);
    }
    

    从上述可以知道,在AMS中分别维护了前台广播队列和后台广播队列,两者的超时时间分别为10秒和60秒。下面我们看看对超时消息的处理,发送消息的mHandler是BroadcastQueue内部类BroadcastHandler的对象:

     final BroadcastHandler mHandler;
     private final class BroadcastHandler extends Handler {
        public BroadcastHandler(Looper looper) {
            super(looper, null, true);
        }
        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case BROADCAST_INTENT_MSG: {
                    if (DEBUG_BROADCAST) Slog.v(
                            TAG_BROADCAST, "Received BROADCAST_INTENT_MSG");
                    processNextBroadcast(true);
                } break;
                case BROADCAST_TIMEOUT_MSG: {
                    synchronized (mService) {
                        broadcastTimeoutLocked(true);
                    }
                } break;
            }
        }
    }
    

    超时后会执行broadcastTimeoutLocked方法,从而触发ANR。

    final void broadcastTimeoutLocked(boolean fromMsg) {
        ...
           if (anrMessage != null) {
           // Post the ANR to the handler since we do not want to process ANRs while
           // potentially holding our lock.
            mHandler.post(new AppNotResponding(app, anrMessage));
        }
    }
    

    通过上述流程,我们就把Broadcast触发ANR的源码分析清楚了,流程同样跟Service是一样的:1.事件执行前添加延时消息;2.事件执行完毕后移除延时消息; 3.延时时间内事件为执行完,延时消息被处理,发生ANR。

    3.ContentProvider

    ContentProvider Timeout是位于ActivityManager线程中的AMS.MainHandler收到CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息时触发。具体逻辑同Service和BroadcastReceiver,具体源码逻辑这里就不做分析了,感兴趣的同学可以自行去查看。

    同样,在AMS启动Activity的时候,对启动和暂停相关Activity,也加入了类似超时处理,超时时间设定为500毫秒,所以在onPause方法中,最好不要做耗时的操作,而要放到onStop中,因为onStop和onDestroy的超时时间都是10s。

        // How long we wait until giving up on the last activity to pause.  This
        // is short because it directly impacts the responsiveness of starting the
        // next activity.
        private static final int PAUSE_TIMEOUT = 500;
    
        // How long we wait for the activity to tell us it has stopped before
        // giving up.  This is a good amount of time because we really need this
        // from the application in order to get its saved state.
        private static final int STOP_TIMEOUT = 10 * 1000;
    
        // How long we wait until giving up on an activity telling us it has
        // finished destroying itself.
        private static final int DESTROY_TIMEOUT = 10 * 1000;
    

    如何避免ANR

    Android系统增加的ANR机制的本质,其实都是监控主线程是否发生阻塞,所以要避免ANR,记住一条,就是:

    • 避免在主线程执行耗时的操作
    • 在Service、BroadcastReceiver、ContentProvider中如果需要执行耗时的操作,请采用合适的多线程技术进行异步调用

    参考链接

    Android ANR:原理分析及解决办法

    相关文章

      网友评论

          本文标题:Android ANR触发原理

          本文链接:https://www.haomeiwen.com/subject/vdxgqqtx.html