美文网首页
Android ANR分析

Android ANR分析

作者: 嘎嘣脆糖 | 来源:发表于2020-01-30 23:46 被阅读0次

    [toc]

    ANR分析

    ANR触发场景

    在android系统中,特定的操作需要在一定时间内完成,超过限定的时间就会触发ANR。

    组件 时长
    Service 前台服务在20s内,后台服务200s内 未执行完成
    Content Provider 内容提供者,在publish过超时10s;
    Broadcast 前台广播在10s,后台广播60s内未执行完成
    Input Dispatching 输入事件分发超时5s,包括按键和触摸事件。
    Service

    service timeout 是位于 ActivityManager 线程中的 AMS.MainHandler收到SERVICE_TIMEOUT_MSG触发的,触发时长上表中可以查看,区分前后台是通过变量ProcessRecord.
    execServicesFg

    注册ANR

    在Service启动流程中,当service attach到system_server进程的过程中会调用realStartServiceLock(),方法进行ANR的注册

    • ActivityService.java realStartServiceLocked
    private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException {
        ...
        //发送delay消息(SERVICE_TIMEOUT_MSG)
        bumpServiceExecutingLocked(r, execInFg, "create");
        try {
            ...
            //最终执行服务的onCreate()方法
            app.thread.scheduleCreateService(r, r.serviceInfo,
                    mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
                    app.repProcState);
        } catch (DeadObjectException e) {
            mAm.appDiedLocked(app);
            throw e;
        } finally {
            ...
        }
    }
    
    private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
        ... 
        scheduleServiceTimeoutLocked(r.app);
    }
    
    
    void scheduleServiceTimeoutLocked(ProcessRecord proc) {
        if (proc.executingServices.size() == 0 || proc.thread == null) {
            return;
        }
        long now = SystemClock.uptimeMillis();
        Message msg = mAm.mHandler.obtainMessage(
                ActivityManagerService.SERVICE_TIMEOUT_MSG);
        msg.obj = proc;
        
        //当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程
        mAm.mHandler.sendMessageAtTime(msg,
            proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT));
    }
    

    由此可见,是通过发送delay的消息进行ANR的注册,当超过时长(前台20s,后台200s就会触发ANR)

    解除注册

    在目标进程的线程中会进行撤销发送上文中提到的SERVICE_TIMEOUT_MSG

    • ActivityThread.java
    private void handleCreateService(CreateServiceData data) {
            ...
            java.lang.ClassLoader cl = packageInfo.getClassLoader();
            Service service = (Service) cl.loadClass(data.info.name).newInstance();
            ...
    
            try {
                //创建ContextImpl对象
                ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
                context.setOuterContext(service);
                //创建Application对象
                Application app = packageInfo.makeApplication(false, mInstrumentation);
                service.attach(context, this, data.info.name, data.token, app,
                        ActivityManagerNative.getDefault());
                //调用服务onCreate()方法 
                service.onCreate();
                
                //解除SERVICE_TIMEOUT_MSG
                ActivityManagerNative.getDefault().serviceDoneExecuting(
                        data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
            } catch (Exception e) {
                ...
            }
        }
    

    该过程会常见服务对象,并调用服务的onCreate方法,然后会通过多次调用回到system_server来执行serviceDoneExecuting

    • AS.serviceDoneExecutingLocked
    private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) {
        ...
        if (r.executeNesting <= 0) {
            if (r.app != null) {
                r.app.execServicesFg = false;
                r.app.executingServices.remove(r);
                if (r.app.executingServices.size() == 0) {
                    //当前服务所在进程中没有正在执行的service
                    mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
            ...
        }
        ...
    }
    

    当service启动完成,则移除延时消息SERVICE_TIMEOUT_MSG

    触发ANR

    如果没有在规定的时间内解除延时消息,那么则就会触发ANR。
    在system_server进程中有一个Handler线程,叫做“ActivityManager”.当倒计时结束便会向该Handler线程发送一条SERVICE_TIMEOUT_MSG.

    • ActivityManagerService.java ::MainHandler
    final class MainHandler extends Handler {
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case SERVICE_TIMEOUT_MSG: {
                    ...
                    mServices.serviceTimeout((ProcessRecord)msg.obj);
                } break;
                ...
            }
            ...
        }
    }
    
    void serviceTimeout(ProcessRecord proc) {
        String anrMessage = null;
    
        synchronized(mAm) {
            if (proc.executingServices.size() == 0 || proc.thread == null) {
                return;
            }
            final long now = SystemClock.uptimeMillis();
            final long maxTime =  now -
                    (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
            ServiceRecord timeout = null;
            long nextTime = 0;
            for (int i=proc.executingServices.size()-1; i>=0; i--) {
                ServiceRecord sr = proc.executingServices.valueAt(i);
                if (sr.executingStart < maxTime) {
                    timeout = sr;
                    break;
                }
                if (sr.executingStart > nextTime) {
                    nextTime = sr.executingStart;
                }
            }
            if (timeout != null && mAm.mLruProcesses.contains(proc)) {
                Slog.w(TAG, "Timeout executing service: " + timeout);
                StringWriter sw = new StringWriter();
                PrintWriter pw = new FastPrintWriter(sw, false, 1024);
                pw.println(timeout);
                timeout.dump(pw, " ");
                pw.close();
                mLastAnrDump = sw.toString();
                mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
                mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
                anrMessage = "executing service " + timeout.shortName;
            }
        }
    
        if (anrMessage != null) {
            //当存在timeout的service,则执行appNotResponding
            mAm.appNotResponding(proc, null, null, false, anrMessage);
        }
    }
    
    BroadcastReceiver

    当ActivityManager线程中的BroadcastQueue.BroadcastHandler收到BROADCAST_TIMEOUT_MSG消息时就会触发ANR

    处理广播中的anr消息

    在广播的启动流程中,通过调用processNextBroadcast来处理广播,其流程为

    • 并行广播
    • 当前有序广播
    • 有序广播
    final void processNextBroadcast(boolean fromMsg) {
        synchronized(mService) {
            ...
            //part 2: 处理当前有序广播
            do {
                r = mOrderedBroadcasts.get(0);
                //获取所有该广播所有的接收者
                int numReceivers = (r.receivers != null) ? r.receivers.size() : 0;
                if (mService.mProcessesReady && r.dispatchTime > 0) {
                    long now = SystemClock.uptimeMillis();
                    if ((numReceivers > 0) &&
                            (now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) {
                        //当广播处理时间超时,则强制结束这条广播
                        broadcastTimeoutLocked(false);
                        ...
                    }
                }
                if (r.receivers == null || r.nextReceiver >= numReceivers
                        || r.resultAbort || forceReceive) {
                    if (r.resultTo != null) {
                        //处理广播消息消息
                        performReceiveLocked(r.callerApp, r.resultTo,
                            new Intent(r.intent), r.resultCode,
                            r.resultData, r.resultExtras, false, false, r.userId);
                        r.resultTo = null;
                    }
                    //取消广播超时ANR
                    cancelBroadcastTimeoutLocked();
                }
            } while (r == null);
            ...
    
            //part 3: 获取下条有序广播
            r.receiverTime = SystemClock.uptimeMillis();
            if (!mPendingBroadcastTimeoutMessage) {
                long timeoutTime = r.receiverTime + mTimeoutPeriod;
                //设置广播超时anr
                setBroadcastTimeoutLocked(timeoutTime);
            }
            ...
        }
    }
    

    对于广播超时处理时机

    • 首先在part3过程中setBroadcastTimeoutLocked(timeoutTime) 设置超时广播消息;

    • 然后在part2根据广播处理情况来处理:

      • 当广播接收者等待时间过长,则调用 broadcastTimeoutLocked(false);

      • 当执行完广播,则调用cancelBroadcastTimeoutLocked;

    • setBroadcastTimeoutLocked

    final void setBroadcastTimeoutLocked(long timeoutTime) {
        if (! mPendingBroadcastTimeoutMessage) {
            Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
            mHandler.sendMessageAtTime(msg, timeoutTime);
            mPendingBroadcastTimeoutMessage = true;
        }
    }
    

    取消设置广播的超时anr和service类似,但是通过静态注册的广播超时受SharePreference(SP)影响

    相关代码如下,只有xml静态注册的光爆超时检测过程会考虑是否有SP尚未完成,动态广播并不受其影响

    public final void finish() {
        if (mType == TYPE_COMPONENT) {
            final IActivityManager mgr = ActivityManager.getService();
            if (QueuedWork.hasPendingWork()) {
                //当SP有未同步到磁盘的工作,则需等待其完成,才告知系统已完成该广播
                QueuedWork.queue(new Runnable() {
                    public void run() {
                        sendFinished(mgr);
                    }
                }, false);
            } else {
                sendFinished(mgr);
            }
        } else if (mOrderedHint && mType != TYPE_UNREGISTERED) {
            final IActivityManager mgr = ActivityManager.getService();
            sendFinished(mgr);
        }
    }
    
    final void cancelBroadcastTimeoutLocked() {
        if (mPendingBroadcastTimeoutMessage) {
            mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
            mPendingBroadcastTimeoutMessage = false;
        }
    }
    
    触发anr
    • BroadcastQueue.java ::BroadcastHandler
    private final class BroadcastHandler extends Handler {
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case BROADCAST_TIMEOUT_MSG: {
                    synchronized (mService) {
                   
                        broadcastTimeoutLocked(true);
                    }
                } break;
                ...
            }
            ...
        }
    }
    
    //fromMsg = true
    final void broadcastTimeoutLocked(boolean fromMsg) {
        if (fromMsg) {
            mPendingBroadcastTimeoutMessage = false;
        }
    
        if (mOrderedBroadcasts.size() == 0) {
            return;
        }
    
        long now = SystemClock.uptimeMillis();
        BroadcastRecord r = mOrderedBroadcasts.get(0);
        if (fromMsg) {
            if (mService.mDidDexOpt) {
                mService.mDidDexOpt = false;
                long timeoutTime = SystemClock.uptimeMillis() + mTimeoutPeriod;
                setBroadcastTimeoutLocked(timeoutTime);
                return;
            }
            
            if (!mService.mProcessesReady) {
                return; //当系统还没有准备就绪时,广播处理流程中不存在广播超时
            }
    
            long timeoutTime = r.receiverTime + mTimeoutPeriod;
            if (timeoutTime > now) {
                //如果当前正在执行的receiver没有超时,则重新设置广播超时
                setBroadcastTimeoutLocked(timeoutTime);
                return;
            }
        }
    
        BroadcastRecord br = mOrderedBroadcasts.get(0);
        if (br.state == BroadcastRecord.WAITING_SERVICES) {
            //广播已经处理完成,但需要等待已启动service执行完成。当等待足够时间,则处理下一条广播。
            br.curComponent = null;
            br.state = BroadcastRecord.IDLE;
            processNextBroadcast(false);
            return;
        }
    
        r.receiverTime = now;
        //当前BroadcastRecord的anr次数执行加1操作
        r.anrCount++;
    
        if (r.nextReceiver <= 0) {
            return;
        }
        ...
        
        Object curReceiver = r.receivers.get(r.nextReceiver-1);
        //查询App进程
        if (curReceiver instanceof BroadcastFilter) {
            BroadcastFilter bf = (BroadcastFilter)curReceiver;
            if (bf.receiverList.pid != 0
                    && bf.receiverList.pid != ActivityManagerService.MY_PID) {
                synchronized (mService.mPidsSelfLocked) {
                    app = mService.mPidsSelfLocked.get(
                            bf.receiverList.pid);
                }
            }
        } else {
            app = r.curApp;
        }
    
        if (app != null) {
            anrMessage = "Broadcast of " + r.intent.toString();
        }
    
        if (mPendingBroadcast == r) {
            mPendingBroadcast = null;
        }
    
        //继续移动到下一个广播接收者
        finishReceiverLocked(r, r.resultCode, r.resultData,
                r.resultExtras, r.resultAbort, false);
        scheduleBroadcastsLocked();
    
        if (anrMessage != null) {
            // BroadcastQueue.java AppNotResponding
            mHandler.post(new AppNotResponding(app, anrMessage));
        }
    }
    
    
    
    private final class AppNotResponding implements Runnable {
        ...
        public void run() {
            // 进入ANR处理流程
            mService.appNotResponding(mApp, null, null, false, mAnnotation);
        }
    }
    
    • mOrderedBroadcasts已处理完成,则不会anr;
    • 在执行dexopt,则不会anr;
    • 系统还没有进入ready状态(mProcessesReady=false),则不会anr;
    • 如果当前正在执行的receiver没有超时,则重新设置广播超时,不会anr;
    ContnentProvider

    ContentProvider Timeout是位于”ActivityManager”线程中的AMS.MainHandler收到CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息时触发。

    设置ANR超时

    ContentProvider 超时为CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10s. 这个跟前面的Service和BroadcastQueue完全不同, 由Provider进程启动过程相关.当进程创建后悔调用attachApplicationLocked()进入system_server进程.

    private final boolean attachApplicationLocked(IApplicationThread thread, int pid) {
        ProcessRecord app;
        if (pid != MY_PID && pid >= 0) {
            synchronized (mPidsSelfLocked) {
                app = mPidsSelfLocked.get(pid); // 根据pid获取ProcessRecord
            }
        } 
        ...
        
        //系统处于ready状态或者该app为FLAG_PERSISTENT进程则为true
        boolean normalMode = mProcessesReady || isAllowedWhileBooting(app.info);
        List<ProviderInfo> providers = normalMode ? generateApplicationProvidersLocked(app) : null;
    
        //app进程存在正在启动中的provider,则超时10s后发送CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息
        if (providers != null && checkAppInLaunchingProvidersLocked(app)) {
            Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG);
            msg.obj = app;
            mHandler.sendMessageDelayed(msg, CONTENT_PROVIDER_PUBLISH_TIMEOUT);
        }
        
        thread.bindApplication(...);
        ...
    }
    
    移除超时消息

    当provider成功publish之后,移除延时的timeout消息

    • AMS.publishContentProviders
    public final void publishContentProviders(IApplicationThread caller, List<ContentProviderHolder> providers) {
       ...
       
       synchronized (this) {
           final ProcessRecord r = getRecordForAppLocked(caller);
           
           final int N = providers.size();
           for (int i = 0; i < N; i++) {
               ContentProviderHolder src = providers.get(i);
               ...
               ContentProviderRecord dst = r.pubProviders.get(src.info.name);
               if (dst != null) {
                   ComponentName comp = new ComponentName(dst.info.packageName, dst.info.name);
                   
                   mProviderMap.putProviderByClass(comp, dst); //将该provider添加到mProviderMap
                   String names[] = dst.info.authority.split(";");
                   for (int j = 0; j < names.length; j++) {
                       mProviderMap.putProviderByName(names[j], dst);
                   }
    
                   int launchingCount = mLaunchingProviders.size();
                   int j;
                   boolean wasInLaunchingProviders = false;
                   for (j = 0; j < launchingCount; j++) {
                       if (mLaunchingProviders.get(j) == dst) {
                           //将该provider移除mLaunchingProviders队列
                           mLaunchingProviders.remove(j);
                           wasInLaunchingProviders = true;
                           j--;
                           launchingCount--;
                       }
                   }
                   //成功pubish则移除该消息
                   if (wasInLaunchingProviders) {
                       mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r);
                   }
                   synchronized (dst) {
                       dst.provider = src.provider;
                       dst.proc = r;
                       //唤醒客户端的wait等待方法
                       dst.notifyAll();
                   }
                   ...
               }
           }
       }    
    }
    
    触发超时

    在system_server中有一个handler线程叫做“ActivityManager”。当倒计时结束便会向该handler线程发送一条信息 CONTENT_PROVIDER_PUBLISH_TIMNEOUT_MSG

    • ActivityManagerService.java -MainHandler.handleMessage
    final class MainHandler extends Handler {
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: {
                    ...
                    ProcessRecord app = (ProcessRecord)msg.obj;
                    synchronized (ActivityManagerService.this) {
                        
                        processContentProviderPublishTimedOutLocked(app);
                    }
                } break;
                ...
            }
            ...
        }
    }
    
    private final void processContentProviderPublishTimedOutLocked(ProcessRecord app) {
        
        cleanupAppInLaunchingProvidersLocked(app, true); 
        
        removeProcessLocked(app, false, true, "timeout publishing content providers");
    }
    
    boolean cleanupAppInLaunchingProvidersLocked(ProcessRecord app, boolean alwaysBad) {
        boolean restart = false;
        for (int i = mLaunchingProviders.size() - 1; i >= 0; i--) {
            ContentProviderRecord cpr = mLaunchingProviders.get(i);
            if (cpr.launchingApp == app) {
                if (!alwaysBad && !app.bad && cpr.hasConnectionOrHandle()) {
                    restart = true;
                } else {
                    //移除死亡的provider
                    removeDyingProviderLocked(app, cpr, true);
                }
            }
        }
        return restart;
    }
    
    • 对于stable类型的provider(即conn.stableCount > 0),则会杀掉所有跟该provider建立stable连接的非persistent进程.
    • 对于unstable类的provider(即conn.unstableCount > 0),并不会导致client进程被级联所杀.

    总结

    超时检测

    Service超时检测机制:

    • 超过一定时间没有执行完相应操作来触发移除延时消息,则会触发anr;
      BroadcastReceiver超时检测机制:

    • 有序广播的总执行时间超过 2* receiver个数 * timeout时长,则会触发anr;

    • 有序广播的某一个receiver执行过程超过 timeout时长,则会触发anr;
      另外:

    • 对于Service, Broadcast, Input发生ANR之后,最终都会调用AMS.appNotResponding;

    • 对于provider,在其进程启动时publish过程可能会出现ANR, 则会直接杀进程以及清理相应信息,而不会弹出ANR的对话框

    • 对于input anr 可通过adb shell dumpsys input来查看手机当前的input状态, 输出内容分别为EventHub.dump(), InputReader.dump(),InputDispatcher.dump()这3类,另外如果发生过input ANR,那么也会输出上一个ANR的状态.

    当ANR出现时,无论是四大组件还是进程等,都是调用到AMS.appNotResponding()方法,provider除外

    • AMS.appNotResponding
    final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) {
        ...
        updateCpuStatsNow(); //第一次 更新cpu统计信息
        synchronized (this) {
          //PowerManager.reboot() 会阻塞很长时间,因此忽略关机时的ANR
          if (mShuttingDown) {
              return;
          } else if (app.notResponding) {
              return;
          } else if (app.crashing) {
              return;
          }
          //记录ANR到EventLog
          EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
                  app.processName, app.info.flags, annotation);
                  
          // 将当前进程添加到firstPids
          firstPids.add(app.pid);
          int parentPid = app.pid;
          
          //将system_server进程添加到firstPids
          if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
          
          for (int i = mLruProcesses.size() - 1; i >= 0; i--) {
              ProcessRecord r = mLruProcesses.get(i);
              if (r != null && r.thread != null) {
                  int pid = r.pid;
                  if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
                      if (r.persistent) {
                          firstPids.add(pid); //将persistent进程添加到firstPids
                      } else {
                          lastPids.put(pid, Boolean.TRUE); //其他进程添加到lastPids
                      }
                  }
              }
          }
        }
        
        // 记录ANR输出到main log
        StringBuilder info = new StringBuilder();
        info.setLength(0);
        info.append("ANR in ").append(app.processName);
        if (activity != null && activity.shortComponentName != null) {
            info.append(" (").append(activity.shortComponentName).append(")");
        }
        info.append("\n");
        info.append("PID: ").append(app.pid).append("\n");
        if (annotation != null) {
            info.append("Reason: ").append(annotation).append("\n");
        }
        if (parent != null && parent != activity) {
            info.append("Parent: ").append(parent.shortComponentName).append("\n");
        }
        
        //创建CPU tracker对象
        final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
        //输出traces信息
        File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, 
                lastPids, NATIVE_STACKS_OF_INTEREST);
                
        updateCpuStatsNow(); //第二次更新cpu统计信息
        //记录当前各个进程的CPU使用情况
        synchronized (mProcessCpuTracker) {
            cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);
        }
        //记录当前CPU负载情况
        info.append(processCpuTracker.printCurrentLoad());
        info.append(cpuInfo);
        //记录从anr时间开始的Cpu使用情况
        info.append(processCpuTracker.printCurrentState(anrTime));
        //输出当前ANR的reason,以及CPU使用率、负载信息
        Slog.e(TAG, info.toString()); 
        
        //将traces文件 和 CPU使用率信息保存到dropbox,即data/system/dropbox目录
        addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
                cpuInfo, tracesFile, null);
    
        synchronized (this) {
            ...
            //后台ANR的情况, 则直接杀掉
            if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {
                app.kill("bg anr", true);
                return;
            }
    
            //设置app的ANR状态,病查询错误报告receiver
            makeAppNotRespondingLocked(app,
                    activity != null ? activity.shortComponentName : null,
                    annotation != null ? "ANR " + annotation : "ANR",
                    info.toString());
    
            //重命名trace文件
            String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
            if (tracesPath != null && tracesPath.length() != 0) {
                //traceRenameFile = "/data/anr/traces.txt"
                File traceRenameFile = new File(tracesPath);
                String newTracesPath;
                int lpos = tracesPath.lastIndexOf (".");
                if (-1 != lpos)
                    // 新的traces文件= /data/anr/traces_进程名_当前日期.txt
                    newTracesPath = tracesPath.substring (0, lpos) + "_" + app.processName + "_" + mTraceDateFormat.format(new Date()) + tracesPath.substring (lpos);
                else
                    newTracesPath = tracesPath + "_" + app.processName;
    
                traceRenameFile.renameTo(new File(newTracesPath));
            }
                    
            //弹出ANR对话框
            Message msg = Message.obtain();
            HashMap<String, Object> map = new HashMap<String, Object>();
            msg.what = SHOW_NOT_RESPONDING_MSG;
            msg.obj = map;
            msg.arg1 = aboveSystem ? 1 : 0;
            map.put("app", app);
            if (activity != null) {
                map.put("activity", activity);
            }
            
            //向ui线程发送,内容为SHOW_NOT_RESPONDING_MSG的消息
            mUiHandler.sendMessage(msg);
        }
        
    }
    

    ANR发生后日志和log信息

    当ANR时,会按顺序依次执行:

    • 输出ANR Reason信息到EventLog. 也就是说ANR触发的时间点最接近的就是EventLog中输出的am_anr信息;
    • 收集并输出重要进程列表中的各个线程的traces信息,该方法较耗时;
    • 输出当前各个进程的CPU使用情况以及CPU负载情况;
    • 将traces文件和 CPU使用情况信息保存到dropbox,即data/system/dropbox目录
    • 根据进程类型,来决定直接后台杀掉,还是弹框告知用户.

    ANR输出重要进程的traces信息,这些重要进程包括:

    • firstPids队列:第一个是ANR进程,第二个是system_server,剩余是所有persistent进程;

    • Native队列:是指/system/bin/目录的mediaserver,sdcard 以及surfaceflinger进程;

    • lastPids队列: 是指mLruProcesses中的不属于firstPids的所有进程。

    • AMS.dumpStackTraces

    public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
        //默认为 data/anr/traces.txt
        String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
        if (tracesPath == null || tracesPath.length() == 0) {
            return null;
        }
    
        File tracesFile = new File(tracesPath);
        try {
            //当clearTraces,则删除已存在的traces文件
            if (clearTraces && tracesFile.exists()) tracesFile.delete();
            //创建traces文件
            tracesFile.createNewFile();
            FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1);
        } catch (IOException e) {
            return null;
        }
        //输出trace内容
        dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs);
        return tracesFile;
    }
    //这里会保证data/anr/traces.txt文件内容是全新的方式,而非追加。
    
    
    
    
    
    
    
    
    
    
    private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
        FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) {
            @Override
            public synchronized void onEvent(int event, String path) { notify(); }
        };
    
        try {
            observer.startWatching();
    
            //首先,获取最重要进程的stacks
            if (firstPids != null) {
                try {
                    int num = firstPids.size();
                    for (int i = 0; i < num; i++) {
                        synchronized (observer) {
                            //向目标进程发送signal来输出traces
                            Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);
                            observer.wait(200);  //等待直到写关闭,或者200ms超时
                        }
                    }
                } catch (InterruptedException e) {
                    Slog.wtf(TAG, e);
                }
            }
    
            //下一步,获取native进程的stacks
            if (nativeProcs != null) {
                int[] pids = Process.getPidsForCommands(nativeProcs);
                if (pids != null) {
                    for (int pid : pids) {
                        //输出native进程的trace【见小节4】
                        Debug.dumpNativeBacktraceToFile(pid, tracesPath);
                    }
                }
            }
    
            if (processCpuTracker != null) {
                processCpuTracker.init();
                System.gc();
                processCpuTracker.update();
                synchronized (processCpuTracker) {
                    processCpuTracker.wait(500); //等待500ms
                }
                //测量CPU使用情况
                processCpuTracker.update();
    
                //从lastPids中选取CPU使用率 top 5的进程,输出这些进程的stacks
                final int N = processCpuTracker.countWorkingStats();
                int numProcs = 0;
                for (int i=0; i<N && numProcs<5; i++) {
                    ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i);
                    if (lastPids.indexOfKey(stats.pid) >= 0) {
                        numProcs++;
                        synchronized (observer) {
                            Process.sendSignal(stats.pid, Process.SIGNAL_QUIT);
                            observer.wait(200); 
                        }
                    }
                }
            }
        } finally {
            observer.stopWatching();
        }
    }
    

    dumpStackTraces方法主要输出

    • 收集firstPids进程的stacks;
      第一个是发生ANR进程;
      第二个是system_server;
      mLruProcesses中所有的persistent进程;
    • 收集Native进程的stacks;(dumpNativeBacktraceToFile)
      依次是mediaserver,sdcard,surfaceflinger进程;
    • 收集lastPids进程的stacks;;
      依次输出CPU使用率top 5的进程;

    触发ANR时系统会输出关键信息,依次打印各个进程信息和cpu的使用情况,将会比较耗时

    1. 最接近ANR发生时间的是am_anr信息,输出到EventLog,所以查看ANR的起点应该看EventLog信息
    2. 获取重要进程trace信息,保存到/data/anr/traces.txt;(会先删除老的文件、有的会重新命名trave文件)
    • Java进程的traces;
    • Native进程的traces;
    1. ANR reason以及cpu使用情况信息输出到main log
    2. 再讲cpu使用情况和进程trace文件信息保存到/data/system/dropbox;路径下

    ANR简单案例分析

    首先在页面中制造一个简单的ANR

    anr_code.jpg

    当我们点击这个按钮的时候,就会触发ANR,然后在logcat中我们可以看到如下日志


    anr_log.png anr_logcat.png

    在traces文件中(位于data/anr)可以看到主线程的线程状态为SLEEPING


    main_thread_anr.png

    这只是一个非常简单的案例,实际问题的分析会相对复杂的多,但是通过分析log日志和trace日志,搜索一些关键字,来定位日志位置比如要排查的app进程包名,主线程名字“main”,和cpu状态等等可以获取一些有效的信息

    相关文章

      网友评论

          本文标题:Android ANR分析

          本文链接:https://www.haomeiwen.com/subject/ajiwthtx.html