[toc]
ANR分析
ANR触发场景
在android系统中,特定的操作需要在一定时间内完成,超过限定的时间就会触发ANR。
组件 | 时长 |
---|---|
Service | 前台服务在20s内,后台服务200s内 未执行完成 |
Content Provider | 内容提供者,在publish过超时10s; |
Broadcast | 前台广播在10s,后台广播60s内未执行完成 |
Input Dispatching | 输入事件分发超时5s,包括按键和触摸事件。 |
Service
service timeout 是位于 ActivityManager 线程中的 AMS.MainHandler收到SERVICE_TIMEOUT_MSG触发的,触发时长上表中可以查看,区分前后台是通过变量ProcessRecord.
execServicesFg
注册ANR
在Service启动流程中,当service attach到system_server进程的过程中会调用realStartServiceLock(),方法进行ANR的注册
- ActivityService.java realStartServiceLocked
private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException {
...
//发送delay消息(SERVICE_TIMEOUT_MSG)
bumpServiceExecutingLocked(r, execInFg, "create");
try {
...
//最终执行服务的onCreate()方法
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
app.repProcState);
} catch (DeadObjectException e) {
mAm.appDiedLocked(app);
throw e;
} finally {
...
}
}
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
...
scheduleServiceTimeoutLocked(r.app);
}
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
long now = SystemClock.uptimeMillis();
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
//当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程
mAm.mHandler.sendMessageAtTime(msg,
proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT));
}
由此可见,是通过发送delay的消息进行ANR的注册,当超过时长(前台20s,后台200s就会触发ANR)
解除注册
在目标进程的线程中会进行撤销发送上文中提到的SERVICE_TIMEOUT_MSG
- ActivityThread.java
private void handleCreateService(CreateServiceData data) {
...
java.lang.ClassLoader cl = packageInfo.getClassLoader();
Service service = (Service) cl.loadClass(data.info.name).newInstance();
...
try {
//创建ContextImpl对象
ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
context.setOuterContext(service);
//创建Application对象
Application app = packageInfo.makeApplication(false, mInstrumentation);
service.attach(context, this, data.info.name, data.token, app,
ActivityManagerNative.getDefault());
//调用服务onCreate()方法
service.onCreate();
//解除SERVICE_TIMEOUT_MSG
ActivityManagerNative.getDefault().serviceDoneExecuting(
data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
} catch (Exception e) {
...
}
}
该过程会常见服务对象,并调用服务的onCreate方法,然后会通过多次调用回到system_server来执行serviceDoneExecuting
- AS.serviceDoneExecutingLocked
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) {
...
if (r.executeNesting <= 0) {
if (r.app != null) {
r.app.execServicesFg = false;
r.app.executingServices.remove(r);
if (r.app.executingServices.size() == 0) {
//当前服务所在进程中没有正在执行的service
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
...
}
...
}
当service启动完成,则移除延时消息SERVICE_TIMEOUT_MSG
触发ANR
如果没有在规定的时间内解除延时消息,那么则就会触发ANR。
在system_server进程中有一个Handler线程,叫做“ActivityManager”.当倒计时结束便会向该Handler线程发送一条SERVICE_TIMEOUT_MSG.
- ActivityManagerService.java ::MainHandler
final class MainHandler extends Handler {
public void handleMessage(Message msg) {
switch (msg.what) {
case SERVICE_TIMEOUT_MSG: {
...
mServices.serviceTimeout((ProcessRecord)msg.obj);
} break;
...
}
...
}
}
void serviceTimeout(ProcessRecord proc) {
String anrMessage = null;
synchronized(mAm) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
final long now = SystemClock.uptimeMillis();
final long maxTime = now -
(proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
ServiceRecord timeout = null;
long nextTime = 0;
for (int i=proc.executingServices.size()-1; i>=0; i--) {
ServiceRecord sr = proc.executingServices.valueAt(i);
if (sr.executingStart < maxTime) {
timeout = sr;
break;
}
if (sr.executingStart > nextTime) {
nextTime = sr.executingStart;
}
}
if (timeout != null && mAm.mLruProcesses.contains(proc)) {
Slog.w(TAG, "Timeout executing service: " + timeout);
StringWriter sw = new StringWriter();
PrintWriter pw = new FastPrintWriter(sw, false, 1024);
pw.println(timeout);
timeout.dump(pw, " ");
pw.close();
mLastAnrDump = sw.toString();
mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
anrMessage = "executing service " + timeout.shortName;
}
}
if (anrMessage != null) {
//当存在timeout的service,则执行appNotResponding
mAm.appNotResponding(proc, null, null, false, anrMessage);
}
}
BroadcastReceiver
当ActivityManager线程中的BroadcastQueue.BroadcastHandler收到BROADCAST_TIMEOUT_MSG消息时就会触发ANR
处理广播中的anr消息
在广播的启动流程中,通过调用processNextBroadcast来处理广播,其流程为
- 并行广播
- 当前有序广播
- 有序广播
final void processNextBroadcast(boolean fromMsg) {
synchronized(mService) {
...
//part 2: 处理当前有序广播
do {
r = mOrderedBroadcasts.get(0);
//获取所有该广播所有的接收者
int numReceivers = (r.receivers != null) ? r.receivers.size() : 0;
if (mService.mProcessesReady && r.dispatchTime > 0) {
long now = SystemClock.uptimeMillis();
if ((numReceivers > 0) &&
(now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) {
//当广播处理时间超时,则强制结束这条广播
broadcastTimeoutLocked(false);
...
}
}
if (r.receivers == null || r.nextReceiver >= numReceivers
|| r.resultAbort || forceReceive) {
if (r.resultTo != null) {
//处理广播消息消息
performReceiveLocked(r.callerApp, r.resultTo,
new Intent(r.intent), r.resultCode,
r.resultData, r.resultExtras, false, false, r.userId);
r.resultTo = null;
}
//取消广播超时ANR
cancelBroadcastTimeoutLocked();
}
} while (r == null);
...
//part 3: 获取下条有序广播
r.receiverTime = SystemClock.uptimeMillis();
if (!mPendingBroadcastTimeoutMessage) {
long timeoutTime = r.receiverTime + mTimeoutPeriod;
//设置广播超时anr
setBroadcastTimeoutLocked(timeoutTime);
}
...
}
}
对于广播超时处理时机
-
首先在part3过程中setBroadcastTimeoutLocked(timeoutTime) 设置超时广播消息;
-
然后在part2根据广播处理情况来处理:
-
当广播接收者等待时间过长,则调用 broadcastTimeoutLocked(false);
-
当执行完广播,则调用cancelBroadcastTimeoutLocked;
-
-
setBroadcastTimeoutLocked
final void setBroadcastTimeoutLocked(long timeoutTime) {
if (! mPendingBroadcastTimeoutMessage) {
Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
mHandler.sendMessageAtTime(msg, timeoutTime);
mPendingBroadcastTimeoutMessage = true;
}
}
取消设置广播的超时anr和service类似,但是通过静态注册的广播超时受SharePreference(SP)影响
相关代码如下,只有xml静态注册的光爆超时检测过程会考虑是否有SP尚未完成,动态广播并不受其影响
public final void finish() {
if (mType == TYPE_COMPONENT) {
final IActivityManager mgr = ActivityManager.getService();
if (QueuedWork.hasPendingWork()) {
//当SP有未同步到磁盘的工作,则需等待其完成,才告知系统已完成该广播
QueuedWork.queue(new Runnable() {
public void run() {
sendFinished(mgr);
}
}, false);
} else {
sendFinished(mgr);
}
} else if (mOrderedHint && mType != TYPE_UNREGISTERED) {
final IActivityManager mgr = ActivityManager.getService();
sendFinished(mgr);
}
}
final void cancelBroadcastTimeoutLocked() {
if (mPendingBroadcastTimeoutMessage) {
mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
mPendingBroadcastTimeoutMessage = false;
}
}
触发anr
- BroadcastQueue.java ::BroadcastHandler
private final class BroadcastHandler extends Handler {
public void handleMessage(Message msg) {
switch (msg.what) {
case BROADCAST_TIMEOUT_MSG: {
synchronized (mService) {
broadcastTimeoutLocked(true);
}
} break;
...
}
...
}
}
//fromMsg = true
final void broadcastTimeoutLocked(boolean fromMsg) {
if (fromMsg) {
mPendingBroadcastTimeoutMessage = false;
}
if (mOrderedBroadcasts.size() == 0) {
return;
}
long now = SystemClock.uptimeMillis();
BroadcastRecord r = mOrderedBroadcasts.get(0);
if (fromMsg) {
if (mService.mDidDexOpt) {
mService.mDidDexOpt = false;
long timeoutTime = SystemClock.uptimeMillis() + mTimeoutPeriod;
setBroadcastTimeoutLocked(timeoutTime);
return;
}
if (!mService.mProcessesReady) {
return; //当系统还没有准备就绪时,广播处理流程中不存在广播超时
}
long timeoutTime = r.receiverTime + mTimeoutPeriod;
if (timeoutTime > now) {
//如果当前正在执行的receiver没有超时,则重新设置广播超时
setBroadcastTimeoutLocked(timeoutTime);
return;
}
}
BroadcastRecord br = mOrderedBroadcasts.get(0);
if (br.state == BroadcastRecord.WAITING_SERVICES) {
//广播已经处理完成,但需要等待已启动service执行完成。当等待足够时间,则处理下一条广播。
br.curComponent = null;
br.state = BroadcastRecord.IDLE;
processNextBroadcast(false);
return;
}
r.receiverTime = now;
//当前BroadcastRecord的anr次数执行加1操作
r.anrCount++;
if (r.nextReceiver <= 0) {
return;
}
...
Object curReceiver = r.receivers.get(r.nextReceiver-1);
//查询App进程
if (curReceiver instanceof BroadcastFilter) {
BroadcastFilter bf = (BroadcastFilter)curReceiver;
if (bf.receiverList.pid != 0
&& bf.receiverList.pid != ActivityManagerService.MY_PID) {
synchronized (mService.mPidsSelfLocked) {
app = mService.mPidsSelfLocked.get(
bf.receiverList.pid);
}
}
} else {
app = r.curApp;
}
if (app != null) {
anrMessage = "Broadcast of " + r.intent.toString();
}
if (mPendingBroadcast == r) {
mPendingBroadcast = null;
}
//继续移动到下一个广播接收者
finishReceiverLocked(r, r.resultCode, r.resultData,
r.resultExtras, r.resultAbort, false);
scheduleBroadcastsLocked();
if (anrMessage != null) {
// BroadcastQueue.java AppNotResponding
mHandler.post(new AppNotResponding(app, anrMessage));
}
}
private final class AppNotResponding implements Runnable {
...
public void run() {
// 进入ANR处理流程
mService.appNotResponding(mApp, null, null, false, mAnnotation);
}
}
- mOrderedBroadcasts已处理完成,则不会anr;
- 在执行dexopt,则不会anr;
- 系统还没有进入ready状态(mProcessesReady=false),则不会anr;
- 如果当前正在执行的receiver没有超时,则重新设置广播超时,不会anr;
ContnentProvider
ContentProvider Timeout是位于”ActivityManager”线程中的AMS.MainHandler收到CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息时触发。
设置ANR超时
ContentProvider 超时为CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10s. 这个跟前面的Service和BroadcastQueue完全不同, 由Provider进程启动过程相关.当进程创建后悔调用attachApplicationLocked()进入system_server进程.
private final boolean attachApplicationLocked(IApplicationThread thread, int pid) {
ProcessRecord app;
if (pid != MY_PID && pid >= 0) {
synchronized (mPidsSelfLocked) {
app = mPidsSelfLocked.get(pid); // 根据pid获取ProcessRecord
}
}
...
//系统处于ready状态或者该app为FLAG_PERSISTENT进程则为true
boolean normalMode = mProcessesReady || isAllowedWhileBooting(app.info);
List<ProviderInfo> providers = normalMode ? generateApplicationProvidersLocked(app) : null;
//app进程存在正在启动中的provider,则超时10s后发送CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息
if (providers != null && checkAppInLaunchingProvidersLocked(app)) {
Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG);
msg.obj = app;
mHandler.sendMessageDelayed(msg, CONTENT_PROVIDER_PUBLISH_TIMEOUT);
}
thread.bindApplication(...);
...
}
移除超时消息
当provider成功publish之后,移除延时的timeout消息
- AMS.publishContentProviders
public final void publishContentProviders(IApplicationThread caller, List<ContentProviderHolder> providers) {
...
synchronized (this) {
final ProcessRecord r = getRecordForAppLocked(caller);
final int N = providers.size();
for (int i = 0; i < N; i++) {
ContentProviderHolder src = providers.get(i);
...
ContentProviderRecord dst = r.pubProviders.get(src.info.name);
if (dst != null) {
ComponentName comp = new ComponentName(dst.info.packageName, dst.info.name);
mProviderMap.putProviderByClass(comp, dst); //将该provider添加到mProviderMap
String names[] = dst.info.authority.split(";");
for (int j = 0; j < names.length; j++) {
mProviderMap.putProviderByName(names[j], dst);
}
int launchingCount = mLaunchingProviders.size();
int j;
boolean wasInLaunchingProviders = false;
for (j = 0; j < launchingCount; j++) {
if (mLaunchingProviders.get(j) == dst) {
//将该provider移除mLaunchingProviders队列
mLaunchingProviders.remove(j);
wasInLaunchingProviders = true;
j--;
launchingCount--;
}
}
//成功pubish则移除该消息
if (wasInLaunchingProviders) {
mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r);
}
synchronized (dst) {
dst.provider = src.provider;
dst.proc = r;
//唤醒客户端的wait等待方法
dst.notifyAll();
}
...
}
}
}
}
触发超时
在system_server中有一个handler线程叫做“ActivityManager”。当倒计时结束便会向该handler线程发送一条信息 CONTENT_PROVIDER_PUBLISH_TIMNEOUT_MSG
- ActivityManagerService.java -MainHandler.handleMessage
final class MainHandler extends Handler {
public void handleMessage(Message msg) {
switch (msg.what) {
case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: {
...
ProcessRecord app = (ProcessRecord)msg.obj;
synchronized (ActivityManagerService.this) {
processContentProviderPublishTimedOutLocked(app);
}
} break;
...
}
...
}
}
private final void processContentProviderPublishTimedOutLocked(ProcessRecord app) {
cleanupAppInLaunchingProvidersLocked(app, true);
removeProcessLocked(app, false, true, "timeout publishing content providers");
}
boolean cleanupAppInLaunchingProvidersLocked(ProcessRecord app, boolean alwaysBad) {
boolean restart = false;
for (int i = mLaunchingProviders.size() - 1; i >= 0; i--) {
ContentProviderRecord cpr = mLaunchingProviders.get(i);
if (cpr.launchingApp == app) {
if (!alwaysBad && !app.bad && cpr.hasConnectionOrHandle()) {
restart = true;
} else {
//移除死亡的provider
removeDyingProviderLocked(app, cpr, true);
}
}
}
return restart;
}
- 对于stable类型的provider(即conn.stableCount > 0),则会杀掉所有跟该provider建立stable连接的非persistent进程.
- 对于unstable类的provider(即conn.unstableCount > 0),并不会导致client进程被级联所杀.
总结
超时检测
Service超时检测机制:
-
超过一定时间没有执行完相应操作来触发移除延时消息,则会触发anr;
BroadcastReceiver超时检测机制: -
有序广播的总执行时间超过 2* receiver个数 * timeout时长,则会触发anr;
-
有序广播的某一个receiver执行过程超过 timeout时长,则会触发anr;
另外: -
对于Service, Broadcast, Input发生ANR之后,最终都会调用AMS.appNotResponding;
-
对于provider,在其进程启动时publish过程可能会出现ANR, 则会直接杀进程以及清理相应信息,而不会弹出ANR的对话框
-
对于input anr 可通过adb shell dumpsys input来查看手机当前的input状态, 输出内容分别为EventHub.dump(), InputReader.dump(),InputDispatcher.dump()这3类,另外如果发生过input ANR,那么也会输出上一个ANR的状态.
当ANR出现时,无论是四大组件还是进程等,都是调用到AMS.appNotResponding()方法,provider除外
- AMS.appNotResponding
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) {
...
updateCpuStatsNow(); //第一次 更新cpu统计信息
synchronized (this) {
//PowerManager.reboot() 会阻塞很长时间,因此忽略关机时的ANR
if (mShuttingDown) {
return;
} else if (app.notResponding) {
return;
} else if (app.crashing) {
return;
}
//记录ANR到EventLog
EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
app.processName, app.info.flags, annotation);
// 将当前进程添加到firstPids
firstPids.add(app.pid);
int parentPid = app.pid;
//将system_server进程添加到firstPids
if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
for (int i = mLruProcesses.size() - 1; i >= 0; i--) {
ProcessRecord r = mLruProcesses.get(i);
if (r != null && r.thread != null) {
int pid = r.pid;
if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
if (r.persistent) {
firstPids.add(pid); //将persistent进程添加到firstPids
} else {
lastPids.put(pid, Boolean.TRUE); //其他进程添加到lastPids
}
}
}
}
}
// 记录ANR输出到main log
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(app.processName);
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(app.pid).append("\n");
if (annotation != null) {
info.append("Reason: ").append(annotation).append("\n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("\n");
}
//创建CPU tracker对象
final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
//输出traces信息
File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker,
lastPids, NATIVE_STACKS_OF_INTEREST);
updateCpuStatsNow(); //第二次更新cpu统计信息
//记录当前各个进程的CPU使用情况
synchronized (mProcessCpuTracker) {
cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);
}
//记录当前CPU负载情况
info.append(processCpuTracker.printCurrentLoad());
info.append(cpuInfo);
//记录从anr时间开始的Cpu使用情况
info.append(processCpuTracker.printCurrentState(anrTime));
//输出当前ANR的reason,以及CPU使用率、负载信息
Slog.e(TAG, info.toString());
//将traces文件 和 CPU使用率信息保存到dropbox,即data/system/dropbox目录
addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
synchronized (this) {
...
//后台ANR的情况, 则直接杀掉
if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {
app.kill("bg anr", true);
return;
}
//设置app的ANR状态,病查询错误报告receiver
makeAppNotRespondingLocked(app,
activity != null ? activity.shortComponentName : null,
annotation != null ? "ANR " + annotation : "ANR",
info.toString());
//重命名trace文件
String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
if (tracesPath != null && tracesPath.length() != 0) {
//traceRenameFile = "/data/anr/traces.txt"
File traceRenameFile = new File(tracesPath);
String newTracesPath;
int lpos = tracesPath.lastIndexOf (".");
if (-1 != lpos)
// 新的traces文件= /data/anr/traces_进程名_当前日期.txt
newTracesPath = tracesPath.substring (0, lpos) + "_" + app.processName + "_" + mTraceDateFormat.format(new Date()) + tracesPath.substring (lpos);
else
newTracesPath = tracesPath + "_" + app.processName;
traceRenameFile.renameTo(new File(newTracesPath));
}
//弹出ANR对话框
Message msg = Message.obtain();
HashMap<String, Object> map = new HashMap<String, Object>();
msg.what = SHOW_NOT_RESPONDING_MSG;
msg.obj = map;
msg.arg1 = aboveSystem ? 1 : 0;
map.put("app", app);
if (activity != null) {
map.put("activity", activity);
}
//向ui线程发送,内容为SHOW_NOT_RESPONDING_MSG的消息
mUiHandler.sendMessage(msg);
}
}
ANR发生后日志和log信息
当ANR时,会按顺序依次执行:
- 输出ANR Reason信息到EventLog. 也就是说ANR触发的时间点最接近的就是EventLog中输出的am_anr信息;
- 收集并输出重要进程列表中的各个线程的traces信息,该方法较耗时;
- 输出当前各个进程的CPU使用情况以及CPU负载情况;
- 将traces文件和 CPU使用情况信息保存到dropbox,即data/system/dropbox目录
- 根据进程类型,来决定直接后台杀掉,还是弹框告知用户.
ANR输出重要进程的traces信息,这些重要进程包括:
-
firstPids队列:第一个是ANR进程,第二个是system_server,剩余是所有persistent进程;
-
Native队列:是指/system/bin/目录的mediaserver,sdcard 以及surfaceflinger进程;
-
lastPids队列: 是指mLruProcesses中的不属于firstPids的所有进程。
-
AMS.dumpStackTraces
public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
//默认为 data/anr/traces.txt
String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
if (tracesPath == null || tracesPath.length() == 0) {
return null;
}
File tracesFile = new File(tracesPath);
try {
//当clearTraces,则删除已存在的traces文件
if (clearTraces && tracesFile.exists()) tracesFile.delete();
//创建traces文件
tracesFile.createNewFile();
FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1);
} catch (IOException e) {
return null;
}
//输出trace内容
dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs);
return tracesFile;
}
//这里会保证data/anr/traces.txt文件内容是全新的方式,而非追加。
private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) {
@Override
public synchronized void onEvent(int event, String path) { notify(); }
};
try {
observer.startWatching();
//首先,获取最重要进程的stacks
if (firstPids != null) {
try {
int num = firstPids.size();
for (int i = 0; i < num; i++) {
synchronized (observer) {
//向目标进程发送signal来输出traces
Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);
observer.wait(200); //等待直到写关闭,或者200ms超时
}
}
} catch (InterruptedException e) {
Slog.wtf(TAG, e);
}
}
//下一步,获取native进程的stacks
if (nativeProcs != null) {
int[] pids = Process.getPidsForCommands(nativeProcs);
if (pids != null) {
for (int pid : pids) {
//输出native进程的trace【见小节4】
Debug.dumpNativeBacktraceToFile(pid, tracesPath);
}
}
}
if (processCpuTracker != null) {
processCpuTracker.init();
System.gc();
processCpuTracker.update();
synchronized (processCpuTracker) {
processCpuTracker.wait(500); //等待500ms
}
//测量CPU使用情况
processCpuTracker.update();
//从lastPids中选取CPU使用率 top 5的进程,输出这些进程的stacks
final int N = processCpuTracker.countWorkingStats();
int numProcs = 0;
for (int i=0; i<N && numProcs<5; i++) {
ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i);
if (lastPids.indexOfKey(stats.pid) >= 0) {
numProcs++;
synchronized (observer) {
Process.sendSignal(stats.pid, Process.SIGNAL_QUIT);
observer.wait(200);
}
}
}
}
} finally {
observer.stopWatching();
}
}
dumpStackTraces方法主要输出
- 收集firstPids进程的stacks;
第一个是发生ANR进程;
第二个是system_server;
mLruProcesses中所有的persistent进程; - 收集Native进程的stacks;(dumpNativeBacktraceToFile)
依次是mediaserver,sdcard,surfaceflinger进程; - 收集lastPids进程的stacks;;
依次输出CPU使用率top 5的进程;
触发ANR时系统会输出关键信息,依次打印各个进程信息和cpu的使用情况,将会比较耗时
- 最接近ANR发生时间的是am_anr信息,输出到EventLog,所以查看ANR的起点应该看EventLog信息
- 获取重要进程trace信息,保存到/data/anr/traces.txt;(会先删除老的文件、有的会重新命名trave文件)
- Java进程的traces;
- Native进程的traces;
- ANR reason以及cpu使用情况信息输出到main log
- 再讲cpu使用情况和进程trace文件信息保存到/data/system/dropbox;路径下
ANR简单案例分析
首先在页面中制造一个简单的ANR
anr_code.jpg当我们点击这个按钮的时候,就会触发ANR,然后在logcat中我们可以看到如下日志
anr_log.png anr_logcat.png
在traces文件中(位于data/anr)可以看到主线程的线程状态为SLEEPING
main_thread_anr.png
这只是一个非常简单的案例,实际问题的分析会相对复杂的多,但是通过分析log日志和trace日志,搜索一些关键字,来定位日志位置比如要排查的app进程包名,主线程名字“main”,和cpu状态等等可以获取一些有效的信息
网友评论