大家在在使用一个Android应用的过程中应该都会遇到一件让人讨厌的事情,那就是用着用着弹出来像下面一个对话框,让你选择等待还是确定,选择了确定就会强制结束你正在使用的应用,而选择等待之后回到应用也很有可能还是无法操作应用,这就是常说的ANR(Application Not Responding),ANR一旦发生非常影响用户的体验。
那么ANR是怎么产生的呢?既然应用已经无响应了,那一定不是应用自己弹出这个对话框,一定是系统监控到应用一段时间无响应后弹出次对话框告知用户并等待用户的操作。那么我们就在Android源码里面搜索ANR和Dialog关键字
grep -ri ANR ./ | grep -i dialog
,然后得到了这样的结果./base/services/core/java/com/android/server/am/AppNotRespondingDialog.java: app.anrDialog = null;
,搜索结果太多,这里省略了其它结果,但是可以确定类AppNotRespondingDialog.java
就是我们看到的ANR对话框。
现在我们只要找到这个对话框在什么位置展示出来就可以找到系统捕获ANR的逻辑,尝试搜索创建AppNotRespondingDialog.java
的位置,找到了这样一个方法:
// services/core/java/com/android/server/am/ProcessRecord.java
void showAnrDialogs(AppNotRespondingDialog.Data data) {
List<Context> contexts = getDisplayContexts(isSilentAnr() /* lastUsedOnly */);
mAnrDialogs = new ArrayList<>();
for (int i = contexts.size() - 1; i >= 0; i--) {
final Context c = contexts.get(i);
mAnrDialogs.add(new AppNotRespondingDialog(mService, c, data));
}
mService.mUiHandler.post(() -> {
List<AppNotRespondingDialog> dialogs;
synchronized (mService) {
dialogs = mAnrDialogs;
}
if (dialogs != null) {
forAllDialogs(dialogs, Dialog::show);
}
});
}
这就是我们要找的代码,接着找哪里调用这个方法,用方法名继续搜索
// services/core/java/com/android/server/am/AppErrors.java
void handleShowAnrUi(Message msg) {
...
if (mService.mAtmInternal.canShowErrorDialogs() || showBackground) {
proc.getDialogController().showAnrDialogs(data);
} else {
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,
AppNotRespondingDialog.CANT_SHOW);
// Just kill the app if there is no dialog to be shown.
mService.killAppAtUsersRequest(proc);
}
...
}
接着往下找,发现是通过发送Handler消息到AMS中出发对话框弹出
// services/core/java/com/android/server/am/ActivityManagerService.java
final class UiHandler extends Handler {
...
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
...
case SHOW_NOT_RESPONDING_UI_MSG: {
mAppErrors.handleShowAnrUi(msg);
ensureBootCompleted();
} break;
}
...
}
}
接下来需要找到哪里发送了SHOW_NOT_RESPONDING_UI_MSG
消息,感觉已经接近捕获ANR的逻辑了,发送SHOW_NOT_RESPONDING_UI_MSG
的地方在类ProcessRecord.java
的void appNotResponding(...)
方法中,这个方法很长做了一些dump栈信息的操作,最后发送了对话框消息,在实际发送消息之前还做了一个判断isSilentAnr()
,含义是是否是沉默的ANR,如果是沉默的ANR将直接杀死应用进程,void appNotResponding(...)
方法节选代码如下:
void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
String parentShortComponentName, WindowProcessController parentProcess,
boolean aboveSystem, String annotation, boolean onlyDumpSelf) {
....
synchronized (mService) {
// mBatteryStatsService can be null if the AMS is constructed with injector only. This
// will only happen in tests.
if (mService.mBatteryStatsService != null) {
mService.mBatteryStatsService.noteProcessAnr(processName, uid);
}
if (isSilentAnr() && !isDebugging()) {
kill("bg anr", ApplicationExitInfo.REASON_ANR, true);
return;
}
// Set the app's notResponding state, and look up the errorReportReceiver
makeAppNotRespondingLocked(activityShortComponentName,
annotation != null ? "ANR " + annotation : "ANR", info.toString());
// mUiHandler can be null if the AMS is constructed with injector only. This will only
// happen in tests.
if (mService.mUiHandler != null) {
// Bring up the infamous App Not Responding dialog
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem);
mService.mUiHandler.sendMessage(msg);
}
}
找到这里还是没有发现任何关于什么情况下会触发ANR的逻辑,接着看哪里调用了void appNotResponding(...)
,这一次搜索下来发现调用的地方太多了,说明我们已经找到触发ANR的逻辑,需要查看每一个调用的地方,刨去间接引用之后有下面几个地方的触发逻辑:
- ContentProvider
ContentResolver暴露了一个接口appNotRespondingViaProvider
,这个方法的实现没有找到在哪里,但是根据方法名称可以判断是触发ANR的,这个方法的调用是在ContentProviderClient.java
中
private class NotRespondingRunnable implements Runnable {
@Override
public void run() {
Log.w(TAG, "Detected provider not responding: " + mContentProvider);
mContentResolver.appNotRespondingViaProvider(mContentProvider);
}
}
private void beforeRemote() {
if (mAnrRunnable != null) {
sAnrHandler.postDelayed(mAnrRunnable, mAnrTimeout);
}
}
/** See {@link ContentProvider#query ContentProvider.query} */
@Override
public @Nullable Cursor query(@NonNull Uri uri, @Nullable String[] projection,
Bundle queryArgs, @Nullable CancellationSignal cancellationSignal)
throws RemoteException {
Objects.requireNonNull(uri, "url");
beforeRemote();
....
}
/** See {@link ContentProvider#getType ContentProvider.getType} */
@Override
public @Nullable String getType(@NonNull Uri url) throws RemoteException {
Objects.requireNonNull(url, "url");
beforeRemote();
...
}
...
可以看到在调用ContentProvider的query、getType等方法时会首先发起一个延时任务,任务等内容就是触发ANR。
- Input Dispatching Timed Out
在AMS当中有这样一个方法
/**
* Handle input dispatching timeouts.
* @return whether input dispatching should be aborted or not.
*/
boolean inputDispatchingTimedOut(ProcessRecord proc, String activityShortComponentName,
ApplicationInfo aInfo, String parentShortComponentName,
WindowProcessController parentProcess, boolean aboveSystem, String reason) {
....
if (proc != null) {
synchronized (this) {
if (proc.isDebugging()) {
return false;
}
if (proc.getActiveInstrumentation() != null) {
Bundle info = new Bundle();
info.putString("shortMsg", "keyDispatchingTimedOut");
info.putString("longMsg", annotation);
finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info);
return true;
}
}
mAnrHelper.appNotResponding(proc, activityShortComponentName, aInfo,
parentShortComponentName, parentProcess, aboveSystem, annotation);
}
return true;
}
一路搜索下去最终找到下面这样一段代码,其中onAnrLocked(mAwaitedFocusedApplication);
调用的就是我们刚才找到的上面的方法:
// Default input dispatching timeout if there is no focused application or paused window
// from which to determine an appropriate dispatching timeout.
constexpr std::chrono::nanoseconds DEFAULT_INPUT_DISPATCHING_TIMEOUT = 5s;
/**
* Check if any of the connections' wait queues have events that are too old.
* If we waited for events to be ack'ed for more than the window timeout, raise an ANR.
* Return the time at which we should wake up next.
*/
nsecs_t InputDispatcher::processAnrsLocked() {
const nsecs_t currentTime = now();
nsecs_t nextAnrCheck = LONG_LONG_MAX;
// Check if we are waiting for a focused window to appear. Raise ANR if waited too long
if (mNoFocusedWindowTimeoutTime.has_value() && mAwaitedFocusedApplication != nullptr) {
if (currentTime >= *mNoFocusedWindowTimeoutTime) {
onAnrLocked(mAwaitedFocusedApplication);
mAwaitedFocusedApplication.clear();
return LONG_LONG_MIN;
} else {
// Keep waiting
const nsecs_t millisRemaining = ns2ms(*mNoFocusedWindowTimeoutTime - currentTime);
ALOGW("Still no focused window. Will drop the event in %" PRId64 "ms", millisRemaining);
nextAnrCheck = *mNoFocusedWindowTimeoutTime;
}
}
// Check if any connection ANRs are due
nextAnrCheck = std::min(nextAnrCheck, mAnrTracker.firstTimeout());
if (currentTime < nextAnrCheck) { // most likely scenario
return nextAnrCheck; // everything is normal. Let's check again at nextAnrCheck
}
// If we reached here, we have an unresponsive connection.
sp<Connection> connection = getConnectionLocked(mAnrTracker.firstToken());
if (connection == nullptr) {
ALOGE("Could not find connection for entry %" PRId64, mAnrTracker.firstTimeout());
return nextAnrCheck;
}
connection->responsive = false;
// Stop waking up for this unresponsive connection
mAnrTracker.eraseToken(connection->inputChannel->getConnectionToken());
onAnrLocked(connection);
return LONG_LONG_MIN;
}
也就是说Dispatch一个按键时间后如果5s中没有回应就触发相应应用进程的ANR。
- BroadCast Timed Out
在类BroadcastQueue.java
中维护这广播队列以及对广播的分发处理,检查广播的处理状态,下面的方法就是不断的检查当前处理中的广播的状态:
final void broadcastTimeoutLocked(boolean fromMsg) {
...
BroadcastRecord r = mDispatcher.getActiveBroadcastLocked();
if (fromMsg) {
if (!mService.mProcessesReady) {
// Only process broadcast timeouts if the system is ready; some early
// broadcasts do heavy work setting up system facilities
return;
}
// If the broadcast is generally exempt from timeout tracking, we're done
if (r.timeoutExempt) {
if (DEBUG_BROADCAST) {
Slog.i(TAG_BROADCAST, "Broadcast timeout but it's exempt: "
+ r.intent.getAction());
}
return;
}
long timeoutTime = r.receiverTime + mConstants.TIMEOUT;
if (timeoutTime > now) {
// We can observe premature timeouts because we do not cancel and reset the
// broadcast timeout message after each receiver finishes. Instead, we set up
// an initial timeout then kick it down the road a little further as needed
// when it expires.
if (DEBUG_BROADCAST) Slog.v(TAG_BROADCAST,
"Premature timeout ["
+ mQueueName + "] @ " + now + ": resetting BROADCAST_TIMEOUT_MSG for "
+ timeoutTime);
setBroadcastTimeoutLocked(timeoutTime);
return;
}
}
...
if (!debugging && anrMessage != null) {
mService.mAnrHelper.appNotResponding(app, anrMessage);
}
}
把一个广发交给接收者时调用这个方法检查是否处理超时,如果超时则触发接收应用进程ANR,超时时间是10秒钟。
- Service Time Out
// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
...
void serviceTimeout(ProcessRecord proc) {
String anrMessage = null;
synchronized(mAm) {
if (proc.isDebugging()) {
// The app's being debugged, ignore timeout.
return;
}
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
final long now = SystemClock.uptimeMillis();
final long maxTime = now -
(proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
ServiceRecord timeout = null;
long nextTime = 0;
for (int i=proc.executingServices.size()-1; i>=0; i--) {
ServiceRecord sr = proc.executingServices.valueAt(i);
if (sr.executingStart < maxTime) {
timeout = sr;
break;
}
if (sr.executingStart > nextTime) {
nextTime = sr.executingStart;
}
}
if (timeout != null && mAm.mProcessList.mLruProcesses.contains(proc)) {
Slog.w(TAG, "Timeout executing service: " + timeout);
StringWriter sw = new StringWriter();
PrintWriter pw = new FastPrintWriter(sw, false, 1024);
pw.println(timeout);
timeout.dump(pw, " ");
pw.close();
mLastAnrDump = sw.toString();
mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
anrMessage = "executing service " + timeout.shortInstanceName;
} else {
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg
? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
}
}
if (anrMessage != null) {
mAm.mAnrHelper.appNotResponding(proc, anrMessage);
}
}
这个方法的大致意思是比较Service生命周期函数开始执行时间与当前时间的差是否超时,对于前台服务是20s,后台服务则是20s的十倍,也就是200s,如果超过这个时间则触发ANR。
根据对源码的搜索情况看触发ANR的逻辑有这四种情况:
- ContentProvider
- Input Dispatching Timed Out
- BroadCast Timed Out
- Service Time Out
所以我们写代码时在上面四种执行逻辑里面一定不要执行耗时操作。
网友评论