Watchdog的作用是监控系统服务有没有卡住,如果有,就会重启系统服务。
1. Watchdog的启动
SystemServer中创建Watchdog,并让它的run跑起来:
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
...
Watchdog.getInstance().start();
2. 监控服务
2.1 注册监控服务
系统服务首先会向Watchdog注册自己
Watchdog.getInstance().addMonitor(this);
Watchdog.getInstance().addThread(mHandler);
注册的内容会放入Watchdog:
public void addMonitor(Monitor monitor) {
// 将monitor对象添加到Monitor Checker中,
// 在Watchdog初始化时,可以看到Monitor Checker本身也是一个HandlerChecker对象
mMonitors.add(monitor);
}
public void addThread(Handler thread, long timeoutMillis) {
synchronized (this) {
if (isAlive()) {
throw new RuntimeException("Threads can't be added once the Watchdog is running");
}
final String name = thread.getLooper().getThread().getName();
// 为Handler构建一个HandlerChecker对象,其实就是**Looper Checker**
mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
}
HandlerChecker会调用Monitor检测服务状态,同时根据检测状态做下一步处理。
2.2 Watchdog的监控方式
Watchdog会在自己的run()方法中不断进行监控:
public void run() {
while (true) {
synchronized (this) {
long timeout = CHECK_INTERVAL;
// 1. 开始监控
// Make sure we (re)spin the checkers that have become idle within
// this wait-and-check interval
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();
}
// 2. 给监控线程一点时间(30s)
long start = SystemClock.uptimeMillis();
while (timeout > 0) {
...
try {
wait(timeout);
} catch (InterruptedException e) {
Log.wtf(TAG, e);
}
...
timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
}
// 3. 检查HandlerChecker的完成状态
final int waitState = evaluateCheckerCompletionLocked();
if (waitState == COMPLETED) {
...
continue;
} else if (waitState == WAITING) {
...
continue;
} else if (waitState == WAITED_HALF) {
...
continue;
}
// 4. 存在超时的HandlerChecker
blockedCheckers = getBlockedCheckersLocked();
subject = describeCheckersLocked(blockedCheckers);
allowRestart = mAllowRestart;
}
...
// 5. 保存日志,判断是否需要杀掉系统进程
Slog.w(TAG, "*** GOODBYE!");
Process.killProcess(Process.myPid());
2.3 监控服务时做了什么
HandlerChecker的scheduleCheckLocked用于发布一个HandlerChecker自己的run的执行任务:
public void scheduleCheckLocked() {
...
mStartTime = SystemClock.uptimeMillis();
mHandler.postAtFrontOfQueue(this);
}
执行到被监控的服务的monitor():
public void run() {
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
mCurrentMonitor.monitor();
}
// 这里mCompleted置true。如果死锁或者等待超时没来得及置true,会在检测时被认为服务出现了问题
synchronized (Watchdog.this) {
mCompleted = true;
mCurrentMonitor = null;
}
}
服务的monitor其实就是尝试获取一下锁:
public void monitor() {
synchronized (this) { }
}
为什么获取锁就可以监控服务状态?因为系统服务会被很多客户端调用,需要处理多线程,就必然会用到锁。如果出现死锁,或者某次调用卡住了占着锁不放,Watchdog就获取不到锁,就可以认为服务出现了异常。
2.3 检查状态
通过服务注册的Handler(存在Watchdog的mHandlerCheckers中),来判断服务的状态。主要就是看Monitor方法是否顺利完成,若没有完成就计算耗时情况。
private int evaluateCheckerCompletionLocked() {
int state = COMPLETED;
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
state = Math.max(state, hc.getCompletionStateLocked());
}
return state;
}
public int getCompletionStateLocked() {
if (mCompleted) {
return COMPLETED;
} else {
long latency = SystemClock.uptimeMillis() - mStartTime;
if (latency < mWaitMax/2) {
return WAITING;
} else if (latency < mWaitMax) {
return WAITED_HALF;
}
}
return OVERDUE;
}
2.4 异常处理
出现异常有两种处理:
1,杀异常的进程
Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: XXX
Watchdog: XXX
Watchdog: "*** GOODBYE!
2,重启
Rebooting system because:xxx
网友评论