Matrix是腾讯大概在2018年开源的一个APM,覆盖Android/ios/macOS三端,功能比较完善。这里简单学习下。
项目地址:https://github.com/Tencent/matrix
相关文档:https://github.com/Tencent/matrix/wiki
一、Matrix整体框架
Matrix核心功能包含三部分:I/O监控、内存泄漏监控、流畅性监控等模块,这里不一一例举。各自以插件类形式统一归Matrix管理,这种架构方式方便热插拔。
接下来挑几个比较感兴趣的Plugin展开学习。
二、TracePlugin分析
官方wiki说明:https://github.com/Tencent/matrix/wiki/Matrix-Android-TraceCanary
2.1 整体框架
image.png类功能说明:
LooperMonitor:监听Looper消息。
UIThreadMonitor:主线程监听,主要功能是针对LooperMonitor的回调进行分发。
AppMethodBeat:app相关工具类。
Tracer
- AnrTracer:Anr监控。
- FrameTracer:帧监控。
- StartupTracer:启动监控(冷启、温起)。
- EvilMethodTracer:慢方法监控。
2.2 具体功能实现分析
2.2.1 TracePlugin 主线程监控方案:
造成丢帧、卡顿的直接原因通常是,主线程执行繁重的UI绘制、大量的计算或IO等耗时操作。主线程监控方案:
- 主线程 Looper设置Printer,监控每次 dispatchMessage 的执行耗时。(使用该方案的开源项目:BlockCanary)
- 向Choreographer注册FrameCallback 监听对象,监控相邻两次 Vsync 事件通知的时间差。(使用该方案的开源项目:ArgusAPM)
先看Looper.loop()源码:
for (;;) {
Message msg = queue.next(); // might block
if (msg == null) {
// No message indicates that the message queue is quitting.
return;
}
// This must be in a local variable, in case a UI event sets the logger
final Printer logging = me.mLogging;
if (logging != null) {
logging.println(">>>>> Dispatching to " + msg.target + " " +
msg.callback + ": " + msg.what);
}
...
try {
msg.target.dispatchMessage(msg);
} finally {
if (traceTag != 0) {
Trace.traceEnd(traceTag);
}
}
if (logging != null) {
logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
}
...
}
queue中next获取到一个消息,然后在msg.target.dispatchMessage前后通过Printer打印对应信息。
核心代码:
LooperMonitor.java
private synchronized void resetPrinter() {
...
looper.setMessageLogging(printer = new LooperPrinter(originPrinter));
...
}
class LooperPrinter implements Printer {
public Printer origin;
boolean isHasChecked = false;
boolean isValid = false;
LooperPrinter(Printer printer) {
this.origin = printer;
}
@Override
public void println(String x) {
...
if (isValid) {
//x.charAt(0) == '>对应 ">>>>> Dispatching to “ 打印,满足则回调onDispatchBegin,否则则是onDispatchEnd
dispatch(x.charAt(0) == '>', x);
}
}
}
之后的tracer都是以LooperPrinter回调的onDispatchBegin和onDispatchEnd来监控每一个主线程消息任务的分发前后。
2.2.2 FrameTracer丢帧监控
LooperMonitor中向主线程Looper设置LooperPrinter,通过通过listener反馈dispatch前后的事件发生的回调给UIThreadMonitor,UIThreadMonitor又调用FrameTrace的doFrame。
核心代码:
/**
* @param focusedActivityName 当前获取焦点的activity
* @param start dispatch start
* @param end dispatch end
* @param frameCostMs 0
* @param inputCostNs
* @param animationCostNs
* @param traversalCostNs
*/
public void doFrame(String focusedActivityName, long start, long end, long frameCostMs, long inputCostNs, long animationCostNs, long traversalCostNs) {
if (isForeground()) {
notifyListener(focusedActivityName, end - start, frameCostMs, frameCostMs >= 0);
}
}
private void notifyListener(final String visibleScene, final long taskCostMs, final long frameCostMs, final boolean isContainsFrame) {
...
//一个消息dispatch的时间折算为多少帧
final int dropFrame = (int) (taskCostMs / frameIntervalMs);
...
}
}
dropFrame统计是mainLooper单个消息任务dispatch耗时折算为多少帧。这里做法些许存在质疑,感觉应该是只针对界面存在绘制的时候才做统计。
上报信息:
{
"machine":"BEST",
"cpu_app":0,
"mem":7870578688,
"mem_free":4052752,
"scene":"com.stan.matrixdemo.MainActivity",
"dropLevel":{
"DROPPED_FROZEN":1,
"DROPPED_HIGH":0,
"DROPPED_MIDDLE":0,
"DROPPED_NORMAL":0,
"DROPPED_BEST":0
},
"dropSum":{
"DROPPED_FROZEN":18,
"DROPPED_HIGH":0,
"DROPPED_MIDDLE":0,
"DROPPED_NORMAL":0,
"DROPPED_BEST":0
},
"fps":3.1645569801330566,
"dropTaskFrameSum":0
}
2.2.2 AnrTracer 的anr监控
核心代码:
AnrTracer.java
public void dispatchBegin(long beginMs, long cpuBeginMs, long token) {
super.dispatchBegin(beginMs, cpuBeginMs, token);
anrTask = new AnrHandleTask(AppMethodBeat.getInstance().maskIndex("AnrTracer#dispatchBegin"), token);
//延迟5s执行anrTask
anrHandler.postDelayed(anrTask, Constants.DEFAULT_ANR - (SystemClock.uptimeMillis() - token));
}
public void dispatchEnd(long beginMs, long cpuBeginMs, long endMs, long cpuEndMs, long token, boolean isBelongFrame) {
super.dispatchEnd(beginMs, cpuBeginMs, endMs, cpuEndMs, token, isBelongFrame);
if (null != anrTask) {
//取消anrTask
anrTask.getBeginRecord().release();
anrHandler.removeCallbacks(anrTask);
}
}
这里是对主线程消息做5s监测。
AnrHandlerTask
@Override
public void run() {
long curTime = SystemClock.uptimeMillis();
boolean isForeground = isForeground();
// process
int[] processStat = Utils.getProcessPriority(Process.myPid());
long[] data = AppMethodBeat.getInstance().copyData(beginRecord);
beginRecord.release();
String scene = AppMethodBeat.getVisibleScene();
// memory
long[] memoryInfo = dumpMemory();
// Thread state
Thread.State status = Looper.getMainLooper().getThread().getState();
StackTraceElement[] stackTrace = Looper.getMainLooper().getThread().getStackTrace();
String dumpStack = Utils.getStack(stackTrace, "|*\t\t", 12);
// frame
UIThreadMonitor monitor = UIThreadMonitor.getMonitor();
long inputCost = monitor.getQueueCost(UIThreadMonitor.CALLBACK_INPUT, token);
long animationCost = monitor.getQueueCost(UIThreadMonitor.CALLBACK_ANIMATION, token);
long traversalCost = monitor.getQueueCost(UIThreadMonitor.CALLBACK_TRAVERSAL, token);
// trace
LinkedList<MethodItem> stack = new LinkedList();
if (data.length > 0) {
TraceDataUtils.structuredDataToStack(data, stack, true, curTime);
TraceDataUtils.trimStack(stack, Constants.TARGET_EVIL_METHOD_STACK, new TraceDataUtils.IStructuredDataFilter() {
@Override
public boolean isFilter(long during, int filterCount) {
return during < filterCount * Constants.TIME_UPDATE_CYCLE_MS;
}
@Override
public int getFilterMaxCount() {
return Constants.FILTER_STACK_MAX_COUNT;
}
@Override
public void fallback(List<MethodItem> stack, int size) {
MatrixLog.w(TAG, "[fallback] size:%s targetSize:%s stack:%s", size, Constants.TARGET_EVIL_METHOD_STACK, stack);
Iterator iterator = stack.listIterator(Math.min(size, Constants.TARGET_EVIL_METHOD_STACK));
while (iterator.hasNext()) {
iterator.next();
iterator.remove();
}
}
});
}
StringBuilder reportBuilder = new StringBuilder();
StringBuilder logcatBuilder = new StringBuilder();
long stackCost = Math.max(Constants.DEFAULT_ANR, TraceDataUtils.stackToString(stack, reportBuilder, logcatBuilder));
// stackKey
...
// report
...
}
执行task实际上就是在做anr dump操作了。因为首先,这里我想说的是,首先像Input Anr是需要产生第二个事件,被第一个时间阻塞在waitqueue 5s才会造成anr,其次anr超时时间厂商是有定制的不一定就5s,因此AnrTracer实际上是检测主线程dispatcher时间超过5s的任务会更贴切点。
测试用例:
public class IssueActivity extends AppCompatActivity {
@Override
protected void onPause() {
super.onPause();
evilMethod();
}
public void evilMethod() {
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
上报信息:
{
"machine":"BEST",
"cpu_app":0,
"mem":7870578688,
"mem_free":3806516,
"detail":"ANR",
"cost":5000,
"stackKey":"",
"scene":"com.stan.matrixdemo.IssueActivity",
"stack":"",
"threadStack":" at com.stan.matrixdemo.IssueActivity:evilMethod(29) at com.stan.matrixdemo.IssueActivity:onPause(23) at android.app.Activity:performPause(8097) at android.app.Instrumentation:callActivityOnPause(1508) at android.app.ActivityThread:performPauseActivityIfNeeded(4544) at android.app.ActivityThread:performPauseActivity(4505) at android.app.ActivityThread:handlePauseActivity(4454) at android.app.servertransaction.PauseActivityItem:execute(46) at android.app.servertransaction.TransactionExecutor:executeLifecycleState(176) at android.app.servertransaction.TransactionExecutor:execute(97) at android.app.ActivityThread$H:handleMessage(2047) at android.os.Handler:dispatchMessage(107) at android.os.Looper:loop(221) at android.app.ActivityThread:main(7540) ",
"processPriority":10,
"processNice":-10,
"isProcessForeground":true,
"memory":{
"dalvik_heap":11261,
"native_heap":20621,
"vm_size":5875032
}
}
2.2.3 EvilMethodTracer 慢方法追踪方案
核心代码
EvilMethodTracer.java
public void dispatchEnd(long beginMs, long cpuBeginMs, long endMs, long cpuEndMs, long token, boolean isBelongFrame) {
super.dispatchEnd(beginMs, cpuBeginMs, endMs, cpuEndMs, token, isBelongFrame);
long start = config.isDevEnv() ? System.currentTimeMillis() : 0;
try {
//dispatchEnd - dispatchBegin时间
long dispatchCost = endMs - beginMs;
if (dispatchCost >= evilThresholdMs) {//evilThresholdMs默认阀值为700ms
long[] data = AppMethodBeat.getInstance().copyData(indexRecord);
long[] queueCosts = new long[3];
System.arraycopy(queueTypeCosts, 0, queueCosts, 0, 3);
String scene = AppMethodBeat.getVisibleScene();
MatrixHandlerThread.getDefaultHandler().post(new AnalyseTask(isForeground(), scene, data, queueCosts, cpuEndMs - cpuBeginMs, endMs - beginMs, endMs));
}
}
...
}
这里dispatchEnd - dispatchBegin时间超过阀值(默认700ms),则执行AnalyseTask dump信息。
测试用例:
public class MainActivity extends AppCompatActivity {
public void evilMethod() {
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
evilMethod();
}
}
上报信息:
{
"machine":"BEST",
"cpu_app":0,
"mem":7870578688,
"mem_free":3796656,
"detail":"NORMAL",
"cost":3296,
"usage":"8.56%",
"scene":"com.stan.matrixdemo.MainActivity",
"stack":"0,1048574,1,3293 ",
"stackKey":"1048574|"
}
从信息看,只能定位到页面,没有定位到具体耗时方法。
2.2.4 StartupTracer冷启动耗时监控
冷启动需要配合插入回调地点。举例:
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
AppMethodBeat.at(this, true);
}
}
这里就是标记结束位置。
StartupTracer.java
public void onActivityFocused(String activity) {
if (isColdStartup()) {
if (firstScreenCost == 0) {
this.firstScreenCost = uptimeMillis() - ActivityThreadHacker.getEggBrokenTime();
}
if (hasShowSplashActivity) {
coldCost = uptimeMillis() - ActivityThreadHacker.getEggBrokenTime();
} else {
if (splashActivities.contains(activity)) {
hasShowSplashActivity = true;
} else if (splashActivities.isEmpty()) {
MatrixLog.i(TAG, "default splash activity[%s]", activity);
coldCost = firstScreenCost;
} else {
MatrixLog.w(TAG, "pass this activity[%s] at duration of start up! splashActivities=%s", activity, splashActivities);
}
}
if (coldCost > 0) {
analyse(ActivityThreadHacker.getApplicationCost(), firstScreenCost, coldCost, false);
}
} else if (isWarmStartUp()) {
isWarmStartUp = false;
long warmCost = uptimeMillis() - ActivityThreadHacker.getLastLaunchActivityTime();
if (warmCost > 0) {
analyse(ActivityThreadHacker.getApplicationCost(), firstScreenCost, warmCost, true);
}
}
}
这里分了冷启动和温启动两种情况,接下来进入analyse方法进行dump
private void analyse(long applicationCost, long firstScreenCost, long allCost, boolean isWarmStartUp) {
MatrixLog.i(TAG, "[report] applicationCost:%s firstScreenCost:%s allCost:%s isWarmStartUp:%s", applicationCost, firstScreenCost, allCost, isWarmStartUp);
long[] data = new long[0];
if (!isWarmStartUp && allCost >= coldStartupThresholdMs) { // for cold startup
data = AppMethodBeat.getInstance().copyData(ActivityThreadHacker.sApplicationCreateBeginMethodIndex);
ActivityThreadHacker.sApplicationCreateBeginMethodIndex.release();
} else if (isWarmStartUp && allCost >= warmStartupThresholdMs) {
data = AppMethodBeat.getInstance().copyData(ActivityThreadHacker.sLastLaunchActivityMethodIndex);
ActivityThreadHacker.sLastLaunchActivityMethodIndex.release();
}
MatrixHandlerThread.getDefaultHandler().post(new AnalyseTask(data, applicationCost, firstScreenCost, allCost, isWarmStartUp, ActivityThreadHacker.sApplicationCreateScene));
}
然后这个AnalyseTask就是做具体dump的地方。
上报信息:
{
"machine":"BEST",
"cpu_app":0,
"mem":7870578688,
"mem_free":3801676,
"application_create":7,
"application_create_scene":159 //启动的场景 100 (activity拉起的)114(service拉起的)113 (receiver拉起的)-100 (未知,比如contentprovider)
"first_activity_create":10168,
"startup_duration":10168,启动总耗时ms
"is_warm_start_up":false
}
三、ResourcePlugin分析
官方wiki说明:https://github.com/Tencent/matrix/wiki/Matrix-Android-ResourceCanary
3.1 整体框架
类功能说明:
ActivityRefWatcher: 监控Activity onDestroy,ResourcePlugin初始化时就开启。
RetryableTask:通过通过WeakReference +ReferenceQueue的方式来判断是否可能存在内存泄漏。
AndroidHeapDumper:主线程dump hprof文件。
CanaryWorkService: 起新进程来处理dump的hprof文件,这里仅做了shrink操作。
CanaryResultService:应用程序主线程起的服务,来执行上报操作。
3.2 核心功能实现
ResourcePlugin构造方法会初始化ActivityRefWatcher,在执行start方法时会启动ActivityRefWatcher.start开始监控。
ActivityRefWatcher.java
private final Application.ActivityLifecycleCallbacks mRemovedActivityMonitor = new ActivityLifeCycleCallbacksAdapter() {
@Override
public void onActivityDestroyed(Activity activity) {
//封装DestroyedActivityInfo,并加入ConcurrentLinkedQueue<DestroyedActivityInfo>
pushDestroyedActivityInfo(activity);
/* synchronized (mDestroyedActivityInfos) {
mDestroyedActivityInfos.notifyAll();
}*/
}
};
@Override
public void start() {
stopDetect();
final Application app = mResourcePlugin.getApplication();
if (app != null) {
//application注册activity的生命周期callback
app.registerActivityLifecycleCallbacks(mRemovedActivityMonitor);
AppActiveMatrixDelegate.INSTANCE.addListener(this);
//执行RetryableTask,来分析是否存在activity内存泄漏
scheduleDetectProcedure();
MatrixLog.i(TAG, "watcher is started.");
}
}
private final RetryableTask mScanDestroyedActivitiesTask = new RetryableTask() {
@Override
public Status execute() {
...
} else if (mDumpHprofMode == ResourceConfig.DumpMode.AUTO_DUMP) {
//dump hprof
final File hprofFile = mHeapDumper.dumpHeap(true);
if (hprofFile != null) {
markPublished(destroyedActivityInfo.mActivityName);
//封装HeapDump
final HeapDump heapDump = new HeapDump(hprofFile, destroyedActivityInfo.mKey, destroyedActivityInfo.mActivityName);
//处理hprof
mHeapDumpHandler.process(heapDump);
infoIt.remove();
} else {
MatrixLog.i(TAG, "heap dump for further analyzing activity with key [%s] was failed, just ignore.",
destroyedActivityInfo.mKey);
infoIt.remove();
}
}
...
return Status.RETRY;
}
};
这里分了几种模式:DumpMode.SILENCE_DUMP、DumpMode.SILENCE_DUMP、DumpMode.AUTO_DUMP、DumpMode.MANUAL_DUMP。这里不做每一种的深入分析。以DumpMode.AUTO_DUMP为例:
先看mHeapDumper.dumpHeap(true)
AndroidHeapDumper.Java
public File dumpHeap(boolean isShowToast) {
final File hprofFile = mDumpStorageManager.newHprofFile();
...
if (isShowToast) {
...
try {
Debug.dumpHprofData(hprofFile.getAbsolutePath());
cancelToast(waitingForToast.get());
return hprofFile;
} catch (Exception e) {
MatrixLog.printErrStackTrace(TAG, e, "failed to dump heap into file: %s.", hprofFile.getAbsolutePath());
return null;
}
} else {
try {
Debug.dumpHprofData(hprofFile.getAbsolutePath());
return hprofFile;
} catch (Exception e) {
MatrixLog.printErrStackTrace(TAG, e, "failed to dump heap into file: %s.", hprofFile.getAbsolutePath());
return null;
}
}
}
通过Debug.dumpHprofData来生成本地hprof文件,且很明显就在应用本身进程中。
再看mHeapDumpHandler.process(heapDump):
它在ActivityRefWatcher构造方法中被初始化:mHeapDumpHandler = componentFactory.createHeapDumpHandler(context, config);
public static class ComponentFactory {
...
protected AndroidHeapDumper.HeapDumpHandler createHeapDumpHandler(final Context context, ResourceConfig resourceConfig) {
return new AndroidHeapDumper.HeapDumpHandler() {
@Override
public void process(HeapDump result) {
CanaryWorkerService.shrinkHprofAndReport(context, result);
}
};
}
}
这里启动了一个CanaryWorkerService来执行shrink操作。且是新启了进程。
<service
android:name=".CanaryWorkerService"
android:process=":res_can_worker"
android:permission="android.permission.BIND_JOB_SERVICE"
android:exported="false">
</service>
CanaryWorkerService.java
private void doShrinkHprofAndReport(HeapDump heapDump) {
final File hprofDir = heapDump.getHprofFile().getParentFile();
final File shrinkedHProfFile = new File(hprofDir, getShrinkHprofName(heapDump.getHprofFile()));
final File zipResFile = new File(hprofDir, getResultZipName("dump_result_" + android.os.Process.myPid()));
final File hprofFile = heapDump.getHprofFile();
ZipOutputStream zos = null;
try {
long startTime = System.currentTimeMillis();
new HprofBufferShrinker().shrink(hprofFile, shrinkedHProfFile);
MatrixLog.i(TAG, "shrink hprof file %s, size: %dk to %s, size: %dk, use time:%d",
hprofFile.getPath(), hprofFile.length() / 1024, shrinkedHProfFile.getPath(), shrinkedHProfFile.length() / 1024, (System.currentTimeMillis() - startTime));
zos = new ZipOutputStream(new BufferedOutputStream(new FileOutputStream(zipResFile)));
final ZipEntry resultInfoEntry = new ZipEntry("[result.info](http://result.info/)");
final ZipEntry shrinkedHProfEntry = new ZipEntry(shrinkedHProfFile.getName());
zos.putNextEntry(resultInfoEntry);
final PrintWriter pw = new PrintWriter(new OutputStreamWriter(zos, Charset.forName("UTF-8")));
pw.println("# Resource Canary Result Infomation. THIS FILE IS IMPORTANT FOR THE ANALYZER !!");
pw.println("sdkVersion=" + Build.VERSION.SDK_INT);
pw.println("manufacturer=" + Build.MANUFACTURER);
pw.println("hprofEntry=" + shrinkedHProfEntry.getName());
pw.println("leakedActivityKey=" + heapDump.getReferenceKey());
pw.flush();
zos.closeEntry();
zos.putNextEntry(shrinkedHProfEntry);
copyFileToStream(shrinkedHProfFile, zos);
zos.closeEntry();
shrinkedHProfFile.delete();
hprofFile.delete();
MatrixLog.i(TAG, "process hprof file use total time:%d", (System.currentTimeMillis() - startTime));
CanaryResultService.reportHprofResult(this, zipResFile.getAbsolutePath(), heapDump.getActivityName());
} catch (IOException e) {
MatrixLog.printErrStackTrace(TAG, e, "");
} finally {
closeQuietly(zos);
}
}
这里是对hprof文件进行shrik操作,然后交由CanaryResultService上报。
CanaryResultService.java
private void doReportHprofResult(String resultPath, String activityName) {
try {
final JSONObject resultJson = new JSONObject();
// resultJson = DeviceUtil.getDeviceInfo(resultJson, getApplication());
resultJson.put(SharePluginInfo.ISSUE_RESULT_PATH, resultPath);
resultJson.put(SharePluginInfo.ISSUE_ACTIVITY_NAME, activityName);
Plugin plugin = Matrix.with().getPluginByClass(ResourcePlugin.class);
if (plugin != null) {
plugin.onDetectIssue(new Issue(resultJson));
}
} catch (Throwable thr) {
MatrixLog.printErrStackTrace(TAG, thr, "unexpected exception, skip reporting.");
}
}
这里最终通过Matix初始化时设置的DefaultPluginListener的实现类调用onReportIssue方法,将issue回调出去。
这里以官方的数据展示下:
{
"resultZipPath":"/storage/emulated/0/Android/data/com.tencent.mm/cache/matrix_resource/dump_result_17400_20170713183615.zip",
"activity":"com.tencent.mm.plugin.setting.ui.setting.SettingsUI",
"tag":"memory",
"process":”com.tencent.mm"
}
总结下,ResourceCanary其实就是LeakCanaray的相同玩法,同时对dump之后的hprof文件还没做分析,直接将文件优化压缩之后上传服务端,由服务端去做hprof文件内存泄漏的分析工作。
本篇文章简单对Matrix整个框架,以及TracerPlugin和ResourcePlugin进行了简单分析,目的是想通过成熟框架的学习,借鉴到APM相关功能的实现思路。
网友评论