美文网首页
ANR问题分析

ANR问题分析

作者: CircleLee | 来源:发表于2019-03-06 17:16 被阅读0次

    一、什么是ANR

    ANR:Application Not Responding,即应用无响应

    ANR一般有三种类型
    1. KeyDispatchTimeout(5 seconds) --主要类型按键或触摸事件在特定时间内无响应
    2. BroadcastTimeout(10 seconds) --BroadcastReceiver在特定时间内无法处理完成
    3. ServiceTimeout(20 seconds) --小概率类型 Service在特定的时间内无法处理完成

    二、如何分析

    1. 首先查找 main.log,MTK平台通过SYS_ANDROID_EVENT_LOG来进行分析或者直接看_exp_main.txt
     04-02 11:52:15.355   862  1517 I watchdog: Blocked in handler on main thread (main)
    

    (从上面的log看,是wachtdog出现问题,是什么原因导致watchdog出现问题,要从其他的方面进行分析)

    1. 然后就是查看trace文件,看主进程文件异常。
    kernel: (couldn't read /proc/self/task/862/stack)
    native: #00 pc 0005fc88  /system/lib64/libc.so (__ioctl+4)
    native: #01 pc 00085a98  /system/lib64/libc.so (ioctl+100)
    native: #02 pc 0002a8b4  /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+164)
    native: #03 pc 0002b3cc  /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+112)
    native: #04 pc 0002b640  /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+176)
    native: #05 pc 000235bc  /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+108)
    native: #06 pc 000e7228  /system/lib64/libandroid_runtime.so (???)
    native: #07 pc 00f0d2a0  /data/dalvik-cache/arm64/system@framework@boot.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+212)
    at android.os.BinderProxy.transactNative(Native method)
    at android.os.BinderProxy.transact(Binder.java:504)
    at com.android.internal.telephony.ISub$Stub$Proxy.getDefaultDataSubId(ISub.java:1078)
    at android.telephony.SubscriptionManager.getDefaultDataSubId(SubscriptionManager.java:1119)
    at android.telephony.TelephonyManager.getSimOperatorNumeric(TelephonyManager.java:1745)
    at android.telephony.TelephonyManager.getSimOperator(TelephonyManager.java:1717)
    at com.android.server.location.GpsLocationProvider.subscriptionOrSimChanged(GpsLocationProvider.java:567)
    at com.android.server.location.GpsLocationProvider.access$800(GpsLocationProvider.java:124)
    at com.android.server.location.GpsLocationProvider$3.onSubscriptionsChanged(GpsLocationProvider.java:559)
    

    从上面卡住的CallStack可以看出是SystemServer binder到com.android.phone 进程卡住
    然后查看binder调用信息,MTK可以从文件 SYS_BINDER_INFO中寻找,重点搜索关键字“outgoing transaction“

    outgoing transaction 1755759: ffffffc048accb00 from 862:862 to 1762:0 code 14 flags 10 pri -2 r1 node 7969 size 80:0 data ffffff802b998878”
    

    主要是看后面有一些长的log地方,找到 ”to 1762:0"后面的 1762 这个就是pid
    从binder调用情况看,应该是对端进程的binder线程被耗尽,需要查看对端进程的trace

    3.在经过上面的分析后,我们需要从trace文件找到 1762 这个PID所对应的相关进程,然后列出 Block以及Waiting相关的log,从而定位问题

    outgoing transaction 1605208: ffffffc09a83ae00 from 1206:1237 to 9957:0 code 39 flags 10 pri 0 r1 node 661390 size 68:0 data ffffff80088001c8
    

    意思是从1206进程掉9957进程

     incoming transaction 1605143: ffffffc06b783980 from 2736:2801 to 1206:1237 code f flags 10 pri 0 r1 node 5688 size 88:0 data ffffff800a582d50 
    

    意思是从2736进程掉1206进程

     pending async transaction 1602114: ffffffc0073af580 from 0:0 to 1206:0 code 1b flags 11 pri -4 r0 node 9259 size 88:0 data ffffff800a580b28
    

    意思是等待调 1206 进程,未进入线程

    三、案例分析

    问题描述:
    同步账号数据时接收短信手机卡死,phone(和systemui )进程无响应(出现一次)
    【操作步骤】:
    1、登录 XXX 账户,同步数据
    2、同步数据的同时本机接收一条短信

    分析步骤:
    1) 通过event_log 找到 ANR time, PID, ANR type

     06-15 15:15:23.774827  1025  1092 I am_anr  : [0,1877,com.android.phone,952647245,Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50000010 }]
    

    从上面的log可以得到如下信息:
    ANR发送的时间,ANR time:15:15:23.774827
    进程ID, PID:1877
    ANR类型,ANR type:broadcast timeout
    2) 查看main.log, MTK平台查看sys_log, 这一步主要是查看cpu的使用率

    06-15 15:14:57.667196  1025  1092 I ANRManager: getProcessState
    06-15 15:14:57.668209  1025  1092 I ActivityManager: Android time :[2016-06-15 15:14:57.66] [2538.692]
    06-15 15:14:57.668209  1025  1092 I ActivityManager: CPU usage from 3177ms to 33ms ago:
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   133% 12180/com.android.mms: 110% user + 23% kernel / faults: 23045 minor
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   81% 2061/android.process.acore: 80% user + 0.9% kernel / faults: 16 minor
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   20% 1025/system_server: 11% user + 8.9% kernel / faults: 1483 minor 22 major
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   13% 1877/com.android.phone: 13% user + 0% kernel / faults: 553 minor
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   10% 223/hps_main: 0% user + 10% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   3.1% 195/mmcqd/0: 0% user + 3.1% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   3.1% 12039/com.meizu.safe:Remote: 1.9% user + 1.2% kernel / faults: 15 minor
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   2.8% 194/exe_cq: 0% user + 2.8% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   1.9% 10/migration/0: 0% user + 1.9% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   1.9% 11/migration/1: 0% user + 1.9% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   1.5% 15/migration/2: 0% user + 1.5% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   1.5% 253/migration/4: 0% user + 1.5% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   1.5% 382/mobile_log_d: 0.3% user + 1.2% kernel / faults: 65 minor
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   1.2% 61/cfinteractive: 0% user + 1.2% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   1.2% 291/logd: 1.2% user + 0% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   0.9% 176/krtatm: 0% user + 0.9% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   0.6% 19/migration/3: 0% user + 0.6% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   0.6% 315/migration/5: 0% user + 0.6% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   0.6% 377/mediaserver: 0.3% user + 0.3% kernel
    06-15 15:14:57.668209  1025  1092 I ActivityManager:   0% 9996/com.tencent.mm: 0% user + 0% kernel / faults: 9 minor
    06-15 15:14:57.668209  1025  1092 I ActivityManager: 77% TOTAL: 61% user + 16% kernel + 0.1% iowait + 0% softirq
    

    从log可以看出mms进程user使用率非常高
    从CPU的使用情况,如果CPU使用量接近100%,说明当前设备很忙,有可能是CPU饥饿导致了ANR
    如果CPU使用量很少,说明主线程被BLOCK了
    如果IOwait很高,说明ANR有可能是主线程在进行I/O操作造成的
    3)checke trace.txt(MTK平台打卡AEE DB log, 找到SWT_JBT_TRACES), 找到发生anr的进程ID,以及时间点
    找到时间在15:15:23左右,PID为1877的主线程执行情况

    ----- pid 1877 at 2016-06-15 15:15:24 -----
    Cmd line: com.android.phone
    "main" prio=5 tid=1 Native
      | group="main" sCount=1 dsCount=0 obj=0x76297f90 self=0x7f9137c400
      | sysTid=1877 nice=0 cgrp=default sched=0/0 handle=0x7f951e92c0
      | state=S schedstat=( 8028180262 2904544285 18378 ) utm=498 stm=304 core=6 HZ=100
      | stack=0x7fcf303000-0x7fcf305000 stackSize=8MB
      | held mutexes=
      at android.os.BinderProxy.transactNative(Native method)
      at android.os.BinderProxy.transact(Binder.java:514) 
      at com.android.internal.telephony.ITelephonyRegistry$Stub$Proxy.notifySignalStrengthForSubscriber(ITelephonyRegistry.java:718)
      at com.android.internal.telephony.DefaultPhoneNotifier.notifySignalStrength(DefaultPhoneNotifier.java:131)
      at com.android.internal.telephony.PhoneBase.notifySignalStrength(PhoneBase.java:1846)
      at com.android.internal.telephony.ServiceStateTracker.notifySignalStrength(ServiceStateTracker.java:446)
      - locked <0x04779908> (a android.telephony.CellInfoGsm)
      at com.android.internal.telephony.ServiceStateTracker.onSignalStrengthResult(ServiceStateTracker.java:980)
      at com.android.internal.telephony.gsm.GsmServiceStateTracker.handleMessage(GsmServiceStateTracker.java:869)
      at android.os.Handler.dispatchMessage(Handler.java:111)
      at android.os.Looper.loop(Looper.java:207)
      at android.app.ActivityThread.main(ActivityThread.java:5940)
      at java.lang.reflect.Method.invoke!(Native method)
      at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:929)
      at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:790)
    

    从上面卡住的CallStack可以看出是SystemServer binder到com.android.phone 进程卡住

    看Binder对端:查看对应时间点的log:

    ----- pid 1025 at 2016-06-15 15:15:24 -----
    Cmd line: system_server
    "Binder_C" prio=5 tid=96 Blocked
      | group="main" sCount=1 dsCount=0 obj=0x135b10a0 self=0x7f7079dc00
      | sysTid=1961 nice=0 cgrp=default sched=0/0 handle=0x7f6ef45440
      | state=S schedstat=( 18431909483 6382895667 48227 ) utm=1317 stm=526 core=0 HZ=100
      | stack=0x7f6ee49000-0x7f6ee4b000 stackSize=1013KB
      | held mutexes=
      at com.android.server.TelephonyRegistry.notifySignalStrengthForSubscriber(TelephonyRegistry.java:908)
      - waiting to lock <0x0aec94d6> (a java.util.ArrayList) held by thread 80
      at com.android.internal.telephony.ITelephonyRegistry$Stub.onTransact(ITelephonyRegistry.java:184)
      at android.os.Binder.execTransact(Binder.java:461)
    

    被tid=80锁住

    "Binder_4" prio=5 tid=80 Native
      | group="main" sCount=1 dsCount=0 obj=0x12fb44c0 self=0x7f72404800
      | sysTid=1365 nice=0 cgrp=default sched=0/0 handle=0x7f70a7f440
      | state=S schedstat=( 18852901213 6143285031 48849 ) utm=1386 stm=499 core=7 HZ=100
      | stack=0x7f70983000-0x7f70985000 stackSize=1013KB
      | held mutexes=
      at android.os.BinderProxy.transactNative(Native method)
      at android.os.BinderProxy.transact(Binder.java:514)
      at com.android.internal.telephony.ISub$Stub$Proxy.getPhoneId(ISub.java:915)
      at android.telephony.SubscriptionManager.getPhoneId(SubscriptionManager.java:963)
      at com.android.server.TelephonyRegistry.listen(TelephonyRegistry.java:582)
      - locked <0x0aec94d6> (a java.util.ArrayList)
      at com.android.server.TelephonyRegistry.listenForSubscriber(TelephonyRegistry.java:521)
      at com.android.internal.telephony.ITelephonyRegistry$Stub.onTransact(ITelephonyRegistry.java:95)
      at android.os.Binder.execTransact(Binder.java:461)
    

    查看binder info信息(MTK平台查看SYS_BINDER_INFO文件),重点搜索关键字“outgoing transaction“

      outgoing transaction 1465421: ffffffc06a140500 from 12180:12197 to 1877:3042 code 1 flags 10 pri 19 r1 node 18222 size 316:8 data ffffff800f145638
    

    主要是看后面有一些长的log地方,找到 ”to 1877:"后面的1877这个就是pid
    调用者是12180这个进程
    应该是对端phone进程的binder线程被耗尽,需要查看对端进程com.android.phone 的trace
    4) 在经过上面的分析后,我们需要从trace 文件中找到1877 这个PID所对应的相关进程,然后列出 Block以及Waiting相关的log,从而可以定位问题。

    "Binder_1" prio=5 tid=7 TimedWaiting
      | group="main" sCount=1 dsCount=0 obj=0x12c500a0 self=0x7f9137ce00
      | sysTid=1889 nice=0 cgrp=default sched=0/0 handle=0x7f8fff8440
      | state=S schedstat=( 26096548648 3199693554 22007 ) utm=2417 stm=192 core=0 HZ=100
      | stack=0x7f8fefc000-0x7f8fefe000 stackSize=1013KB
      | held mutexes=
      at java.lang.Object.wait!(Native method)
       - waiting on <0x06b95eb5> (a java.lang.Object)
      at java.lang.Thread.parkFor$(Thread.java:1220)
      - locked <0x06b95eb5> (a java.lang.Object)
      at sun.misc.Unsafe.park(Unsafe.java:299)
      at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
      at android.database.sqlite.SQLiteConnectionPool.waitForConnection(SQLiteConnectionPool.java:670)
      at android.database.sqlite.SQLiteConnectionPool.acquireConnection(SQLiteConnectionPool.java:348)
      at android.database.sqlite.SQLiteSession.acquireConnection(SQLiteSession.java:894)
      at android.database.sqlite.SQLiteSession.prepare(SQLiteSession.java:586)
      at android.database.sqlite.SQLiteProgram.<init>(SQLiteProgram.java:58)
      at android.database.sqlite.SQLiteQuery.<init>(SQLiteQuery.java:37)
      at android.database.sqlite.SQLiteDirectCursorDriver.query(SQLiteDirectCursorDriver.java:44)
      at android.database.sqlite.SQLiteDatabase.rawQueryWithFactory(SQLiteDatabase.java:1350)
      at android.database.sqlite.SQLiteQueryBuilder.query(SQLiteQueryBuilder.java:400)
      at android.database.sqlite.SQLiteQueryBuilder.query(SQLiteQueryBuilder.java:294)
      at com.android.providers.telephony.MmsProvider.query(MmsProvider.java:244)
      at android.content.ContentProvider.query(ContentProvider.java:1140)
      at android.content.ContentProvider$Transport.query(ContentProvider.java:262)
      at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:112)
      at android.os.Binder.execTransact(Binder.java:461)
    

    问题原因就是xxx同步MmsProvider查询超时导致。

    相关文章

      网友评论

          本文标题:ANR问题分析

          本文链接:https://www.haomeiwen.com/subject/yerxpqtx.html