针对手机厂商相机性能优化部分进行重点总结记录
一.相机性能场景
类别 | 场 景 | 耗时重点 | 建议 |
---|---|---|---|
启动类 | 热启动 | App代码优化精简显示,内存预分配,cpu perflock boost | |
冷启动 | 本质上做成热启动,一般都让相机在后台被杀后过几分钟自动重启,由于是系统平台应用,做到这点不是很难 | ||
切换类 | 模式切换 | 要经过开关Camera过程 | cpu perflock boost,减少无用初始化,耗时久的可考虑并行化初始化 |
前后置切换 | 要经过开关Camera过程 | 同一型号手机可能摄像头马达不一样,滚珠式或弹簧式,在弹簧式马达下电关闭相机时如果一下子收回会有摄像头撞击的声音,因此一般让其分步下电 ,这一步也是较为耗时,在滚珠式马达下该处理就没那么必要了,因此可以优化掉一部分马达下电的耗时 | |
预览类 | 拍照预览 | 引用的算法 | |
录像预览 | |||
拍照类 | 普通拍照 | 算法上移处理,采用ZSL HAL,或者AppZSL可提升App普通拍照体验 | |
高像素拍照 | 高像素照片尺寸大CPU处理就会耗时,如果还要经过一些处理算法,性能必然会受影响 | bypass检查,曝光时间是否合理(这个tuning说了算,==) ,多帧处理的话帧数是否合理等检查 | |
算法拍照 | 算法耗时是主要还是算法处理 | 1.算法上移处理,则应用可以构造出一个Queue出来,拍照不停,图片排队算法处理,减少shot2shot时间,三分相机应用同样可以适用. 2.bypass,在手机开发的时候发现qcom camx架构下,pipeline会复用,某些场景下共用其他场景的pipeline,但pipeline中有该场景不需要的node,如果未使用bypasss,则会导致该经过该node同样会造成耗时,当这样的node较多且buffer较大时,耗时还是很明显的 3.Cpu Boost,算法处理的时候可以将CPU调度调优,以加速处理 | |
视频类 | 高帧率录制 | 录像帧率一般有30fps,60fps,120fps,960fps,一般高于60fps的实现都是通过插帧实现,30fps的request一个request带回4个buffer即实现120fps,同理960fps可以一个request带回8个buffer即可实现 | 高帧率对CPU处理依赖较大,因此CPU Boost调优很有必要,在满足功耗要求的情况下尽量将CPU大小核跑满而不是集中在某个核上 |
防抖录制 | EIS还是Vidhance,在高像素录制下防抖开启cpu数据处理量必然上升,当缓存用作防抖裁剪的Buffer数量很大的时候内存也会跟着紧张 | 设置合理的防抖缓存buffer数量,CPU perflock可以调优,尽量大小核分布均匀严密 | |
高像素防抖录制 | 如上,内存问题 | 防抖算法缓存帧数要合理,不能一味的只注重防抖效果而忽略内存的能力 |
本质上,相机性能主要从以下三点入手:
1.代码架构合理
比如尝试并行化处理耗时较久的操作,用户体验合理(比如拍照完可立即进入图库查看,虽然显示一张模糊的图正在处理,过一会才处理完显示变清楚,但用户体验要比等待算法处理完才可进图库要好很多),动态创建资源等,有些算法库耗时,不正常耗时即算法库的处理逻辑有问题也需要库代码侧去改善.
2.CPU调度调优
若cpu高负载运行必然使得CPU温顿升高,另外还有Sensor温度随处理频繁而升高,这就会触发Thermal降频机制或者其他防止手机继续升温的控制,便会给CPU降频,而CPU大核一降频,相机拍照时间变久,预览丢帧,录制最终的视频帧率不足等性能问题就来了;而一个好的CPU调度策略,会让CPU保持大小核均匀负载,没有其他进程抢占,绑核现象,使CPU负载均匀严密,一定程度上不仅可以提高处理效率还可以维持一定的温度稳定.及CPU不被降频,调度合理最优.
3.内存管理高效
相机占用的内存较大,如果机器内存较小,且当前内存紧张,由内存不足引起的相机视频丢帧,预览卡顿等性能问题便出现了.比如连拍功能,或者高帧率防抖录制视频,对内存依赖都很大,因此便需要在内存优化上下功夫,如提前分配更大的buffer以解决一开始启动时内存突然消耗,或者降低一些算法的Buffer数量,做到合理水平,或者可以在启动相机或者进入某个高消耗内存的模式时调用系统接口去清理下后台内存.
内存如果足够紧张,最终不仅是性能有问题,相机稳定性也会收到影响,比如lmk杀了前台进程导致相机闪退(醉了),或者其他影响用户体验的问题,比如从三方调用相机,结果导致返回应用时应用重新启动了,对,就是内存不租被干掉了,看样子,杀不杀后台进程还得看系统的智慧的判断才行,比如这里用户要返回的场景就会导致用户体验极差.
不仅CPU用到的内存,GPU的内存使用也很重要,比如GPU如果内存不足导致分配GPU内存变慢同样会导致预览卡顿等性能问题.因此GPU内存把控也十分重要.
二.相机性能debug工具及方案建议
1.dumpsys media.camera
adb shell dumpsys media.camera > camera_dumpsys.txt
有些很有用的信息:
1.1 qcom机器
== Service global info: ==
------>// Camera ID及数量情况,这里涉及到逻辑id,物理id等,还有可能手机厂商自己高的虚拟camera也在内,具体要看framework中打印的地方是怎么取的,这个地方没怎么关注过.
Number of camera devices: 8
Number of normal camera devices: 6
Device 0 maps to "0"
Device 1 maps to "1"
Device 2 maps to "21"
Device 3 maps to "22"
Device 4 maps to "100"
Device 5 maps to "101"
Active Camera Clients:
[
(Camera ID: 0, Cost: 99, PID: 20418, Score: -900, State: 0User Id: 0, Client Package Name: com.android.camera, Conflicting Client Devices: {1, })
]
Allowed user IDs: 0
------->//最近操作史
== Camera service events log (most recent at top): ==
04-22 06:36:07 : CONNECT device 0 client for package com.android.camera (PID 20418)
04-22 06:25:41 : DISCONNECT device 0 client for package com.android.camera (PID 20418)
04-22 06:23:38 : CONNECT device 0 client for package com.android.camera (PID 20418)
04-22 06:23:38 : DISCONNECT device 1 client for package com.android.camera (PID 20418)
04-22 06:03:53 : CONNECT device 1 client for package com.android.camera (PID 20418)
------->//后置默认拍照打开此时device0处于活动状态.
== Camera device 0 dynamic info: ==
Device 0 is open. Client instance dump:
Client priority score: -900 state: 0
Client PID: 20418
Client package: com.android.camera
CameraDeviceClient[0] (0xe879d280) dump:
Current client UID 10072
State:
Request ID counter: 3
No input stream configured.
---------->//outputstream信息,看出来有三个stream
Current output stream/surface IDs:
Stream 0 Surface 0
Stream 1 Surface 0
Stream 2 Surface 0
-------->// 这里就看出来三个stream是什么了,一个surfaceTexture,用于显示预览
Device dump:
Device status: ACTIVE
Stream configuration:
Operation mode: CUSTOM (36869)
No input stream.
Stream[0]: Output
Consumer name: SurfaceTexture-10-20418-3
State: 4
Dims: 1440 x 1080, format 0x22, dataspace 0x0
Max size: 0
Combined usage: 131328, max HAL buffers: 8
Frames produced: 201, last timestamp: 984089393217873 ns
Total buffers: 10, currently dequeued: 6
DequeueBuffer latency histogram: (207) samples
5 10 15 20 25 30 35 40 45 inf (max ms)
98.07 1.45 0.00 0.48 0.00 0.00 0.00 0.00 0.00 0.00 (%)
------>//ImageReader用于保存图片
Stream[1]: Output
Consumer name: ImageReader-4000x3000f23m10-20418-15
State: 4
Dims: 4000 x 3000, format 0x23, dataspace 0x8c20000
Max size: 0
Combined usage: 131075, max HAL buffers: 8
Frames produced: 0, last timestamp: 0 ns
Total buffers: 18, currently dequeued: 0
------->//竟然还有个imagereader,从其尺寸来看,要么是想搞缩率图要么就是想干其他事,不晓得,二维码识别或场景识别可能会用得到,我猜的,==
Stream[2]: Output
Consumer name: ImageReader-1440x1080f23m1-20418-16
State: 4
Dims: 1440 x 1080, format 0x23, dataspace 0x8c20000
Max size: 0
--------> // 信息很详细啊,这里还有max HAL buffers数量,这里要是不够的话就会导致卡顿或者hang住哦,
Combined usage: 131075, max HAL buffers: 8
Frames produced: 202, last timestamp: 984089426352300 ns
Total buffers: 9, currently dequeued: 5
信息太多,找点有用的信息,
// dump hal的信息
== Camera HAL device device@3.5/legacy/0 (v3.5) dumpState: ==
############### CameraId: 0 Dump Start ###############
+------------------------------------------------------------------+
+ HAL Dump +
+------------------------------------------------------------------+
+------------------------------------------------------------------+
+ Chi statistics +
+------------------------------------------------------------------+
---->//camx chi session信息:
+ Number of open sessions: 5
+ Number of open pipeline descriptors: 7
+------------------------------------------------------------------+
+ Chi Session +
+------------------------------------------------------------------+
+ Session: 0x6f8b344700
+-----------------------------------------------------------------------------------------+
Session 0x6f8b344700 state:
reason = "ChiContextDump"
numPipelines = 1
numRealtimePipelines = 0
partialMetadataEnabled = false
....
------>//该session里的pipeline信息,因为后置默认拍照用到的是mfnr因此有多个pipeline
拿一个举例:
Pipelines:
Pipeline 0:
name = "MfnrPrefilter_0"
address = 0x6f80930800
...
lastInOrderCompletedRequestId = 0
currentRequest = 0
+------------------------------------------------------------------+
Pending Nodes:
Requests Range - lastInOrderCompletedRequestId: 0 currentRequest: 0
.....
Graph:
Nodes + ports:
Node: MfnrPrefilter_IPE0
InputPort 0, format: 0, portDisabled: false, bufferDelta: 0
InputPort 1, format: 0, portDisabled: false, bufferDelta: 0
InputPort 2, format: 0, portDisabled: false, bufferDelta: 0
InputPort 3, format: 0, portDisabled: false, bufferDelta: 0
OutputPort 10, format: 12 maxNumBuffers: 40, memFlags: 0x00000001, heap: 1,
flags: 0x00000010, streamEnableMask: 0x00000001, numBatchedFrames: 1
deviceIndex[0]: 17
OutputPort 11, format: 18 maxNumBuffers: 40, memFlags: 0x00000001, heap: 1,
flags: 0x00000010, streamEnableMask: 0x00000002, numBatchedFrames: 1
deviceIndex[0]: 17
Node: MfnrPrefilter_BPS0
InputPort 0, format: 0, portDisabled: false, bufferDelta: 0
OutputPort 7, format: 3 maxNumBuffers: 40, memFlags: 0x00000001, heap: 1,
flags: 0x00000010, streamEnableMask: 0x00000004, numBatchedFrames: 1
deviceIndex[0]: 17
....
----->// pipeline里的 node link信息,从日志中就可以知道链接顺序了.
Links:
Node::MfnrPrefilter_BPS0::OutputPort_1 -> Node::MfnrPrefilter_IPE0::InputPort_0 -- bufferDelta = 0, maxImageBuffer = 0
Node::MfnrPrefilter_BPS0::OutputPort_2 -> Node::MfnrPrefilter_IPE0::InputPort_1 -- bufferDelta = 0, maxImageBuffer = 0
Node::MfnrPrefilter_BPS0::OutputPort_3 -> Node::MfnrPrefilter_IPE0::InputPort_2 -- bufferDelta = 0, maxImageBuffer = 0
Node::MfnrPrefilter_BPS0::OutputPort_4 -> Node::MfnrPrefilter_IPE0::InputPort_3 -- bufferDelta = 0, maxImageBuffer = 0
Node::MfnrPrefilter_IPE0::OutputPort_10 -> SinkBuffer_0 -- BufferQueueDepth = 0, maxImageBuffer = 0
Node::MfnrPrefilter_IPE0::OutputPort_11 -> SinkBuffer_1 -- BufferQueueDepth = 0, maxImageBuffer = 0
SourceBuffer_0 -> Node::MfnrPrefilter_BPS0::InputPort_0 -- bufferDelta = 0, maxImageBuffer = 40
Node::MfnrPrefilter_BPS0::OutputPort_7 -> SinkBuffer_2 -- BufferQueueDepth = 0, maxImageBuffer = 0
-------->其他mfnr的pipeline
+------------------------------------------------------------------+
Pipeline 1:
name = "MfnrBlend_0"
....
+------------------------------------------------------------------+
Pipeline 2:
name = "MfnrPostFilter_0"
--------> 当然还有其他的chx session
一堆tuning的参数看不懂
------>//一些vendortag
== Vendor tags: ==
------>//这里是之前发生过camera故障时抓的进程trace信息,如果未出过错就没有有trace信息
== Camera error traces (2): ==
----- pid 1035 at 2020-04-10 23:29:00 -----
Cmd line: /system/bin/cameraserver
"cameraserver" sysTid=1035
#00 pc 0003b29f /system/lib/libbinde
1.2 mtk机器
前面一样,后面dump hal层信息dump了mtk自己的一些东西:
== Camera HAL device device@3.5/internal/0 (v3.5) dumpState: ==
== error state (most recent at bottom): App Stream Manager ==
[no events yet]
== CommandHandler (tid:3099) isRunning:1 exitPending:0 ==
No pending command
mtk的有点少,接触的也少,后面看到有用的再补上!!!
2.systrace
参考:
手把手教你使用Systrace(一)
Android Systrace 基础知识
相机的systrace抓取需要设置一定的属性才能看得到pipeline细节,比如Qcom需要操作如下:
1.设置trace打开属性:
adb root && adb shell setprop persist.vendor.camera.traceGroupsEnable 65566 && adb shell getprop persist.vendor.camera.traceGroupsEnable
adb reboot //provider进程重启生效
adb root && adb shell getprop persist.vendor.camera.traceGroupsEnable
adb shell setenforce 0 && adb shell getenforce //不关selinux可能provider node的systrace抓的不全
traceGroupsEnable的值与Camx日志group一样,一般按照上面65566基本就够了.
UMD group:
static const CamxLogGroup CamxLogGroupNone = (1 << 0); ///< Generic group
static const CamxLogGroup CamxLogGroupSensor = (1 << 1); ///< Sensor
static const CamxLogGroup CamxLogGroupIFace = (1 << 2); ///< IFace
static const CamxLogGroup CamxLogGroupISP = (1 << 3); ///< ISP
static const CamxLogGroup CamxLogGroupPProc = (1 << 4); ///< Post Processor
static const CamxLogGroup CamxLogGroupImgLib = (1 << 5); ///< Image Lib
static const CamxLogGroup CamxLogGroupCPP = (1 << 6); ///< CPP
static const CamxLogGroup CamxLogGroupHAL = (1 << 7); ///< HAL
static const CamxLogGroup CamxLogGroupJPEG = (1 << 8); ///< JPEG
static const CamxLogGroup CamxLogGroupStats = (1 << 9); ///< Stats
static const CamxLogGroup CamxLogGroupCSL = (1 << 10); ///< CSL
static const CamxLogGroup CamxLogGroupApp = (1 << 11); ///< Application
static const CamxLogGroup CamxLogGroupUtils = (1 << 12); ///< Utilities
static const CamxLogGroup CamxLogGroupSync = (1 << 13); ///< Sync
static const CamxLogGroup CamxLogGroupMemSpy = (1 << 14); ///< MemSpy
static const CamxLogGroup CamxLogGroupAssert = (1 << 15); ///< Asserts
static const CamxLogGroup CamxLogGroupCore = (1 << 16); ///< Core
static const CamxLogGroup CamxLogGroupHWL = (1 << 17); ///< HWL
static const CamxLogGroup CamxLogGroupChi = (1 << 18); ///< CHI
static const CamxLogGroup CamxLogGroupDRQ = (1 << 19); ///< DRQ
static const CamxLogGroup CamxLogGroupFD = (1 << 20); ///< FD
static const CamxLogGroup CamxLogGroupIQMod = (1 << 21); ///< IQ module
static const CamxLogGroup CamxLogGroupLRME = (1 << 22); ///< LRME
static const CamxLogGroup CamxLogGroupNCS = (1 << 23); ///< NCS
static const CamxLogGroup CamxLogGroupMeta = (1 << 24); ///< Metadata
static const CamxLogGroup CamxLogGroupAEC = (1 << 25); ///< AEC
static const CamxLogGroup CamxLogGroupAWB = (1 << 26); ///< AWB
static const CamxLogGroup CamxLogGroupAF = (1 << 27); ///< AF
2.抓取
python ~/Android/Sdk/platform-tools/systrace/systrace.py -o camera_qidong.html sched freq camera hal
如果抓取到的trace时间较短,可以 -b 设大缓存,
python ~/Android/Sdk/platform-tools/systrace/systrace.py -b 30726 -o camera_qidong7.html sched freq camera hal
或者 -t ,如下可以抓到4s左右
python ~/Android/Sdk/platform-tools/systrace/systrace.py -t 4 -o camera_qidong7.html sched freq camera hal gfx input view wm am app dalvik sched freq idle load -a com.android.camera
当然debug的时候可能需要自己添加systrace的tag,按照已有的trace tag添加即可,问题不大.
参考:https://blog.csdn.net/oujunli/article/details/16888897
3.perfetto
systrace抓的太久哪怕10秒就会很大,以致打不开,而且像视频丢帧这种现象基本要过很一段时间后偶现一次,想借用systrace来抓现场基本也是不可能抓得到.这就需要用perfetto去抓systrace了抓取时常可以是1小时,且用在线工具打开毫无压力,很棒!
参考:
Perfetto工具使用简介
perfetto -Android Develop介绍
Perfetto 官网
perfetto 工具是Android下一代全新的统一的 trace 收集和分析框架,可以抓取平台和app的 trace 信息,是用来取代 systrace 的,但 systrace 由于历史原因也还会一直存在,并且 Perfetto 抓取的 trace 文件也可以同样转换成 systrace 视图,如果习惯用 systrace 的,可以用 Perfetto UI 的 Open with legacy UI 转换成 systrace 视图来看。
1、首先查看手机是否已经打开类perfetto的后台进程
adb logcat -s perfetto
perfetto: service.cc:45 Started traced, listening on /dev/socket/traced_producer /dev/socket/traced_consumer
perfetto: probes.cc:25 Starting /system/bin/traced_probes service
perfetto: probes_producer.cc:32 Connected to the service
2、去Perfetto的UI网址(https://ui.perfetto.dev/#!/record)上定制相关的使用命令
如图为Perfetto UI的页面,在该页面下可以设置抓取信息的时长方式等
在Probes的选项中可以设置想要抓取的CPU,GPU等信息
在这里插入图片描述
最后来到InStructions选项可以复制刚刚设置完成后的命令,将其直接拷贝到Linux终端即可运行。
在这里插入图片描述
3、等到抓取的长度或者时间达到你第2步设置的值后,会自动退出抓取程序,此时将手机/data/misc/perfetto-traces/文件夹下面的trace文件adb pull到电脑端
4、同样在https://ui.perfetto.dev/#!/record网址上通过Open trace File选项打开本地存储的trace文件即可查看相关信息,可以看到时长1min半打开无压力,用于复现视频偶先丢帧很棒啊!
4.SimplePerf,火焰图
参考:https://zhuanlan.zhihu.com/p/25277481
https://android.googlesource.com/platform/prebuilts/simpleperf/
https://developer.android.com/ndk/guides/simpleperf?hl=zh-cn
http://www.brendangregg.com/flamegraphs.html
https://github.com/brendangregg/FlameGraph
代码路径:/system/extras/simpleperf/
脚本路径:/system/extras/simpleperf/scripts
手机设备的默认路径 :/system/bin/simpleperf
Simpleperf是谷歌将perf工具port到Android上的性能分析工具,它的命令行界面支持与linux-tools perf大致相同的选项,但是它还支持许多Android特有的改进。具有三个主要的功能:stat,record 和 report。
使用方法:
抓取方法
在pc端:
快速方式:
python /home/sun/work/code/j6a1-curtana/system/extras/simpleperf/scripts/run_simpleperf_on_device.py kmem record --call-graph fp -f 4000 --duration 10 -o /data/local/tmp/perf.data
完整方式:
python /home/sun/work/code/j6a1-curtana/system/extras/simpleperf/scripts/run_simpleperf_on_device.py kmem record --call-graph fp -f 4000 --duration 10 --symfs ~/work/image/curtana_in_global_symbols_20.3.13.root_10.0_96fc05f913_gdcdaf60/out/target/product/curtana/symbols/ -o /data/local/tmp/perf.data
python /home/sun/work/code/j6a1-curtana/system/extras/simpleperf/scripts/run_simpleperf_on_device.py kmem record -p pid -e kmem:kmalloc,kmem:kmem_cache_alloc --call-graph fp -f 4000 --duration 10 -o /data/local/tmp/perf.data
adb pull /data/local/tmp/perf.data ./
或者在手机端:
快速方式:
simpleperf kmem record --call-graph fp -f 4000 --duration 10 -o /data/local/tmp/perf_2.data
完整方式:
simpleperf kmem record -p pid -e kmem:kmalloc,kmem:kmem_cache_alloc --call-graph fp -f 4000 --duration 10 -o /data/local/tmp/perf_2.data
解析生成火焰图:
python /home/sun/work/code/j6a1-curtana/system/extras/simpleperf/scripts/report_html.py -i ./perf.data -o ./perf.html
或者
python /home/sun/work/code/j6a1-curtana/system/extras/simpleperf/scripts/report_sample.py --symfs ~/work/image/curtana_in_global_symbols_20.3.13.root_10.0_96fc05f913_gdcdaf60/out/target/product/curtana/symbols/ -i perf.data > out.perf
git clone https://github.com/brendangregg/FlameGraph.git
FlameGraph/stackcollapse-perf.pl out.perf > out.folded
FlameGraph/flamegraph.pl out.folded > graph.svg
解析完成后的perf.html或者graph.svg直接用谷歌浏览器打开,如下:
在这里插入图片描述
在这里插入图片描述
建议用report_html.py生成的perf.html,会有更直观的统计信息
y 轴表示调用栈,每一层都是一个函数。调用栈越深,火焰就越高,顶部就是正在执行的函数,下方都是它的父函数。
x 轴表示抽样数,如果一个函数在 x 轴占据的宽度越宽,就表示它被抽到的次数多,即执行的时间长。注意,x 轴不代表时间,而是所有的调用栈合并后,按字母顺序排列的。
火焰图就是看顶层的哪个函数占据的宽度最大。只要有"平顶"(plateaus),就表示该函数可能存在性能问题。
颜色没有特殊含义,因为火焰图表示的是 CPU 的繁忙程度,所以一般选择暖色调。
5.Android sudio profile
Android studio中的profile功能可以跟进app进程内存变化情况及细节,以及CPU占用情况,但看相机应用内存变化趋势是可以的了,然后在高位时dump需要的信息继续debug吧,==
参考:
Android studio中android profile(性能分析器)的使用
Android Studio使用profile简单优雅的查看内存变化
6.Qcom Perfdump工具
高通可以在手机上安装,查看显示帧率,可以用来查看相机预览界面帧率是否达标,还可以顺便抓个systrace,真棒!
具体使用方式在高通网站上有介绍ppt,搜索Perfdump Tool Overview即可.
在这里插入图片描述 在这里插入图片描述
在这里插入图片描述
7.Cpu dump工具
cpu的负载和频率对相机性能影响很大,有时候需要实时查看cpu变化,提供个shell脚本,可以实时dump cpu信息:
这里是mtk cpu的脚本,mtkcpu与qcom的在读去温度上有点区别,qcom可以参考着写一个,问题不大
链接: https://pan.baidu.com/s/1C9FYIQeZHFRHdDhh-tc_1g 提取码: fw4v
主要cpu信息从如下文件节点读取:
atom:/sys/devices/system/cpu/cpu0/cpufreq # ls -l
total 0
-r--r--r-- 1 root root 4096 2020-04-07 14:41 affected_cpus
-r-------- 1 root root 4096 2020-04-07 14:43 cpuinfo_cur_freq
-r--r--r-- 1 root root 4096 2020-04-07 14:57 cpuinfo_max_freq
-r--r--r-- 1 root root 4096 2020-04-07 14:41 cpuinfo_min_freq
-r--r--r-- 1 root root 4096 2020-04-07 14:57 cpuinfo_transition_latency
-r--r--r-- 1 root root 4096 2020-04-07 14:57 related_cpus
-r--r--r-- 1 root root 4096 2020-04-07 14:41 scaling_available_frequencies
-r--r--r-- 1 root root 4096 2020-04-07 14:57 scaling_available_governors
-r--r--r-- 1 root root 4096 2020-04-07 14:57 scaling_cur_freq
-r--r--r-- 1 root root 4096 2020-04-07 14:57 scaling_driver
-rw-rw---- 1 system system 4096 2020-03-31 16:58 scaling_governor
-rw-rw-r-- 1 system system 4096 2020-03-31 16:57 scaling_max_freq
-rw-rw-r-- 1 system system 4096 2020-03-31 16:57 scaling_min_freq
-rw-r--r-- 1 root root 4096 2020-04-07 14:57 scaling_setspeed
drwxr-xr-x 2 root root 0 2020-04-07 14:42 stats
atom:/sys/devices/system/cpu/cpu0/cpufreq #
含义:
cpuinfo_cur_freq: 当前cpu正在运行的工作频率
cpuinfo_max_freq:该文件指定了处理器能够运行的最高工作频率 (单位: 千赫兹)
cpuinfo_min_freq :该文件指定了处理器能够运行的最低工作频率 (单位: 千赫兹)
cpuinfo_transition_latency:该文件定义了处理器在两个不同频率之间切换时所需要的时间 (单位: 纳秒)
scaling_available_frequencies:所有支持的主频率列表 (单位: 千赫兹)
scaling_available_governors:该文件显示当前内核中支持的所有cpufreq governor类型
scaling_cur_freq:被governor和cpufreq核决定的当前CPU工作频率。该频率是内核认为该CPU当前运行的主频率
scaling_driver:该文件显示该CPU正在使用何种cpufreq driver
scaling_governor:通过echo命令,能够改变当前处理器的governor类型
scaling_max_freq:显示当前policy的上下限 (单位: 千赫兹)需要注意的是,当改变cpu policy时,需要首先设置scaling_max_freq, 然后才是scaling_min_freq
scaling_setspeed:如果用户选择了“userspace” governor, 那么可以设置cpu工作主频率到某一个指定值。
需要这个值在scaling_min_freq 和 scaling_max_freq之间即可。
8. CpuFloat & PerfMon+ Apk
cpufloat和PerfMon+都是非常好用的通知状态栏app,可以对手机的cpu和gpu频率和温度进行监控.
百度搜索下载或者从这下载:
链接: https://pan.baidu.com/s/1WxScYtAPOY9Gkjv2JkyrtA 提取码: 9zgn
cpufloat:
可以看到给予应用足够的权限可以读取的信息将更多:
在这里插入图片描述
9.查看视频文件丢帧情况工具
写个脚本利用ffmpeg实现对视频文件的解析,从pts时间戳来统计丢帧的情况:
需要ffmpeg环境.源码如下,保存成sh文件运行:
#!/bin/bash
#######################################################################################
# 2019.10.24
#
# 5种使用方法
# 1.解析手机上相册下最新产生的视频文件
# ./as_frameloss.sh
# 2.解析pc上指定路径下视频文件
# ./as_frameloss.sh pc_video_file_path
# example:
# ./as_frameloss.sh /home/chengang/Documents/g7b/zhenlv/VID_20191022_194835_HSR_240.mp4
#
# 3.指定pull目录且使用新生成的视频文件名进行解析
# as_frameloss.sh pc_tmp_dir video_file_name
# example:
# ./as_frameloss.sh /home/chengang/Documents/g7b/zhenlv/ VID_20191022_194835_HSR_240.mp4
#
# 4.使用默认pull目录且使用新生成的文件名进行解析
# ./as_frameloss.sh VID_20191022_194835_HSR_240.mp4
#
# 5.使用默认pull目录且分析完删除pc上的video文件
# ./as_frameloss.sh -d video_file_name
# ./as_frameloss.sh -d VID_20191022_194835_HSR_240.mp4
########################################################################################
TMP_PTS_FILE="pkt_pts_time_tmp.txt"
VIDEO_REAL_FPS=0 #视频文件实际帧率
FPS_DEMANDED=0 #视频文件要求达到的帧率
VIDEO_FILE_TEMP_DIR="/home/${USER}/as_frameloss_video_files"
analysis_out_fps_from_single_file(){
path_2_file_name=$1
avg_frame_rate=$(ffprobe -select_streams v -v quiet -print_format json -show_format -show_streams $path_2_file_name |awk -F "[avg_frame_rate]" '/avg_frame_rate/{print$0}')
str=$(echo $avg_frame_rate | awk -F "[:]" '/avg_frame_rate/{print$2}')
#echo $str,上一步得到字符串'"39060000/1362709",'
#去除行头的"号
str1=$(echo ${str#*\"})
#接着去除行末"号及之后的,号
final_str=$(echo ${str1%\"*})
#分割取出地一个数值
first_value=`echo $final_str|awk -F "[/]" '{print$1}' `
#取出第二个数值
second_value=`echo $final_str|awk -F "[/]" '{print$2}' `
#echo $first_value
#echo $second_value
#保留两位小数输出帧率,四舍五入与windows上保持一致
VIDEO_REAL_FPS=`awk 'BEGIN{printf "%.2f\n",'$first_value'/'$second_value'}'`
}
analysis_frameloss(){
pc_tmp_video_file_full_path=$1
#取出参数中的文件的目录,这里不带最后一个斜杠后面需要在补上
path_of_video_dir=`echo ${pc_tmp_video_file_full_path%/*}`
path_of_tmp_file=$path_of_video_dir/$TMP_PTS_FILE
str_result=""
time1=$(date)
str_result=`ffprobe -show_frames -select_streams v $pc_tmp_video_file_full_path | grep pkt_pts_time`
echo "$str_result" > $path_of_tmp_file
time2=$(date)
frame_count=0
string_final=""
last_frame=0
frame_count_demanded_per_0_1s=`expr $FPS_DEMANDED / 10`
echo "******************************************************************"
echo " "
echo "Frameloss Analysis "
echo " "
echo "文件路径 : $pc_tmp_video_file_full_path"
echo "要求帧率 : $FPS_DEMANDED"
echo "实际帧率 : $VIDEO_REAL_FPS"
echo "丢帧情况 : "
while read line
do
time_str=${line:13}
tmp_str=`echo ${time_str::-5}` #从字符串尾部删掉5个字符
let time_str_final=`echo 10#$tmp_str | sed 's/\.\+//g'` #保证10进制打印
if [ $time_str_final == $last_frame ];then
let "frame_count++"
else
loss_frame_no=`expr $frame_count_demanded_per_0_1s - $frame_count`
denominator=10
#res=`echo "scale=1; $last_frame/$denominator" | bc`
res=$(printf "%.1f" `echo "scale=1;$last_frame/$denominator"|bc`)
if [ $loss_frame_no != 0 ];then
echo "第$res秒---> $loss_frame_no"
#else
# echo "第$res秒---> "
fi
#开始新的0.1s计数
last_frame=$time_str_final
frame_count=1
fi
if [ "$string_final" == "" ];then
string_final="$time_str_final"
else
string_final="$string_final;$time_str_final"
fi
#echo $string_final
done < $path_of_tmp_file
time3=$(date)
}
figout_fps_demanded(){
local_fps=$1
real_int_fps=`echo $local_fps | awk -F '\.' '{print $1}'`
echo "figout_fps_demanded-local_fps:real_int_fps[$local_fps:$real_int_fps]"
if [[ $real_int_fps -gt 150 ]];then
FPS_DEMANDED=240
elif [[ $real_int_fps -gt 80 ]];then
FPS_DEMANDED=120
elif [[ $real_int_fps -gt 40 ]];then
FPS_DEMANDED=60
else
FPS_DEMANDED=30
fi
}
file_value="$1"
pc_video_file_path=""
pc_tmp_video_dir=""
device_video_file_path=""
device_video_file_dir="/storage/emulated/0/DCIM/Camera"
param_count=$#
need_to_delete=0
need_to_adb_pull=0
if [ $param_count == 2 ]; then
if [ "$1" == "-d" ]; then
need_to_delete=1
need_to_adb_pull=1
device_video_file_path=$device_video_file_dir/$2
pc_tmp_video_dir=$VIDEO_FILE_TEMP_DIR
pc_video_file_path=$pc_tmp_video_dir/$2
else
need_to_adb_pull=1
pc_tmp_video_dir=$1
device_video_file_path=$device_video_file_dir/$2
pc_video_file_path=$pc_tmp_video_dir/$2
fi
elif [ $param_count == 1 ]; then
if [ -f $1 ]; then
pc_video_file_path=$1
else
need_to_adb_pull=1
device_video_file_path=$device_video_file_dir/$1
pc_tmp_video_dir=$VIDEO_FILE_TEMP_DIR
pc_video_file_path=$pc_tmp_video_dir/$1
fi
elif [ $param_count == 0 ]; then
need_to_adb_pull=1
#获取相册目录下最新的视频文件
device_video_newest_file_name=`adb shell ls -lt /sdcard/DCIM/Camera | grep mp4 | head -n 1 |awk '{print $8}'`
device_video_file_path=$device_video_file_dir/$device_video_newest_file_name
pc_tmp_video_dir=$VIDEO_FILE_TEMP_DIR
pc_video_file_path=$pc_tmp_video_dir/$device_video_newest_file_name
fi
#需要同步手机文件至pc
if [ $need_to_adb_pull == 1 ]; then
echo "device_video_file_path:$device_video_file_path"
echo "pc_tmp_video_dir:$pc_tmp_video_dir"
if [ ! -d "$pc_tmp_video_dir" ];then
mkdir -p $pc_tmp_video_dir
fi
adb pull $device_video_file_path $pc_tmp_video_dir
fi
analysis_out_fps_from_single_file $pc_video_file_path
figout_fps_demanded $VIDEO_REAL_FPS
analysis_frameloss $pc_video_file_path
#是否需要删掉电脑上的视频文件
if [ $need_to_delete == 1 ]; then
rm -fr $pc_video_file_path
fi
结果:
在这里插入图片描述
事实上,可以根据这个结果判断连续丢帧是否严重,作为一个指标进行考量.
10.perflock 及 thermal Debug
有的时候为了排除是perflock 或thermal限制导致,需要排除这个因素,可以尝试将perflock或thermal机制给关掉验证,关闭方式跟平台可能有关系.
高通可通过stop掉perflock相关的服务来实现:
adb shell stop perfservice
adb shell stop vendor.perfservice
//可能没有,如果不直到该有哪些可以ps -ef | grep perf来看下哪些和perflock相关的服务进程给他关掉即可.
adb shell stop perf-hal-1-0
adb shell stop perf-hal-2-0
thermal机制关闭同样可以先ps看下有哪些与thermal相关的进程然后给它stop掉则thermal机制就不起作用了.
adb shell stop thermalservice
adb shell stop thermal-engine
adb shell stop vendor.thermal-engine
adb shell stop vendor.thermal-hal-1-0
有时候可能需要手动调cpu最大频率来看效果,各平台厂商都有提供方法,咨询下就有了不同平台可能不一样,
比如qcom文档Common Performance Issues Debugging Guide 中提到的
Issue verification by putting CPU in performance mode
For 8939/8952:
adb wait-for-device root
adb wait-for-device
adb shell setenforce 0
adb shell stop thermal-engine
adb shell "echo 1 > /sys/devices/system/cpu/cpu0/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu1/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu3/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu4/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu5/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu6/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu7/online"
adb shell "echo performance >
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"
adb shell "echo performance >
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor"
For 8909:
adb wait-for-device root
adb wait-for-device
adb shell setenforce 0
adb shell stop thermal-engine
adb shell "echo 1 > /sys/devices/system/cpu/cpu0/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu1/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu3/online"
adb shell "echo performance >
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"
同样的 qcom的gpu也有类似的perfmance mode:
.6 Issue verification by when GPU is in performance
mode
If the below steps, resolve the issue, check from GPU bandwidth voting info from dtsi files.
Check if there any heavy GL calls during the use case using Adreno profiler.
Check if there any Un-resolves during the use case by enabling resolve logs.
Check if GPU is not getting data in time and GPU is going to sleep from systrace.
For 8939/8952:
adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/force_rail_on"
adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/force_clk_on"
adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/force_bus_on"
adb shell "echo 10000000 > /sys/class/kgsl/kgsl-3d0/idle_timer"
adb shell "echo performance > /sys/class/kgsl/kgsl-3d0/devfreq/governor"
adb shell "echo 550000000 > /sys/class/kgsl/kgsl-3d0/gpuclk"
For 8909:
adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/force_rail_on"
adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/force_clk_on"
adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/force_bus_on"
adb shell "echo 10000000 > /sys/class/kgsl/kgsl-3d0/idle_timer"
adb shell "echo performance > /sys/class/kgsl/kgsl-3d0/devfreq/governor"
adb shell "echo 409600000 > /sys/class/kgsl/kgsl-3d0/gpuclk"
3.7 Issue verification by putting GPU and CPU in
performance mode
Please follow steps 3.5 and 3.6 to put the device in Performance mode.
If this step solves the issue, check the CPU and GPU BW governor, scaling and scheduler
parameters
Check if there are fence waits in the application rendering thread from systrace.
Check by increasing the app buffers to 4 in build.prop (hw.sf.app_buff_count=4).
If increasing the app buffers solves the issue, then issue has to be checked from DISPLAY
side for panel settings and other mdp parameters.
11.内存监控
可以自己写个内存监控的脚本监控内存变化情况,内存信息来源可以从下获取:
11.1 procrank
在这里插入图片描述在这里插入图片描述
11.2 top
参考:[Linux-Android][Log] Top命令打印含义
11.3 ION内存信息
动态选择解析
/sys/kernel/debug/ion/heaps/system
或者
/sys/kernel/debug/dma_buf/ [ bufinfo | dmaprocs ],
11.4 Gpu内存信息
动态选择解析
/sys/class/kgsl/kgsl/page_alloc
/sys/class/kgsl/kgsl/pagetables/$pid/mapped
/sys/kernel/debug/kgsl/proc/$pid/mem
ION 和 GPU这块还不熟悉,后面整理完再补充,这部分相当重要
GPU 内存泄漏问题,会导致界面显示卡顿,毕竟GPU和显示相关,分配GPU内存慢也导致显示变慢从而用户侧感受到界面卡顿.
GPU内存泄漏可以先看下泄漏时的泄露的内存类型:
高通机型:
在这里插入图片描述
在通过加callback trace来定位泄露点,或者能看到是哪张大图导致gpu内存泄露了也可以,怎么看texture就使用下面这个工具了.
12.GAPID
参考看到每个texture长什么样子,也就可以方便定位异常texture了.
参考:http://www.gcsjj.cn/articles/2019/06/04/1559658578252.html
三.基础知识补充
Camera性能Debug需要很多基础知识的支撑,列在这希望日后好好补充自己:
知识点 | 说明 | 参考 |
---|---|---|
1.Qcom Cpu Perflock | CPU调度合理,在关键时刻Boost,避免抢核绑核情况,均匀严密使用大小核可以有效提高特定场景下的相机性能 | qcom文档: 80-nr256-2_d_mpctl_feature.pdf 80-nt384-2_c_perflock_in_android_o.pdf 百度网盘: Perflock框架原理与应用分析-v90-20200424_200315.pdf |
2.Linux进程调度 | 重要 | linux书籍 |
3. Linux内存管理 | dma_buff ION相关,gpu内存等如何管理分配理解很重要 | |
4. FFmpeg开发技术 | 重要 | |
5.Camera App Framework HAL源码架构 | Qcom Camx中有很多不完善的地方,需要对代码架构熟悉以可以有底气去修改它改善他 | |
6.GPU | Gpu在显示上举足轻重,理解Gpu的工作原理或代码框架对改善相机预览或者应用帧率有很大的帮助 | |
7.Graphic Debug | ||
.... | 后续补充 |
网友评论