CTS/GTS问题分析9
一个由kernel层驱动错误导致的错误,查明的过程比较曲折,因此记录一下
问题初探
测试命令: run cts -m CtsHardwareTestCases -t android.hardware.input.cts.tests.GamepadTestCase#testButtonA
报错如下:
<Module name="CtsHardwareTestCases" abi="arm64-v8a" runtime="241587" done="true" pass="0">
<TestCase name="android.hardware.input.cts.tests.GamepadTestCase">
<Test result="fail" name="testButtonA">
<Failure message="java.lang.AssertionError: expected:<1> but was:<0> ">
<StackTrace>java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at android.hardware.input.cts.tests.InputTestCase.assertReceivedKeyEvent(InputTestCase.java:145)
at android.hardware.input.cts.tests.GamepadTestCase.testButtonA(GamepadTestCase.java:41)
测试case code如下:
32 @Test
33 public void testButtonA() throws Exception {
34 registerInputDevice(R.raw.gamepad_register_device);
35
36 sendHidCommands(R.raw.gamepad_button_a_down);
37 sendHidCommands(R.raw.gamepad_delay);
38 assertReceivedKeyEvent(KeyEvent.ACTION_DOWN, KeyEvent.KEYCODE_BUTTON_A);
39
40 sendHidCommands(R.raw.gamepad_button_a_up);
41 assertReceivedKeyEvent(KeyEvent.ACTION_UP, KeyEvent.KEYCODE_BUTTON_A);
42
43 assertNoMoreEvents();
44 }
那么看出流程大致如下: 首先注册一个名为Odie (Test)的hid device,首先发送一个down事件,延迟10ms,检查测试界面是否接收到了down事件;然后再传递一个up事件,然后检查测试界面是否接收到了up事件
case栈报错的原因是应该接收到up事件时接收到的还是down事件,因此case报错
问题分析
(a)分析过程1:
首先直觉怀疑是不是case的时间设置问题导致发送的down事件被测试页面当作了长按,这个验证很简单,简单修改下case
206 private class InputListener implements InputCallback {
207 @Override
208 public void onKeyEvent(KeyEvent ev) {
209 boolean done = false;
210 do {
211 try {
212 //mKeys.put(new KeyEvent(ev));
//改为:
if(ev.getRepeatCount()==0){
mKeys.put(new KeyEvent(ev));
}
213 done = true;
214 } catch (InterruptedException ignore) { }
215 } while (!done);
216 }
结果发现case还是fail,报错变为不能得到up事件,因此这里就可以证明,cts的测试页面InputCtsActivity并没有接收到up事件,侧面证明下,我们在测试过之前打开/dev/input,执行getevent -ltr,发现:
异常F1:
add device 8: /dev/input/event7
name: "Odie (Test)"
[ 24251.158735] /dev/input/event7: EV_LED LED_NUML 00000001
[ 24251.158735] /dev/input/event7: EV_MSC MSC_SCAN 00090001
[ 24251.158735] /dev/input/event7: EV_KEY BTN_GAMEPAD DOWN
[ 24251.158735] /dev/input/event7: EV_ABS ABS_X 0000007f
[ 24251.158735] /dev/input/event7: EV_ABS ABS_Y 0000007f
[ 24251.158735] /dev/input/event7: EV_ABS ABS_Z 0000007f
[ 24251.158735] /dev/input/event7: EV_ABS ABS_RZ 0000007f
[ 24251.158735] /dev/input/event7: EV_SYN SYN_REPORT 00000000 rate 0
case 卡住,强制结束case
[ 24416.424598] /dev/input/event7: EV_KEY BTN_GAMEPAD UP
[ 24416.424612] /dev/input/event7: EV_SYN SYN_REPORT 00000001 rate 0
remove device 8: /dev/input/event7
正常F5:
add device 8: /dev/input/event7
name: "Odie (Test)"
[ 9500.572331] /dev/input/event7: EV_LED LED_NUML 00000001
[ 9500.572331] /dev/input/event7: EV_MSC MSC_SCAN 00090001 //hid码
[ 9500.572331] /dev/input/event7: EV_KEY BTN_GAMEPAD DOWN
[ 9500.572331] /dev/input/event7: EV_ABS ABS_X 0000007f
[ 9500.572331] /dev/input/event7: EV_ABS ABS_Y 0000007f
[ 9500.572331] /dev/input/event7: EV_ABS ABS_Z 0000007f
[ 9500.572331] /dev/input/event7: EV_ABS ABS_RZ 0000007f
[ 9500.572331] /dev/input/event7: EV_SYN SYN_REPORT 00000000 (上报) //相当于一个按键结束
[ 9500.582252] /dev/input/event7: EV_MSC MSC_SCAN 00090001
[ 9500.582252] /dev/input/event7: EV_KEY BTN_GAMEPAD UP
[ 9500.582252] /dev/input/event7: EV_SYN SYN_REPORT 00000000 rate 100
remove device 8: /dev/input/event7
可见,up事件在case运行过程中并没有被正常接收到,直到case fail,测试页面InputCtsActivity pause才发出了up事件
11-07 03:03:58.819 30370 30370 D LifecycleMonitor: Lifecycle status change: android.hardware.input.cts.InputCtsActivity@f112acb in: PAUSED
11-07 03:03:58.820 4792 17838 D ActivityTrigger: activityResumeTrigger: The activity in ApplicationInfo{9d04809 com.android.launcher3} is now in focus and seems to be in full-screen mode
11-07 03:03:58.820 4792 17838 E ActivityTrigger: activityResumeTrigger: not whiteListedcom.android.launcher3/com.android.launcher3.Launcher/28
11-07 03:03:58.824 30370 30370 W ViewRootImpl[InputCtsActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_BUTTON_A, scanCode=304, metaState=0, flags=0xa8, repeatCount=0, eventTime=12083079, downTime=12082572, deviceId=14, source=0x501 }
(b)分析过程2:
上面说到up事件没有被正常接收到,首先看一下底层驱动是否有正常上传up事件,首先在input event从内核上传的必经之路input_event加一些log
http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/kernel/msm-4.14/drivers/input/input.c#510
void input_event(struct input_dev *dev,
unsigned int type, unsigned int code, int value)
{
unsigned long flags;
printk("weijuncheng input_event, dev = %s\n",dev->name);
printk("weijuncheng input_event, type = %x\n",type);
printk("weijuncheng input_event, code = %x\n",code);
printk("weijuncheng input_event, value = %x\n",value);
printk("---------------------------------------------------->a3");
dump_stack();
printk("---------------------------------------------------->a4");
if (is_event_supported(type, dev->evbit, EV_MAX)) {
spin_lock_irqsave(&dev->event_lock, flags);
printk("---------------------------------------------------->a5");
printk("weijuncheng input_event, dev = %s\n",dev->name);
printk("weijuncheng input_event, type = %x\n",type);
printk("weijuncheng input_event, code = %x\n",code);
printk("weijuncheng input_event, value = %x\n",value);
input_handle_event(dev, type, code, value);
printk("---------------------------------------------------->a6");
spin_unlock_irqrestore(&dev->event_lock, flags);
#ifdef CONFIG_LAST_TOUCH_EVENTS
touch_press_release_events_collect(dev, type, code, value);
#endif
}
}
EXPORT_SYMBOL(input_event);
相关log:
11-08 03:16:25.894 root 0 0 I : ---------------------------------------------------->a3
11-08 03:16:25.894 root 0 0 I : CPU: 1 PID: 4803 Comm: HidDeviceHandle Tainted: G S 4.14.76-perf-g92d7574-dirty #3
11-08 03:16:25.894 root 0 0 I Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 CEPHEUS (DT)
11-08 03:16:25.894 root 0 0 I Call trace:
11-08 03:16:25.894 root 0 0 I : dump_backtrace+0x0/0x188
11-08 03:16:25.894 root 0 0 I : show_stack+0x18/0x20
11-08 03:16:25.894 root 0 0 I : dump_stack+0xc4/0x100
11-08 03:16:25.894 root 0 0 I : input_event+0x78/0x268
11-08 03:16:25.894 root 0 0 I : hidinput_hid_event+0x498/0x530 //加工为input系统可识别的event
11-08 03:16:25.894 root 0 0 I : hid_process_event+0x12c/0x168
11-08 03:16:25.894 root 0 0 I : hid_report_raw_event+0x540/0x5d8
11-08 03:16:25.894 root 0 0 I : hid_input_report+0x178/0x1a8
11-08 03:16:25.894 root 0 0 I : uhid_char_write+0x390/0x5b8
11-08 03:16:25.894 root 0 0 I : __vfs_write+0x44/0x138 //从用户空间读取输入数据
11-08 03:16:25.894 root 0 0 I : vfs_write+0xc8/0x188
11-08 03:16:25.894 root 0 0 I : SyS_write+0x54/0xb8
11-08 03:16:25.894 root 0 0 I : el0_svc_naked+0x34/0x38
11-08 03:16:25.894 root 0 0 I : ---------------------------------------------------->a4
11-08 03:16:25.894 root 0 0 I : ---------------------------------------------------->a5
11-08 03:16:25.894 root 0 0 I : weijuncheng input_event, dev = Odie (Test)
11-08 03:16:25.894 root 0 0 I : weijuncheng input_event, type = 4
11-08 03:16:25.894 root 0 0 I : weijuncheng input_event, code = 4
11-08 03:16:25.894 root 0 0 I : weijuncheng input_event, value = 90001
11-08 03:16:25.894 root 0 0 I : ---------------------------------------------------->a6
最后发现,并没有up input event上传的迹象,而down事件是正常上传的,那么传递event事件流程中出问题的概率就很小了,那么首先怀疑up事件是否就根本没有从用户空间传下去,那么先看传输流程
case是通过hid command命令进行事件传输的,我们找到相关位置http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/frameworks/base/cmds/hid/src/com/android/commands/hid/Hid.java#46
整理关键代码如下:
93 private void process(Event e) {
94 final int index = mDevices.indexOfKey(e.getId());
95 if (index >= 0) {
96 Device d = mDevices.valueAt(index);
97 if (Event.COMMAND_DELAY.equals(e.getCommand())) {
98 d.addDelay(e.getDuration());
99 } else if (Event.COMMAND_REPORT.equals(e.getCommand())) {
100 d.sendReport(e.getReport());
101 } else {
102 if (Event.COMMAND_REGISTER.equals(e.getCommand())) {
103 error("Device id=" + e.getId() + " is already registered. Ignoring event.");
104 } else {
105 error("Unknown command \"" + e.getCommand() + "\". Ignoring event.");
106 }
107 }
108 } else if (Event.COMMAND_REGISTER.equals(e.getCommand())) {
109 registerDevice(e);
110 } else {
111 Log.e(TAG, "Unknown device id specified. Ignoring event.");
112 }
113 }
frameworks/base/cmds/hid/src/com/android/commands/hid/Device.java
72 public void sendReport(byte[] report) {
73 Message msg = mHandler.obtainMessage(MSG_SEND_REPORT, report);
74 // if two messages are sent at identical time, they will be processed in order received
75 mHandler.sendMessageAtTime(msg, mTimeToSend);
76 }
77
78 public void addDelay(int delay) {
79 mTimeToSend = Math.max(SystemClock.uptimeMillis(), mTimeToSend) + delay;
80 }
100 @Override
101 public void handleMessage(Message msg) {
102 switch (msg.what) {
103 case MSG_OPEN_DEVICE:
104 SomeArgs args = (SomeArgs) msg.obj;
105 mPtr = nativeOpenDevice((String) args.arg1, args.argi1, args.argi2, args.argi3,
106 (byte[]) args.arg2, new DeviceCallback());
107 pauseEvents();
108 break;
109 case MSG_SEND_REPORT:
110 if (mPtr != 0) {
111 nativeSendReport(mPtr, (byte[]) msg.obj);
112 } else {
113 Log.e(TAG, "Tried to send report to closed device.");
114 }
115 break;
116 case MSG_CLOSE_DEVICE:
...
128 default:
129 throw new IllegalArgumentException("Unknown device message");
130 }
131 }
frameworks/base/cmds/hid/jni/com_android_commands_hid_Device.cpp
240static JNINativeMethod sMethods[] = {
241 { "nativeOpenDevice",
242 "(Ljava/lang/String;III[B"
243 "Lcom/android/commands/hid/Device$DeviceCallback;)J",
244 reinterpret_cast<void*>(openDevice) },
245 { "nativeSendReport", "(J[B)V", reinterpret_cast<void*>(sendReport) },
246 { "nativeCloseDevice", "(J)V", reinterpret_cast<void*>(closeDevice) },
247};
222static void sendReport(JNIEnv* env, jclass /* clazz */, jlong ptr, jbyteArray rawReport) {
223 size_t size;
224 std::unique_ptr<uint8_t[]> report = getData(env, rawReport, size);
225 uhid::Device* d = reinterpret_cast<uhid::Device*>(ptr);
226 if (d) {
227 d->sendReport(report.get(), size);
228 } else {
229 LOGE("Could not send report, Device* is null!");
230 }
231}
159void Device::sendReport(uint8_t* report, size_t reportSize) {
160 struct uhid_event ev;
161 memset(&ev, 0, sizeof(ev));
162 ev.type = UHID_INPUT2;
163 ev.u.input2.size = reportSize;
164 memcpy(&ev.u.input2.data, report, reportSize);
165 ssize_t ret = TEMP_FAILURE_RETRY(::write(mFd, &ev, sizeof(ev))); //在这里将相应event写到内核中
166 if (ret < 0 || ret != sizeof(ev)) {
167 LOGE("Failed to send hid event: %s", strerror(errno));
168 }
169}
通过加log,发现up事件的handleMessage没有如期处理,那么up事件就没有被向下传递,问题的分析又被转回到上层,下面就要分析handleMessage没有被处理的原因
(c)分析过程3:
首先如果要继续分析,需要对Handler有个大致的了解,简单来说就是一个线程向另一个持有looper的线程发送消息,持有looper的线程通过loop()循环选取Message进行处理
那么只有两种可能: 1.消息没入队 2.入队没处理
首先看消息没入队的情况,本例中是通过sendMessageAtTime来向队列中投递消息的,大体流程
frameworks/base/core/java/android/os/Handler.java
689 public boolean sendMessageAtTime(Message msg, long uptimeMillis) {
690 MessageQueue queue = mQueue;
691 if (queue == null) {
692 RuntimeException e = new RuntimeException(
693 this + " sendMessageAtTime() called with no mQueue");
694 Log.w("Looper", e.getMessage(), e);
695 return false;
696 }
697 return enqueueMessage(queue, msg, uptimeMillis);
698 }
740 private boolean enqueueMessage(MessageQueue queue, Message msg, long uptimeMillis) {
741 msg.target = this;
742 if (mAsynchronous) {
743 msg.setAsynchronous(true);
744 }
745 return queue.enqueueMessage(msg, uptimeMillis);
746 }
frameworks/base/core/java/android/os/MessageQueue.java
boolean enqueueMessage(Message msg, long when) {
537 if (msg.target == null) {
538 throw new IllegalArgumentException("Message must have a target.");
539 }
540 if (msg.isInUse()) {
541 throw new IllegalStateException(msg + " This message is already in use.");
542 }
543
544 synchronized (this) {
545 if (mQuitting) {
546 IllegalStateException e = new IllegalStateException(
547 msg.target + " sending message to a Handler on a dead thread");
548 Log.w(TAG, e.getMessage(), e);
549 msg.recycle();
550 return false;
551 }
552
553 msg.markInUse();
554 msg.when = when;
555 Message p = mMessages;
556 boolean needWake;
557 if (p == null || when == 0 || when < p.when) {
558 // New head, wake up the event queue if blocked.
559 msg.next = p;
560 mMessages = msg;
561 needWake = mBlocked; //如果入队前队列为空,或者入队Message为对头,需要唤醒queue
562 } else {
563 // Inserted within the middle of the queue. Usually we don't have to wake
564 // up the event queue unless there is a barrier at the head of the queue
565 // and the message is the earliest asynchronous message in the queue.
566 needWake = mBlocked && p.target == null && msg.isAsynchronous(); //如果当前队列中已经有其他消息,只有特殊情况下才需要wake queue (异步message)
567 Message prev;
568 for (;;) {
569 prev = p;
570 p = p.next;
571 if (p == null || when < p.when) {
572 break;
573 }
574 if (needWake && p.isAsynchronous()) {
575 needWake = false;
576 }
577 }
578 msg.next = p; // invariant: p == prev.next
579 prev.next = msg; //插入到适当位置
580 }
581
582 // We can assume mPtr != 0 because mQuitting is false.
583 if (needWake) {
584 nativeWake(mPtr); //唤醒队列
585 }
586 }
587 return true;
588 }
我们注意到enqueueMessage是会返回是否入队成功的,那么问题就很简单了,在case里把是否入队成功打出来就行了,结果发现入队是成功的;那么就剩下第二种可能,消息虽然在队中,但没有被取出来执行,这里的流程在http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/frameworks/base/core/java/android/os/Looper.java#137:
137 public static void loop() {
138 final Looper me = myLooper();
...
142 final MessageQueue queue = me.mQueue;
143
...
158
159 for (;;) {
160 Message msg = queue.next(); // might block
...
192 try {
193 msg.target.dispatchMessage(msg);
194 dispatchEnd = needEndTime ? SystemClock.uptimeMillis() : 0;
195 } finally {
196 if (traceTag != 0) {
197 Trace.traceEnd(traceTag);
198 }
199 }
...
233 msg.recycleUnchecked();
234 }
235 }
那么肯定是卡在Message msg = queue.next(); // might block ,这也解释了为什么测试报错之后会卡住不生成log的原因,直到触发了CTS框架的整体超时机制,那么下面看一下MessageQueue.next()的操作
Message next() {
311 // Return here if the message loop has already quit and been disposed.
312 // This can happen if the application tries to restart a looper after quit
313 // which is not supported.
314 final long ptr = mPtr;
315 if (ptr == 0) {
316 return null;
317 }
318
319 int pendingIdleHandlerCount = -1; // -1 only during first iteration
320 int nextPollTimeoutMillis = 0;
321 for (;;) {
322 if (nextPollTimeoutMillis != 0) {
323 Binder.flushPendingCommands();
324 }
325
326 nativePollOnce(ptr, nextPollTimeoutMillis); //nextPollTimeoutMillis 队列为空时一直等待,直到被唤醒(下一个Message入队)
327
328 synchronized (this) {
329 // Try to retrieve the next message. Return if found.
330 final long now = SystemClock.uptimeMillis();
331 Message prevMsg = null;
332 Message msg = mMessages;
333 if (msg != null && msg.target == null) {
334 // Stalled by a barrier. Find the next asynchronous message in the queue.
335 do {
336 prevMsg = msg;
337 msg = msg.next;
338 } while (msg != null && !msg.isAsynchronous());
339 }
340 if (msg != null) {
341 if (now < msg.when) {
342 // Next message is not ready. Set a timeout to wake up when it is ready.
343 nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE); //队列中有Message了,但最快需要执行的Message还没到时间
344 } else {
345 // Got a message.
346 mBlocked = false;
347 if (prevMsg != null) {
348 prevMsg.next = msg.next;
349 } else {
350 mMessages = msg.next;
351 }
352 msg.next = null;
353 if (DEBUG) Log.v(TAG, "Returning message: " + msg);
354 msg.markInUse();
355 return msg; //到了时间后,取出Message,并将queue的状态改为blocked = false;
356 }
357 } else {
358 // No more messages.
359 nextPollTimeoutMillis = -1; //队列为空时一直阻塞,直到被唤醒
360 }
361
362 // Process the quit message now that all pending messages have been handled.
363 if (mQuitting) {
364 dispose();
365 return null;
366 }
367
368 // If first time idle, then get the number of idlers to run.
369 // Idle handles only run if the queue is empty or if the first message
370 // in the queue (possibly a barrier) is due to be handled in the future.
371 if (pendingIdleHandlerCount < 0
372 && (mMessages == null || now < mMessages.when)) {
373 pendingIdleHandlerCount = mIdleHandlers.size();
374 }
375 if (pendingIdleHandlerCount <= 0) {
376 // No idle handlers to run. Loop and wait some more.
377 mBlocked = true; //除非顺利返回了Message,否则 nextPollTimeoutMillis >0或-1,分别代表首个Message没到时间或队列为空,此时队列阻塞,并进入下一个等待 循环
378 continue;
379 }
380
381 if (mPendingIdleHandlers == null) {
382 mPendingIdleHandlers = new IdleHandler[Math.max(pendingIdleHandlerCount, 4)];
383 }
384 mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers);
385 }
386
387 // Run the idle handlers.
388 // We only ever reach this code block during the first iteration.
389 for (int i = 0; i < pendingIdleHandlerCount; i++) {
390 final IdleHandler idler = mPendingIdleHandlers[i];
391 mPendingIdleHandlers[i] = null; // release the reference to the handler
392
393 boolean keep = false;
394 try {
395 keep = idler.queueIdle();
396 } catch (Throwable t) {
397 Log.wtf(TAG, "IdleHandler threw exception", t);
398 }
399
400 if (!keep) {
401 synchronized (this) {
402 mIdleHandlers.remove(idler);
403 }
404 }
405 }
406
407 // Reset the idle handler count to 0 so we do not run them again.
408 pendingIdleHandlerCount = 0;
409
410 // While calling an idle handler, a new message could have been delivered
411 // so go back and look again for a pending message without waiting.
412 nextPollTimeoutMillis = 0;
413 }
414 }
因为case卡住了,那么很方便我们抓trace,先判断卡在哪里,发现果然卡在nativePollOnce
"main" prio=10 tid=1 Waiting
| group="main" sCount=1 dsCount=0 flags=1 obj=0x12c01ee8 self=0x7758e14c00
| sysTid=11568 nice=-8 cgrp=default sched=0/0 handle=0x77dec83548
| state=S schedstat=( 171782606 895523 56 ) utm=11 stm=5 core=7 HZ=100
| stack=0x7fde185000-0x7fde187000 stackSize=8MB
| held mutexes=
at java.lang.Object.wait(Native method)
- waiting on <0x0d54e4d2> (a java.lang.Object)
at com.android.commands.hid.Device.close(Device.java:107)
- locked <0x0d54e4d2> (a java.lang.Object)
at com.android.commands.hid.Hid.run(Hid.java:90)
at com.android.commands.hid.Hid.main(Hid.java:60)
at com.android.internal.os.RuntimeInit.nativeFinishInit(Native method)
at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:388)
"HidDeviceHandler" prio=10 tid=10 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x12c31a98 self=0x7758f8f000
| sysTid=11583 nice=0 cgrp=default sched=0/0 handle=0x7751a484f0
| state=S schedstat=( 8710574 6102134 7 ) utm=0 stm=0 core=4 HZ=100
| stack=0x7751945000-0x7751947000 stackSize=1041KB
| held mutexes=
kernel: (couldn't read /proc/self/task/11583/stack)
native: #00 pc 000000000007aacc /system/lib64/libc.so (__epoll_pwait+8)
native: #01 pc 0000000000014d48 /system/lib64/libutils.so (android::Looper::pollInner(int)+148)
native: #02 pc 0000000000014c18 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60)
native: #03 pc 00000000001271b8 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:326)
at android.os.Looper.loop(Looper.java:162)
at android.os.HandlerThread.run(HandlerThread.java:65)
队列为空时timeout为-1,会一直等待,但是等到下一个Message入队时就应该被激活啊,而我们前面已经验证了up Message是入队的,因此这里就矛盾了,所以肯定还有什么我们没考虑的地方
先看下nativePollOnce的等待流程
frameworks/base/core/jni/android_os_MessageQueue.cpp
212static const JNINativeMethod gMessageQueueMethods[] = {
213 /* name, signature, funcPtr */
...
216 { "nativePollOnce", "(JI)V", (void*)android_os_MessageQueue_nativePollOnce },
217 { "nativeWake", "(J)V", (void*)android_os_MessageQueue_nativeWake },
...
221};
188static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj,
189 jlong ptr, jint timeoutMillis) {
190 NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
191 nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
192}
107void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
108 mPollEnv = env;
109 mPollObj = pollObj;
110 mLooper->pollOnce(timeoutMillis);
111 mPollObj = NULL;
112 mPollEnv = NULL;
...
119}
/system/core/libutils/Looper.cpp
178int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
179 int result = 0;
180 for (;;) {
181 while (mResponseIndex < mResponses.size()) {
182 const Response& response = mResponses.itemAt(mResponseIndex++);
183 int ident = response.request.ident;
184 if (ident >= 0) {
185 int fd = response.request.fd;
186 int events = response.events;
187 void* data = response.request.data;
188#if DEBUG_POLL_AND_WAKE
189 ALOGD("%p ~ pollOnce - returning signalled identifier %d: "
190 "fd=%d, events=0x%x, data=%p",
191 this, ident, fd, events, data);
192#endif
193 if (outFd != NULL) *outFd = fd;
194 if (outEvents != NULL) *outEvents = events;
195 if (outData != NULL) *outData = data;
196 return ident; //针对回调函数为空的情况,ident值必为一个大于等于0的值
197 }
198 }
199
200 if (result != 0) {
201#if DEBUG_POLL_AND_WAKE
202 ALOGD("%p ~ pollOnce - returning result %d", this, result);
203#endif
204 if (outFd != NULL) *outFd = 0;
205 if (outEvents != NULL) *outEvents = 0;
206 if (outData != NULL) *outData = NULL;
207 return result; //根据pollInner的结果进行处理,实际上是针对设置了回调函数的情况,因为设置了回调函数,所以已经对发生的事件做了处理了,所以,不需要将发生事件的相关信息再返回 给调用者了
208 }
209
210 result = pollInner(timeoutMillis);
211 }
212}
214int Looper::pollInner(int timeoutMillis) {
215#if DEBUG_POLL_AND_WAKE
216 ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis);
217#endif
218
219 // Adjust the timeout based on when the next message is due.
220 if (timeoutMillis != 0 && mNextMessageUptime != LLONG_MAX) {
221 nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
222 int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
223 if (messageTimeoutMillis >= 0
224 && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) {
225 timeoutMillis = messageTimeoutMillis; //重新计算超时时间
226 }
227#if DEBUG_POLL_AND_WAKE
228 ALOGD("%p ~ pollOnce - next message in %" PRId64 "ns, adjusted timeout: timeoutMillis=%d",
229 this, mNextMessageUptime - now, timeoutMillis);
230#endif
231 }
232
233 // Poll.
234 int result = POLL_WAKE;
235 mResponses.clear();
236 mResponseIndex = 0;
237
238 // We are about to idle.
239 mPolling = true;
240
241 struct epoll_event eventItems[EPOLL_MAX_EVENTS];
242 int eventCount = epoll_wait(mEpollFd, eventItems, EPOLL_MAX_EVENTS, timeoutMillis);
243
244 // No longer idling.
245 mPolling = false;
246
247 // Acquire lock.
248 mLock.lock();
249
250 // Rebuild epoll set if needed.
251 if (mEpollRebuildRequired) {
252 mEpollRebuildRequired = false;
253 rebuildEpollLocked();
254 goto Done;
255 }
256
257 // Check for poll error.
258 if (eventCount < 0) { //返回值小于零,表示发生错误
259 if (errno == EINTR) {
260 goto Done;
261 }
262 ALOGW("Poll failed with an unexpected error: %s", strerror(errno));
263 result = POLL_ERROR;
264 goto Done;
265 }
266
267 // Check for poll timeout.
268 if (eventCount == 0) { //eventCount为零,表示发生超时,因此直接跳转到Done
269#if DEBUG_POLL_AND_WAKE
270 ALOGD("%p ~ pollOnce - timeout", this);
271#endif
272 result = POLL_TIMEOUT;
273 goto Done;
274 }
275
276 // Handle all events.
277#if DEBUG_POLL_AND_WAKE
278 ALOGD("%p ~ pollOnce - handling events from %d fds", this, eventCount);
279#endif
280
281 for (int i = 0; i < eventCount; i++) {
282 int fd = eventItems[i].data.fd;
283 uint32_t epollEvents = eventItems[i].events;
284 if (fd == mWakeEventFd) {
285 if (epollEvents & EPOLLIN) {
//通过wake唤醒队列
286 awoken();
287 } else {
288 ALOGW("Ignoring unexpected epoll events 0x%x on wake event fd.", epollEvents);
289 }
290 } else {
291 ssize_t requestIndex = mRequests.indexOfKey(fd);
292 if (requestIndex >= 0) {
293 int events = 0;
294 if (epollEvents & EPOLLIN) events |= EVENT_INPUT;
295 if (epollEvents & EPOLLOUT) events |= EVENT_OUTPUT;
296 if (epollEvents & EPOLLERR) events |= EVENT_ERROR;
297 if (epollEvents & EPOLLHUP) events |= EVENT_HANGUP;
298 pushResponse(events, mRequests.valueAt(requestIndex)); //如果是其他FD发生事件,则根据Request构造Response,并push到Response数组
//当某个监控fd上发生事件后,就会把对应的Request取出来调用。
299 } else {
300 ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is "
301 "no longer registered.", epollEvents, fd);
302 }
303 }
304 }
305Done: ;
306
307 // Invoke pending message callbacks.
308 mNextMessageUptime = LLONG_MAX;
309 while (mMessageEnvelopes.size() != 0) {
310 nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
311 const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0);
312 if (messageEnvelope.uptime <= now) {
313 // Remove the envelope from the list.
314 // We keep a strong reference to the handler until the call to handleMessage
315 // finishes. Then we drop it so that the handler can be deleted *before*
316 // we reacquire our lock.
317 { // obtain handler
318 sp<MessageHandler> handler = messageEnvelope.handler;
319 Message message = messageEnvelope.message;
320 mMessageEnvelopes.removeAt(0);
321 mSendingMessage = true;
322 mLock.unlock();
323
324#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
325 ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d",
326 this, handler.get(), message.what);
327#endif
328 handler->handleMessage(message);
329 } // release handler
330
331 mLock.lock();
332 mSendingMessage = false;
333 result = POLL_CALLBACK;
334 } else {
335 // The last message left at the head of the queue determines the next wakeup time.
336 mNextMessageUptime = messageEnvelope.uptime;
337 break;
338 }
339 }
340
341 // Release lock.
342 mLock.unlock();
343
344 // Invoke all response callbacks.
345 for (size_t i = 0; i < mResponses.size(); i++) {
346 Response& response = mResponses.editItemAt(i);
347 if (response.request.ident == POLL_CALLBACK) {
348 int fd = response.request.fd;
349 int events = response.events;
350 void* data = response.request.data;
351#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
352 ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
353 this, response.request.callback.get(), fd, events, data);
354#endif
355 // Invoke the callback. Note that the file descriptor may be closed by
356 // the callback (and potentially even reused) before the function returns so
357 // we need to be a little careful when removing the file descriptor afterwards.
358 int callbackResult = response.request.callback->handleEvent(fd, events, data);//回调其他fd定义的callback
359 if (callbackResult == 0) {
360 removeFd(fd, response.request.seq);
361 }
362
363 // Clear the callback reference in the response structure promptly because we
364 // will not clear the response vector itself until the next poll
365 response.request.callback.clear();
366 result = POLL_CALLBACK;
367 }
368 }
369 return result;
370}
Looper初始化时会初始化一个fd,**mWakeEventFd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); **
前面说道,当一个队列中没有Message时,会陷入无限等待,当新消息入队时,我们将MessageQueue重新唤醒:
enqueueMessage中
583 if (needWake) {
584 nativeWake(mPtr);
585 }
194static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
195 NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
196 nativeMessageQueue->wake();
197}
121void NativeMessageQueue::wake() {
122 mLooper->wake();
123}
398void Looper::wake() {
399#if DEBUG_POLL_AND_WAKE
400 ALOGD("%p ~ wake", this);
401#endif
402
403 uint64_t inc = 1;
404 ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd, &inc, sizeof(uint64_t)));
405 if (nWrite != sizeof(uint64_t)) {
406 if (errno != EAGAIN) {
407 LOG_ALWAYS_FATAL("Could not write wake signal to fd %d: %s",
408 mWakeEventFd, strerror(errno));
409 }
410 }
411}
通过epoll机制自动唤醒mWakeEventFd,那么没有被唤醒说明needWake为false,如果MessageQueue为空,这个值只能根据当前是否block来决定,那么必然会wake,只能怀疑Messagequeue不为空
在根据这里,强烈怀疑设置了一个屏障Message导致queue没有被wakeup
needWake = mBlocked && p.target == null && msg.isAsynchronous(); //如果当前队列中已经有其他消息,只有特殊情况下才需要wake queue (异步message)
再看case:
100 @Override
101 public void handleMessage(Message msg) {
102 switch (msg.what) {
103 case MSG_OPEN_DEVICE:
104 SomeArgs args = (SomeArgs) msg.obj;
105 mPtr = nativeOpenDevice((String) args.arg1, args.argi1, args.argi2, args.argi3,
106 (byte[]) args.arg2, new DeviceCallback());
107 pauseEvents();
143 private class DeviceCallback {
144 public void onDeviceOpen() {
145 mHandler.resumeEvents();
146 }
处理Message (MSG_OPEN_DEVICE)设置了屏障,在DeviceCallback中的onDeviceOpen被消除,如果这个没有回调,就会发生这个up消息入队却无法执行的现象,case中加log发现果然没有被回调。那么问题又回到底层,需要找出Callback没有被回调的原因。
(d)分析过程4:
首先看这个callback被调用的位置
171int Device::handleEvents(int events) {
172 if (events & (ALOOPER_EVENT_ERROR | ALOOPER_EVENT_HANGUP)) {
173 LOGE("uhid node was closed or an error occurred. events=0x%x", events);
174 mDeviceCallback->onDeviceError();
175 return 0;
176 }
177 struct uhid_event ev;
178 ssize_t ret = TEMP_FAILURE_RETRY(::read(mFd, &ev, sizeof(ev)));
179 if (ret < 0) {
180 LOGE("Failed to read from uhid node: %s", strerror(errno));
181 mDeviceCallback->onDeviceError();
182 return 0;
183 }
184
185 if (ev.type == UHID_OPEN) {
186 mDeviceCallback->onDeviceOpen();
187 }
188
189 return 1;
190}
52static int handleLooperEvents(int /* fd */, int events, void* data) {
53 Device* d = reinterpret_cast<Device*>(data);
54 return d->handleEvents(events);
55}
133Device::Device(int32_t id, int fd, std::unique_ptr<DeviceCallback> callback) :
134 mId(id), mFd(fd), mDeviceCallback(std::move(callback)) {
135 ALooper* aLooper = ALooper_forThread();
136 if (aLooper == NULL) {
137 LOGE("Could not get ALooper, ALooper_forThread returned NULL");
138 aLooper = ALooper_prepare(ALOOPER_PREPARE_ALLOW_NON_CALLBACKS);
139 }
140 ALooper_addFd(aLooper, fd, 0, ALOOPER_EVENT_INPUT, handleLooperEvents,
141 reinterpret_cast<void*>(this));
142}
在这里add_Fd,然后在pollInner中进行处理
345 for (size_t i = 0; i < mResponses.size(); i++) {
346 Response& response = mResponses.editItemAt(i);
347 if (response.request.ident == POLL_CALLBACK) {
348 int fd = response.request.fd;
349 int events = response.events;
350 void* data = response.request.data;
351#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
352 ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
353 this, response.request.callback.get(), fd, events, data);
354#endif
355 // Invoke the callback. Note that the file descriptor may be closed by
356 // the callback (and potentially even reused) before the function returns so
357 // we need to be a little careful when removing the file descriptor afterwards.
//在这里进行callback的回调
358 int callbackResult = response.request.callback->handleEvent(fd, events, data);
359 if (callbackResult == 0) {
360 removeFd(fd, response.request.seq);
361 }
362
363 // Clear the callback reference in the response structure promptly because we
364 // will not clear the response vector itself until the next poll.
365 response.request.callback.clear();
366 result = POLL_CALLBACK;
367 }
368 }
369 return result;
370}
可见onDeviceOpen这个callback的回调与这个event UHID_OPEN息息相关,可见是没有从内核上传UHID_OPEN上来,我们看一下uhid.txt中的说明:
47The first thing you should do is sending an UHID_CREATE2 event. This will
48register the device. UHID will respond with an UHID_START event. You can now
49start sending data to and reading data from UHID. However, unless UHID sends the
50UHID_OPEN event, the internally attached HID Device Driver has no user attached.
51That is, you might put your device asleep unless you receive the UHID_OPEN
52event. If you receive the UHID_OPEN event, you should start I/O. If the last
53user closes the HID device, you will receive an UHID_CLOSE event. This may be
54followed by an UHID_OPEN event again and so on. There is no need to perform
55reference-counting in user-space. That is, you will never receive multiple
56UHID_OPEN events without an UHID_CLOSE event. The HID subsystem performs
57ref-counting for you.
58You may decide to ignore UHID_OPEN/UHID_CLOSE, though. I/O is allowed even
59though the device may have no users.
The first thing you should do is sending an UHID_CREATE2 event. This will register the device. UHID will respond with an UHID_START event.
从用户态发送UHID_CREATE2到内核态(UHID),然后内核态(UHID)返回UHID_START和UHID_OPEN到用户态
然后通过添加log,大致理清楚了流程
首先在http://guard.pt.miui.com/opengrok/xref/v10-p-dipper-dev/kernel/msm-4.9/drivers/hid/uhid.c#642中初始化工作队列:
623static int uhid_char_open(struct inode *inode, struct file *file)
624{ 625 struct uhid_device *uhid;
626
627 uhid = kzalloc(sizeof(*uhid), GFP_KERNEL);
628 if (!uhid)
629 return -ENOMEM;
630
631 mutex_init(&uhid->devlock);
632 mutex_init(&uhid->report_lock);
633 spin_lock_init(&uhid->qlock);
634 init_waitqueue_head(&uhid->waitq);
635 init_waitqueue_head(&uhid->report_wait); 636 uhid->running = false;
637 INIT_WORK(&uhid->worker, uhid_device_add_worker);
638
639 file->private_data = uhid;
640 nonseekable_open(inode, file);
641
642 return 0;
643}
11-15 01:51:18.455 root 0 0 I : uhid_char_open+0xb4/0x110
11-15 01:51:18.455 root 0 0 I : misc_open+0x12c/0x168
11-15 01:51:18.455 root 0 0 I : chrdev_open+0x12c/0x200
11-15 01:51:18.455 root 0 0 I : do_dentry_open+0x12c/0x2e8
11-15 01:51:18.455 root 0 0 I : vfs_open+0x54/0x88
11-15 01:51:18.455 root 0 0 I : path_openat+0x91c/0x14d0
11-15 01:51:18.455 root 0 0 I : do_filp_open+0x74/0x120
11-15 01:51:18.455 root 0 0 I : do_sys_open+0x134/0x248
11-15 01:51:18.455 root 0 0 I : SyS_openat+0x10/0x18
11-15 01:51:18.455 root 0 0 I : el0_svc_naked+0x34/0x38
然后从用户态向内核态发送event UHID_CREATE2
92Device* Device::open(int32_t id, const char* name, int32_t vid, int32_t pid,
93 std::unique_ptr<uint8_t[]> descriptor, size_t descriptorSize,
94 std::unique_ptr<DeviceCallback> callback) {
95
96 int fd = ::open(UHID_PATH, O_RDWR | O_CLOEXEC);
97 if (fd < 0) {
98 LOGE("Failed to open uhid: %s", strerror(errno));
99 return nullptr;
100 }
101
102 struct uhid_event ev;
103 memset(&ev, 0, sizeof(ev));
104 ev.type = UHID_CREATE2;
105 strncpy((char*)ev.u.create2.name, name, UHID_MAX_NAME_LENGTH);
106 memcpy(&ev.u.create2.rd_data, descriptor.get(),
107 descriptorSize * sizeof(ev.u.create2.rd_data[0]));
108 ev.u.create2.rd_size = descriptorSize;
109 ev.u.create2.bus = BUS_BLUETOOTH;
110 ev.u.create2.vendor = vid;
111 ev.u.create2.product = pid;
112 ev.u.create2.version = 0;
113 ev.u.create2.country = 0;
114
115 errno = 0;
116 ssize_t ret = TEMP_FAILURE_RETRY(::write(fd, &ev, sizeof(ev))); //向内核态写
117 if (ret < 0 || ret != sizeof(ev)) {
118 ::close(fd);
119 LOGE("Failed to create uhid node: %s", strerror(errno));
120 return nullptr;
121 }
122
123 // Wait for the device to actually be created.
124 ret = TEMP_FAILURE_RETRY(::read(fd, &ev, sizeof(ev))); //等待内核态的返回结果
125 if (ret < 0 || ev.type != UHID_START) {
126 ::close(fd);
127 LOGE("uhid node failed to start: %s", strerror(errno));
128 return nullptr;
129 }
130 return new Device(id, fd, std::move(callback));
131}
然后内核态监听到这个event后,开始uhid_dev_create2创建操作
734 case UHID_CREATE2:
735 ret = uhid_dev_create2(uhid, &uhid->input_buf);
736 break;
478static int uhid_dev_create2(struct uhid_device *uhid,
479 const struct uhid_event *ev)
480{
481 struct hid_device *hid;
482 size_t rd_size, len;
483 void *rd_data;
484 int ret;
485
486 if (uhid->running)
487 return -EALREADY;
488
489 rd_size = ev->u.create2.rd_size;
490 if (rd_size <= 0 || rd_size > HID_MAX_DESCRIPTOR_SIZE)
491 return -EINVAL;
492
493 rd_data = kmemdup(ev->u.create2.rd_data, rd_size, GFP_KERNEL);
494 if (!rd_data)
495 return -ENOMEM;
496
497 uhid->rd_size = rd_size;
498 uhid->rd_data = rd_data;
499
500 hid = hid_allocate_device();
501 if (IS_ERR(hid)) {
502 ret = PTR_ERR(hid);
503 goto err_free;
504 }
505
506 len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1;
507 strncpy(hid->name, ev->u.create2.name, len);
508 len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1;
509 strncpy(hid->phys, ev->u.create2.phys, len);
510 len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1;
511 strncpy(hid->uniq, ev->u.create2.uniq, len);
512
513 hid->ll_driver = &uhid_hid_driver;
514 hid->bus = ev->u.create2.bus;
515 hid->vendor = ev->u.create2.vendor;
516 hid->product = ev->u.create2.product;
517 hid->version = ev->u.create2.version;
518 hid->country = ev->u.create2.country;
519 hid->driver_data = uhid;
520 hid->dev.parent = uhid_misc.this_device;
521
522 uhid->hid = hid;
523 uhid->running = true;
524
525 /* Adding of a HID device is done through a worker, to allow HID drivers
526 * which use feature requests during .probe to work, without they would
527 * be blocked on devlock, which is held by uhid_char_write.
528 */
529 schedule_work(&uhid->worker);
530
531 return 0;
532
533err_free:
534 kfree(uhid->rd_data);
535 uhid->rd_data = NULL;
536 uhid->rd_size = 0;
537 return ret;
538}
注意这里schedule_work(&uhid->worker); 将工作结构变量添加到系统共享队列,等待执行,其工作处理函数为uhid_device_add_worker,在其中发送UHID_START和UHID_OPEN,大体流程如下:
<4>[ 643.029505] weijuncheng ---------------------------------------------------->uhid_queue UHID_START start
<4>[ 643.029510] CPU: 4 PID: 7339 Comm: kworker/4:0 Tainted: G W O 4.9.112-perf-g3dd9240-dirty #2
<4>[ 643.029511] Hardware name: Xiaomi Technologies, Inc. Perseus P3 v2.1 (DT)
<4>[ 643.029519] Workqueue: events uhid_device_add_worker
<4>[ 643.029522] Call trace:
<4>[ 643.029527] [<ffffff96a8c8a808>] dump_backtrace+0x0/0x2ac
<4>[ 643.029529] [<ffffff96a8c8ad34>] show_stack+0x14/0x1c
<4>[ 643.029533] [<ffffff96a905c02c>] dump_stack+0x94/0xb4
<4>[ 643.029534] [<ffffff96a982a198>] uhid_hid_start+0xb8/0xe4
<4>[ 643.029536] [<ffffff96a98234a8>] hid_device_probe+0xfc/0x140
<4>[ 643.029539] [<ffffff96a93891e4>] driver_probe_device+0x24c/0x430
<4>[ 643.029541] [<ffffff96a93895b0>] __device_attach_driver+0xd4/0x148
<4>[ 643.029542] [<ffffff96a9386ad8>] bus_for_each_drv+0x60/0xb0
<4>[ 643.029543] [<ffffff96a9388d84>] __device_attach+0x100/0x17c
<4>[ 643.029544] [<ffffff96a93897bc>] device_initial_probe+0x10/0x18
<4>[ 643.029545] [<ffffff96a93880e0>] bus_probe_device+0x90/0x98
<4>[ 643.029547] [<ffffff96a93858d0>] device_add+0x420/0x58c
<4>[ 643.029548] [<ffffff96a9823104>] hid_add_device+0xe0/0x2a0
<4>[ 643.029550] [<ffffff96a98297f0>] uhid_device_add_worker+0x1c/0x58
<4>[ 643.029552] [<ffffff96a8cc8d8c>] process_one_work+0x168/0x45c
<4>[ 643.029553] [<ffffff96a8cc90d4>] worker_thread+0x54/0x46c
<4>[ 643.029555] [<ffffff96a8ccf838>] kthread+0xe0/0xf4
<4>[ 643.029556] [<ffffff96a8c82ee0>] ret_from_fork+0x10/0x30
<4>[ 643.029558] weijuncheng ---------------------------------------------------->uhid_queue UHID_START end
<4>[ 643.029762] weijuncheng ---------------------------------------------------->uhid_queue UHID_OPEN start
<4>[ 643.029764] CPU: 4 PID: 7339 Comm: kworker/4:0 Tainted: G W O 4.9.112-perf-g3dd9240-dirty #2
<4>[ 643.029765] Hardware name: Xiaomi Technologies, Inc. Perseus P3 v2.1 (DT)
<4>[ 643.029767] Workqueue: events uhid_device_add_worker
<4>[ 643.029768] Call trace:
<4>[ 643.029770] [<ffffff96a8c8a808>] dump_backtrace+0x0/0x2ac
<4>[ 643.029772] [<ffffff96a8c8ad34>] show_stack+0x14/0x1c
<4>[ 643.029774] [<ffffff96a905c02c>] dump_stack+0x94/0xb4
<4>[ 643.029775] [<ffffff96a982a9fc>] uhid_hid_open+0x6c/0x9c
<4>[ 643.029776] [<ffffff96a9823854>] hidinput_open+0x1c/0x24
<4>[ 643.029779] [<ffffff96a958c000>] input_open_device+0x80/0xac
<4>[ 643.029781] [<ffffff96a959163c>] input_leds_connect+0xd0/0x2ac
<4>[ 643.029782] [<ffffff96a958ca8c>] input_attach_handler+0x19c/0x1fc
<4>[ 643.029783] [<ffffff96a958cf24>] input_register_device+0x36c/0x4b4
<4>[ 643.029784] [<ffffff96a9823f50>] hidinput_connect+0x43c/0x3d18
<4>[ 643.029785] [<ffffff96a9822c98>] hid_connect+0x1f4/0x334
<4>[ 643.029786] [<ffffff96a98234bc>] hid_device_probe+0x110/0x140
<4>[ 643.029787] [<ffffff96a93891e4>] driver_probe_device+0x24c/0x430
<4>[ 643.029788] [<ffffff96a93895b0>] __device_attach_driver+0xd4/0x148
<4>[ 643.029789] [<ffffff96a9386ad8>] bus_for_each_drv+0x60/0xb0
<4>[ 643.029791] [<ffffff96a9388d84>] __device_attach+0x100/0x17c
<4>[ 643.029792] [<ffffff96a93897bc>] device_initial_probe+0x10/0x18
<4>[ 643.029793] [<ffffff96a93880e0>] bus_probe_device+0x90/0x98
<4>[ 643.029794] [<ffffff96a93858d0>] device_add+0x420/0x58c
<4>[ 643.029795] [<ffffff96a9823104>] hid_add_device+0xe0/0x2a0
<4>[ 643.029796] [<ffffff96a98297f0>] uhid_device_add_worker+0x1c/0x58
<4>[ 643.029798] [<ffffff96a8cc8d8c>] process_one_work+0x168/0x45c
<4>[ 643.029799] [<ffffff96a8cc90d4>] worker_thread+0x54/0x46c
<4>[ 643.029800] [<ffffff96a8ccf838>] kthread+0xe0/0xf4
<4>[ 643.029801] [<ffffff96a8c82ee0>] ret_from_fork+0x10/0x30
<4>[ 643.029802] weijuncheng ---------------------------------------------------->uhid_queue UHID_OPEN end
然后排查这一段逻辑:
<4>[ 643.029775] [<ffffff96a982a9fc>] uhid_hid_open+0x6c/0x9c
<4>[ 643.029776] [<ffffff96a9823854>] hidinput_open+0x1c/0x24
<4>[ 643.029779] [<ffffff96a958c000>] input_open_device+0x80/0xac
<4>[ 643.029781] [<ffffff96a959163c>] input_leds_connect+0xd0/0x2ac
<4>[ 643.029782] [<ffffff96a958ca8c>] input_attach_handler+0x19c/0x1fc
<4>[ 643.029783] [<ffffff96a958cf24>] input_register_device+0x36c/0x4b4
1804int hid_hw_open(struct hid_device *hdev)
1805{
1806 int ret;
1807
1808 ret = mutex_lock_killable(&hdev->ll_open_lock);
1809 if (ret)
1810 return ret;
1811
1812 if (!hdev->ll_open_count++) {
1813 ret = hdev->ll_driver->open(hdev);
1814 if (ret)
1815 hdev->ll_open_count--;
1816 }
1817
1818 mutex_unlock(&hdev->ll_open_lock);
1819 return ret;
1820}
1821EXPORT_SYMBOL_GPL(hid_hw_open);
注意
1812 if (!hdev->ll_open_count++) {
1813 ret = hdev->ll_driver->open(hdev);
而调用的driver->open里面
142static int uhid_hid_open(struct hid_device *hid)
143{
144 struct uhid_device *uhid = hid->driver_data;
145 int retval = 0;
146
147 if (!hid->ll_open_count++) {
148 retval = uhid_queue_event(uhid, UHID_OPEN);
149 if (retval)
150 hid->ll_open_count--;
151 }
152 return retval;
153}
147 if (!hid->ll_open_count++) {
148 retval = uhid_queue_event(uhid, UHID_OPEN);
hid->ll_open_count这个值加了两次,导致无法进入retval = uhid_queue_event(uhid, UHID_OPEN),UHID_OPEN这个event无法向上传递,最终没有调用到定义的callback,导致messagequeue被阻塞,up消息无法发送,case fail,本地修改hid_hw_open为:
int hid_hw_open(struct hid_device *hdev)
{
/*
int ret;
ret = mutex_lock_killable(&hdev->ll_open_lock);
if (ret)
return ret;
if (!hdev->ll_open_count++) {
ret = hdev->ll_driver->open(hdev);
if (ret)
hdev->ll_open_count--;
}
mutex_unlock(&hdev->ll_open_lock);
return ret;
*/
return hdev->ll_driver->open(hdev);
}
再编bootimage刷一下,就可以通过测试了
问题总结
这个问题是kernel更新之后里面的hid driver实现逻辑有问题造成的,中间回顾了Handler实现逻辑,用到了linux的epoll,workqueue机制(只有基本了解),有待后续进一步详细了解。
网友评论