美文网首页
CTS问题分析9

CTS问题分析9

作者: weiinter105 | 来源:发表于2018-11-17 18:04 被阅读0次

    CTS/GTS问题分析9

    一个由kernel层驱动错误导致的错误,查明的过程比较曲折,因此记录一下

    问题初探

    测试命令: run cts -m CtsHardwareTestCases -t android.hardware.input.cts.tests.GamepadTestCase#testButtonA

    报错如下:

    <Module name="CtsHardwareTestCases" abi="arm64-v8a" runtime="241587" done="true" pass="0">
      <TestCase name="android.hardware.input.cts.tests.GamepadTestCase">
        <Test result="fail" name="testButtonA">
          <Failure message="java.lang.AssertionError: expected:<1> but was:<0> ">
            <StackTrace>java.lang.AssertionError: expected:<1> but was:<0>
      at org.junit.Assert.fail(Assert.java:88)
      at org.junit.Assert.failNotEquals(Assert.java:834)
      at org.junit.Assert.assertEquals(Assert.java:645)
      at org.junit.Assert.assertEquals(Assert.java:631)
      at android.hardware.input.cts.tests.InputTestCase.assertReceivedKeyEvent(InputTestCase.java:145)
      at android.hardware.input.cts.tests.GamepadTestCase.testButtonA(GamepadTestCase.java:41)
    

    测试case code如下:

    32    @Test
    33    public void testButtonA() throws Exception {
    34        registerInputDevice(R.raw.gamepad_register_device);
    35
    36        sendHidCommands(R.raw.gamepad_button_a_down);
    37        sendHidCommands(R.raw.gamepad_delay);
    38        assertReceivedKeyEvent(KeyEvent.ACTION_DOWN, KeyEvent.KEYCODE_BUTTON_A);
    39
    40        sendHidCommands(R.raw.gamepad_button_a_up);
    41        assertReceivedKeyEvent(KeyEvent.ACTION_UP, KeyEvent.KEYCODE_BUTTON_A);
    42
    43        assertNoMoreEvents();
    44    }
    

    那么看出流程大致如下: 首先注册一个名为Odie (Test)的hid device,首先发送一个down事件,延迟10ms,检查测试界面是否接收到了down事件;然后再传递一个up事件,然后检查测试界面是否接收到了up事件

    case栈报错的原因是应该接收到up事件时接收到的还是down事件,因此case报错

    问题分析

    (a)分析过程1:
    首先直觉怀疑是不是case的时间设置问题导致发送的down事件被测试页面当作了长按,这个验证很简单,简单修改下case

    206    private class InputListener implements InputCallback {
    207        @Override
    208        public void onKeyEvent(KeyEvent ev) {
    209            boolean done = false;
    210            do {
    211                try {
    212                    //mKeys.put(new KeyEvent(ev));
                           //改为:
                           if(ev.getRepeatCount()==0){
                              mKeys.put(new KeyEvent(ev));
                           }
    213                    done = true;
    214                } catch (InterruptedException ignore) { }
    215            } while (!done);
    216        }
    
       结果发现case还是fail,报错变为不能得到up事件,因此这里就可以证明,cts的测试页面InputCtsActivity并没有接收到up事件,侧面证明下,我们在测试过之前打开/dev/input,执行getevent -ltr,发现:
    

    异常F1:

    add device 8: /dev/input/event7
      name:     "Odie (Test)"
    [   24251.158735] /dev/input/event7: EV_LED       LED_NUML             00000001           
    [   24251.158735] /dev/input/event7: EV_MSC       MSC_SCAN             00090001           
    [   24251.158735] /dev/input/event7: EV_KEY       BTN_GAMEPAD          DOWN               
    [   24251.158735] /dev/input/event7: EV_ABS       ABS_X                0000007f           
    [   24251.158735] /dev/input/event7: EV_ABS       ABS_Y                0000007f           
    [   24251.158735] /dev/input/event7: EV_ABS       ABS_Z                0000007f           
    [   24251.158735] /dev/input/event7: EV_ABS       ABS_RZ               0000007f           
    [   24251.158735] /dev/input/event7: EV_SYN       SYN_REPORT           00000000             rate 0
    case 卡住,强制结束case
    [   24416.424598] /dev/input/event7: EV_KEY       BTN_GAMEPAD          UP                 
    [   24416.424612] /dev/input/event7: EV_SYN       SYN_REPORT           00000001             rate 0
    remove device 8: /dev/input/event7
    

    正常F5:

    add device 8: /dev/input/event7
      name:     "Odie (Test)"
    [    9500.572331] /dev/input/event7: EV_LED       LED_NUML             00000001           
    [    9500.572331] /dev/input/event7: EV_MSC       MSC_SCAN             00090001         //hid码   
    [    9500.572331] /dev/input/event7: EV_KEY       BTN_GAMEPAD          DOWN               
    [    9500.572331] /dev/input/event7: EV_ABS       ABS_X                0000007f           
    [    9500.572331] /dev/input/event7: EV_ABS       ABS_Y                0000007f           
    [    9500.572331] /dev/input/event7: EV_ABS       ABS_Z                0000007f           
    [    9500.572331] /dev/input/event7: EV_ABS       ABS_RZ               0000007f           
    [    9500.572331] /dev/input/event7: EV_SYN       SYN_REPORT           00000000 (上报) //相当于一个按键结束      
    [    9500.582252] /dev/input/event7: EV_MSC       MSC_SCAN             00090001           
    [    9500.582252] /dev/input/event7: EV_KEY       BTN_GAMEPAD          UP                 
    [    9500.582252] /dev/input/event7: EV_SYN       SYN_REPORT           00000000              rate 100
    remove device 8: /dev/input/event7
    

    可见,up事件在case运行过程中并没有被正常接收到,直到case fail,测试页面InputCtsActivity pause才发出了up事件

      11-07 03:03:58.819 30370 30370 D LifecycleMonitor: Lifecycle status change: android.hardware.input.cts.InputCtsActivity@f112acb in: PAUSED
      11-07 03:03:58.820  4792 17838 D ActivityTrigger: activityResumeTrigger: The activity in ApplicationInfo{9d04809 com.android.launcher3} is now in focus and seems to be in full-screen mode
      11-07 03:03:58.820  4792 17838 E ActivityTrigger: activityResumeTrigger: not whiteListedcom.android.launcher3/com.android.launcher3.Launcher/28
      11-07 03:03:58.824 30370 30370 W ViewRootImpl[InputCtsActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_BUTTON_A, scanCode=304, metaState=0, flags=0xa8, repeatCount=0, eventTime=12083079, downTime=12082572, deviceId=14, source=0x501 }
    

    (b)分析过程2:
    上面说到up事件没有被正常接收到,首先看一下底层驱动是否有正常上传up事件,首先在input event从内核上传的必经之路input_event加一些log

    http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/kernel/msm-4.14/drivers/input/input.c#510
    
    void input_event(struct input_dev *dev,
             unsigned int type, unsigned int code, int value)
    {
        unsigned long flags;
        printk("weijuncheng input_event, dev = %s\n",dev->name);
        printk("weijuncheng input_event, type = %x\n",type);
        printk("weijuncheng input_event, code = %x\n",code);
        printk("weijuncheng input_event, value = %x\n",value);
        printk("---------------------------------------------------->a3");
        dump_stack();
        printk("---------------------------------------------------->a4");
        if (is_event_supported(type, dev->evbit, EV_MAX)) {
             
            spin_lock_irqsave(&dev->event_lock, flags);
            printk("---------------------------------------------------->a5");
            printk("weijuncheng input_event, dev = %s\n",dev->name);
            printk("weijuncheng input_event, type = %x\n",type);
            printk("weijuncheng input_event, code = %x\n",code);
            printk("weijuncheng input_event, value = %x\n",value);
            input_handle_event(dev, type, code, value);
            printk("---------------------------------------------------->a6");
            spin_unlock_irqrestore(&dev->event_lock, flags);
    #ifdef CONFIG_LAST_TOUCH_EVENTS
            touch_press_release_events_collect(dev, type, code, value);
    #endif
        }
    }
    EXPORT_SYMBOL(input_event);
    

    相关log:

    11-08 03:16:25.894  root     0     0 I         : ---------------------------------------------------->a3
    11-08 03:16:25.894  root     0     0 I         : CPU: 1 PID: 4803 Comm: HidDeviceHandle Tainted: G S              4.14.76-perf-g92d7574-dirty #3
    11-08 03:16:25.894  root     0     0 I Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 CEPHEUS (DT)
    11-08 03:16:25.894  root     0     0 I Call trace: 
    11-08 03:16:25.894  root     0     0 I         : dump_backtrace+0x0/0x188
    11-08 03:16:25.894  root     0     0 I         : show_stack+0x18/0x20
    11-08 03:16:25.894  root     0     0 I         : dump_stack+0xc4/0x100
    11-08 03:16:25.894  root     0     0 I         : input_event+0x78/0x268
    11-08 03:16:25.894  root     0     0 I         : hidinput_hid_event+0x498/0x530  //加工为input系统可识别的event
    11-08 03:16:25.894  root     0     0 I         : hid_process_event+0x12c/0x168
    11-08 03:16:25.894  root     0     0 I         : hid_report_raw_event+0x540/0x5d8
    11-08 03:16:25.894  root     0     0 I         : hid_input_report+0x178/0x1a8
    11-08 03:16:25.894  root     0     0 I         : uhid_char_write+0x390/0x5b8
    11-08 03:16:25.894  root     0     0 I         : __vfs_write+0x44/0x138   //从用户空间读取输入数据
    11-08 03:16:25.894  root     0     0 I         : vfs_write+0xc8/0x188
    11-08 03:16:25.894  root     0     0 I         : SyS_write+0x54/0xb8
    11-08 03:16:25.894  root     0     0 I         : el0_svc_naked+0x34/0x38
    11-08 03:16:25.894  root     0     0 I         : ---------------------------------------------------->a4
    11-08 03:16:25.894  root     0     0 I         : ---------------------------------------------------->a5
    11-08 03:16:25.894  root     0     0 I         : weijuncheng input_event, dev = Odie (Test)
    11-08 03:16:25.894  root     0     0 I         : weijuncheng input_event, type = 4
    11-08 03:16:25.894  root     0     0 I         : weijuncheng input_event, code = 4
    11-08 03:16:25.894  root     0     0 I         : weijuncheng input_event, value = 90001
    11-08 03:16:25.894  root     0     0 I         : ---------------------------------------------------->a6
    

    最后发现,并没有up input event上传的迹象,而down事件是正常上传的,那么传递event事件流程中出问题的概率就很小了,那么首先怀疑up事件是否就根本没有从用户空间传下去,那么先看传输流程

    case是通过hid command命令进行事件传输的,我们找到相关位置http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/frameworks/base/cmds/hid/src/com/android/commands/hid/Hid.java#46

    整理关键代码如下:

    93    private void process(Event e) {
    94        final int index = mDevices.indexOfKey(e.getId());
    95        if (index >= 0) {
    96            Device d = mDevices.valueAt(index);
    97            if (Event.COMMAND_DELAY.equals(e.getCommand())) {
    98                d.addDelay(e.getDuration());
    99            } else if (Event.COMMAND_REPORT.equals(e.getCommand())) {
    100                d.sendReport(e.getReport());
    101            } else {
    102                if (Event.COMMAND_REGISTER.equals(e.getCommand())) {
    103                    error("Device id=" + e.getId() + " is already registered. Ignoring event.");
    104                } else {
    105                    error("Unknown command \"" + e.getCommand() + "\". Ignoring event.");
    106                }
    107            }
    108        } else if (Event.COMMAND_REGISTER.equals(e.getCommand())) {
    109            registerDevice(e);
    110        } else {
    111            Log.e(TAG, "Unknown device id specified. Ignoring event.");
    112        }
    113    }
    

    frameworks/base/cmds/hid/src/com/android/commands/hid/Device.java

    72    public void sendReport(byte[] report) {
    73        Message msg = mHandler.obtainMessage(MSG_SEND_REPORT, report);
    74        // if two messages are sent at identical time, they will be processed in order received
    75        mHandler.sendMessageAtTime(msg, mTimeToSend);
    76    }
    77
    78    public void addDelay(int delay) {
    79        mTimeToSend = Math.max(SystemClock.uptimeMillis(), mTimeToSend) + delay;
    80    }
     
     
    100        @Override
    101        public void handleMessage(Message msg) {
    102            switch (msg.what) {
    103                case MSG_OPEN_DEVICE:
    104                    SomeArgs args = (SomeArgs) msg.obj;
    105                    mPtr = nativeOpenDevice((String) args.arg1, args.argi1, args.argi2, args.argi3,
    106                            (byte[]) args.arg2, new DeviceCallback());
    107                    pauseEvents();
    108                    break;
    109                case MSG_SEND_REPORT:
    110                    if (mPtr != 0) {
    111                        nativeSendReport(mPtr, (byte[]) msg.obj);
    112                    } else {
    113                        Log.e(TAG, "Tried to send report to closed device.");
    114                    }
    115                    break;
    116                case MSG_CLOSE_DEVICE:
                       ...
    128                default:
    129                    throw new IllegalArgumentException("Unknown device message");
    130            }
    131        }
    

    frameworks/base/cmds/hid/jni/com_android_commands_hid_Device.cpp

    240static JNINativeMethod sMethods[] = {
    241    { "nativeOpenDevice",
    242            "(Ljava/lang/String;III[B"
    243            "Lcom/android/commands/hid/Device$DeviceCallback;)J",
    244            reinterpret_cast<void*>(openDevice) },
    245    { "nativeSendReport", "(J[B)V", reinterpret_cast<void*>(sendReport) },
    246    { "nativeCloseDevice", "(J)V", reinterpret_cast<void*>(closeDevice) },
    247};
     
     
    222static void sendReport(JNIEnv* env, jclass /* clazz */, jlong ptr, jbyteArray rawReport) {
    223    size_t size;
    224    std::unique_ptr<uint8_t[]> report = getData(env, rawReport, size);
    225    uhid::Device* d = reinterpret_cast<uhid::Device*>(ptr);
    226    if (d) {
    227        d->sendReport(report.get(), size);
    228    } else {
    229        LOGE("Could not send report, Device* is null!");
    230    }
    231}
     
     
    159void Device::sendReport(uint8_t* report, size_t reportSize) {
    160    struct uhid_event ev;
    161    memset(&ev, 0, sizeof(ev));
    162    ev.type = UHID_INPUT2;
    163    ev.u.input2.size = reportSize;
    164    memcpy(&ev.u.input2.data, report, reportSize);
    165    ssize_t ret = TEMP_FAILURE_RETRY(::write(mFd, &ev, sizeof(ev))); //在这里将相应event写到内核中
    166    if (ret < 0 || ret != sizeof(ev)) {
    167        LOGE("Failed to send hid event: %s", strerror(errno));
    168    }
    169}
    
    通过加log,发现up事件的handleMessage没有如期处理,那么up事件就没有被向下传递,问题的分析又被转回到上层,下面就要分析handleMessage没有被处理的原因
    

    (c)分析过程3:
    首先如果要继续分析,需要对Handler有个大致的了解,简单来说就是一个线程向另一个持有looper的线程发送消息,持有looper的线程通过loop()循环选取Message进行处理

    那么只有两种可能: 1.消息没入队 2.入队没处理

    首先看消息没入队的情况,本例中是通过sendMessageAtTime来向队列中投递消息的,大体流程

    frameworks/base/core/java/android/os/Handler.java

    689    public boolean sendMessageAtTime(Message msg, long uptimeMillis) {
    690        MessageQueue queue = mQueue;
    691        if (queue == null) {
    692            RuntimeException e = new RuntimeException(
    693                    this + " sendMessageAtTime() called with no mQueue");
    694            Log.w("Looper", e.getMessage(), e);
    695            return false;
    696        }
    697        return enqueueMessage(queue, msg, uptimeMillis);
    698    }
     
     
    740    private boolean enqueueMessage(MessageQueue queue, Message msg, long uptimeMillis) {
    741        msg.target = this;
    742        if (mAsynchronous) {
    743            msg.setAsynchronous(true);
    744        }
    745        return queue.enqueueMessage(msg, uptimeMillis);
    746    }
    

    frameworks/base/core/java/android/os/MessageQueue.java

        boolean enqueueMessage(Message msg, long when) {
    537        if (msg.target == null) {
    538            throw new IllegalArgumentException("Message must have a target.");
    539        }
    540        if (msg.isInUse()) {
    541            throw new IllegalStateException(msg + " This message is already in use.");
    542        }
    543
    544        synchronized (this) {
    545            if (mQuitting) {
    546                IllegalStateException e = new IllegalStateException(
    547                        msg.target + " sending message to a Handler on a dead thread");
    548                Log.w(TAG, e.getMessage(), e);
    549                msg.recycle();
    550                return false;
    551            }
    552
    553            msg.markInUse();
    554            msg.when = when;
    555            Message p = mMessages;
    556            boolean needWake;
    557            if (p == null || when == 0 || when < p.when) {
    558                // New head, wake up the event queue if blocked.
    559                msg.next = p;
    560                mMessages = msg;
    561                needWake = mBlocked; //如果入队前队列为空,或者入队Message为对头,需要唤醒queue
    562            } else {
    563                // Inserted within the middle of the queue.  Usually we don't have to wake
    564                // up the event queue unless there is a barrier at the head of the queue
    565                // and the message is the earliest asynchronous message in the queue.
    566                needWake = mBlocked && p.target == null && msg.isAsynchronous(); //如果当前队列中已经有其他消息,只有特殊情况下才需要wake queue (异步message)
    567                Message prev;
    568                for (;;) {
    569                    prev = p;
    570                    p = p.next;
    571                    if (p == null || when < p.when) {
    572                        break;
    573                    }
    574                    if (needWake && p.isAsynchronous()) {
    575                        needWake = false;
    576                    }
    577                }
    578                msg.next = p; // invariant: p == prev.next
    579                prev.next = msg; //插入到适当位置
    580            }
    581
    582            // We can assume mPtr != 0 because mQuitting is false.
    583            if (needWake) {
    584                nativeWake(mPtr); //唤醒队列
    585            }
    586        }
    587        return true;
    588    }
    

    我们注意到enqueueMessage是会返回是否入队成功的,那么问题就很简单了,在case里把是否入队成功打出来就行了,结果发现入队是成功的;那么就剩下第二种可能,消息虽然在队中,但没有被取出来执行,这里的流程在http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/frameworks/base/core/java/android/os/Looper.java#137

    137    public static void loop() {
    138        final Looper me = myLooper();
              ...
    142        final MessageQueue queue = me.mQueue;
    143
               ...
    158
    159        for (;;) {
    160            Message msg = queue.next(); // might block
                   ...
    192            try {
    193                msg.target.dispatchMessage(msg);
    194                dispatchEnd = needEndTime ? SystemClock.uptimeMillis() : 0;
    195            } finally {
    196                if (traceTag != 0) {
    197                    Trace.traceEnd(traceTag);
    198                }
    199            }
                   ...
    233            msg.recycleUnchecked();
    234        }
    235    }
    

    那么肯定是卡在Message msg = queue.next(); // might block ,这也解释了为什么测试报错之后会卡住不生成log的原因,直到触发了CTS框架的整体超时机制,那么下面看一下MessageQueue.next()的操作

    Message next() {
    311        // Return here if the message loop has already quit and been disposed.
    312        // This can happen if the application tries to restart a looper after quit
    313        // which is not supported.
    314        final long ptr = mPtr;
    315        if (ptr == 0) {
    316            return null;
    317        }
    318
    319        int pendingIdleHandlerCount = -1; // -1 only during first iteration
    320        int nextPollTimeoutMillis = 0;
    321        for (;;) {
    322            if (nextPollTimeoutMillis != 0) {
    323                Binder.flushPendingCommands();
    324            }
    325
    326            nativePollOnce(ptr, nextPollTimeoutMillis); //nextPollTimeoutMillis 队列为空时一直等待,直到被唤醒(下一个Message入队)
    327
    328            synchronized (this) {
    329                // Try to retrieve the next message.  Return if found.
    330                final long now = SystemClock.uptimeMillis();
    331                Message prevMsg = null;
    332                Message msg = mMessages;
    333                if (msg != null && msg.target == null) {
    334                    // Stalled by a barrier.  Find the next asynchronous message in the queue.
    335                    do {
    336                        prevMsg = msg;
    337                        msg = msg.next;
    338                    } while (msg != null && !msg.isAsynchronous());
    339                }
    340                if (msg != null) {
    341                    if (now < msg.when) {
    342                        // Next message is not ready.  Set a timeout to wake up when it is ready.
    343                        nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE); //队列中有Message了,但最快需要执行的Message还没到时间
    344                    } else {
    345                        // Got a message.
    346                        mBlocked = false;
    347                        if (prevMsg != null) {
    348                            prevMsg.next = msg.next;
    349                        } else {
    350                            mMessages = msg.next;
    351                        }
    352                        msg.next = null;
    353                        if (DEBUG) Log.v(TAG, "Returning message: " + msg);
    354                        msg.markInUse();
    355                        return msg; //到了时间后,取出Message,并将queue的状态改为blocked = false;
    356                    }
    357                } else {
    358                    // No more messages.
    359                    nextPollTimeoutMillis = -1; //队列为空时一直阻塞,直到被唤醒
    360                }
    361
    362                // Process the quit message now that all pending messages have been handled.
    363                if (mQuitting) {
    364                    dispose();
    365                    return null;
    366                }
    367
    368                // If first time idle, then get the number of idlers to run.
    369                // Idle handles only run if the queue is empty or if the first message
    370                // in the queue (possibly a barrier) is due to be handled in the future.
    371                if (pendingIdleHandlerCount < 0
    372                        && (mMessages == null || now < mMessages.when)) {
    373                    pendingIdleHandlerCount = mIdleHandlers.size();
    374                }
    375                if (pendingIdleHandlerCount <= 0) {
    376                    // No idle handlers to run.  Loop and wait some more.
    377                    mBlocked = true; //除非顺利返回了Message,否则 nextPollTimeoutMillis >0或-1,分别代表首个Message没到时间或队列为空,此时队列阻塞,并进入下一个等待                             循环
    378                    continue;
    379                }
    380
    381                if (mPendingIdleHandlers == null) {
    382                    mPendingIdleHandlers = new IdleHandler[Math.max(pendingIdleHandlerCount, 4)];
    383                }
    384                mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers);
    385            }
    386
    387            // Run the idle handlers.
    388            // We only ever reach this code block during the first iteration.
    389            for (int i = 0; i < pendingIdleHandlerCount; i++) {
    390                final IdleHandler idler = mPendingIdleHandlers[i];
    391                mPendingIdleHandlers[i] = null; // release the reference to the handler
    392
    393                boolean keep = false;
    394                try {
    395                    keep = idler.queueIdle();
    396                } catch (Throwable t) {
    397                    Log.wtf(TAG, "IdleHandler threw exception", t);
    398                }
    399
    400                if (!keep) {
    401                    synchronized (this) {
    402                        mIdleHandlers.remove(idler);
    403                    }
    404                }
    405            }
    406
    407            // Reset the idle handler count to 0 so we do not run them again.
    408            pendingIdleHandlerCount = 0;
    409
    410            // While calling an idle handler, a new message could have been delivered
    411            // so go back and look again for a pending message without waiting.
    412            nextPollTimeoutMillis = 0;
    413        }
    414    }
    

    因为case卡住了,那么很方便我们抓trace,先判断卡在哪里,发现果然卡在nativePollOnce

    "main" prio=10 tid=1 Waiting
      | group="main" sCount=1 dsCount=0 flags=1 obj=0x12c01ee8 self=0x7758e14c00
      | sysTid=11568 nice=-8 cgrp=default sched=0/0 handle=0x77dec83548
      | state=S schedstat=( 171782606 895523 56 ) utm=11 stm=5 core=7 HZ=100
      | stack=0x7fde185000-0x7fde187000 stackSize=8MB
      | held mutexes=
      at java.lang.Object.wait(Native method)
      - waiting on <0x0d54e4d2> (a java.lang.Object)
      at com.android.commands.hid.Device.close(Device.java:107)
      - locked <0x0d54e4d2> (a java.lang.Object)
      at com.android.commands.hid.Hid.run(Hid.java:90)
      at com.android.commands.hid.Hid.main(Hid.java:60)
      at com.android.internal.os.RuntimeInit.nativeFinishInit(Native method)
      at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:388)
     
     
     
     
    "HidDeviceHandler" prio=10 tid=10 Native
      | group="main" sCount=1 dsCount=0 flags=1 obj=0x12c31a98 self=0x7758f8f000
      | sysTid=11583 nice=0 cgrp=default sched=0/0 handle=0x7751a484f0
      | state=S schedstat=( 8710574 6102134 7 ) utm=0 stm=0 core=4 HZ=100
      | stack=0x7751945000-0x7751947000 stackSize=1041KB
      | held mutexes=
      kernel: (couldn't read /proc/self/task/11583/stack)
      native: #00 pc 000000000007aacc  /system/lib64/libc.so (__epoll_pwait+8)
      native: #01 pc 0000000000014d48  /system/lib64/libutils.so (android::Looper::pollInner(int)+148)
      native: #02 pc 0000000000014c18  /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60)
      native: #03 pc 00000000001271b8  /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
      at android.os.MessageQueue.nativePollOnce(Native method)
      at android.os.MessageQueue.next(MessageQueue.java:326)
      at android.os.Looper.loop(Looper.java:162)
      at android.os.HandlerThread.run(HandlerThread.java:65)
    

    队列为空时timeout为-1,会一直等待,但是等到下一个Message入队时就应该被激活啊,而我们前面已经验证了up Message是入队的,因此这里就矛盾了,所以肯定还有什么我们没考虑的地方

    先看下nativePollOnce的等待流程

    frameworks/base/core/jni/android_os_MessageQueue.cpp

    212static const JNINativeMethod gMessageQueueMethods[] = {
    213    /* name, signature, funcPtr */
           ...
    216    { "nativePollOnce", "(JI)V", (void*)android_os_MessageQueue_nativePollOnce },
    217    { "nativeWake", "(J)V", (void*)android_os_MessageQueue_nativeWake },
           ...
    221};
     
     
    188static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj,
    189        jlong ptr, jint timeoutMillis) {
    190    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    191    nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
    192}
     
     
    107void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
    108    mPollEnv = env;
    109    mPollObj = pollObj;
    110    mLooper->pollOnce(timeoutMillis);
    111    mPollObj = NULL;
    112    mPollEnv = NULL;
        ...
    119}
    

    /system/core/libutils/Looper.cpp

    178int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
    179    int result = 0;
    180    for (;;) {
    181        while (mResponseIndex < mResponses.size()) {
    182            const Response& response = mResponses.itemAt(mResponseIndex++);
    183            int ident = response.request.ident;
    184            if (ident >= 0) {
    185                int fd = response.request.fd;
    186                int events = response.events;
    187                void* data = response.request.data;
    188#if DEBUG_POLL_AND_WAKE
    189                ALOGD("%p ~ pollOnce - returning signalled identifier %d: "
    190                        "fd=%d, events=0x%x, data=%p",
    191                        this, ident, fd, events, data);
    192#endif
    193                if (outFd != NULL) *outFd = fd;
    194                if (outEvents != NULL) *outEvents = events;
    195                if (outData != NULL) *outData = data;
    196                return ident;  //针对回调函数为空的情况,ident值必为一个大于等于0的值
    197            }
    198        }
    199
    200        if (result != 0) {
    201#if DEBUG_POLL_AND_WAKE
    202            ALOGD("%p ~ pollOnce - returning result %d", this, result);
    203#endif
    204            if (outFd != NULL) *outFd = 0;
    205            if (outEvents != NULL) *outEvents = 0;
    206            if (outData != NULL) *outData = NULL;
    207            return result; //根据pollInner的结果进行处理,实际上是针对设置了回调函数的情况,因为设置了回调函数,所以已经对发生的事件做了处理了,所以,不需要将发生事件的相关信息再返回                给调用者了
    208        }
    209
    210        result = pollInner(timeoutMillis);
    211    }
    212}
    
    214int Looper::pollInner(int timeoutMillis) {
    215#if DEBUG_POLL_AND_WAKE
    216    ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis);
    217#endif
    218
    219    // Adjust the timeout based on when the next message is due.
    220    if (timeoutMillis != 0 && mNextMessageUptime != LLONG_MAX) {
    221        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
    222        int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
    223        if (messageTimeoutMillis >= 0
    224                && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) {
    225            timeoutMillis = messageTimeoutMillis; //重新计算超时时间
    226        }
    227#if DEBUG_POLL_AND_WAKE
    228        ALOGD("%p ~ pollOnce - next message in %" PRId64 "ns, adjusted timeout: timeoutMillis=%d",
    229                this, mNextMessageUptime - now, timeoutMillis);
    230#endif
    231    }
    232
    233    // Poll.
    234    int result = POLL_WAKE;
    235    mResponses.clear();
    236    mResponseIndex = 0;
    237
    238    // We are about to idle.
    239    mPolling = true;
    240
    241    struct epoll_event eventItems[EPOLL_MAX_EVENTS];
    242    int eventCount = epoll_wait(mEpollFd, eventItems, EPOLL_MAX_EVENTS, timeoutMillis);
    243
    244    // No longer idling.
    245    mPolling = false;
    246
    247    // Acquire lock.
    248    mLock.lock();
    249
    250    // Rebuild epoll set if needed.
    251    if (mEpollRebuildRequired) {
    252        mEpollRebuildRequired = false;
    253        rebuildEpollLocked();
    254        goto Done;
    255    }
    256
    257    // Check for poll error.
    258    if (eventCount < 0) { //返回值小于零,表示发生错误
    259        if (errno == EINTR) {
    260            goto Done;
    261        }
    262        ALOGW("Poll failed with an unexpected error: %s", strerror(errno));
    263        result = POLL_ERROR;
    264        goto Done;
    265    }
    266
    267    // Check for poll timeout.
    268    if (eventCount == 0) { //eventCount为零,表示发生超时,因此直接跳转到Done
    269#if DEBUG_POLL_AND_WAKE
    270        ALOGD("%p ~ pollOnce - timeout", this);
    271#endif
    272        result = POLL_TIMEOUT;
    273        goto Done;
    274    }
    275
    276    // Handle all events.
    277#if DEBUG_POLL_AND_WAKE
    278    ALOGD("%p ~ pollOnce - handling events from %d fds", this, eventCount);
    279#endif
    280
    281    for (int i = 0; i < eventCount; i++) {
    282        int fd = eventItems[i].data.fd;
    283        uint32_t epollEvents = eventItems[i].events;
    284        if (fd == mWakeEventFd) {
    285            if (epollEvents & EPOLLIN) {
                      //通过wake唤醒队列
    286                awoken();
    287            } else {
    288                ALOGW("Ignoring unexpected epoll events 0x%x on wake event fd.", epollEvents);
    289            }
    290        } else {
    291            ssize_t requestIndex = mRequests.indexOfKey(fd);
    292            if (requestIndex >= 0) {
    293                int events = 0;
    294                if (epollEvents & EPOLLIN) events |= EVENT_INPUT;
    295                if (epollEvents & EPOLLOUT) events |= EVENT_OUTPUT;
    296                if (epollEvents & EPOLLERR) events |= EVENT_ERROR;
    297                if (epollEvents & EPOLLHUP) events |= EVENT_HANGUP;
    298                pushResponse(events, mRequests.valueAt(requestIndex)); //如果是其他FD发生事件,则根据Request构造Response,并push到Response数组
                                  //当某个监控fd上发生事件后,就会把对应的Request取出来调用。
    299            } else {
    300                ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is "
    301                        "no longer registered.", epollEvents, fd);
    302            }
    303        }
    304    }
    305Done: ;
    306
    307    // Invoke pending message callbacks.
    308    mNextMessageUptime = LLONG_MAX;
    309    while (mMessageEnvelopes.size() != 0) {
    310        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
    311        const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0);
    312        if (messageEnvelope.uptime <= now) {
    313            // Remove the envelope from the list.
    314            // We keep a strong reference to the handler until the call to handleMessage
    315            // finishes.  Then we drop it so that the handler can be deleted *before*
    316            // we reacquire our lock.
    317            { // obtain handler
    318                sp<MessageHandler> handler = messageEnvelope.handler;
    319                Message message = messageEnvelope.message;
    320                mMessageEnvelopes.removeAt(0);
    321                mSendingMessage = true;
    322                mLock.unlock();
    323
    324#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
    325                ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d",
    326                        this, handler.get(), message.what);
    327#endif
    328                handler->handleMessage(message);
    329            } // release handler
    330
    331            mLock.lock();
    332            mSendingMessage = false;
    333            result = POLL_CALLBACK;
    334        } else {
    335            // The last message left at the head of the queue determines the next wakeup time.
    336            mNextMessageUptime = messageEnvelope.uptime;
    337            break;
    338        }
    339    }
    340
    341    // Release lock.
    342    mLock.unlock();
    343
    344    // Invoke all response callbacks.
    345    for (size_t i = 0; i < mResponses.size(); i++) {
    346        Response& response = mResponses.editItemAt(i);
    347        if (response.request.ident == POLL_CALLBACK) {
    348            int fd = response.request.fd;
    349            int events = response.events;
    350            void* data = response.request.data;
    351#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
    352            ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
    353                    this, response.request.callback.get(), fd, events, data);
    354#endif
    355            // Invoke the callback.  Note that the file descriptor may be closed by
    356            // the callback (and potentially even reused) before the function returns so
    357            // we need to be a little careful when removing the file descriptor afterwards.
    358            int callbackResult = response.request.callback->handleEvent(fd, events, data);//回调其他fd定义的callback
    359            if (callbackResult == 0) {
    360                removeFd(fd, response.request.seq);
    361            }
    362
    363            // Clear the callback reference in the response structure promptly because we
    364            // will not clear the response vector itself until the next poll
    365            response.request.callback.clear();
    366            result = POLL_CALLBACK;
    367        }
    368    }
    369    return result;
    370}
    

    Looper初始化时会初始化一个fd,**mWakeEventFd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); **

    前面说道,当一个队列中没有Message时,会陷入无限等待,当新消息入队时,我们将MessageQueue重新唤醒:

    enqueueMessage中

    583            if (needWake) {
    584                nativeWake(mPtr);
    585            }
    
    194static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
    195    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    196    nativeMessageQueue->wake();
    197}
     
     
    121void NativeMessageQueue::wake() {
    122    mLooper->wake();
    123}
    
    398void Looper::wake() {
    399#if DEBUG_POLL_AND_WAKE
    400    ALOGD("%p ~ wake", this);
    401#endif
    402
    403    uint64_t inc = 1;
    404    ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd, &inc, sizeof(uint64_t)));
    405    if (nWrite != sizeof(uint64_t)) {
    406        if (errno != EAGAIN) {
    407            LOG_ALWAYS_FATAL("Could not write wake signal to fd %d: %s",
    408                    mWakeEventFd, strerror(errno));
    409        }
    410    }
    411}
    

    通过epoll机制自动唤醒mWakeEventFd,那么没有被唤醒说明needWake为false,如果MessageQueue为空,这个值只能根据当前是否block来决定,那么必然会wake,只能怀疑Messagequeue不为空

    在根据这里,强烈怀疑设置了一个屏障Message导致queue没有被wakeup

    needWake = mBlocked && p.target == null && msg.isAsynchronous(); //如果当前队列中已经有其他消息,只有特殊情况下才需要wake queue (异步message)
    

    再看case:

    100        @Override
    101        public void handleMessage(Message msg) {
    102            switch (msg.what) {
    103                case MSG_OPEN_DEVICE:
    104                    SomeArgs args = (SomeArgs) msg.obj;
    105                    mPtr = nativeOpenDevice((String) args.arg1, args.argi1, args.argi2, args.argi3,
    106                            (byte[]) args.arg2, new DeviceCallback());
    107                    pauseEvents();
     
     
    143    private class DeviceCallback {
    144        public void onDeviceOpen() {
    145            mHandler.resumeEvents();
    146        }
    

    处理Message (MSG_OPEN_DEVICE)设置了屏障,在DeviceCallback中的onDeviceOpen被消除,如果这个没有回调,就会发生这个up消息入队却无法执行的现象,case中加log发现果然没有被回调。那么问题又回到底层,需要找出Callback没有被回调的原因。

    (d)分析过程4:
    首先看这个callback被调用的位置

    171int Device::handleEvents(int events) {
    172    if (events & (ALOOPER_EVENT_ERROR | ALOOPER_EVENT_HANGUP)) {
    173        LOGE("uhid node was closed or an error occurred. events=0x%x", events);
    174        mDeviceCallback->onDeviceError();
    175        return 0;
    176    }
    177    struct uhid_event ev;
    178    ssize_t ret = TEMP_FAILURE_RETRY(::read(mFd, &ev, sizeof(ev)));
    179    if (ret < 0) {
    180        LOGE("Failed to read from uhid node: %s", strerror(errno));
    181        mDeviceCallback->onDeviceError();
    182        return 0;
    183    }
    184
    185    if (ev.type == UHID_OPEN) {
    186        mDeviceCallback->onDeviceOpen();
    187    }
    188
    189    return 1;
    190}
    52static int handleLooperEvents(int /* fd */, int events, void* data) {
    53    Device* d = reinterpret_cast<Device*>(data);
    54    return d->handleEvents(events);
    55}
    133Device::Device(int32_t id, int fd, std::unique_ptr<DeviceCallback> callback) :
    134            mId(id), mFd(fd), mDeviceCallback(std::move(callback)) {
    135    ALooper* aLooper = ALooper_forThread();
    136    if (aLooper == NULL) {
    137        LOGE("Could not get ALooper, ALooper_forThread returned NULL");
    138        aLooper = ALooper_prepare(ALOOPER_PREPARE_ALLOW_NON_CALLBACKS);
    139    }
    140    ALooper_addFd(aLooper, fd, 0, ALOOPER_EVENT_INPUT, handleLooperEvents,
    141                  reinterpret_cast<void*>(this));
    142}
    

    在这里add_Fd,然后在pollInner中进行处理

    345    for (size_t i = 0; i < mResponses.size(); i++) {
    346        Response& response = mResponses.editItemAt(i);
    347        if (response.request.ident == POLL_CALLBACK) {
    348            int fd = response.request.fd;
    349            int events = response.events;
    350            void* data = response.request.data;
    351#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
    352            ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
    353                    this, response.request.callback.get(), fd, events, data);
    354#endif
    355            // Invoke the callback.  Note that the file descriptor may be closed by
    356            // the callback (and potentially even reused) before the function returns so
    357            // we need to be a little careful when removing the file descriptor afterwards.
                   //在这里进行callback的回调
    358            int callbackResult = response.request.callback->handleEvent(fd, events, data);
    359            if (callbackResult == 0) {
    360                removeFd(fd, response.request.seq);
    361            }
    362
    363            // Clear the callback reference in the response structure promptly because we
    364            // will not clear the response vector itself until the next poll.
    365            response.request.callback.clear();
    366            result = POLL_CALLBACK;
    367        }
    368    }
    369    return result;
    370}
    

    可见onDeviceOpen这个callback的回调与这个event UHID_OPEN息息相关,可见是没有从内核上传UHID_OPEN上来,我们看一下uhid.txt中的说明:

    47The first thing you should do is sending an UHID_CREATE2 event. This will
    48register the device. UHID will respond with an UHID_START event. You can now
    49start sending data to and reading data from UHID. However, unless UHID sends the
    50UHID_OPEN event, the internally attached HID Device Driver has no user attached.
    51That is, you might put your device asleep unless you receive the UHID_OPEN
    52event. If you receive the UHID_OPEN event, you should start I/O. If the last
    53user closes the HID device, you will receive an UHID_CLOSE event. This may be
    54followed by an UHID_OPEN event again and so on. There is no need to perform
    55reference-counting in user-space. That is, you will never receive multiple
    56UHID_OPEN events without an UHID_CLOSE event. The HID subsystem performs
    57ref-counting for you.
    58You may decide to ignore UHID_OPEN/UHID_CLOSE, though. I/O is allowed even
    59though the device may have no users.
    

    The first thing you should do is sending an UHID_CREATE2 event. This will register the device. UHID will respond with an UHID_START event.

    从用户态发送UHID_CREATE2到内核态(UHID),然后内核态(UHID)返回UHID_START和UHID_OPEN到用户态

    然后通过添加log,大致理清楚了流程

    首先在http://guard.pt.miui.com/opengrok/xref/v10-p-dipper-dev/kernel/msm-4.9/drivers/hid/uhid.c#642中初始化工作队列:

    623static int uhid_char_open(struct inode *inode, struct file *file)
    624{ 625 struct uhid_device *uhid;
    626
    627 uhid = kzalloc(sizeof(*uhid), GFP_KERNEL);
    628 if (!uhid)
    629 return -ENOMEM;
    630
    631 mutex_init(&uhid->devlock);
    632 mutex_init(&uhid->report_lock);
    633 spin_lock_init(&uhid->qlock);
    634 init_waitqueue_head(&uhid->waitq);
    635 init_waitqueue_head(&uhid->report_wait); 636 uhid->running = false;
    637 INIT_WORK(&uhid->worker, uhid_device_add_worker);
    638
    639 file->private_data = uhid;
    640 nonseekable_open(inode, file);
    641
    642 return 0;
    643}
     
     
    11-15 01:51:18.455 root 0 0 I : uhid_char_open+0xb4/0x110
    11-15 01:51:18.455 root 0 0 I : misc_open+0x12c/0x168
    11-15 01:51:18.455 root 0 0 I : chrdev_open+0x12c/0x200
    11-15 01:51:18.455 root 0 0 I : do_dentry_open+0x12c/0x2e8
    11-15 01:51:18.455 root 0 0 I : vfs_open+0x54/0x88
    11-15 01:51:18.455 root 0 0 I : path_openat+0x91c/0x14d0
    11-15 01:51:18.455 root 0 0 I : do_filp_open+0x74/0x120
    11-15 01:51:18.455 root 0 0 I : do_sys_open+0x134/0x248
    11-15 01:51:18.455 root 0 0 I : SyS_openat+0x10/0x18
    11-15 01:51:18.455 root 0 0 I : el0_svc_naked+0x34/0x38
    

    然后从用户态向内核态发送event UHID_CREATE2

    92Device* Device::open(int32_t id, const char* name, int32_t vid, int32_t pid,
    93        std::unique_ptr<uint8_t[]> descriptor, size_t descriptorSize,
    94        std::unique_ptr<DeviceCallback> callback) {
    95
    96    int fd = ::open(UHID_PATH, O_RDWR | O_CLOEXEC);
    97    if (fd < 0) {
    98        LOGE("Failed to open uhid: %s", strerror(errno));
    99        return nullptr;
    100    }
    101
    102    struct uhid_event ev;
    103    memset(&ev, 0, sizeof(ev));
    104    ev.type = UHID_CREATE2;
    105    strncpy((char*)ev.u.create2.name, name, UHID_MAX_NAME_LENGTH);
    106    memcpy(&ev.u.create2.rd_data, descriptor.get(),
    107            descriptorSize * sizeof(ev.u.create2.rd_data[0]));
    108    ev.u.create2.rd_size = descriptorSize;
    109    ev.u.create2.bus = BUS_BLUETOOTH;
    110    ev.u.create2.vendor = vid;
    111    ev.u.create2.product = pid;
    112    ev.u.create2.version = 0;
    113    ev.u.create2.country = 0;
    114
    115    errno = 0;
    116    ssize_t ret = TEMP_FAILURE_RETRY(::write(fd, &ev, sizeof(ev))); //向内核态写
    117    if (ret < 0 || ret != sizeof(ev)) {
    118        ::close(fd);
    119        LOGE("Failed to create uhid node: %s", strerror(errno));
    120        return nullptr;
    121    }
    122
    123    // Wait for the device to actually be created.
    124    ret = TEMP_FAILURE_RETRY(::read(fd, &ev, sizeof(ev))); //等待内核态的返回结果
    125    if (ret < 0 || ev.type != UHID_START) {
    126        ::close(fd);
    127        LOGE("uhid node failed to start: %s", strerror(errno));
    128        return nullptr;
    129    }
    130    return new Device(id, fd, std::move(callback));
    131}
    

    然后内核态监听到这个event后,开始uhid_dev_create2创建操作

    734 case UHID_CREATE2:
    735     ret = uhid_dev_create2(uhid, &uhid->input_buf);
    736     break;
    478static int uhid_dev_create2(struct uhid_device *uhid,
    479             const struct uhid_event *ev)
    480{
    481 struct hid_device *hid;
    482 size_t rd_size, len;
    483 void *rd_data;
    484 int ret;
    485
    486 if (uhid->running)
    487     return -EALREADY;
    488
    489 rd_size = ev->u.create2.rd_size;
    490 if (rd_size <= 0 || rd_size > HID_MAX_DESCRIPTOR_SIZE)
    491     return -EINVAL;
    492
    493 rd_data = kmemdup(ev->u.create2.rd_data, rd_size, GFP_KERNEL);
    494 if (!rd_data)
    495     return -ENOMEM;
    496
    497 uhid->rd_size = rd_size;
    498 uhid->rd_data = rd_data;
    499
    500 hid = hid_allocate_device();
    501 if (IS_ERR(hid)) {
    502     ret = PTR_ERR(hid);
    503     goto err_free;
    504 }
    505
    506 len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1;
    507 strncpy(hid->name, ev->u.create2.name, len);
    508 len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1;
    509 strncpy(hid->phys, ev->u.create2.phys, len);
    510 len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1;
    511 strncpy(hid->uniq, ev->u.create2.uniq, len);
    512
    513 hid->ll_driver = &uhid_hid_driver;
    514 hid->bus = ev->u.create2.bus;
    515 hid->vendor = ev->u.create2.vendor;
    516 hid->product = ev->u.create2.product;
    517 hid->version = ev->u.create2.version;
    518 hid->country = ev->u.create2.country;
    519 hid->driver_data = uhid;
    520 hid->dev.parent = uhid_misc.this_device;
    521
    522 uhid->hid = hid;
    523 uhid->running = true;
    524
    525 /* Adding of a HID device is done through a worker, to allow HID drivers
    526  * which use feature requests during .probe to work, without they would
    527  * be blocked on devlock, which is held by uhid_char_write.
    528  */
    529 schedule_work(&uhid->worker);
    530
    531 return 0;
    532
    533err_free:
    534 kfree(uhid->rd_data);
    535 uhid->rd_data = NULL;
    536 uhid->rd_size = 0;
    537 return ret;
    538}
    

    注意这里schedule_work(&uhid->worker); 将工作结构变量添加到系统共享队列,等待执行,其工作处理函数为uhid_device_add_worker,在其中发送UHID_START和UHID_OPEN,大体流程如下:

    <4>[  643.029505] weijuncheng ---------------------------------------------------->uhid_queue UHID_START start
    <4>[  643.029510] CPU: 4 PID: 7339 Comm: kworker/4:0 Tainted: G        W  O    4.9.112-perf-g3dd9240-dirty #2
    <4>[  643.029511] Hardware name: Xiaomi Technologies, Inc. Perseus P3 v2.1 (DT)
    <4>[  643.029519] Workqueue: events uhid_device_add_worker
    <4>[  643.029522] Call trace:
    <4>[  643.029527] [<ffffff96a8c8a808>] dump_backtrace+0x0/0x2ac
    <4>[  643.029529] [<ffffff96a8c8ad34>] show_stack+0x14/0x1c
    <4>[  643.029533] [<ffffff96a905c02c>] dump_stack+0x94/0xb4
    <4>[  643.029534] [<ffffff96a982a198>] uhid_hid_start+0xb8/0xe4
    <4>[  643.029536] [<ffffff96a98234a8>] hid_device_probe+0xfc/0x140
    <4>[  643.029539] [<ffffff96a93891e4>] driver_probe_device+0x24c/0x430
    <4>[  643.029541] [<ffffff96a93895b0>] __device_attach_driver+0xd4/0x148
    <4>[  643.029542] [<ffffff96a9386ad8>] bus_for_each_drv+0x60/0xb0
    <4>[  643.029543] [<ffffff96a9388d84>] __device_attach+0x100/0x17c
    <4>[  643.029544] [<ffffff96a93897bc>] device_initial_probe+0x10/0x18
    <4>[  643.029545] [<ffffff96a93880e0>] bus_probe_device+0x90/0x98
    <4>[  643.029547] [<ffffff96a93858d0>] device_add+0x420/0x58c
    <4>[  643.029548] [<ffffff96a9823104>] hid_add_device+0xe0/0x2a0
    <4>[  643.029550] [<ffffff96a98297f0>] uhid_device_add_worker+0x1c/0x58
    <4>[  643.029552] [<ffffff96a8cc8d8c>] process_one_work+0x168/0x45c
    <4>[  643.029553] [<ffffff96a8cc90d4>] worker_thread+0x54/0x46c
    <4>[  643.029555] [<ffffff96a8ccf838>] kthread+0xe0/0xf4
    <4>[  643.029556] [<ffffff96a8c82ee0>] ret_from_fork+0x10/0x30
    <4>[  643.029558] weijuncheng ---------------------------------------------------->uhid_queue UHID_START end
    
    <4>[  643.029762] weijuncheng ---------------------------------------------------->uhid_queue UHID_OPEN start
    <4>[  643.029764] CPU: 4 PID: 7339 Comm: kworker/4:0 Tainted: G        W  O    4.9.112-perf-g3dd9240-dirty #2
    <4>[  643.029765] Hardware name: Xiaomi Technologies, Inc. Perseus P3 v2.1 (DT)
    <4>[  643.029767] Workqueue: events uhid_device_add_worker
    <4>[  643.029768] Call trace:
    <4>[  643.029770] [<ffffff96a8c8a808>] dump_backtrace+0x0/0x2ac
    <4>[  643.029772] [<ffffff96a8c8ad34>] show_stack+0x14/0x1c
    <4>[  643.029774] [<ffffff96a905c02c>] dump_stack+0x94/0xb4
    <4>[  643.029775] [<ffffff96a982a9fc>] uhid_hid_open+0x6c/0x9c
    <4>[  643.029776] [<ffffff96a9823854>] hidinput_open+0x1c/0x24
    <4>[  643.029779] [<ffffff96a958c000>] input_open_device+0x80/0xac
    <4>[  643.029781] [<ffffff96a959163c>] input_leds_connect+0xd0/0x2ac
    <4>[  643.029782] [<ffffff96a958ca8c>] input_attach_handler+0x19c/0x1fc
    <4>[  643.029783] [<ffffff96a958cf24>] input_register_device+0x36c/0x4b4
    <4>[  643.029784] [<ffffff96a9823f50>] hidinput_connect+0x43c/0x3d18
    <4>[  643.029785] [<ffffff96a9822c98>] hid_connect+0x1f4/0x334
    <4>[  643.029786] [<ffffff96a98234bc>] hid_device_probe+0x110/0x140
    <4>[  643.029787] [<ffffff96a93891e4>] driver_probe_device+0x24c/0x430
    <4>[  643.029788] [<ffffff96a93895b0>] __device_attach_driver+0xd4/0x148
    <4>[  643.029789] [<ffffff96a9386ad8>] bus_for_each_drv+0x60/0xb0
    <4>[  643.029791] [<ffffff96a9388d84>] __device_attach+0x100/0x17c
    <4>[  643.029792] [<ffffff96a93897bc>] device_initial_probe+0x10/0x18
    <4>[  643.029793] [<ffffff96a93880e0>] bus_probe_device+0x90/0x98
    <4>[  643.029794] [<ffffff96a93858d0>] device_add+0x420/0x58c
    <4>[  643.029795] [<ffffff96a9823104>] hid_add_device+0xe0/0x2a0
    <4>[  643.029796] [<ffffff96a98297f0>] uhid_device_add_worker+0x1c/0x58
    <4>[  643.029798] [<ffffff96a8cc8d8c>] process_one_work+0x168/0x45c
    <4>[  643.029799] [<ffffff96a8cc90d4>] worker_thread+0x54/0x46c
    <4>[  643.029800] [<ffffff96a8ccf838>] kthread+0xe0/0xf4
    <4>[  643.029801] [<ffffff96a8c82ee0>] ret_from_fork+0x10/0x30
    <4>[  643.029802] weijuncheng ---------------------------------------------------->uhid_queue UHID_OPEN end
    

    然后排查这一段逻辑:

     <4>[  643.029775] [<ffffff96a982a9fc>] uhid_hid_open+0x6c/0x9c
     <4>[  643.029776] [<ffffff96a9823854>] hidinput_open+0x1c/0x24
     <4>[  643.029779] [<ffffff96a958c000>] input_open_device+0x80/0xac
     <4>[  643.029781] [<ffffff96a959163c>] input_leds_connect+0xd0/0x2ac
     <4>[  643.029782] [<ffffff96a958ca8c>] input_attach_handler+0x19c/0x1fc
     <4>[  643.029783] [<ffffff96a958cf24>] input_register_device+0x36c/0x4b4
    

    最终发现错误在http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/kernel/msm-4.14/drivers/hid/hid-core.c?r=aaac082d#1804

    1804int hid_hw_open(struct hid_device *hdev)
    1805{
    1806    int ret;
    1807
    1808    ret = mutex_lock_killable(&hdev->ll_open_lock);
    1809    if (ret)
    1810        return ret;
    1811
    1812    if (!hdev->ll_open_count++) {
    1813        ret = hdev->ll_driver->open(hdev);
    1814        if (ret)
    1815            hdev->ll_open_count--;
    1816    }
    1817
    1818    mutex_unlock(&hdev->ll_open_lock);
    1819    return ret;
    1820}
    1821EXPORT_SYMBOL_GPL(hid_hw_open);
    

    注意

        1812    if (!hdev->ll_open_count++) {
        1813        ret = hdev->ll_driver->open(hdev);
    

    而调用的driver->open里面

    142static int uhid_hid_open(struct hid_device *hid)
    143{
    144 struct uhid_device *uhid = hid->driver_data;
    145 int retval = 0;
    146
    147 if (!hid->ll_open_count++) {
    148     retval = uhid_queue_event(uhid, UHID_OPEN);
    149     if (retval)
    150         hid->ll_open_count--;
    151 }
    152 return retval;
    153}
    
        147 if (!hid->ll_open_count++) {
        148     retval = uhid_queue_event(uhid, UHID_OPEN);
    

    hid->ll_open_count这个值加了两次,导致无法进入retval = uhid_queue_event(uhid, UHID_OPEN),UHID_OPEN这个event无法向上传递,最终没有调用到定义的callback,导致messagequeue被阻塞,up消息无法发送,case fail,本地修改hid_hw_open为:

    int hid_hw_open(struct hid_device *hdev)
    {
        /*
        int ret;
     
        ret = mutex_lock_killable(&hdev->ll_open_lock);
        if (ret)
            return ret;
     
        if (!hdev->ll_open_count++) {
            ret = hdev->ll_driver->open(hdev);
            if (ret)
                hdev->ll_open_count--;
        }
     
        mutex_unlock(&hdev->ll_open_lock);
        return ret;
        */
        return hdev->ll_driver->open(hdev);
    }
    

    再编bootimage刷一下,就可以通过测试了

    问题总结

    这个问题是kernel更新之后里面的hid driver实现逻辑有问题造成的,中间回顾了Handler实现逻辑,用到了linux的epoll,workqueue机制(只有基本了解),有待后续进一步详细了解。

    相关文章

      网友评论

          本文标题:CTS问题分析9

          本文链接:https://www.haomeiwen.com/subject/qmdifqtx.html