美文网首页
死锁问题排查

死锁问题排查

作者: 我是光芒万丈 | 来源:发表于2022-07-21 12:00 被阅读0次

    最近在处理现场问题中偶然发现从日志中发现事务间死锁,与以往认知不大相符,故而进行记录。
    1.死锁
    死锁一般指的是两个事务或者线程间发生资源资源交叉互斥,导致双方都盯着对方持有的资源无法更进一步。
    2.常见事务型死锁例子:


    事务死锁示例

    3.本次死锁排查:
    首先是服务日志中mybatis执行sql过程中出现死锁:

    ### The error occurred while setting parameters
    ### SQL: UPDATE UM_METADATA     SET VERSION_NUMBER = CASE id              WHEN ? THEN ?      end,      UPDATE_TIME= CASE id              WHEN ? THEN ?      end      WHERE ID IN      (         ?      )
    ### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
    ; Deadlock found when trying to get lock; try restarting transaction; nested exception is 
    

    先使用命令对mysql死锁进行分析:

    show engine innodb status;
    

    接下来简要分析下对应mysql死锁日志
    事务1:等待获取行锁RECORD LOCKS space id 179871 page no 39 n bits 96 index PRIMARY of table meta_test.XXX trx id 48125214 lock_mode X waiting
    X表示是排他锁 S 表示是共享锁。
    lock_mode X 说明是 next-key锁 ,next-key锁是行锁(record lock)和间隙锁(Gap Locks)的组合,会锁住当前行及索引的间隙。
    具体可参考:https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html

    *** (1) TRANSACTION:
    TRANSACTION 48125214, ACTIVE 0 sec fetching rows
    mysql tables in use 1, locked 1
    LOCK WAIT 67 lock struct(s), heap size 8400, 1183 row lock(s)
    MySQL thread id 22591, OS thread handle 139935406667520, query id 27170986 192.168.0.211 root updating
    delete from
         XXX
        where MD_PATH like concat('/52b7d582b11c42c182a7ce5ffc2c3873/7d818c22d37c4854954f4f1a88ba3644/dd7c85705a494928b6d07c9b9fe3fae1/00dad50cc56840c0a1641e1b0accb3c7/16776e52b5ab438e9695736e8881265c/964246d87594441ebc35fa86b8abbb39/eab0e37c65294969b7f80d10cfba41e4/3b2d63069fdc4a47b7ecdf6fea27cdc4/fe6bfc61eb144b5ba9b0a5097e3546ce','/%')
         
          and TENANT_CODE = 'GTJA2022'
    *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 179871 page no 39 n bits 96 index PRIMARY of table `meta_test`.`UM_METADATA` trx id 48125214 lock_mode X waiting
    

    事务2:

    *** (2) TRANSACTION:
    TRANSACTION 48125210, ACTIVE 0 sec starting index read
    mysql tables in use 1, locked 1
    42 lock struct(s), heap size 3520, 1054 row lock(s), undo log entries 3
    MySQL thread id 22595, OS thread handle 139935405045504, query id 27170991 192.168.0.211 root updating
    UPDATE XXX
        SET VERSION_NUMBER = CASE id
          
          WHEN '3b2d63069fdc4a47b7ecdf6fea27cdc4' THEN '1.1'
         end, 
        UPDATE_TIME= CASE id
          
          WHEN '3b2d63069fdc4a47b7ecdf6fea27cdc4' THEN '2022-07-20 09:35:11.663'
         end 
        WHERE ID IN
         (  
          '3b2d63069fdc4a47b7ecdf6fea27cdc4'
         )
    

    事务2持有的锁有:space id 179871 page no 39 n bits 96 仔细观察可以发现正是事务1在等待的锁对象上的x锁 。
    page no 每个记录在堆中的相对位置.
    lock_mode X locks rec but not gap 说明是行锁。 锁定id为:fe6bfc61eb144b5ba9b0a5097e3546

    *** (2) HOLDS THE LOCK(S):
    RECORD LOCKS space id 179871 page no 39 n bits 96 index PRIMARY of table `meta_test`.`UM_METADATA` trx id 48125210 lock_mode X locks rec but not gap
    -- Record lock 标识锁住的记录  PHYSICAL RECORD 锁住的是heap no(存储叶内的偏移值)为15的记录 info bits 非0表示被删除  
    -- 0表示没删除
    Record lock, heap no 15 PHYSICAL RECORD: n_fields 20; compact format; info bits 32
     -- 锁住主键长度为30  id为fe6bfc61eb144b5ba9b0a5097e3546的的id hex是id的16进制表示方式
     0: len 30; hex 666536626663363165623134346235626139623061353039376533353436; asc fe6bfc61eb144b5ba9b0a5097e3546; (total 32 bytes);
     -- 固定6个字节的事务id
     1: len 6; hex 000002de551a; asc     U ;;
     -- 固定7个字节的回滚指针
     2: len 7; hex 6a000001da23df; asc j    # ;;
     3: SQL NULL;
     -- 后面是此列其他行数据
    

    事务2请求的锁对象:再次尝试获取RECORD LOCKS space id 179871 page no 77 n bits 88 index PRIMARY 上的x锁
    主键id为3b2d63069fdc4a47b7ecdf6fea27cd

    *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 179871 page no 77 n bits 88 index PRIMARY of table `meta_test`.`UM_METADATA` trx id 48125210 lock_mode X locks rec but not gap waiting
    Record lock, heap no 15 PHYSICAL RECORD: n_fields 20; compact format; info bits 0
     0: len 30; hex 336232643633303639666463346134376237656364663666656132376364; asc 3b2d63069fdc4a47b7ecdf6fea27cd; (total 32 bytes);
     1: len 6; hex 000002de54e9; asc     T ;;
     2: len 7; hex 4200001a76154b; asc B   v K;;
     3: len 3; hex 312e30; asc 1.0;;
     4: len 8; hex 47544a4132303232; asc GTJA2022;;
    

    结果:事务2被回滚掉。

    *** WE ROLL BACK TRANSACTION (2)
    

    经排查相关冲突事务示意如下:


    死锁事务示意

    具体原因,我暂时也无法解释,解决:其实只需要避免这种事务冲突就好。等事务2业务全部执行完,再触发事务1异步任务线程即可.

    相关文章

      网友评论

          本文标题:死锁问题排查

          本文链接:https://www.haomeiwen.com/subject/edweirtx.html