美文网首页
生产环境长事务造成的mysql死锁分析及解决

生产环境长事务造成的mysql死锁分析及解决

作者: 生不悔改 | 来源:发表于2023-06-18 15:39 被阅读0次

    死锁日志

    *************************** 1. row ***************************
      Type: InnoDB
      Name: 
    Status: 
    =====================================
    2023-06-13 14:11:44 0x7f1febad8700 INNODB MONITOR OUTPUT
    =====================================
    Per second averages calculated from the last 25 seconds
    -----------------
    BACKGROUND THREAD
    -----------------
    srv_master_thread loops: 27984349 srv_active, 0 srv_shutdown, 27539402 srv_idle
    srv_master_thread log flush and writes: 55523751
    ----------
    SEMAPHORES
    ----------
    OS WAIT ARRAY INFO: reservation count 23061421
    OS WAIT ARRAY INFO: signal count 67625116
    RW-shared spins 0, rounds 90130459, OS waits 21606013
    RW-excl spins 0, rounds 168864564, OS waits 391828
    RW-sx spins 10886910, rounds 129682783, OS waits 864374
    Spin rounds per wait: 90130459.00 RW-shared, 168864564.00 RW-excl, 11.91 RW-sx
    ------------------------
    LATEST DETECTED DEADLOCK
    ------------------------
    2023-06-03 18:20:06 0x7f1fc46e5700
    *** (1) TRANSACTION:
    TRANSACTION 225089659, ACTIVE 1 sec starting index read
    mysql tables in use 1, locked 1
    LOCK WAIT 320 lock struct(s), heap size 41168, 4 row lock(s)
    MySQL thread id 5619472, OS thread handle 139774417090304, query id 1142417448 172.16.10.121 application updating
    UPDATE `saas_application`.`tenant_account_statistics`
        SET settlement   = settlement - 198.00,
            to_account   = to_account + 198.00,
            withdrawable = withdrawable + 198.00
        WHERE shop_id = 1531550363716599810
          AND belong_id = 1501384359112056834
    *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 494 page no 5 n bits 136 index index_shopid_belongid of table `saas_application`.`tenant_account_statistics` trx id 225089659 lock_mode X locks rec but not gap waiting
    Record lock, heap no 40 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
     0: len 8; hex 154128f0f8f2c002; asc  A(     ;;
     1: len 8; hex 14d5fd1eb433a002; asc      3  ;;
     2: len 4; hex 00000031; asc    1;;
    
    *** (2) TRANSACTION:
    TRANSACTION 225089658, ACTIVE 1 sec fetching rows, thread declared inside InnoDB 676
    mysql tables in use 1, locked 1
    323 lock struct(s), heap size 41168, 5 row lock(s), undo log entries 2
    MySQL thread id 5619462, OS thread handle 139774416279296, query id 1142417475 172.16.10.121 application updating
    UPDATE `saas_application`.`tenant_account_statistics_details`
        SET status                 = 1,
            actual_settlement_time = NOW()
        WHERE order_no = 'O2305101612286510'
          and `type` = 0
    *** (2) HOLDS THE LOCK(S):
    RECORD LOCKS space id 494 page no 5 n bits 136 index index_shopid_belongid of table `saas_application`.`tenant_account_statistics` trx id 225089658 lock_mode X locks rec but not gap
    Record lock, heap no 40 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
     0: len 8; hex 154128f0f8f2c002; asc  A(     ;;
     1: len 8; hex 14d5fd1eb433a002; asc      3  ;;
     2: len 4; hex 00000031; asc    1;;
    
    *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 727 page no 473 n bits 152 index PRIMARY of table `saas_application`.`tenant_account_statistics_details` trx id 225089658 lock_mode X locks rec but not gap waiting
    Record lock, heap no 33 PHYSICAL RECORD: n_fields 24; compact format; info bits 0
     0: len 4; hex 80004bbc; asc   K ;;
     1: len 6; hex 00000cd4ab0b; asc       ;;
     2: len 7; hex f6000001ca0110; asc        ;;
     3: len 4; hex 8000aa75; asc    u;;
     4: len 17; hex 4f32333035313031363131343938333437; asc O2305101611498347;;
     5: len 4; hex 80000000; asc     ;;
     6: SQL NULL;
     7: len 8; hex 94d5fd1eb433a002; asc      3  ;;
     8: len 8; hex 9520fd639933b002; asc    c 3  ;;
     9: len 18; hex e5a4a9e9a290e59586e59f8ee5b9b3e58fb0; asc                   ;;
     10: len 8; hex 954128f0f8f2c002; asc  A(     ;;
     11: len 30; hex e5a4a9e9a290e88cb6e59381e799bde88dafe99b86e59ba2e58685e8b4ad; asc                               ; (total 33 bytes);
     12: len 8; hex 94d5fd1eb433a002; asc      3  ;;
     13: len 6; hex e680bbe5ba97; asc       ;;
     14: len 1; hex 80; asc  ;;
     15: len 4; hex 80000000; asc     ;;
     16: len 1; hex 80; asc  ;;
     17: len 5; hex 800000c600; asc      ;;
     18: len 5; hex 99b0150304; asc      ;;
     19: SQL NULL;
     20: SQL NULL;
     21: SQL NULL;
     22: len 20; hex 4f32333035313030383131353139363031313233; asc O2305100811519601123;;
     23: len 21; hex 423233303531303136313135323038333032313030; asc B23051016115208302100;;
    

    根据sql找到代码

    image.png

    分析对应的死锁日志

    事务1死锁

    image.png

    从这段分析出,这是一个事务,事务id = 225089659 ,
    死锁的语句是:涉及的表是 tenant_account_statistics 这张表

    UPDATE `saas_application`.`tenant_account_statistics`
        SET settlement   = settlement - 198.00,
            to_account   = to_account + 198.00,
            withdrawable = withdrawable + 198.00
        WHERE shop_id = 1531550363716599810
          AND belong_id = 1501384359112056834
    

    涉及到 lock_mode X locks 是排他锁,行锁的一种。对应锁的索引是 index index_shopid_belongid 这是一个联合索引,shop_id 和 belong_id 的联合索引,

    分析

    1.此时该事务, 在尝试获取表 tenant_account_statistics 的 index index_shopid_belongid 对应非聚合索引的行锁。

    事务2死锁

    image.png

    从这段分析出,这是一个事务,事务id = 225089658 ,
    死锁的语句是:涉及的表是 tenant_account_statistics_details 这张表

    UPDATE `saas_application`.`tenant_account_statistics_details`
        SET status                 = 1,
            actual_settlement_time = NOW()
        WHERE order_no = 'O2305101612286510'
          and `type` = 0
    

    涉及到 lock_mode X locks 是排他锁,行锁的一种。对应锁的索引是 index index_shopid_belongid 这是一个联合索引,shop_id 和 belong_id 的联合索引。

    分析

    1.这个事务操作获取到了表 tenant_account_statistics 的 index index_shopid_belongid 索引对应的行锁。
    2.这个事务操作此时等待表 tenant_account_statistics_details 的 primary 主键索引

    分析代码

    image.png

    settlement(jsons)代码如下


    image.png

    死锁分析

    1.首先查看系统的数据库,mysql的隔离级别是read-commited ,读已提交,所以所有的查询都不会加锁。直接排除代码里面大事务里面的查询,造成的行锁问题。
    2.那么只有update造成的行锁互相竞争,我们将事务中update操作的sql都查看一遍。发现就是更新两张表,一张是 tenant_account_statistics_details ,一张表是 tenant_account_statistics 。
    3.分析死锁产生的流程:先看下面的执行sql

    ## SQL的执行顺序,也就是加锁的顺序
    ## ===============================================111111
    UPDATE tenant_account_statistics_details
        SET order_time      = #{orderTime},
            settlement_time = #{settlementTime}
        WHERE order_no = #{orderNo}
          AND type = 0
          AND status = 0
    ## ================================================222222
    UPDATE `saas_application`.`tenant_account_statistics`
        SET settlement   = settlement - #{amount},
            to_account   = to_account + #{amount},
            withdrawable = withdrawable + #{amount}
        WHERE shop_id = #{shopId}
          AND belong_id = #{belongId}
    ## =================================================333333
    UPDATE `saas_application`.`tenant_account_statistics_details`
    SET status                 = 1,
        actual_settlement_time = NOW()
    WHERE order_no = #{orderNo}
      and `type` = #{type}
    
    image.png

    事务在执行的时候,会同时去抢夺 tenant_account_statistics_details 和 tenant_account_statistics 两张表对应记录的行锁,并且持有的锁只有在事务提交或回滚后才会释放。
    事务1和事务2同时执行,并且操作两张表的同一条记录的时候,极端的情况是:
    事务1获得了 tenant_account_statistics_details 表的对应记录的行锁,等待 tenant_account_statistics 表的行锁。
    事务2获得了 tenant_account_statistics 表的对应记录的行锁,等待 tenant_account_statistics_details 表对应的行锁。
    此时形成了 两个事务相互等待对方的锁释放,无法回滚,无法提交,造成了死锁,数据库在超时后,会回滚undolog中操作影响小的事务。

    死锁解决

    死锁的原因找到了,然后就是解决思路:
    1.首先避免同一时间操作对应表的对应的同一条记录。业务的逻辑无法避免这块,除非重新设计。
    2.其次如何让事物操作所消耗的时间尽可能的短,这样就能减少两个事务操作同一条记录的冲突行为。排查一下sql,以及对应表的索引,发现

    UPDATE tenant_account_statistics_details
        SET order_time      = #{orderTime},
            settlement_time = #{settlementTime}
        WHERE order_no = #{orderNo}
          AND type = 0
          AND status = 0
    

    这个sql语句中根据 order_no去做更新操作,而order_no不是索引,所以会引起全表扫描,从而增加了这行sql的执行时间,从而也加大了整个事务的执行时间,从而提高了两个事务同时操作两张表对应的同一条数据概率,造成死锁的几率也大大提高了。
    于是我给 tenant_account_statistics_details 表添加了 order_no 索引,再本地做了并发是500的压测,发现不再出现死锁现象。从而解决了业务上死锁造成的数据错误问题。
    3.最后,我们还可以在这个大事务上加一个分布式锁,让所有的事务串行化执行,这样肯定也不会造成死锁,但是会减少接口的并发吞吐量。

    相关文章

      网友评论

          本文标题:生产环境长事务造成的mysql死锁分析及解决

          本文链接:https://www.haomeiwen.com/subject/wtcsydtx.html