最近在处理现场问题中偶然发现从日志中发现事务间死锁,与以往认知不大相符,故而进行记录。
1.死锁
死锁一般指的是两个事务或者线程间发生资源资源交叉互斥,导致双方都盯着对方持有的资源无法更进一步。
2.常见事务型死锁例子:
事务死锁示例
3.本次死锁排查:
首先是服务日志中mybatis执行sql过程中出现死锁:
### The error occurred while setting parameters
### SQL: UPDATE UM_METADATA SET VERSION_NUMBER = CASE id WHEN ? THEN ? end, UPDATE_TIME= CASE id WHEN ? THEN ? end WHERE ID IN ( ? )
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
; Deadlock found when trying to get lock; try restarting transaction; nested exception is
先使用命令对mysql死锁进行分析:
show engine innodb status;
接下来简要分析下对应mysql死锁日志
事务1:等待获取行锁RECORD LOCKS space id 179871 page no 39 n bits 96 index PRIMARY of table meta_test
.XXX
trx id 48125214 lock_mode X waiting
X表示是排他锁 S 表示是共享锁。
lock_mode X 说明是 next-key锁 ,next-key锁是行锁(record lock)和间隙锁(Gap Locks)的组合,会锁住当前行及索引的间隙。
具体可参考:https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html
*** (1) TRANSACTION:
TRANSACTION 48125214, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 67 lock struct(s), heap size 8400, 1183 row lock(s)
MySQL thread id 22591, OS thread handle 139935406667520, query id 27170986 192.168.0.211 root updating
delete from
XXX
where MD_PATH like concat('/52b7d582b11c42c182a7ce5ffc2c3873/7d818c22d37c4854954f4f1a88ba3644/dd7c85705a494928b6d07c9b9fe3fae1/00dad50cc56840c0a1641e1b0accb3c7/16776e52b5ab438e9695736e8881265c/964246d87594441ebc35fa86b8abbb39/eab0e37c65294969b7f80d10cfba41e4/3b2d63069fdc4a47b7ecdf6fea27cdc4/fe6bfc61eb144b5ba9b0a5097e3546ce','/%')
and TENANT_CODE = 'GTJA2022'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 179871 page no 39 n bits 96 index PRIMARY of table `meta_test`.`UM_METADATA` trx id 48125214 lock_mode X waiting
事务2:
*** (2) TRANSACTION:
TRANSACTION 48125210, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
42 lock struct(s), heap size 3520, 1054 row lock(s), undo log entries 3
MySQL thread id 22595, OS thread handle 139935405045504, query id 27170991 192.168.0.211 root updating
UPDATE XXX
SET VERSION_NUMBER = CASE id
WHEN '3b2d63069fdc4a47b7ecdf6fea27cdc4' THEN '1.1'
end,
UPDATE_TIME= CASE id
WHEN '3b2d63069fdc4a47b7ecdf6fea27cdc4' THEN '2022-07-20 09:35:11.663'
end
WHERE ID IN
(
'3b2d63069fdc4a47b7ecdf6fea27cdc4'
)
事务2持有的锁有:space id 179871 page no 39 n bits 96 仔细观察可以发现正是事务1在等待的锁对象上的x锁 。
page no 每个记录在堆中的相对位置.
lock_mode X locks rec but not gap 说明是行锁。 锁定id为:fe6bfc61eb144b5ba9b0a5097e3546
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 179871 page no 39 n bits 96 index PRIMARY of table `meta_test`.`UM_METADATA` trx id 48125210 lock_mode X locks rec but not gap
-- Record lock 标识锁住的记录 PHYSICAL RECORD 锁住的是heap no(存储叶内的偏移值)为15的记录 info bits 非0表示被删除
-- 0表示没删除
Record lock, heap no 15 PHYSICAL RECORD: n_fields 20; compact format; info bits 32
-- 锁住主键长度为30 id为fe6bfc61eb144b5ba9b0a5097e3546的的id hex是id的16进制表示方式
0: len 30; hex 666536626663363165623134346235626139623061353039376533353436; asc fe6bfc61eb144b5ba9b0a5097e3546; (total 32 bytes);
-- 固定6个字节的事务id
1: len 6; hex 000002de551a; asc U ;;
-- 固定7个字节的回滚指针
2: len 7; hex 6a000001da23df; asc j # ;;
3: SQL NULL;
-- 后面是此列其他行数据
事务2请求的锁对象:再次尝试获取RECORD LOCKS space id 179871 page no 77 n bits 88 index PRIMARY 上的x锁
主键id为3b2d63069fdc4a47b7ecdf6fea27cd
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 179871 page no 77 n bits 88 index PRIMARY of table `meta_test`.`UM_METADATA` trx id 48125210 lock_mode X locks rec but not gap waiting
Record lock, heap no 15 PHYSICAL RECORD: n_fields 20; compact format; info bits 0
0: len 30; hex 336232643633303639666463346134376237656364663666656132376364; asc 3b2d63069fdc4a47b7ecdf6fea27cd; (total 32 bytes);
1: len 6; hex 000002de54e9; asc T ;;
2: len 7; hex 4200001a76154b; asc B v K;;
3: len 3; hex 312e30; asc 1.0;;
4: len 8; hex 47544a4132303232; asc GTJA2022;;
结果:事务2被回滚掉。
*** WE ROLL BACK TRANSACTION (2)
经排查相关冲突事务示意如下:
死锁事务示意
具体原因,我暂时也无法解释,解决:其实只需要避免这种事务冲突就好。等事务2业务全部执行完,再触发事务1异步任务线程即可.
网友评论