美文网首页
一次主从延迟排查

一次主从延迟排查

作者: 左轮Lee | 来源:发表于2020-08-19 09:03 被阅读0次

MySQL: 5.7.25-log
隔壁部门的同学发现主从发生了延迟,让我协助处理下。
以下测试是本地环境模拟。

目录
一、问题发现
二、初步排查
三、解析binlog,查看具体SQL
四、解决方案
五、善后
六、教训

一、问题发现

钉钉收到复制延迟报警,于是连上备库查看复制状态。
反复执行 show slave status;
Relay_Master_Log_FileExec_Master_Log_Pos两个参数保持不变。
Seconds_Behind_Master一直在增加。

(sit)root@localhost [(none)]> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.211.129
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 269158126
               Relay_Log_File: mysql-relay-bin.000007
                Relay_Log_Pos: 105085733
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 222934194
              Relay_Log_Space: 361484149
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 168
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1293306
                  Master_UUID: 37ca06ac-5c51-11e9-9d3b-000c296776be
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Waiting for Slave Workers to free pending events
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 37ca06ac-5c51-11e9-9d3b-000c296776be:5768-5771
            Executed_Gtid_Set: 173d287c-a642-11ea-ae79-000c295d8ce8:1-39256,
37ca06ac-5c51-11e9-9d3b-000c296776be:1-5770,
425c9d7e-3ff6-11e9-baed-e86a6406e22a:1-1170
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
二、初步排查

拿到上面的Relay_Master_Log_FileExec_Master_Log_Pos查看主库在此pos之后的执行语句。
主库执行:

(sit)root@localhost [(none)]> show binlog events in 'mysql-bin.000006' from 222934194 ;
+------------------+-----------+-------------+-----------+-------------+----------------------------------------------------------------------+
| Log_name         | Pos       | Event_type  | Server_id | End_log_pos | Info                                                                 |
+------------------+-----------+-------------+-----------+-------------+----------------------------------------------------------------------+
| mysql-bin.000006 | 222934194 | Gtid        |   1293306 |   222934259 | SET @@SESSION.GTID_NEXT= '37ca06ac-5c51-11e9-9d3b-000c296776be:5771' |
| mysql-bin.000006 | 222934259 | Query       |   1293306 |   222934332 | BEGIN                                                                |
| mysql-bin.000006 | 222934332 | Table_map   |   1293306 |   222934411 | table_id: 325 (test1.t_core_task_log)                                |
| mysql-bin.000006 | 222934411 | Delete_rows |   1293306 |   222942517 | table_id: 325                                                        |
| mysql-bin.000006 | 222942517 | Delete_rows |   1293306 |   222950603 | table_id: 325                                                        |
| mysql-bin.000006 | 222950603 | Delete_rows |   1293306 |   222958736 | table_id: 325                                                        |
| mysql-bin.000006 | 222958736 | Delete_rows |   1293306 |   222966918 | table_id: 325                                                        |
| mysql-bin.000006 | 222966918 | Delete_rows |   1293306 |   222975014 | table_id: 325                                                        |
| mysql-bin.000006 | 222975014 | Delete_rows |   1293306 |   222983142 | table_id: 325                                                        |
| mysql-bin.000006 | 222983142 | Delete_rows |   1293306 |   222991343 | table_id: 325                                                        |
...
## 省略
...

发现此pos后对表test1.t_core_task_log做了大量的delete操作(上面只截取部分查询结果)。
这种基本上就是主库执行了一个大的删除事务,几十上百万,导致备库回放时延迟。但是正常情况下不至于延迟几个小时。
于是检查表结构:

(sit)root@localhost [(none)]> show create table test1.t_core_task_log \G
*************************** 1. row ***************************
       Table: t_core_task_log
Create Table: CREATE TABLE `t_core_task_log` (
  `task_id` bigint(20) DEFAULT NULL,
  `exe_id` bigint(20) DEFAULT NULL,
  `task_type` int(11) DEFAULT NULL,
  `state` mediumint(6) DEFAULT NULL,
  `cur_run_date` varchar(64) DEFAULT NULL,
  `next_run_date` varchar(64) DEFAULT NULL,
  `log_desc` longtext,
  `runtime_broker` varchar(64) DEFAULT NULL,
  `log_time` varchar(64) DEFAULT NULL,
  `log_time_ms` int(11) DEFAULT NULL,
  `tries` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

果然,寡表一张,连个索引都没有,别说主键了。
为了更深一步确认执行的SQL语句,下面解析下binlog。

三、解析binlog,查看具体SQL
# at 222934163
#200818 14:20:58 server id 1293306  end_log_pos 222934194 CRC32 0xa8e0f554      Xid = 5635
COMMIT/*!*/;
# at 222934194
#200818 14:58:17 server id 1293306  end_log_pos 222934259 CRC32 0x9792c084      GTID    last_committed=515      sequence_number=516     rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '37ca06ac-5c51-11e9-9d3b-000c296776be:5771'/*!*/;
# at 222934259
#200818 14:58:17 server id 1293306  end_log_pos 222934332 CRC32 0xba4022f8      Query   thread_id=31    exec_time=0     error_code=0
SET TIMESTAMP=1597733897/*!*/;
BEGIN
/*!*/;
# at 222934332
#200818 14:58:17 server id 1293306  end_log_pos 222934411 CRC32 0x54e65c06      Table_map: `test1`.`t_core_task_log` mapped to number 325
# at 222934411
#200818 14:58:17 server id 1293306  end_log_pos 222942517 CRC32 0x5661e9f7      Delete_rows: table id 325
# at 222942517
#200818 14:58:17 server id 1293306  end_log_pos 222950603 CRC32 0xa6817b87      Delete_rows: table id 325
# at 222950603
#200818 14:58:17 server id 1293306  end_log_pos 222958736 CRC32 0x12ae1bcb      Delete_rows: table id 325
...
## 省略很多
...
# at 269137195
#200818 14:58:17 server id 1293306  end_log_pos 269145386 CRC32 0xbdc1c971      Delete_rows: table id 325
# at 269145386
#200818 14:58:17 server id 1293306  end_log_pos 269153543 CRC32 0xd065b822      Delete_rows: table id 325
# at 269153543
#200818 14:58:17 server id 1293306  end_log_pos 269158095 CRC32 0x95f1eb27      Delete_rows: table id 325 flags: STMT_END_F
### DELETE FROM `test1`.`t_core_task_log`
### WHERE
###   @1=10122 /* LONGINT meta=0 nullable=1 is_null=0 */
###   @2=810 /* LONGINT meta=0 nullable=1 is_null=0 */
###   @3=5 /* INT meta=0 nullable=1 is_null=0 */
###   @4=1 /* MEDIUMINT meta=0 nullable=1 is_null=0 */
###   @5='2020-06-25 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
###   @6='2020-06-26 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
###   @7='50075log' /* LONGBLOB/LONGTEXT meta=4 nullable=1 is_null=0 */
###   @8='172.31.51.0' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
###   @9='2020-07-23 16:30:02' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
###   @10=28 /* INT meta=0 nullable=1 is_null=0 */
###   @11=1 /* INT meta=0 nullable=1 is_null=0 */
### DELETE FROM `test1`.`t_core_task_log`
### WHERE
###   @1=10122 /* LONGINT meta=0 nullable=1 is_null=0 */
###   @2=810 /* LONGINT meta=0 nullable=1 is_null=0 */
###   @3=5 /* INT meta=0 nullable=1 is_null=0 */
###   @4=1 /* MEDIUMINT meta=0 nullable=1 is_null=0 */
###   @5='2020-06-25 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
###   @6='2020-06-26 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
###   @7='50072log' /* LONGBLOB/LONGTEXT meta=4 nullable=1 is_null=0 */
###   @8='172.31.51.0' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
###   @9='2020-07-23 16:30:02' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
###   @10=78 /* INT meta=0 nullable=1 is_null=0 */
###   @11=1 /* INT meta=0 nullable=1 is_null=0 */
...
## 省略很多
...

可知备库回放的SQL为:

DELETE FROM `test1`.`t_core_task_log`
WHERE
  @1=10122 /* LONGINT meta=0 nullable=1 is_null=0 */
  @2=810 /* LONGINT meta=0 nullable=1 is_null=0 */
  @3=5 /* INT meta=0 nullable=1 is_null=0 */
  @4=1 /* MEDIUMINT meta=0 nullable=1 is_null=0 */
  @5='2020-06-25 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
  @6='2020-06-26 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
  @7='50075log' /* LONGBLOB/LONGTEXT meta=4 nullable=1 is_null=0 */
  @8='172.31.51.0' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
  @9='2020-07-23 16:30:02' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
  @10=28 /* INT meta=0 nullable=1 is_null=0 */
  @11=1 /* INT meta=0 nullable=1 is_null=0 */

因为缺少索引,导致备库每次执行均需要全表扫描一次test1.t_core_task_log,我们可以单独执行下delete对应的select语句,估算下备库回放需要的时间。

#### 备库执行
(sit)root@localhost [(none)]> select * from `test1`.`t_core_task_log` where task_id = 10122  and  log_time = '2020-07-23 16:30:02'  and log_time_ms=110 ;
+---------+--------+-----------+-------+---------------------+---------------------+---------------------------+----------------+---------------------+-------------+-------+
| task_id | exe_id | task_type | state | cur_run_date        | next_run_date       | log_desc                  | runtime_broker | log_time            | log_time_ms | tries |
+---------+--------+-----------+-------+---------------------+---------------------+---------------------------+----------------+---------------------+-------------+-------+
|   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
|   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
|   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
|   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
|   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
|   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
|   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
|   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
+---------+--------+-----------+-------+---------------------+---------------------+---------------------------+----------------+---------------------+-------------+-------+
8 rows in set (0.29 sec)

单次全表扫描一次需约0.29s
因为是删除操作,查看备库和主库当前的数据量,估算总耗时。

## 主库执行
(sit)root@localhost [(none)]> SELECT count(*) FROM test1.`t_core_task_log` ;
+----------+
| count(*) |
+----------+
|    88856 |
+----------+
1 row in set (0.16 sec)

## 备库执行
(sit)root@localhost [(none)]> SELECT count(*) FROM test1.`t_core_task_log` ;
+----------+
| count(*) |
+----------+
|   432312 |
+----------+
1 row in set (0.13 sec)

## 计算耗时,约27小时
(sit)root@localhost [(none)]> select 0.29*(432312-88856)/3600;
+--------------------------+
| 0.29*(432312-88856)/3600 |
+--------------------------+
|                27.667289 |
+--------------------------+
1 row in set (0.00 sec)

需要27小时才能执行完成,GG

四、解决方案

1.如果上面估算时间还可以接受,且备库不是着急使用的话,建议等待;
2.强制stop slave(正常stop slave会卡住),关闭MySQL实例,修改my.cnf(禁止自动启动复制),启动MySQL实例,备库创建索引,打开复制start slave。
步骤如下:

## 关闭IO线程(备库执行)
(sit)root@localhost [(none)]> stop slave io_thread;

## 关闭SQL线程(备库执行)
(sit)root@localhost [(none)]> SELECT * from information_schema.`PROCESSLIST`  where user = 'system user';
+----+-------------+------+------+---------+------+--------------------------------------------------+------+
| ID | USER        | HOST | DB   | COMMAND | TIME | STATE                                            | INFO |
+----+-------------+------+------+---------+------+--------------------------------------------------+------+
| 10 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
|  9 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
| 11 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
|  7 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
| 13 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
|  8 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
| 12 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
|  5 | system user |      | NULL | Connect |    0 | Waiting for Slave Workers to free pending events | NULL |
|  6 | system user |      | NULL | Connect | 1453 | Executing event                                  | NULL |
+----+-------------+------+------+---------+------+--------------------------------------------------+------+

## 线程6是正在执行回放的,强制kill(备库执行)
(sit)root@localhost [(none)]> kill 6 ;

## 检查线程,已经全部关闭(备库执行)
(sit)root@localhost [(none)]> SELECT * from information_schema.`PROCESSLIST`  where user = 'system user';
Empty set (0.00 sec)

## 检查复制状态,有报错,重启后会自动恢复(备库执行)
(sit)root@localhost [(none)]> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.211.129
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 269158126
               Relay_Log_File: mysql-relay-bin.000007
                Relay_Log_Pos: 105085733
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1317
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '37ca06ac-5c51-11e9-9d3b-000c296776be:5771' 
at master log mysql-bin.000006, end_log_pos 235240872. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 222934194
              Relay_Log_Space: 361484149
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1317
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '37ca06ac-5c51-11e9-9d3b-000c296776be:5771' 
at master log mysql-bin.000006, end_log_pos 235240872. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1293306
                  Master_UUID: 37ca06ac-5c51-11e9-9d3b-000c296776be
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 200818 15:22:37
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 37ca06ac-5c51-11e9-9d3b-000c296776be:5768-5771
            Executed_Gtid_Set: 173d287c-a642-11ea-ae79-000c295d8ce8:1-39256,
37ca06ac-5c51-11e9-9d3b-000c296776be:1-5770,
425c9d7e-3ff6-11e9-baed-e86a6406e22a:1-1170
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

编辑参数,重启备库:

## 编辑参数
vi /etc/my.cnf 增加下面的参数,让备库启动时不启动复制。
[mysqld]
skip-slave-start=1

## 启动备库
service mysqld start

启动后检查复制状态(确保复制没有启动)

(sit)root@localhost [(none)]> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.211.129
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 222934194
               Relay_Log_File: mysql-relay-bin.000008
                Relay_Log_Pos: 4
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 222934194
              Relay_Log_Space: 361484326
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: 37ca06ac-5c51-11e9-9d3b-000c296776be
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 173d287c-a642-11ea-ae79-000c295d8ce8:1-39256,
37ca06ac-5c51-11e9-9d3b-000c296776be:1-5770,
425c9d7e-3ff6-11e9-baed-e86a6406e22a:1-1170
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

备库创建索引,记得把 log_bin 关闭,再执行索引创建。否则后期主从切换可能会导致因索引名冲突而复制报错。

(sit)root@localhost [(none)]> set sql_log_bin =0;
Query OK, 0 rows affected (0.00 sec)

(sit)root@localhost [(none)]> create index idx_task_id_log_time on test1.t_core_task_log(task_id,log_time);
Query OK, 0 rows affected (2.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

打开复制,观察复制状态,约20秒后备库追上。

(sit)root@localhost [(none)]> start slave;
(sit)root@localhost [(none)]> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.211.129
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 269158126
               Relay_Log_File: mysql-relay-bin.000009
                Relay_Log_Pos: 46224346
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 269158126
              Relay_Log_Space: 407708725
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1293306
                  Master_UUID: 37ca06ac-5c51-11e9-9d3b-000c296776be
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 37ca06ac-5c51-11e9-9d3b-000c296776be:5771
            Executed_Gtid_Set: 173d287c-a642-11ea-ae79-000c295d8ce8:1-39256,
37ca06ac-5c51-11e9-9d3b-000c296776be:1-5771,
425c9d7e-3ff6-11e9-baed-e86a6406e22a:1-1170
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

五、善后

检查主备库数据,并添加表主键,删除备库之前创建的索引。

## 主库执行
(sit)root@localhost [(none)]> SELECT count(*) FROM test1.`t_core_task_log` ;
+----------+
| count(*) |
+----------+
|    88856 |
+----------+

## 备库执行
(sit)root@localhost [(none)]> SELECT count(*) FROM test1.`t_core_task_log` ;
+----------+
| count(*) |
+----------+
|    88856 |
+----------+

## 主备数据一致,且复制延迟已解决。
## 主库新增主键,主库执行
alter table t_core_task_log add id int auto_increment PRIMARY key ;

## 备库删除刚刚添加的索引,备库执行
(sit)root@localhost [(none)]> set sql_log_bin =0;
(sit)root@localhost [(none)]> alter table test1.t_core_task_log drop index idx_task_id_log_time;
六、教训

老生常谈的MySQL表必须包含主键,万万不可少,不仅复制中使用,很多第三方工具也是需要的。前期的上线SQL审核要做到位!切记切记!!

相关文章

  • 一次主从延迟排查

    MySQL: 5.7.25-log隔壁部门的同学发现主从发生了延迟,让我协助处理下。以下测试是本地环境模拟。 目录...

  • 主从延迟问题及解决方案

    主从延迟情况 主从延迟监控

  • MySQL复制中常见的故障及处理

    1.复制中断怎么办?2.大数据量变更造成的复制延迟如何定位及解决3.主从复制延迟排查及开发中处理技巧有哪些? 1....

  • 32主从复制

    职责搭建主从复制,主从原理熟悉,主从的故障处理,主从延迟,主从的特殊架构的配置使用,主从架构的演变 主从复制介绍主...

  • MySQL 实现准实时的表级别DML计数

    一、先上效果图,再做阐述 主从延迟,快速定位哪个表引起的 二、背景 当生产业务出现了主从延迟,DBA无法立刻知道是...

  • Mysql主从延迟

    如何判断主从延迟的办法? 通过show slave status;进行查看,比如看看Seconds_Behin...

  • redis延迟排查

    http://mp.weixin.qq.com/s?__biz=MzAwODg3MDk0OQ==&mid=2247...

  • MySQL 主从延迟问题思路

    背景: 一个历史比较悠久的服务出现了主从延迟。即写库写入n秒后,从库才能读到正确的数据。 主从延迟产生的原因: 当...

  • mysql主从邮件监控

    概述 包括mysql宕机报警,mysql主从io,sql状态异常报警以及主从延迟过长报警 通过外部STMP发邮件 ...

  • 数据库一主多从带来的线上问题

    主从延迟有可能几分钟。 正常的情况下,主从延迟肯定是1s内的,但当dba在主库进行DDL操作,比如改表结构等等,如...

网友评论

      本文标题:一次主从延迟排查

      本文链接:https://www.haomeiwen.com/subject/wsvhjktx.html