美文网首页
一次主从延迟排查

一次主从延迟排查

作者: 左轮Lee | 来源:发表于2020-08-19 09:03 被阅读0次

    MySQL: 5.7.25-log
    隔壁部门的同学发现主从发生了延迟,让我协助处理下。
    以下测试是本地环境模拟。

    目录
    一、问题发现
    二、初步排查
    三、解析binlog,查看具体SQL
    四、解决方案
    五、善后
    六、教训

    一、问题发现

    钉钉收到复制延迟报警,于是连上备库查看复制状态。
    反复执行 show slave status;
    Relay_Master_Log_FileExec_Master_Log_Pos两个参数保持不变。
    Seconds_Behind_Master一直在增加。

    (sit)root@localhost [(none)]> show slave status \G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.211.129
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000006
              Read_Master_Log_Pos: 269158126
                   Relay_Log_File: mysql-relay-bin.000007
                    Relay_Log_Pos: 105085733
            Relay_Master_Log_File: mysql-bin.000006
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 222934194
                  Relay_Log_Space: 361484149
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 168
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 1293306
                      Master_UUID: 37ca06ac-5c51-11e9-9d3b-000c296776be
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Waiting for Slave Workers to free pending events
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 37ca06ac-5c51-11e9-9d3b-000c296776be:5768-5771
                Executed_Gtid_Set: 173d287c-a642-11ea-ae79-000c295d8ce8:1-39256,
    37ca06ac-5c51-11e9-9d3b-000c296776be:1-5770,
    425c9d7e-3ff6-11e9-baed-e86a6406e22a:1-1170
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    二、初步排查

    拿到上面的Relay_Master_Log_FileExec_Master_Log_Pos查看主库在此pos之后的执行语句。
    主库执行:

    (sit)root@localhost [(none)]> show binlog events in 'mysql-bin.000006' from 222934194 ;
    +------------------+-----------+-------------+-----------+-------------+----------------------------------------------------------------------+
    | Log_name         | Pos       | Event_type  | Server_id | End_log_pos | Info                                                                 |
    +------------------+-----------+-------------+-----------+-------------+----------------------------------------------------------------------+
    | mysql-bin.000006 | 222934194 | Gtid        |   1293306 |   222934259 | SET @@SESSION.GTID_NEXT= '37ca06ac-5c51-11e9-9d3b-000c296776be:5771' |
    | mysql-bin.000006 | 222934259 | Query       |   1293306 |   222934332 | BEGIN                                                                |
    | mysql-bin.000006 | 222934332 | Table_map   |   1293306 |   222934411 | table_id: 325 (test1.t_core_task_log)                                |
    | mysql-bin.000006 | 222934411 | Delete_rows |   1293306 |   222942517 | table_id: 325                                                        |
    | mysql-bin.000006 | 222942517 | Delete_rows |   1293306 |   222950603 | table_id: 325                                                        |
    | mysql-bin.000006 | 222950603 | Delete_rows |   1293306 |   222958736 | table_id: 325                                                        |
    | mysql-bin.000006 | 222958736 | Delete_rows |   1293306 |   222966918 | table_id: 325                                                        |
    | mysql-bin.000006 | 222966918 | Delete_rows |   1293306 |   222975014 | table_id: 325                                                        |
    | mysql-bin.000006 | 222975014 | Delete_rows |   1293306 |   222983142 | table_id: 325                                                        |
    | mysql-bin.000006 | 222983142 | Delete_rows |   1293306 |   222991343 | table_id: 325                                                        |
    ...
    ## 省略
    ...
    

    发现此pos后对表test1.t_core_task_log做了大量的delete操作(上面只截取部分查询结果)。
    这种基本上就是主库执行了一个大的删除事务,几十上百万,导致备库回放时延迟。但是正常情况下不至于延迟几个小时。
    于是检查表结构:

    (sit)root@localhost [(none)]> show create table test1.t_core_task_log \G
    *************************** 1. row ***************************
           Table: t_core_task_log
    Create Table: CREATE TABLE `t_core_task_log` (
      `task_id` bigint(20) DEFAULT NULL,
      `exe_id` bigint(20) DEFAULT NULL,
      `task_type` int(11) DEFAULT NULL,
      `state` mediumint(6) DEFAULT NULL,
      `cur_run_date` varchar(64) DEFAULT NULL,
      `next_run_date` varchar(64) DEFAULT NULL,
      `log_desc` longtext,
      `runtime_broker` varchar(64) DEFAULT NULL,
      `log_time` varchar(64) DEFAULT NULL,
      `log_time_ms` int(11) DEFAULT NULL,
      `tries` int(11) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
    1 row in set (0.00 sec)
    
    

    果然,寡表一张,连个索引都没有,别说主键了。
    为了更深一步确认执行的SQL语句,下面解析下binlog。

    三、解析binlog,查看具体SQL
    # at 222934163
    #200818 14:20:58 server id 1293306  end_log_pos 222934194 CRC32 0xa8e0f554      Xid = 5635
    COMMIT/*!*/;
    # at 222934194
    #200818 14:58:17 server id 1293306  end_log_pos 222934259 CRC32 0x9792c084      GTID    last_committed=515      sequence_number=516     rbr_only=yes
    /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
    SET @@SESSION.GTID_NEXT= '37ca06ac-5c51-11e9-9d3b-000c296776be:5771'/*!*/;
    # at 222934259
    #200818 14:58:17 server id 1293306  end_log_pos 222934332 CRC32 0xba4022f8      Query   thread_id=31    exec_time=0     error_code=0
    SET TIMESTAMP=1597733897/*!*/;
    BEGIN
    /*!*/;
    # at 222934332
    #200818 14:58:17 server id 1293306  end_log_pos 222934411 CRC32 0x54e65c06      Table_map: `test1`.`t_core_task_log` mapped to number 325
    # at 222934411
    #200818 14:58:17 server id 1293306  end_log_pos 222942517 CRC32 0x5661e9f7      Delete_rows: table id 325
    # at 222942517
    #200818 14:58:17 server id 1293306  end_log_pos 222950603 CRC32 0xa6817b87      Delete_rows: table id 325
    # at 222950603
    #200818 14:58:17 server id 1293306  end_log_pos 222958736 CRC32 0x12ae1bcb      Delete_rows: table id 325
    ...
    ## 省略很多
    ...
    # at 269137195
    #200818 14:58:17 server id 1293306  end_log_pos 269145386 CRC32 0xbdc1c971      Delete_rows: table id 325
    # at 269145386
    #200818 14:58:17 server id 1293306  end_log_pos 269153543 CRC32 0xd065b822      Delete_rows: table id 325
    # at 269153543
    #200818 14:58:17 server id 1293306  end_log_pos 269158095 CRC32 0x95f1eb27      Delete_rows: table id 325 flags: STMT_END_F
    ### DELETE FROM `test1`.`t_core_task_log`
    ### WHERE
    ###   @1=10122 /* LONGINT meta=0 nullable=1 is_null=0 */
    ###   @2=810 /* LONGINT meta=0 nullable=1 is_null=0 */
    ###   @3=5 /* INT meta=0 nullable=1 is_null=0 */
    ###   @4=1 /* MEDIUMINT meta=0 nullable=1 is_null=0 */
    ###   @5='2020-06-25 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
    ###   @6='2020-06-26 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
    ###   @7='50075log' /* LONGBLOB/LONGTEXT meta=4 nullable=1 is_null=0 */
    ###   @8='172.31.51.0' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
    ###   @9='2020-07-23 16:30:02' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
    ###   @10=28 /* INT meta=0 nullable=1 is_null=0 */
    ###   @11=1 /* INT meta=0 nullable=1 is_null=0 */
    ### DELETE FROM `test1`.`t_core_task_log`
    ### WHERE
    ###   @1=10122 /* LONGINT meta=0 nullable=1 is_null=0 */
    ###   @2=810 /* LONGINT meta=0 nullable=1 is_null=0 */
    ###   @3=5 /* INT meta=0 nullable=1 is_null=0 */
    ###   @4=1 /* MEDIUMINT meta=0 nullable=1 is_null=0 */
    ###   @5='2020-06-25 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
    ###   @6='2020-06-26 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
    ###   @7='50072log' /* LONGBLOB/LONGTEXT meta=4 nullable=1 is_null=0 */
    ###   @8='172.31.51.0' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
    ###   @9='2020-07-23 16:30:02' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
    ###   @10=78 /* INT meta=0 nullable=1 is_null=0 */
    ###   @11=1 /* INT meta=0 nullable=1 is_null=0 */
    ...
    ## 省略很多
    ...
    

    可知备库回放的SQL为:

    DELETE FROM `test1`.`t_core_task_log`
    WHERE
      @1=10122 /* LONGINT meta=0 nullable=1 is_null=0 */
      @2=810 /* LONGINT meta=0 nullable=1 is_null=0 */
      @3=5 /* INT meta=0 nullable=1 is_null=0 */
      @4=1 /* MEDIUMINT meta=0 nullable=1 is_null=0 */
      @5='2020-06-25 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
      @6='2020-06-26 00:00:00' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
      @7='50075log' /* LONGBLOB/LONGTEXT meta=4 nullable=1 is_null=0 */
      @8='172.31.51.0' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
      @9='2020-07-23 16:30:02' /* VARSTRING(256) meta=256 nullable=1 is_null=0 */
      @10=28 /* INT meta=0 nullable=1 is_null=0 */
      @11=1 /* INT meta=0 nullable=1 is_null=0 */
    

    因为缺少索引,导致备库每次执行均需要全表扫描一次test1.t_core_task_log,我们可以单独执行下delete对应的select语句,估算下备库回放需要的时间。

    #### 备库执行
    (sit)root@localhost [(none)]> select * from `test1`.`t_core_task_log` where task_id = 10122  and  log_time = '2020-07-23 16:30:02'  and log_time_ms=110 ;
    +---------+--------+-----------+-------+---------------------+---------------------+---------------------------+----------------+---------------------+-------------+-------+
    | task_id | exe_id | task_type | state | cur_run_date        | next_run_date       | log_desc                  | runtime_broker | log_time            | log_time_ms | tries |
    +---------+--------+-----------+-------+---------------------+---------------------+---------------------------+----------------+---------------------+-------------+-------+
    |   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
    |   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
    |   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
    |   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
    |   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
    |   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
    |   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
    |   10122 |    810 |         5 |     1 | 2020-06-25 00:00:00 | 2020-06-26 00:00:00 | 50070log||30767||50071log | 172.31.51.0    | 2020-07-23 16:30:02 |         110 |     1 |
    +---------+--------+-----------+-------+---------------------+---------------------+---------------------------+----------------+---------------------+-------------+-------+
    8 rows in set (0.29 sec)
    

    单次全表扫描一次需约0.29s
    因为是删除操作,查看备库和主库当前的数据量,估算总耗时。

    ## 主库执行
    (sit)root@localhost [(none)]> SELECT count(*) FROM test1.`t_core_task_log` ;
    +----------+
    | count(*) |
    +----------+
    |    88856 |
    +----------+
    1 row in set (0.16 sec)
    
    ## 备库执行
    (sit)root@localhost [(none)]> SELECT count(*) FROM test1.`t_core_task_log` ;
    +----------+
    | count(*) |
    +----------+
    |   432312 |
    +----------+
    1 row in set (0.13 sec)
    
    ## 计算耗时,约27小时
    (sit)root@localhost [(none)]> select 0.29*(432312-88856)/3600;
    +--------------------------+
    | 0.29*(432312-88856)/3600 |
    +--------------------------+
    |                27.667289 |
    +--------------------------+
    1 row in set (0.00 sec)
    

    需要27小时才能执行完成,GG

    四、解决方案

    1.如果上面估算时间还可以接受,且备库不是着急使用的话,建议等待;
    2.强制stop slave(正常stop slave会卡住),关闭MySQL实例,修改my.cnf(禁止自动启动复制),启动MySQL实例,备库创建索引,打开复制start slave。
    步骤如下:

    ## 关闭IO线程(备库执行)
    (sit)root@localhost [(none)]> stop slave io_thread;
    
    ## 关闭SQL线程(备库执行)
    (sit)root@localhost [(none)]> SELECT * from information_schema.`PROCESSLIST`  where user = 'system user';
    +----+-------------+------+------+---------+------+--------------------------------------------------+------+
    | ID | USER        | HOST | DB   | COMMAND | TIME | STATE                                            | INFO |
    +----+-------------+------+------+---------+------+--------------------------------------------------+------+
    | 10 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
    |  9 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
    | 11 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
    |  7 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
    | 13 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
    |  8 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
    | 12 | system user |      | NULL | Connect | 9656 | Waiting for an event from Coordinator            | NULL |
    |  5 | system user |      | NULL | Connect |    0 | Waiting for Slave Workers to free pending events | NULL |
    |  6 | system user |      | NULL | Connect | 1453 | Executing event                                  | NULL |
    +----+-------------+------+------+---------+------+--------------------------------------------------+------+
    
    ## 线程6是正在执行回放的,强制kill(备库执行)
    (sit)root@localhost [(none)]> kill 6 ;
    
    ## 检查线程,已经全部关闭(备库执行)
    (sit)root@localhost [(none)]> SELECT * from information_schema.`PROCESSLIST`  where user = 'system user';
    Empty set (0.00 sec)
    
    ## 检查复制状态,有报错,重启后会自动恢复(备库执行)
    (sit)root@localhost [(none)]> show slave status \G
    *************************** 1. row ***************************
                   Slave_IO_State: 
                      Master_Host: 192.168.211.129
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000006
              Read_Master_Log_Pos: 269158126
                   Relay_Log_File: mysql-relay-bin.000007
                    Relay_Log_Pos: 105085733
            Relay_Master_Log_File: mysql-bin.000006
                 Slave_IO_Running: No
                Slave_SQL_Running: No
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 1317
                       Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '37ca06ac-5c51-11e9-9d3b-000c296776be:5771' 
    at master log mysql-bin.000006, end_log_pos 235240872. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 222934194
                  Relay_Log_Space: 361484149
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 1317
                   Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '37ca06ac-5c51-11e9-9d3b-000c296776be:5771' 
    at master log mysql-bin.000006, end_log_pos 235240872. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 1293306
                      Master_UUID: 37ca06ac-5c51-11e9-9d3b-000c296776be
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: 
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 200818 15:22:37
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 37ca06ac-5c51-11e9-9d3b-000c296776be:5768-5771
                Executed_Gtid_Set: 173d287c-a642-11ea-ae79-000c295d8ce8:1-39256,
    37ca06ac-5c51-11e9-9d3b-000c296776be:1-5770,
    425c9d7e-3ff6-11e9-baed-e86a6406e22a:1-1170
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    

    编辑参数,重启备库:

    ## 编辑参数
    vi /etc/my.cnf 增加下面的参数,让备库启动时不启动复制。
    [mysqld]
    skip-slave-start=1
    
    ## 启动备库
    service mysqld start
    

    启动后检查复制状态(确保复制没有启动)

    (sit)root@localhost [(none)]> show slave status \G
    *************************** 1. row ***************************
                   Slave_IO_State: 
                      Master_Host: 192.168.211.129
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000006
              Read_Master_Log_Pos: 222934194
                   Relay_Log_File: mysql-relay-bin.000008
                    Relay_Log_Pos: 4
            Relay_Master_Log_File: mysql-bin.000006
                 Slave_IO_Running: No
                Slave_SQL_Running: No
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 222934194
                  Relay_Log_Space: 361484326
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 0
                      Master_UUID: 37ca06ac-5c51-11e9-9d3b-000c296776be
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: 
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 
                Executed_Gtid_Set: 173d287c-a642-11ea-ae79-000c295d8ce8:1-39256,
    37ca06ac-5c51-11e9-9d3b-000c296776be:1-5770,
    425c9d7e-3ff6-11e9-baed-e86a6406e22a:1-1170
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    

    备库创建索引,记得把 log_bin 关闭,再执行索引创建。否则后期主从切换可能会导致因索引名冲突而复制报错。

    (sit)root@localhost [(none)]> set sql_log_bin =0;
    Query OK, 0 rows affected (0.00 sec)
    
    (sit)root@localhost [(none)]> create index idx_task_id_log_time on test1.t_core_task_log(task_id,log_time);
    Query OK, 0 rows affected (2.04 sec)
    Records: 0  Duplicates: 0  Warnings: 0
    

    打开复制,观察复制状态,约20秒后备库追上。

    (sit)root@localhost [(none)]> start slave;
    (sit)root@localhost [(none)]> show slave status \G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.211.129
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000006
              Read_Master_Log_Pos: 269158126
                   Relay_Log_File: mysql-relay-bin.000009
                    Relay_Log_Pos: 46224346
            Relay_Master_Log_File: mysql-bin.000006
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 269158126
                  Relay_Log_Space: 407708725
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 1293306
                      Master_UUID: 37ca06ac-5c51-11e9-9d3b-000c296776be
                 Master_Info_File: mysql.slave_master_info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 37ca06ac-5c51-11e9-9d3b-000c296776be:5771
                Executed_Gtid_Set: 173d287c-a642-11ea-ae79-000c295d8ce8:1-39256,
    37ca06ac-5c51-11e9-9d3b-000c296776be:1-5771,
    425c9d7e-3ff6-11e9-baed-e86a6406e22a:1-1170
                    Auto_Position: 1
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    
    
    五、善后

    检查主备库数据,并添加表主键,删除备库之前创建的索引。

    ## 主库执行
    (sit)root@localhost [(none)]> SELECT count(*) FROM test1.`t_core_task_log` ;
    +----------+
    | count(*) |
    +----------+
    |    88856 |
    +----------+
    
    ## 备库执行
    (sit)root@localhost [(none)]> SELECT count(*) FROM test1.`t_core_task_log` ;
    +----------+
    | count(*) |
    +----------+
    |    88856 |
    +----------+
    
    ## 主备数据一致,且复制延迟已解决。
    ## 主库新增主键,主库执行
    alter table t_core_task_log add id int auto_increment PRIMARY key ;
    
    ## 备库删除刚刚添加的索引,备库执行
    (sit)root@localhost [(none)]> set sql_log_bin =0;
    (sit)root@localhost [(none)]> alter table test1.t_core_task_log drop index idx_task_id_log_time;
    
    六、教训

    老生常谈的MySQL表必须包含主键,万万不可少,不仅复制中使用,很多第三方工具也是需要的。前期的上线SQL审核要做到位!切记切记!!

    相关文章

      网友评论

          本文标题:一次主从延迟排查

          本文链接:https://www.haomeiwen.com/subject/wsvhjktx.html