美文网首页
主从复制(上)

主从复制(上)

作者: 肥四_F4 | 来源:发表于2020-07-21 23:20 被阅读0次
    1. 简介
      1.1 什么是复制?
      复制:是将一台MySQL实例(master),发生的DML、DDL等修改操作记录到binlog中,源源不断传输副本库,副本库应用日志,达到一个和主库数据接近一致的状态。
      1.2 应用场景
      a. 备份。
      b. 高可用。
      c. 读写分离
      d. 分布式架构
    2. 主从复制前提(搭建过程)
      2.1 2台以上数据库实例,server_id ,server_uuid
    #启动多实例
    [root@db01 oldguo]# systemctl start mysqld3307
    [root@db01 oldguo]# systemctl start mysqld3308
    [root@db01 oldguo]# systemctl start mysqld3309
    [root@db01 oldguo]# netstat -tulnp
    [root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "select @@server_id;select @@server_uuid"
    +-------------+
    | @@server_id |
    +-------------+
    |           7 |
    +-------------+
    +--------------------------------------+
    | @@server_uuid                        |
    +--------------------------------------+
    | d639b892-ba7b-11ea-9d00-000c295bb94f |
    +--------------------------------------+
    [root@db01 oldguo]# mysql -S /tmp/mysql3308.sock -e "select @@server_id;select @@server_uuid"
    +-------------+
    | @@server_id |
    +-------------+
    |           8 |
    +-------------+
    +--------------------------------------+
    | @@server_uuid                        |
    +--------------------------------------+
    | d8e965c5-ba7b-11ea-9d1e-000c295bb94f |
    +--------------------------------------+
    [root@db01 oldguo]# mysql -S /tmp/mysql3309.sock -e "select @@server_id;select @@server_uuid"
    +-------------+
    | @@server_id |
    +-------------+
    |           9 |
    +-------------+
    +--------------------------------------+
    | @@server_uuid                        |
    +--------------------------------------+
    | dc20d2d8-ba7b-11ea-9f57-000c295bb94f |
    +--------------------------------------+
    [root@db01 oldguo]# 
    

    2.2 主库开启binlog

    [root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "select @@log_bin;"
    +-----------+
    | @@log_bin |
    +-----------+
    |         1 |
    +-----------+
    

    2.3 主库开启专门的复制用户

    [root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "grant replication slave on *.* to repl@'10.0.0.%' identified by '123'"
    

    2.4 “补课”: 备份主库恢复至从库(主库7从库89)

    #全备主库数据
    root@db01 oldguo]# mysqldump -S /tmp/mysql3307.sock -A --master-data=2 >/tmp/full.sql
    #在从库上恢复
    [root@db01 oldguo]# mysql -S /tmp/mysql3308.sock -e "source /tmp/full.sql"
    [root@db01 oldguo]# mysql -S /tmp/mysql3309.sock -e "source /tmp/full.sql"
    

    2.5 开启主从
    a.

    mysql> help change master to #获取配置信息
    
    CHANGE MASTER TO
      MASTER_HOST='master2.example.com',
      MASTER_USER='replication',
      MASTER_PASSWORD='password',
      MASTER_PORT=3306,
      MASTER_LOG_FILE='master2-bin.001',
      MASTER_LOG_POS=4,
      MASTER_CONNECT_RETRY=10;
    
    [root@db01 data]# grep '\-- \CHANGE' /tmp/full.sql 
    -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=21103277;
    #分别在从库中执行
    [root@db01 /data 15:45:36]# mysql -S /tmp/mysql3308.sock 
    CHANGE MASTER TO
      MASTER_HOST='10.0.0.51',
      MASTER_USER='repl',
      MASTER_PASSWORD='123',
      MASTER_PORT=3307,
      MASTER_LOG_FILE='mysql-bin.000002',
      MASTER_LOG_POS=21103277,
      MASTER_CONNECT_RETRY=10;
    

    b. 开启复制线程

    mysql> start slave;
    

    c. 检测状态

    [root@db01 data]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep "Running:"
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    [root@db01 data]# 
    
    [root@db01 data]# mysql -S /tmp/mysql3309.sock -e "show slave status\G"|grep "Running:"
                 Slave_IO_Running: Yes
                 Slave_SQL_Running: Yes
    

    3. 传统主从复制工作原理

    3.1 涉及到的文件

    主库:

    binlog文件: mysql-bin.000001

    mysql> select @@log_bin_basename;
    +---------------------------+
    | @@log_bin_basename        |
    +---------------------------+
    | /data/3307/logs/mysql-bin |
    +---------------------------+
    

    从库:

    a. relaylog文件: 存储接收到的binlog日志
    存储位置: db01-relay-bin.000001

    mysql> select @@relay_log_basename;
    +--------------------------------+
    | @@relay_log_basename           |
    +--------------------------------+
    | /data/3307/data/db01-relay-bin |
    +--------------------------------+
    

    b. master_info 文件:用来存储主库相关的信息:
    存储位置:/data/3308/data/master.info

    mysql> select @@master_info_repository;
    +--------------------------+
    | @@master_info_repository |
    +--------------------------+
    | FILE                     |
    +--------------------------+
    server_id,server_uuid,user,password,host,port,binlog位置点。
    

    b. relay_info 文件: 记录relaylog回放到的位置点。
    存储位置:/data/3308/data/relay-log.info

    mysql> select @@relay_log_info_repository;
    +-----------------------------+
    | @@relay_log_info_repository |
    +-----------------------------+
    | FILE                        |
    +-----------------------------+
    1 row in set (0.00 sec)
    
    

    3.2 涉及到的线程

    主库:

    binlog_dump/binlog_dump_gtid
    作用: 监控binlog状态,投递binlog给从库。

    mysql> show processlist;
    +----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
    | Id | User | Host       | db   | Command     | Time | State                                                         | Info             |
    +----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
    |  6 | root | localhost  | NULL | Sleep       | 9041 |                                                               | NULL             |
    |  7 | repl | db01:34702 | NULL | Binlog Dump | 8856 | Master has sent all binlog to slave; waiting for more updates | NULL             |
    |  8 | repl | db01:34704 | NULL | Binlog Dump | 8844 | Master has sent all binlog to slave; waiting for more updates | NULL             |
    | 10 | root | localhost  | NULL | Query       |    0 | starting                                                      | show processlist |
    +----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
    4 rows in set (0.00 sec)
    
    

    从库:

    [root@db01 data]# mysql -S /tmp/mysql3309.sock -e "show slave status\G"|grep "Running:"
                 Slave_IO_Running: Yes
                 Slave_SQL_Running: Yes
    

    a. IO线程 :
    作用:连接主库、与binlog_dump交互、接收日志、存储日志等。

    b. SQL线程
    作用:回放relaylog
    1.原理:
    从库:
    a.从库执行 change master to 所用信息被保存到master_info
    b.从库执行 start slave,启动IO和SQL线程
    c.从库IO线程工作,获取master_info信息,生成指针(MI)
    d.从库IO线程,连接主库
    f.IO线程和binlog_dump交互,验证server_id、server_uuid、clock
    g.IO通过最新MI指针中的binlog位置点,向bing_dump请求更新日志
    i.从库IO线程接收主库binlog_dump发送的新的日志,MI指针自动更新,并写入到master_info中
    j.IO线程最终坏将接收到的binlog,写入到rely-bin中日志
    k.SQL线程,获取relay-log.info信息(上次回放到的位置点),生成一个RI指针,与relay-bin中继日志中的pos进行对比
    L.如果有新的中继日志生成,就进行回放,回放完成更新RI指针,并更新relay-log.info

    主库:
    e.主库连接层,接收请求,验证用户,权限,并生成binlog_dump线程
    h.binlog_dump线程一直监控着binglog状态,有新的日志就返回新的binlog给从库IO线程

    主从复制原理图

    2.主从复制监控
    2.1监控方法
    a.主库做个修改操作,看看从库有没有做操作
    b.通过相关命令监控
    c.通过第三方工具监控

    2.2通过相关命令监控
    从库

    [root@db01 /data 12:49:33]# vim 3308/my.cnf 
    report_host=10.0.0.51:3308  #让主库能发现自己的IP地址和端口
    [root@db01 /data 12:52:59]# systemctl restart mysqld3308.service 
    

    a.主库:

    mysql>show processlist;  #查看所有从库的连接
    mysql> show slave hosts;  #
    +-----------+----------------+------+-----------+--------------------------------------+
    | Server_id | Host           | Port | Master_id | Slave_UUID                           |
    +-----------+----------------+------+-----------+--------------------------------------+
    |         9 | 10.0.0.51:3309 | 3309 |         7 | e54df4b7-c5a2-11ea-8530-000c29b9d34b |
    |         8 | 10.0.0.51:3308 | 3308 |         7 | c0f75dfe-c5a1-11ea-b643-000c29b9d34b |
    +-----------+----------------+------+-----------+--------------------------------------+
    2 rows in set (0.00 sec)
    
    

    b.从库监控:

    mysql> show slave status \G
    
    # 1. 主库相关信息(来自于master_info)
    Master_Host:10.0.0.51   #主库地址
    Master_User: repl       #主库的复制用户
    Master_Port: 3307       #主库的端口号
    Connect_Retry: 10       #重连间隔秒数
    Master_Log_File: mysql-bin.000008
    Read_Master_Log_Pos: 444  #这两条是从库请求到主库的位置点
    
    # 2. 从库relay-log相关信息(relay_info)
    Relay_Log_File: db01-relay-bin.000004 
    Relay_Log_Pos: 320      #这两条是上次回放到的位置点
    
    #3. relaylog和binlog的对应关系
    Relay_Master_Log_File: mysql-bin.000008
    Exec_Master_Log_Pos: 444   #从库的bin.00004 320 对应的主库的000008 444位置点
    
    # 4. 线程状态有关的信息    
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Last_IO_Errno: 0  #我一般不会选择这些,我的解决思路是从last_IO_Error出发找打问题所在
    Last_IO_Error: 
    Last_SQL_Errno: 0
    Last_SQL_Error: 
    
    #5. 过滤复制相关信息
    Replicate_Do_DB: 
    Replicate_Ignore_DB: 
    Replicate_Do_Table: 
    Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
    Replicate_Wild_Ignore_Table: 
    
    #6. 主从延时的时间
    Seconds_Behind_Master: 0
    
    #7. 延时从库状态信息
    SQL_Delay: 0
    SQL_Remaining_Delay: NULL
    
    # 8. GTID复制相关   
    Retrieved_Gtid_Set: 
    Executed_Gtid_Set:
    

    3.主从常见故障分析及处理思路
    3.1如何监控
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Last_IO_Errno: 0
    Last_IO_Error:
    Last_SQL_Errno: 0
    Last_SQL_Error:

    a. IO线程故障

    1.建立连接(connecting)

    外部因素: 网络不通、防火墙未关闭
    内部因素: 
            用户、密码错误
            port、IP错误
            主库连接数满了,资源耗尽
    

    故障重现:

    1. 主库修改repl的密码
    mysql> alter user repl@'10.0.0.%' identified by '123456';
    Query OK, 0 rows affected (0.00 sec)
    
    1. 从库重启线程
    stop slave;
    start slave;
    
    mysql > show mysql status \G;
    Slave_IO_Running: Connecting
    Last_IO_Errno: 1045
    Last_IO_Error: error connecting to master 'repl@10.0.0.51:3307' - retry-time: 10  retries: 1
    
    1. 通用排查方法
    [root@db01 data]# mysql -urepl -p123 -h 10.0.0.51 -P3307
    mysql: [Warning] Using a password on the command line interface can be insecure.
    ERROR 1045 (28000): Access denied for user 'repl'@'db01' (using password: YES)
    
    [root@db01 data]# mysql -urepl1 -p123456 -h 10.0.0.51 -P3307
    mysql: [Warning] Using a password on the command line interface can be insecure.
    ERROR 1045 (28000): Access denied for user 'repl1'@'db01' (using password: YES)
    
    [root@db01 data]# mysql -urepl1 -p123 -h 10.0.0.51 -P3307
    mysql: [Warning] Using a password on the command line interface can be insecure.
    ERROR 1045 (28000): Access denied for user 'repl1'@'db01' (using password: YES)
    
    
    1045  用户名或密码错误
    2003  地址或端口错误  113--地址错误 111---端口错误
    
    [root@db01 data]# mysql -urepl -p123456  -h 10.0.0.52 -P3307
    mysql: [Warning] Using a password on the command line interface can be insecure.
    ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.52' (113)
    
    [root@db01 data]# mysql -urepl -p123456  -h 10.0.0.51 -P3300
    mysql: [Warning] Using a password on the command line interface can be insecure.
    ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.51' (111)
    
    1. 解决
    a.停从库 
    mysql> stop slave; (SQL线程和IO线程都停了要想停一个后面要加参数 io_thread;)
    b. 重新 change master to 
    mysql>reset slave all;
    mysql> CHANGE MASTER TO
      MASTER_HOST='10.0.0.51',
      MASTER_USER='repl',
      MASTER_PASSWORD='123456',
      MASTER_PORT=3307,
      MASTER_LOG_FILE='mysql-bin.000008',
      MASTER_LOG_POS=687,
      MASTER_CONNECT_RETRY=10;
    
    c.  启动线程
    mysql> start slave ;
    
    

    2. 注册从库到主库 (NO状态)

    原因: 主从之间的server_id和server_uuid 重复

    故障重现:

    1. 修改主库server_id与从库一致。
    [root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=8"
    [root@db01 data]# mysql -S /tmp/mysql3307.sock -e "select @@server_id"
    +-------------+
    | @@server_id |
    +-------------+
    |           8 |
    +-------------+
    [root@db01 data]# mysql -S /tmp/mysql3308.sock -e "select @@server_id"
    +-------------+
    | @@server_id |
    +-------------+
    |           8 |
    +-------------+
    
    1. 重启从库线程
    mysql> stop slave; 
    mysql> start slave;
    
    Slave_IO_Running: No
    Last_IO_Errno: 1593
    Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).
    

    解决方法:

    [root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=7"
    [root@db01 data]# mysql -S /tmp/mysql3308.sock -e "stop slave ; start slave;"
    

    3. 请求二进制日志(NO 状态)

    原因:
    a. 搭建时位置点写错了。
    b. 主库的日志损坏。

    故障重现:

    1. 搭建时位置点写错了?
    mysql -S /tmp/mysql3308.sock 
    mysql> stop slave;
    mysql> reset slave all; 
    CHANGE MASTER TO
      MASTER_HOST='10.0.0.51',
      MASTER_USER='repl',
      MASTER_PASSWORD='123456',
      MASTER_PORT=3307,
      MASTER_LOG_FILE='mysql-bin.000008',
      MASTER_LOG_POS=1212,
      MASTER_CONNECT_RETRY=10;
    start slave;
    

    报错信息:

    mysql> show slave status;
    Slave_IO_Running: No
    Last_IO_Errno: 1236
    Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size'
    

    处理方法:

    mysql -S /tmp/mysql3308.sock 
    mysql> stop slave;
    mysql> reset slave all; 
    mysql> CHANGE MASTER TO
      MASTER_HOST='10.0.0.51',
      MASTER_USER='repl',
      MASTER_PASSWORD='123456',
      MASTER_PORT=3307,
      MASTER_LOG_FILE='mysql-bin.000008',
      MASTER_LOG_POS=687,#修改位置点
      MASTER_CONNECT_RETRY=10;
    mysql> start slave;
    
    1. 主库日志被误删除
    mysql> show binary logs;
    +------------------+-----------+
    | Log_name         | File_size |
    +------------------+-----------+
    | mysql-bin.000001 |       177 |
    | mysql-bin.000002 |       464 |
    | mysql-bin.000003 |       177 |
    | mysql-bin.000004 |       177 |
    | mysql-bin.000005 |       177 |
    | mysql-bin.000006 |       154 |
    | mysql-bin.000007 |      1111 |
    | mysql-bin.000008 |       687 |
    +------------------+-----------+
    8 rows in set (0.00 sec)
    
    mysql> reset master;
    Query OK, 0 rows affected (0.01 sec)
    
    Last_IO_Errno: 1236
    Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event 'mysql-bin.000008' at 687, the last event read from '/data/3307/logs/mysql-bin.000008' at 123, the last byte read from '/data/3307/logs/mysql-bin.000008' at 687.'
    

    处理方法:

    mysql -S /tmp/mysql3308.sock 
    stop slave;
    reset slave all; 
    CHANGE MASTER TO
      MASTER_HOST='10.0.0.51',
      MASTER_USER='repl',
      MASTER_PASSWORD='123456',
      MASTER_PORT=3307,
      MASTER_LOG_FILE='mysql-bin.000001', #000001
      MASTER_LOG_POS=154,#154是起点,每一个日志都有,不用复制
      MASTER_CONNECT_RETRY=10;
    start slave;
    

    b SQL线程故障

    原因:回放日志出现问题,就相当于执行SQL 语句出现问题

    3.relay-log 损坏
    1.无法回放日志,就相当于执行SQL语句出现问题
    1)配置、版本、参数、SQL_MODE
    解决方案:硬件配置一致、版本一致、参数一致、SQL_MODE
    2)约束冲突(PK、唯一键)、对象的存在性
    原因在于:从库发生写入了,或者经历过宕机导致数据不一致
    防范方案:
        1>从库禁止写入 read_only=1  innodb_readd_only=1 或者使用读写分离中间建
        2>高可用结构,半同步,MGR等
    如果没有防范出现此类问题:
    解决思路:
       ①PT工具校验主从一致性
       ②通过校验信息进行同步数据
       ③跳过错误
    

    方法一:

    stop slave; 
    set global sql_slave_skip_counter = 1;
    #将同步指针向下移动一个,如果多次不同步,可以重复操作。
    start slave;
    

    方法二:不推荐

    /etc/my.cnf
    slave-skip-errors = 1032,1062,1007
    常见错误代码:
    1007:对象已存在
    1032:无法执行DML
    1062:主键冲突,或约束冲突
    
    1. 主从复制延时
      5.1 什么是延时?
      主库做的事,从库好久才做。
      5.2 如何监控?
      5.2.1 传输过程监控
      主库:
        mysql> show master status ;
    

    从库:

        mysql> show slave status \G 
    

    对比的是Read_Master_Log_Pos的值,如果主库的远大于从库的值就是延时了
    例子:

    [root@db01 ~]# mysql -S /tmp/mysql3307.sock -e "show master status;"
    +------------------+----------+--------------+------------------+-------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +------------------+----------+--------------+------------------+-------------------+
    | mysql-bin.000001 |      154 |              |                  |                   |
    +------------------+----------+--------------+------------------+-------------------+
    [root@db01 ~]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep "Read_Master_Log_Pos"
    Read_Master_Log_Pos: 154
    [root@db01 ~]# 
    1000以上告警,10000以上紧急
    

    5.2.2 回放是否及时

    [root@db01 ~]# mysql -S /tmp/mysql3307.sock -e "show master status;"
    +------------------+----------+--------------+------------------+-------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +------------------+----------+--------------+------------------+-------------------+
    | mysql-bin.000001 |      154 |              |                  |                   |
    +------------------+----------+--------------+------------------+-------------------+
    [root@db01 ~]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep " Exec_Master_Log_Pos"
    Exec_Master_Log_Pos: 154
    对比这两个值
    10000字节=100K以上告警,100000以上紧急
    

    5.3 主从延时的原因 (从库一般都不会超过五个)
    5.3.1 外部因素

    网络 
    主从配置(cpu\mem\io)
    参数配置
    等。
    从库太多
    

    5.3.2 主库方面

    # dump线程是串行工作的模式。
    5.6以前的版本,只能一个一个事务投递binlog。
    5.6+版本以后,出现了group commit(按组提交).0
    再不是一个一个提交了;
    # binlog日志落地不及时。
    采用ssd专门存储binlog
    

    5.3.3 从库问题

    # IO落地relaylog 
    一般建议采用SSD
    # SQL线程
    默认情况只有一个SQL线程,只能串行工作。
    主库可以并发事务。
    高并发场景下,会造成较高延时。
    出现大事务的时候,都会造成较高延时。
    

    解决方案:

        1. 5.6版本,多SQL线程回放功能。但是只能根据不同库(database)进行回放。
        2. 5.7+版本中,加入MTS机制。能够按照group commit 的逻辑时钟,进行并行回放。
    

    人的问题
    高并发场景下,会造成较高延时。
    出现大事务的时候,都会造成较高延时。
    锁问题严重。

    相关文章

      网友评论

          本文标题:主从复制(上)

          本文链接:https://www.haomeiwen.com/subject/rugacktx.html