美文网首页
MHA 主从延迟不切换原因分析

MHA 主从延迟不切换原因分析

作者: 爱钓鱼的码农 | 来源:发表于2021-06-22 16:14 被阅读0次

背景

测试线上MySQL集群MHA切换时,发现当备选主库发生延迟时将不会发生HA,但是并且不是所有的延迟都不会切换,但是具体的阈值为多少?

[warning]  Slave XXXXXX SQL Thread delays too much. Latest log file:mysql-bin.000001:348490983, Current log file:mysql-bin.000001:137978694. This server is not selected as a new master because recovery will take long time.

MySQL架构

[server1]
hostname=lf-mha-mysql1-online (主库)
[server2]
hostname=dx-mha-mysql1-online  (备选主库)
[server3]
hostname=lf-mha-mysql2-online (指定不切换的实例)
no_master=1

MHA代码分析

相关选新主的代码流程
wait_until_master_is_unreachable()->get_bad_candidate_masters()->check_slave_delay()

wait_until_master_is_unreachable()

读取MySQL集群中所有的从库,并筛除problem的节点。
if (
      $_server_manager->validate_slaves(
        $servers_config[0]->{check_repl_filter},
        $current_master
      )
      )
    {
      $log->error("Slave configurations is not valid.");
      croak;
    }
    my @bad = $_server_manager->get_bad_candidate_masters();
    if ( $#alive_slaves <= $#bad ) {
      $log->error( "None of slaves can be master. Check failover "
          . "configuration file or log-bin settings in my.cnf" );
      croak;
    }

get_bad_candidate_masters()

界定是否能成为备选主库的条件
# The following servers can not be master:
  # - dead servers
  # - Set no_master in conf files (i.e. DR servers)
  # - log_bin is disabled
  # - Major version is not the oldest
  # - too much replication delay

遍历所有从库节点进行检查
foreach (@servers) {
    if (
         $_->{no_master} >= 1
      || $_->{log_bin} eq '0'
      || $_->{oldest_major_version} eq '0'
      || (
        $latest_slave
        && ( $check_replication_delay
          && $self->check_slave_delay( $_, $latest_slave ) >= 1 )
      )
      )
    {
      push( @ret_servers, $_ );
    }
  }
  return @ret_servers;
}

check_slave_delay()

  if (
    ( $latest->{Master_Log_File} gt $target->{Relay_Master_Log_File} )
    || ( $latest->{Read_Master_Log_Pos} >
      $target->{Exec_Master_Log_Pos} + 100000000 )
    )

结论

1、Slave已经读取的Binlog日志的位点小于Master的Binlog位点。
2、Slave当前执行的位点+100000000<Slave已经读取的Binlog的位点。
当满足以上两种情况时,Slave不会被选为Standby Master。

相关文章

网友评论

      本文标题:MHA 主从延迟不切换原因分析

      本文链接:https://www.haomeiwen.com/subject/klmxlltx.html