背景
测试线上MySQL集群MHA切换时,发现当备选主库发生延迟时将不会发生HA,但是并且不是所有的延迟都不会切换,但是具体的阈值为多少?
[warning] Slave XXXXXX SQL Thread delays too much. Latest log file:mysql-bin.000001:348490983, Current log file:mysql-bin.000001:137978694. This server is not selected as a new master because recovery will take long time.
MySQL架构
[server1]
hostname=lf-mha-mysql1-online (主库)
[server2]
hostname=dx-mha-mysql1-online (备选主库)
[server3]
hostname=lf-mha-mysql2-online (指定不切换的实例)
no_master=1
MHA代码分析
相关选新主的代码流程
wait_until_master_is_unreachable()->get_bad_candidate_masters()->check_slave_delay()
wait_until_master_is_unreachable()
读取MySQL集群中所有的从库,并筛除problem的节点。
if (
$_server_manager->validate_slaves(
$servers_config[0]->{check_repl_filter},
$current_master
)
)
{
$log->error("Slave configurations is not valid.");
croak;
}
my @bad = $_server_manager->get_bad_candidate_masters();
if ( $#alive_slaves <= $#bad ) {
$log->error( "None of slaves can be master. Check failover "
. "configuration file or log-bin settings in my.cnf" );
croak;
}
get_bad_candidate_masters()
界定是否能成为备选主库的条件
# The following servers can not be master:
# - dead servers
# - Set no_master in conf files (i.e. DR servers)
# - log_bin is disabled
# - Major version is not the oldest
# - too much replication delay
遍历所有从库节点进行检查
foreach (@servers) {
if (
$_->{no_master} >= 1
|| $_->{log_bin} eq '0'
|| $_->{oldest_major_version} eq '0'
|| (
$latest_slave
&& ( $check_replication_delay
&& $self->check_slave_delay( $_, $latest_slave ) >= 1 )
)
)
{
push( @ret_servers, $_ );
}
}
return @ret_servers;
}
check_slave_delay()
if (
( $latest->{Master_Log_File} gt $target->{Relay_Master_Log_File} )
|| ( $latest->{Read_Master_Log_Pos} >
$target->{Exec_Master_Log_Pos} + 100000000 )
)
结论
1、Slave已经读取的Binlog日志的位点小于Master的Binlog位点。
2、Slave当前执行的位点+100000000<Slave已经读取的Binlog的位点。
当满足以上两种情况时,Slave不会被选为Standby Master。
网友评论