主从节点问题
- 主节点出现故障,需手动将某从节点晋升为主节点,同时需要修改应用方的主节点地址,还需要命令其他从节点去复制新的主节点,整个过程都需要人工干预
- 主节点的写能力受到单机的限制
- 主节点的存储能力受到单机的限制
由此,诞生出了哨兵模式
哨兵模式(Sentinel)
以一主两从为例:
- 主节点出现故障,此时两个从节点与主节点失去连接,主从复制失败
- Sentinel节点通过定期监控发现主节点出现了故障
- 多个Sentinel节点对主节点的故障达成一致,选举出某从节点作为领导者负责故障转移
哨兵模式搭建
- 搭建redis主从模式,我们这里搭建完毕后
- 130为主服务器,reids.conf配置如下
port 6379
daemonize yes
logfile "/usr/local/redis-4.0.14/log/redis.log"
- 128、129为从服务器,redis.conf配置如下
port 6379
daemonize yes
logfile "/usr/local/redis-4.0.14/log/redis.log"
slaveof 192.168.6.130 6379
其他配置信息都为默认值,然后启动redis服务。
- 搭建哨兵模式
- 在上述3台服务器的redis目录下新建sentinel-26379.conf配置文件,内容如下
port 26379
daemonize yes
logfile "/usr/local/redis-4.0.14/log/sentinel.log"
sentinel monitor mymaster 192.168.6.130 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
- 启动哨兵模式
- 使用redis-sentinel命令
src/redis-sentinel sentinel-26379.conf
- 使用redis-server命令
src/redis-server sentinel-26379.conf --sentinel
这里注意根据自身实际情况指定对应的命令路径和脚本路径。同时Sentinel节点线上需要部署在其他不同的服务器。启动完成后,查看一下启动日志,以128服务器为例:
[root@simon log]# tail -f sentinel.log
7145:X 15 Nov 11:34:49.997 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7145:X 15 Nov 11:34:49.997 # Redis version=4.0.14, bits=64, commit=00000000, modified=0, pid=7145, just started
7145:X 15 Nov 11:34:49.997 # Configuration loaded
7146:X 15 Nov 11:34:50.048 * Increased maximum number of open files to 10032 (it was originally set to 1024).
7146:X 15 Nov 11:34:50.077 * Running mode=sentinel, port=26379.
7146:X 15 Nov 11:34:50.077 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
7146:X 15 Nov 11:34:50.077 # Sentinel ID is 8e328e16d867ba5ef31e869b8d54c9c39b889777
7146:X 15 Nov 11:34:50.077 # +monitor master mymaster 192.168.6.130 6379 quorum 2
配置讲解
-
sentinel monitor <master-name> <ip> <redis-port> <quorum>
Sentinel节点会定期监控主节点,所以从配置上必然也会有所体现,本配置说明Sentinel节点要监控的是一个名字叫做<master-name>,ip地址和端口为<ip><port>的主节点。<quorum>代表要判定主节点最终不可达所需要的票数 -
sentinel auth-pass <master-name> <password>
主节点配置了密码,从节点要开启配置此项 -
sentinel down-after-milliseconds <master-name> <milliseconds>
每个Sentinel节点都要通过定期发送ping命令来判断Redis数据节点和其余Sentinel节点是否可达,如果超过了down-after-milliseconds配置的时间且没有有效的回复,则判定节点不可达,<times>(单位为毫秒)就是超时时间。这个配置是对节点失败判定的重要依据 -
sentinel parallel-syncs <master-name> <numslaves>
故障转移后,parallel-syncs就是用来限制在一次故障转移之后,每次向新的主节点发起复制操作的从节点个数。例如parallel-syncs=3会同时发起复制,parallel-syncs=1从节点会轮询发起复制 -
sentinel failover-timeout <master-name> <milliseconds>
故障转移超时时间 -
sentinel notification-script <master-name> <script-path>
在故障转移期间,当一些警告级别的Sentinel事件发生(指重要事件,例如-sdown:客观下线、-odown:主观下 线)时,会触发对应路径的脚本,并向脚本发送相应的事件参数 -
sentinel client-reconfig-script <master-name> <script-path>
在故障转移结束后,会触发对应路径的脚本,并向脚本发送故障转移结果的相关参数
主节点故障模拟
- 关闭主节点模拟宕机
kill -9 7056
- 观察Sentinel日志
- 130日志
7245:X 15 Nov 11:49:00.926 # +sdown master mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:00.989 # +odown master mymaster 192.168.6.130 6379 #quorum 2/2
7245:X 15 Nov 11:49:00.989 # +new-epoch 2
7245:X 15 Nov 11:49:00.989 # +try-failover master mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:00.990 # +vote-for-leader ea797207e975696b1e03ca6c62dd1ac54b80e99a 2
7245:X 15 Nov 11:49:00.992 # 8e328e16d867ba5ef31e869b8d54c9c39b889777 voted for ea797207e975696b1e03ca6c62dd1ac54b80e99a 2
7245:X 15 Nov 11:49:01.063 # +elected-leader master mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:01.063 # +failover-state-select-slave master mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:01.163 # +selected-slave slave 192.168.6.128:6379 192.168.6.128 6379 @ mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:01.164 * +failover-state-send-slaveof-noone slave 192.168.6.128:6379 192.168.6.128 6379 @ mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:01.265 * +failover-state-wait-promotion slave 192.168.6.128:6379 192.168.6.128 6379 @ mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:01.340 # 48fcc2767a272993c7bfda8f4d5b02bf467b11c2 voted for ea797207e975696b1e03ca6c62dd1ac54b80e99a 2
7245:X 15 Nov 11:49:01.346 # +promoted-slave slave 192.168.6.128:6379 192.168.6.128 6379 @ mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:01.347 # +failover-state-reconf-slaves master mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:01.390 * +slave-reconf-sent slave 192.168.6.129:6379 192.168.6.129 6379 @ mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:02.410 * +slave-reconf-inprog slave 192.168.6.129:6379 192.168.6.129 6379 @ mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:02.410 * +slave-reconf-done slave 192.168.6.129:6379 192.168.6.129 6379 @ mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:02.465 # -odown master mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:02.466 # +failover-end master mymaster 192.168.6.130 6379
7245:X 15 Nov 11:49:02.466 # +switch-master mymaster 192.168.6.130 6379 192.168.6.128 6379
7245:X 15 Nov 11:49:02.466 * +slave slave 192.168.6.129:6379 192.168.6.129 6379 @ mymaster 192.168.6.128 6379
7245:X 15 Nov 11:49:02.466 * +slave slave 192.168.6.130:6379 192.168.6.130 6379 @ mymaster 192.168.6.128 6379
7245:X 15 Nov 11:49:32.481 # +sdown slave 192.168.6.130:6379 192.168.6.130 6379 @ mymaster 192.168.6.128 6379
- 128日志
7146:X 15 Nov 11:49:01.291 # +sdown master mymaster 192.168.6.130 6379
7146:X 15 Nov 11:49:01.396 # +new-epoch 2
7146:X 15 Nov 11:49:01.396 # +vote-for-leader ea797207e975696b1e03ca6c62dd1ac54b80e99a 2
7146:X 15 Nov 11:49:01.798 # +config-update-from sentinel ea797207e975696b1e03ca6c62dd1ac54b80e99a 192.168.6.130 26379 @ mymaster 192.168.6.130 6379
7146:X 15 Nov 11:49:01.798 # +switch-master mymaster 192.168.6.130 6379 192.168.6.128 6379
7146:X 15 Nov 11:49:01.799 * +slave slave 192.168.6.129:6379 192.168.6.129 6379 @ mymaster 192.168.6.128 6379
7146:X 15 Nov 11:49:01.799 * +slave slave 192.168.6.130:6379 192.168.6.130 6379 @ mymaster 192.168.6.128 6379
7146:X 15 Nov 11:49:31.837 # +sdown slave 192.168.6.130:6379 192.168.6.130 6379 @ mymaster 192.168.6.128 6379
- 129日志
7091:X 15 Nov 11:49:01.082 # +sdown master mymaster 192.168.6.130 6379
7091:X 15 Nov 11:49:01.452 # +new-epoch 2
7091:X 15 Nov 11:49:01.455 # +vote-for-leader ea797207e975696b1e03ca6c62dd1ac54b80e99a 2
7091:X 15 Nov 11:49:01.455 # +odown master mymaster 192.168.6.130 6379 #quorum 3/2
7091:X 15 Nov 11:49:01.455 # Next failover delay: I will not start a failover before Fri Nov 15 11:55:01 2019
7091:X 15 Nov 11:49:01.518 # +config-update-from sentinel ea797207e975696b1e03ca6c62dd1ac54b80e99a 192.168.6.130 26379 @ mymaster 192.168.6.130 6379
7091:X 15 Nov 11:49:01.518 # +switch-master mymaster 192.168.6.130 6379 192.168.6.128 6379
7091:X 15 Nov 11:49:01.519 * +slave slave 192.168.6.129:6379 192.168.6.129 6379 @ mymaster 192.168.6.128 6379
7091:X 15 Nov 11:49:01.519 * +slave slave 192.168.6.130:6379 192.168.6.130 6379 @ mymaster 192.168.6.128 6379
7091:X 15 Nov 11:49:31.523 # +sdown slave 192.168.6.130:6379 192.168.6.130 6379 @ mymaster 192.168.6.128 6379
最终成功将128选择主节点。
- 重启130(刚刚被宕掉的主节点)节点
该节点会被当作slave节点,然后同步当前的master节点数据
java客户端
示例代码如下:
public class SentinelTest {
public static void main(String[] args) {
JedisPoolConfig config = new JedisPoolConfig();
JedisSentinelPool jedisSentinelPool = null;
Jedis jedis = null;
try {
Set<String> set = new HashSet<>();
set.add("192.168.6.129:26379");
set.add("192.168.6.130:26379");
jedisSentinelPool = new JedisSentinelPool("mymaster",set,config);
jedis = jedisSentinelPool.getResource();
for(int i = 1;i < 6;i++){
jedis.del("master:"+i);
}
jedis.close();
}catch (Exception e){
e.printStackTrace();
}finally {
if(jedis != null){
jedis.close();
}
if(jedisSentinelPool != null){
jedisSentinelPool.close();
}
}
}
}
哨兵模式实现原理
- 每隔10秒,每个Sentinel节点会向主节点和从节点发送info命令获取最新的拓扑结构
- 每隔2秒,每个Sentinel节点会向Redis数据节点的sentinel:hello 频道上发送该Sentinel节点对于主节点的判断以及当前Sentinel节点的信息,同时每个Sentinel节点也会订阅该频道,来了解其他 Sentinel节点以及它们对主节点的判断
- 每隔1秒,每个Sentinel节点会向主节点、从节点、其余Sentinel节点 发送一条ping命令做一次心跳检测,来确认这些节点当前是否可达
内容来源:《Redis开发与运维》
网友评论