本文介绍在Linux HA集群中的stonith模块功能。
Stonith,全称Shoot The Other Node In The Head,用于防止集群出现脑裂现象。简单来说,一旦集群中的节点相互之间失去了通信,无法知道其他节点的状态,此时集群中的每个节点将尝试fence(隔离或“射杀”)失去通信的节点,确保这些节点不再抢夺资源,然后才继续启动服务资源,对外提供服务。
1. Stonith安装及其Agent简介
在3台集群主机上安装fence-agents软件包。
# yum -y install fence-agents
安装完毕后可查看到系统支持的stonith设备类型:
[root@ha-host1 ~]# pcs stonith list
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC, Tripplite PDU over SNMP
fence_bladecenter - Fence agent for IBM BladeCenter
fence_brocade - Fence agent for HP Brocade over telnet/ssh
fence_cisco_mds - Fence agent for Cisco MDS
fence_cisco_ucs - Fence agent for Cisco UCS
fence_compute - Fence agent for the automatic resurrection of OpenStack compute
instances
fence_drac5 - Fence agent for Dell DRAC CMC/5
fence_eaton_snmp - Fence agent for Eaton over SNMP
fence_emerson - Fence agent for Emerson over SNMP
fence_eps - Fence agent for ePowerSwitch
fence_evacuate - Fence agent for the automatic resurrection of OpenStack compute
instances
fence_hpblade - Fence agent for HP BladeSystem
fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
fence_idrac - Fence agent for IPMI
fence_ifmib - Fence agent for IF MIB
fence_ilo - Fence agent for HP iLO
fence_ilo2 - Fence agent for HP iLO
fence_ilo3 - Fence agent for IPMI
fence_ilo3_ssh - Fence agent for HP iLO over SSH
fence_ilo4 - Fence agent for IPMI
fence_ilo4_ssh - Fence agent for HP iLO over SSH
fence_ilo_moonshot - Fence agent for HP Moonshot iLO
fence_ilo_mp - Fence agent for HP iLO MP
fence_ilo_ssh - Fence agent for HP iLO over SSH
fence_imm - Fence agent for IPMI
fence_intelmodular - Fence agent for Intel Modular
fence_ipdu - Fence agent for iPDU over SNMP
fence_ipmilan - Fence agent for IPMI
fence_kdump - Fence agent for use with kdump
fence_mpath - Fence agent for multipath persistent reservation
fence_rhevm - Fence agent for RHEV-M REST API
fence_rsa - Fence agent for IBM RSA
fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
fence_sbd - Fence agent for sbd
fence_scsi - Fence agent for SCSI persistent reservation
fence_virt - Fence agent for virtual machines
fence_vmware_soap - Fence agent for VMWare over SOAP API
fence_wti - Fence agent for WTI
fence_xvm - Fence agent for virtual machines
以上输出中的每个Fence agent都是一种Stonith设备,从名字的后缀可以看出,这些Agent有以下几类:
- 通过服务器的管理口来关闭被fencing节点的电源,如ilo,ipmi,drac,绝大多数Agent属于此类,这些用于控制物理服务器节点。
- 通过Hybervisor虚拟层或云平台关闭被fencing的节点,如virt,vmware,xvm,compute,这些用于控制虚机节点。
- 通过禁止被fencing节点访问特定资源阻止起启动,如scsi,math,brocade。
前两种都属于电源类型的Stonith设备,而第三种和电源无关,之所以要这样划分,是因为:
- 使用非电源类型Stonith设备时,被fenced的节点没有关闭电源,仅仅是服务没有启动。在对其重启前,必须进行unfence,这样节点才能正常重启。因此创建此种类型的Stonith设备时需指定参数meta provides=unfencing。
- 使用电源类型的stonith设备则无需指定,因为被fenced的节点电源已经被关闭,而启动节点这个操作本身即为unfenced。
2 创建stonith设备:
以下以fence_scsi为例进行实验。
2.1 创建共享存储
安装《在CentOS7上配置iSCSI》中的方法,通过一台专用的存储节点ha-disks为集群中的3个主机提供共享存储(即在ha-disks上创建iscsi硬盘,然后将其映射到3个集群主机上)。
在iscsi-disks上创建3个100M的硬盘fen1,fen2,fen3,挂载到主机上后设备名称分别为sdb,sdc,sdd
[root@ha-host1 ~]# fdisk -l | grep dev
Disk /dev/sda: 42.9 GB, 42949672960 bytes, 83886080 sectors
/dev/sda1 2048 4095 1024 83 Linux
/dev/sda2 * 4096 2101247 1048576 83 Linux
/dev/sda3 2101248 83886079 40892416 8e Linux LVM
Disk /dev/mapper/VolGroup00-LogVol00: 40.2 GB, 40231763968 bytes, 78577664 sectors
Disk /dev/mapper/VolGroup00-LogVol01: 1610 MB, 1610612736 bytes, 3145728 sectors
Disk /dev/sdb: 104 MB, 104857600 bytes, 204800 sectors
Disk /dev/sdc: 104 MB, 104857600 bytes, 204800 sectors
Disk /dev/sdd: 104 MB, 104857600 bytes, 204800 sectors
[root@ha-host2 ~]# fdisk -l | grep dev
Disk /dev/sda: 42.9 GB, 42949672960 bytes, 83886080 sectors
/dev/sda1 2048 4095 1024 83 Linux
/dev/sda2 * 4096 2101247 1048576 83 Linux
/dev/sda3 2101248 83886079 40892416 8e Linux LVM
Disk /dev/mapper/VolGroup00-LogVol00: 40.2 GB, 40231763968 bytes, 78577664 sectors
Disk /dev/mapper/VolGroup00-LogVol01: 1610 MB, 1610612736 bytes, 3145728 sectors
Disk /dev/sdb: 104 MB, 104857600 bytes, 204800 sectors
Disk /dev/sdc: 104 MB, 104857600 bytes, 204800 sectors
Disk /dev/sdd: 104 MB, 104857600 bytes, 204800 sectors
[root@ha-host3 ~]# fdisk -l | grep dev
Disk /dev/sda: 42.9 GB, 42949672960 bytes, 83886080 sectors
/dev/sda1 2048 4095 1024 83 Linux
/dev/sda2 * 4096 2101247 1048576 83 Linux
/dev/sda3 2101248 83886079 40892416 8e Linux LVM
Disk /dev/mapper/VolGroup00-LogVol00: 40.2 GB, 40231763968 bytes, 78577664 sectors
Disk /dev/mapper/VolGroup00-LogVol01: 1610 MB, 1610612736 bytes, 3145728 sectors
Disk /dev/sdb: 104 MB, 104857600 bytes, 204800 sectors
Disk /dev/sdc: 104 MB, 104857600 bytes, 204800 sectors
Disk /dev/sdd: 104 MB, 104857600 bytes, 204800 sectors
测试一下这些硬盘是否支持PR Key:
[root@ha-host1 ~]# sg_persist /dev/sdc
>> No service action given; assume Persistent Reserve In command
>> with Read Keys service action
LIO-ORG fen2 4.0
Peripheral device type: disk
PR generation=0x5, there are NO registered reservation keys
2.2 创建stonith设备
首先使用一个fence盘/dev/sdb来进行实验:
[root@ha-host1 ~]# pcs stonith create scsi-shooter fence_scsi pcmk_host_list="ha-host1 ha-host2 ha-host3" devices=/dev/sdb meta provides=unfencing
[root@ha-host1 ~]# pcs status
Cluster name: linuxha
Stack: corosync
Current DC: ha-host2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Fri May 4 07:10:33 2018
Last change: Fri May 4 07:07:14 2018 by root via cibadmin on ha-host1
3 nodes configured
2 resources configured
Online: [ ha-host1 ha-host2 ha-host3 ]
Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started ha-host1
scsi-shooter (stonith:fence_scsi): Started ha-host2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
使用sg_persist -s参数获取/dev/sdb上的所有信息:
[root@ha-host1 ~]# sg_persist -s /dev/sdb
LIO-ORG fen1 4.0
Peripheral device type: disk
PR generation=0xa
Key=0x35fc0000
All target ports bit clear
Relative port address: 0x1
<< Reservation holder >>
scope: LU_SCOPE, type: Write Exclusive, registrants only
Transport Id of initiator:
iSCSI name and session id: iqn.2016-06.com.ha-host1:iscsi-host1
Key=0x35fc0001
All target ports bit clear
Relative port address: 0x1
not reservation holder
Transport Id of initiator:
iSCSI name and session id: iqn.2016-06.com.ha-host2:iscsi-host2
Key=0x35fc0002
All target ports bit clear
Relative port address: 0x1
not reservation holder
Transport Id of initiator:
iSCSI name and session id: iqn.2016-06.com.ha-host3:iscsi-host3
可以看到,3个节点使用不同的PR Key在这个磁盘上进行了注册(register),并且ha-host1保留(reservation)成功,类型为“Write Exclusive, registrants only”。表明此时只有ha-host1对该磁盘进行写操作。
此时如果断开其中两个节点的的链接,如ha-host1和ha-host3:
[root@ha-host1 ~]# pcs status
Cluster name: linuxha
Stack: corosync
Current DC: ha-host2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Fri May 4 07:30:53 2018
Last change: Fri May 4 07:07:13 2018 by root via cibadmin on ha-host1
3 nodes configured
2 resources configured
Node ha-host3: UNCLEAN (offline)
Online: [ ha-host1 ha-host2 ]
Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started ha-host1
scsi-shooter (stonith:fence_scsi): Started ha-host2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@ha-host1 ~]# sg_persist -s /dev/sdb
LIO-ORG fen1 4.0
Peripheral device type: disk
PR generation=0xb
Key=0x35fc0000
All target ports bit clear
Relative port address: 0x1
<< Reservation holder >>
scope: LU_SCOPE, type: Write Exclusive, registrants only
Transport Id of initiator:
iSCSI name and session id: iqn.2016-06.com.ha-host1:iscsi-host1
Key=0x35fc0001
All target ports bit clear
Relative port address: 0x1
not reservation holder
Transport Id of initiator:
iSCSI name and session id: iqn.2016-06.com.ha-host2:iscsi-host2
可以看到,经过协商后,ha-host3退出集群,并且也删除在fencing磁盘中的注册信息。由于stonith资源运行在ha-host2上,所以在ha-host2的日志中可以看到ha-host3被fence的过程:
[root@ha-host2 ~]# tail -1000 /var/log/cluster/corosync.log | grep ha-host3 May 04 07:30:51 [1437] ha-host2 pengine: notice: LogNodeActions: * Fence (reboot) ha-host3 'peer is no longer part of the cluster'
May 04 07:30:51 [1438] ha-host2 crmd: notice: te_fence_node: Requesting fencing (reboot) of node ha-host3 | action=1 timeout=60000
May 04 07:30:51 [1434] ha-host2 stonith-ng: notice: handle_request: Client crmd.1438.0cea319b wants to fence (reboot) 'ha-host3' with device '(any)'
May 04 07:30:51 [1434] ha-host2 stonith-ng: notice: initiate_remote_stonith_op: Requesting peer fencing (reboot) of ha-host3 | id=0cf426c7-666f-4299-8285-fa500fa5ac09 state=0
May 04 07:30:52 [1434] ha-host2 stonith-ng: notice: can_fence_host_with_device: scsi-shooter can fence (reboot) ha-host3: static-list
May 04 07:30:52 [1434] ha-host2 stonith-ng: info: process_remote_stonith_query: Query result 1 of 2 from ha-host2 for ha-host3/reboot (1 devices) 0cf426c7-666f-4299-8285-fa500fa5ac09
May 04 07:30:52 [1434] ha-host2 stonith-ng: info: call_remote_stonith: Total timeout set to 60 for peer's fencing of ha-host3 for crmd.1438|id=0cf426c7-666f-4299-8285-fa500fa5ac09
May 04 07:30:52 [1434] ha-host2 stonith-ng: info: call_remote_stonith: Requesting that 'ha-host2' perform op 'ha-host3 reboot' for crmd.1438 (72s, 0s)
May 04 07:30:52 [1434] ha-host2 stonith-ng: info: process_remote_stonith_query: Query result 2 of 2 from ha-host1 for ha-host3/reboot (1 devices) 0cf426c7-666f-4299-8285-fa500fa5ac09
May 04 07:30:52 [1434] ha-host2 stonith-ng: notice: can_fence_host_with_device: scsi-shooter can fence (reboot) ha-host3: static-list
May 04 07:30:52 [1434] ha-host2 stonith-ng: info: stonith_fence_get_devices_cb: Found 1 matching devices for 'ha-host3'
May 04 07:30:53 [1434] ha-host2 stonith-ng: warning: log_action: fence_scsi[2603] stderr: [ WARNING:root:Parse error: Ignoring unknown option 'port=ha-host3' ]
May 04 07:30:53 [1434] ha-host2 stonith-ng: notice: log_operation: Operation 'reboot' [2603] (call 6 from crmd.1438) for host 'ha-host3' with device 'scsi-shooter' returned: 0 (OK)
May 04 07:30:53 [1434] ha-host2 stonith-ng: notice: remote_op_done: Operation reboot of ha-host3 by ha-host2 for crmd.1438@ha-host2.0cf426c7: OK
May 04 07:30:53 [1438] ha-host2 crmd: info: tengine_stonith_callback:Stonith operation 6 for ha-host3 passed
May 04 07:30:53 [1438] ha-host2 crmd: info: crm_update_peer_expected:crmd_peer_down: Node ha-host3[3] - expected state is now down (was member)
ha-host3被fence之后,必须重启才能重新注册PR Key,否则即使网络恢复,其也无法运行需要stonith支持的资源。
问题:仲裁机制保证了必须有超过半数的节点的partition才能启动资源,拿为什么还需要stonith设备?
- 当集群中有只有两个节点的时候,我们必须允许partition在只有一个节点的时候也可以启动资源,此时Stonith设备为必须。
- 仲裁机制是以主机间的pcs进程通信为基础的,存在一种可能性是pcs进程已经停掉了,但是相关的资源进程仍然在运行(资源进程处于脱管状态),此时节点已经脱离集群但仍会争抢资源。
网友评论