美文网首页
Linux HA 集群原理和配置-03

Linux HA 集群原理和配置-03

作者: 简暴老王 | 来源:发表于2018-09-04 14:11 被阅读0次

    本文介绍在Linux HA集群中的stonith模块功能。

    Stonith,全称Shoot The Other Node In The Head,用于防止集群出现脑裂现象。简单来说,一旦集群中的节点相互之间失去了通信,无法知道其他节点的状态,此时集群中的每个节点将尝试fence(隔离或“射杀”)失去通信的节点,确保这些节点不再抢夺资源,然后才继续启动服务资源,对外提供服务。

    1. Stonith安装及其Agent简介

    在3台集群主机上安装fence-agents软件包。

    # yum -y install fence-agents
    

    安装完毕后可查看到系统支持的stonith设备类型:

    [root@ha-host1 ~]# pcs stonith list
    fence_apc - Fence agent for APC over telnet/ssh
    fence_apc_snmp - Fence agent for APC, Tripplite PDU over SNMP
    fence_bladecenter - Fence agent for IBM BladeCenter
    fence_brocade - Fence agent for HP Brocade over telnet/ssh
    fence_cisco_mds - Fence agent for Cisco MDS
    fence_cisco_ucs - Fence agent for Cisco UCS
    fence_compute - Fence agent for the automatic resurrection of OpenStack compute
                    instances
    fence_drac5 - Fence agent for Dell DRAC CMC/5
    fence_eaton_snmp - Fence agent for Eaton over SNMP
    fence_emerson - Fence agent for Emerson over SNMP
    fence_eps - Fence agent for ePowerSwitch
    fence_evacuate - Fence agent for the automatic resurrection of OpenStack compute
                     instances
    fence_hpblade - Fence agent for HP BladeSystem
    fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
    fence_idrac - Fence agent for IPMI
    fence_ifmib - Fence agent for IF MIB
    fence_ilo - Fence agent for HP iLO
    fence_ilo2 - Fence agent for HP iLO
    fence_ilo3 - Fence agent for IPMI
    fence_ilo3_ssh - Fence agent for HP iLO over SSH
    fence_ilo4 - Fence agent for IPMI
    fence_ilo4_ssh - Fence agent for HP iLO over SSH
    fence_ilo_moonshot - Fence agent for HP Moonshot iLO
    fence_ilo_mp - Fence agent for HP iLO MP
    fence_ilo_ssh - Fence agent for HP iLO over SSH
    fence_imm - Fence agent for IPMI
    fence_intelmodular - Fence agent for Intel Modular
    fence_ipdu - Fence agent for iPDU over SNMP
    fence_ipmilan - Fence agent for IPMI
    fence_kdump - Fence agent for use with kdump
    fence_mpath - Fence agent for multipath persistent reservation
    fence_rhevm - Fence agent for RHEV-M REST API
    fence_rsa - Fence agent for IBM RSA
    fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
    fence_sbd - Fence agent for sbd
    fence_scsi - Fence agent for SCSI persistent reservation
    fence_virt - Fence agent for virtual machines
    fence_vmware_soap - Fence agent for VMWare over SOAP API
    fence_wti - Fence agent for WTI
    fence_xvm - Fence agent for virtual machines
    

    以上输出中的每个Fence agent都是一种Stonith设备,从名字的后缀可以看出,这些Agent有以下几类:

    1. 通过服务器的管理口来关闭被fencing节点的电源,如ilo,ipmi,drac,绝大多数Agent属于此类,这些用于控制物理服务器节点。
    2. 通过Hybervisor虚拟层或云平台关闭被fencing的节点,如virt,vmware,xvm,compute,这些用于控制虚机节点。
    3. 通过禁止被fencing节点访问特定资源阻止起启动,如scsi,math,brocade。

    前两种都属于电源类型的Stonith设备,而第三种和电源无关,之所以要这样划分,是因为:

    • 使用非电源类型Stonith设备时,被fenced的节点没有关闭电源,仅仅是服务没有启动。在对其重启前,必须进行unfence,这样节点才能正常重启。因此创建此种类型的Stonith设备时需指定参数meta provides=unfencing。
    • 使用电源类型的stonith设备则无需指定,因为被fenced的节点电源已经被关闭,而启动节点这个操作本身即为unfenced。

    2 创建stonith设备:

    以下以fence_scsi为例进行实验。

    2.1 创建共享存储

    安装《在CentOS7上配置iSCSI》中的方法,通过一台专用的存储节点ha-disks为集群中的3个主机提供共享存储(即在ha-disks上创建iscsi硬盘,然后将其映射到3个集群主机上)。

    在iscsi-disks上创建3个100M的硬盘fen1,fen2,fen3,挂载到主机上后设备名称分别为sdb,sdc,sdd

    [root@ha-host1 ~]# fdisk -l | grep dev
    Disk /dev/sda: 42.9 GB, 42949672960 bytes, 83886080 sectors
    /dev/sda1            2048        4095        1024   83  Linux
    /dev/sda2   *        4096     2101247     1048576   83  Linux
    /dev/sda3         2101248    83886079    40892416   8e  Linux LVM
    Disk /dev/mapper/VolGroup00-LogVol00: 40.2 GB, 40231763968 bytes, 78577664 sectors
    Disk /dev/mapper/VolGroup00-LogVol01: 1610 MB, 1610612736 bytes, 3145728 sectors
    Disk /dev/sdb: 104 MB, 104857600 bytes, 204800 sectors
    Disk /dev/sdc: 104 MB, 104857600 bytes, 204800 sectors
    Disk /dev/sdd: 104 MB, 104857600 bytes, 204800 sectors
    
    [root@ha-host2 ~]# fdisk -l | grep dev
    Disk /dev/sda: 42.9 GB, 42949672960 bytes, 83886080 sectors
    /dev/sda1            2048        4095        1024   83  Linux
    /dev/sda2   *        4096     2101247     1048576   83  Linux
    /dev/sda3         2101248    83886079    40892416   8e  Linux LVM
    Disk /dev/mapper/VolGroup00-LogVol00: 40.2 GB, 40231763968 bytes, 78577664 sectors
    Disk /dev/mapper/VolGroup00-LogVol01: 1610 MB, 1610612736 bytes, 3145728 sectors
    Disk /dev/sdb: 104 MB, 104857600 bytes, 204800 sectors
    Disk /dev/sdc: 104 MB, 104857600 bytes, 204800 sectors
    Disk /dev/sdd: 104 MB, 104857600 bytes, 204800 sectors
    
    [root@ha-host3 ~]# fdisk -l | grep dev
    Disk /dev/sda: 42.9 GB, 42949672960 bytes, 83886080 sectors
    /dev/sda1            2048        4095        1024   83  Linux
    /dev/sda2   *        4096     2101247     1048576   83  Linux
    /dev/sda3         2101248    83886079    40892416   8e  Linux LVM
    Disk /dev/mapper/VolGroup00-LogVol00: 40.2 GB, 40231763968 bytes, 78577664 sectors
    Disk /dev/mapper/VolGroup00-LogVol01: 1610 MB, 1610612736 bytes, 3145728 sectors
    Disk /dev/sdb: 104 MB, 104857600 bytes, 204800 sectors
    Disk /dev/sdc: 104 MB, 104857600 bytes, 204800 sectors
    Disk /dev/sdd: 104 MB, 104857600 bytes, 204800 sectors
    

    测试一下这些硬盘是否支持PR Key:

    [root@ha-host1 ~]# sg_persist /dev/sdc
    >> No service action given; assume Persistent Reserve In command
    >> with Read Keys service action
      LIO-ORG   fen2              4.0
      Peripheral device type: disk
      PR generation=0x5, there are NO registered reservation keys
    

    2.2 创建stonith设备

    首先使用一个fence盘/dev/sdb来进行实验:

    [root@ha-host1 ~]# pcs stonith create scsi-shooter fence_scsi pcmk_host_list="ha-host1 ha-host2 ha-host3" devices=/dev/sdb meta provides=unfencing
    [root@ha-host1 ~]# pcs status            
    Cluster name: linuxha
    Stack: corosync
    Current DC: ha-host2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
    Last updated: Fri May  4 07:10:33 2018
    Last change: Fri May  4 07:07:14 2018 by root via cibadmin on ha-host1
    
    3 nodes configured
    2 resources configured
    
    Online: [ ha-host1 ha-host2 ha-host3 ]
    
    Full list of resources:
    
     vip    (ocf::heartbeat:IPaddr2):       Started ha-host1
     scsi-shooter   (stonith:fence_scsi):   Started ha-host2
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    

    使用sg_persist -s参数获取/dev/sdb上的所有信息:

    [root@ha-host1 ~]# sg_persist -s /dev/sdb
      LIO-ORG   fen1              4.0
      Peripheral device type: disk
      PR generation=0xa
        Key=0x35fc0000
          All target ports bit clear
          Relative port address: 0x1
          << Reservation holder >>
          scope: LU_SCOPE,  type: Write Exclusive, registrants only
          Transport Id of initiator:
            iSCSI name and session id: iqn.2016-06.com.ha-host1:iscsi-host1
        Key=0x35fc0001
          All target ports bit clear
          Relative port address: 0x1
          not reservation holder
          Transport Id of initiator:
            iSCSI name and session id: iqn.2016-06.com.ha-host2:iscsi-host2
        Key=0x35fc0002
          All target ports bit clear
          Relative port address: 0x1
          not reservation holder
          Transport Id of initiator:
            iSCSI name and session id: iqn.2016-06.com.ha-host3:iscsi-host3
    

    可以看到,3个节点使用不同的PR Key在这个磁盘上进行了注册(register),并且ha-host1保留(reservation)成功,类型为“Write Exclusive, registrants only”。表明此时只有ha-host1对该磁盘进行写操作。

    此时如果断开其中两个节点的的链接,如ha-host1和ha-host3:

    [root@ha-host1 ~]# pcs status
    Cluster name: linuxha
    Stack: corosync
    Current DC: ha-host2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
    Last updated: Fri May  4 07:30:53 2018
    Last change: Fri May  4 07:07:13 2018 by root via cibadmin on ha-host1
    
    3 nodes configured
    2 resources configured
    
    Node ha-host3: UNCLEAN (offline)
    Online: [ ha-host1 ha-host2 ]
    
    Full list of resources:
    
     vip    (ocf::heartbeat:IPaddr2):       Started ha-host1
     scsi-shooter   (stonith:fence_scsi):   Started ha-host2
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    [root@ha-host1 ~]# sg_persist -s /dev/sdb
      LIO-ORG   fen1              4.0
      Peripheral device type: disk
      PR generation=0xb
        Key=0x35fc0000
          All target ports bit clear
          Relative port address: 0x1
          << Reservation holder >>
          scope: LU_SCOPE,  type: Write Exclusive, registrants only
          Transport Id of initiator:
            iSCSI name and session id: iqn.2016-06.com.ha-host1:iscsi-host1
        Key=0x35fc0001
          All target ports bit clear
          Relative port address: 0x1
          not reservation holder
          Transport Id of initiator:
            iSCSI name and session id: iqn.2016-06.com.ha-host2:iscsi-host2
    

    可以看到,经过协商后,ha-host3退出集群,并且也删除在fencing磁盘中的注册信息。由于stonith资源运行在ha-host2上,所以在ha-host2的日志中可以看到ha-host3被fence的过程:

    [root@ha-host2 ~]# tail -1000 /var/log/cluster/corosync.log | grep ha-host3     May 04 07:30:51 [1437] ha-host2    pengine:   notice: LogNodeActions:    * Fence (reboot) ha-host3 'peer is no longer part of the cluster'
    May 04 07:30:51 [1438] ha-host2       crmd:   notice: te_fence_node:    Requesting fencing (reboot) of node ha-host3 | action=1 timeout=60000
    May 04 07:30:51 [1434] ha-host2 stonith-ng:   notice: handle_request:   Client crmd.1438.0cea319b wants to fence (reboot) 'ha-host3' with device '(any)'
    May 04 07:30:51 [1434] ha-host2 stonith-ng:   notice: initiate_remote_stonith_op:       Requesting peer fencing (reboot) of ha-host3 | id=0cf426c7-666f-4299-8285-fa500fa5ac09 state=0
    May 04 07:30:52 [1434] ha-host2 stonith-ng:   notice: can_fence_host_with_device:       scsi-shooter can fence (reboot) ha-host3: static-list
    May 04 07:30:52 [1434] ha-host2 stonith-ng:     info: process_remote_stonith_query:     Query result 1 of 2 from ha-host2 for ha-host3/reboot (1 devices) 0cf426c7-666f-4299-8285-fa500fa5ac09
    May 04 07:30:52 [1434] ha-host2 stonith-ng:     info: call_remote_stonith:     Total timeout set to 60 for peer's fencing of ha-host3 for crmd.1438|id=0cf426c7-666f-4299-8285-fa500fa5ac09
    May 04 07:30:52 [1434] ha-host2 stonith-ng:     info: call_remote_stonith:     Requesting that 'ha-host2' perform op 'ha-host3 reboot' for crmd.1438 (72s, 0s)
    May 04 07:30:52 [1434] ha-host2 stonith-ng:     info: process_remote_stonith_query:     Query result 2 of 2 from ha-host1 for ha-host3/reboot (1 devices) 0cf426c7-666f-4299-8285-fa500fa5ac09
    May 04 07:30:52 [1434] ha-host2 stonith-ng:   notice: can_fence_host_with_device:       scsi-shooter can fence (reboot) ha-host3: static-list
    May 04 07:30:52 [1434] ha-host2 stonith-ng:     info: stonith_fence_get_devices_cb:     Found 1 matching devices for 'ha-host3'
    May 04 07:30:53 [1434] ha-host2 stonith-ng:  warning: log_action:       fence_scsi[2603] stderr: [ WARNING:root:Parse error: Ignoring unknown option 'port=ha-host3' ]
    May 04 07:30:53 [1434] ha-host2 stonith-ng:   notice: log_operation:    Operation 'reboot' [2603] (call 6 from crmd.1438) for host 'ha-host3' with device 'scsi-shooter' returned: 0 (OK)
    May 04 07:30:53 [1434] ha-host2 stonith-ng:   notice: remote_op_done:   Operation reboot of ha-host3 by ha-host2 for crmd.1438@ha-host2.0cf426c7: OK
    May 04 07:30:53 [1438] ha-host2       crmd:     info: tengine_stonith_callback:Stonith operation 6 for ha-host3 passed
    May 04 07:30:53 [1438] ha-host2       crmd:     info: crm_update_peer_expected:crmd_peer_down: Node ha-host3[3] - expected state is now down (was member)
    

    ha-host3被fence之后,必须重启才能重新注册PR Key,否则即使网络恢复,其也无法运行需要stonith支持的资源。

    问题:仲裁机制保证了必须有超过半数的节点的partition才能启动资源,拿为什么还需要stonith设备?

    1. 当集群中有只有两个节点的时候,我们必须允许partition在只有一个节点的时候也可以启动资源,此时Stonith设备为必须。
    2. 仲裁机制是以主机间的pcs进程通信为基础的,存在一种可能性是pcs进程已经停掉了,但是相关的资源进程仍然在运行(资源进程处于脱管状态),此时节点已经脱离集群但仍会争抢资源。

    相关文章

      网友评论

          本文标题:Linux HA 集群原理和配置-03

          本文链接:https://www.haomeiwen.com/subject/uumgwftx.html