小型架构实践（准备工作）--corosync配置及注意事项

作者: 飞翔的Tallgeese | 来源:发表于2018-02-06 11:22 被阅读4次

小型架构实践（准备工作）--corosync配置及注意事项
小型架构实践--Mysql双主+corosync+NFS
第9章项目部署到kubernetes平台
微服务架构实践（背景篇）
第六节、K8s相关介绍以及主从节点服务安装
小型架构实践--Rsync部署
中小型研发团队架构实践三要点
iOS开发配置Universal Links及注意事项
在Deepin Linux上通过Sublime Text 3 配
小型架构实践--NFS环境搭建

##################################

本文除开前面的RA报错的重要补充实验，其余部分转自下文

原文链接：corosync配置详解

##################################

重要补充：关于RA资源报错的实验

测试机1：192.168.40.135（LNMP01）

测试机2：192.168.40.100（Master01）

在配置nginx资源的时候，网上的语法是primitive server_name lsb:nginx，然而在lsb:ng后面敲击tab并无法补全，证明该资源不存在（强行配置的话，在verify的时候会报错，导致无法commit）

实际上LNMP01已经完成了nginx的安装，进一步对比Master01发现LNMP01的RA里面同样没有lsb:mysqld资源

于是对LNMP01进行mysql的安装，完成安装后，crm→ra→list lsb可以发现mysqld资源出现，确定必须安装了资源才会出现在ra列表里

然而LNMP01上面已经部署了nginx，为什么没有出现呢

进一步发现必须将该服务添加到/etc/init.d下面

完成添加后，确认可以通过service nginx start/stop 来操控nginx

之后在crm的list lsb里面已经可以发现nginx

primitive配置也可以自动补全nginx，并且成功commit

测试效果如下：

eg.

[root@LNMP01 init.d]# service nginx status

nginx is stopped

[root@LNMP01 init.d]# ps -ef|grep nginx

root 31266 20217 0 16:03 pts/0 00:00:00 grep nginx

[root@LNMP01 init.d]# crm

crm(live)# ra

crm(live)ra# list lsb

.....

nginx

......

crm(live)ra# cd

crm(live)# configure

crm(live)configure# primitive nginx_server lsb:nginx

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure#

[root@LNMP01 init.d]# crm_mon

Attempting connection to the cluster...

Stack: classic openais (with plugin)

Current DC: LNMP02 (version 1.1.15-5.el6-e174ec8) - partition with quorum

Last updated: Tue Feb 6 16:07:15 2018 Last change: Tue Feb 6 16:06:08 2018 by root via cibadmin on LNMP01

, 2 expected votes

2 nodes and 3 resources configured

Online: [ LNMP01 LNMP02 ]

Active resources:

webvip (ocf::heartbeat:IPaddr): Started LNMP01

webstore (ocf::heartbeat:Filesystem): Started LNMP02

nginx_server (lsb:nginx): Started LNMP01

Failed Actions:

* nginx_server_monitor_0 on LNMP02 'not installed' (5): call=29, status=Not installed, exitreason='none',

last-rc-change='Tue Feb 6 16:06:10 2018', queued=0ms, exec=1ms

（报错是LNMP02上面还没有配置init.d的nginx）

##################################

只摘抄了其中关于CRM配置的部分

一、corosync、pacemaker各自是什么？

Corosync简介：

Coreosync在传递信息的时候可以通过一个简单的配置文件来定义信息传递的方式和协议等。它是一个新兴的软件，2008年推出，但其实它并不是一个真正意义上的新软件，在2002年的时候有一个项目Openais它由于过大，分裂为两个子项目，其中可以实现HA心跳信息传输的功能就是Corosync ,它的代码60%左右来源于Openais. Corosync可以提供一个完整的HA功能，但是要实现更多，更复杂的功能，那就需要使用Openais了。Corosync是未来的发展方向。在以后的新项目里，一般采用Corosync，而hb_gui可以提供很好的HA管理功能，可以实现图形化的管理。另外相关的图形化有RHCS的套件luci+ricci.

pacemaker是一个开源的高可用资源管理器(CRM)，位于HA集群架构中资源管理、资源代理(RA)这个层次，它不能提供底层心跳信息传递的功能，要想与对方节点通信需要借助底层的心跳传递服务，将信息通告给对方。

corosync默认启用了stonith，而当前集群并没有相应的stonith设备，因此此默认配置目前尚不可用

注：Stonith 即shoot the other node in the head使Heartbeat软件包的一部分，该组件允许系统自动复位一个失败的服务器使用连接到一个健康的服务器的遥远电源设备，简单的说Stonith设备可以接受一台主机发来的信号从而切断不能传递心跳信息的节点电源，从而避免产生资源争用的设备；

此时我们将node2 节点停掉，因为node2没办法传递心跳信息，node3以为node2出了故障，马上就变成了DC 而且两个节点都不具备法定票数(partition WITHOUT quorum)，再将node2启动起来，就都具有法定票数 (partition quorum)；

什么是 crmsh？

pacemaker本身只是一个资源管理器，我们需要一个接口才能对pacemker上的资源进行定义与管理，而crmsh即是pacemaker的配置接口，从pacemaker 1.1.8开始，crmsh 发展成一个独立项目，pacemaker中不再提供。crmsh提供了一个命令行的交互接口来对Pacemaker集群进行管理，它具有更强大的管理功能，同样也更加易用，在更多的集群上都得到了广泛的应用，类似软件还有 pcs；

注：在crm管理接口所做的配置会同步到各个节点上；

crm的特性：

1、任何操作都需要commit提交后才会生效；

2、想要删除一个资源之前需要先将资源停止

3、可以用 help COMMAND 获取该命令的帮助

4、与Linux命令行一样，都支持TAB补全

[iyunv@essun corosync]# crm

crm(live)# help # 获取当前可用命令

# 一级子命令

This is crm shell, a Pacemaker command line interface.

Available commands:

cib manage shadow CIBs # cib沙盒

resource resources management # 所有的资源都在这个子命令后定义

configure CRM cluster configuration # 编辑集群配置信息

node nodes management # 集群节点管理子命令

options user preferences # 用户优先级

history CRM cluster history

site Geo-cluster support

ra resource agents information center # 资源代理子命令（所有与资源代理相关的程都在此命令之下）

status show cluster status # 显示当前集群的状态信息

help,? show help (help topics for list of topics)# 查看当前区域可能的命令

end,cd,up go back one level # 返回第一级crm(live)#

quit,bye,exit exit the program # 退出crm（live）交互模式

crm(live)resource# help

vailable commands:

status show status of resources # 显示资源状态信息

start start a resource # 启动一个资源

stop stop a resource # 停止一个资源

restart restart a resource # 重启一个资源

promote promote a master-slave resource # 提升一个主从资源

demote demote a master-slave resource # 降级一个主从资源

manage put a resource into managed mode

unmanage put a resource into unmanaged mode

migrate migrate a resource to another node # 将资源迁移到另一个节点上

unmigrate unmigrate a resource to another node

param manage a parameter of a resource # 管理资源的参数

secret manage sensitive parameters # 管理敏感参数

meta manage a meta attribute # 管理源属性

utilization manage a utilization attribute

failcount manage failcounts # 管理失效计数器

cleanup cleanup resource status # 清理资源状态

refresh refresh CIB from the LRM status # 从LRM（LRM本地资源管理）更新CIB（集群信息库），在

reprobe probe for resources not started by the CRM # 探测在CRM中没有启动的资源

trace start RA tracing # 启用资源代理（RA）追踪

untrace stop RA tracing # 禁用资源代理（RA）追踪

help show help (help topics for list of topics) # 显示帮助

end go back one level # 返回一级（crm(live)#）

quit exit the program # 退出交互式程序

crm(live)configure# help

Available commands:

node define a cluster node # 定义一个集群节点

primitive define a resource # 定义资源

monitor add monitor operation to a primitive # 对一个资源添加监控选项（如超时时间，启动失败后的操作）

group define a group # 定义一个组类型（将多个资源整合在一起）

clone define a clone # 定义一个克隆类型（可以设置总的克隆数，每一个节点上可以运行几个克隆）

ms define a master-slave resource # 定义一个主从类型（集群内的节点只能有一个运行主资源，其它从的做备用）

rsc_template define a resource template # 定义一个资源模板

location a location preference # 定义位置约束优先级（默认运行于那一个节点（如果位置约束的值相同，默认倾向性那一个高，就在那一个节点上运行））

colocation colocate resources # 排列约束资源（多个资源在一起的可能性）

order order resources # 资源的启动的先后顺序

rsc_ticket resources ticket dependency

property set a cluster property # 设置集群属性

rsc_defaults set resource defaults # 设置资源默认属性（粘性）

fencing_topology node fencing order # 隔离节点顺序

role define role access rights # 定义角色的访问权限

user define user access rights # 定义用用户访问权限

op_defaults set resource operations defaults # 设置资源默认选项

schema set or display current CIB RNG schema

show display CIB objects # 显示集群信息库对

edit edit CIB objects # 编辑集群信息库对象（vim模式下编辑）

filter filter CIB objects # 过滤CIB对象

delete delete CIB objects # 删除CIB对象

default-timeouts set timeouts for operations to minimums from the meta-data

rename rename a CIB object # 重命名CIB对象

modgroup modify group # 改变资源组

refresh refresh from CIB # 重新读取CIB信息

erase erase the CIB # 清除CIB信息

ptest show cluster actions if changes were committed

rsctest test resources as currently configured

cib CIB shadow management

cibstatus CIB status management and editing

template edit and import a configuration from a template

commit commit the changes to the CIB # 将更改后的信息提交写入CIB

verify verify the CIB with crm_verify # CIB语法验证

upgrade upgrade the CIB to version 1.0

save save the CIB to a file # 将当前CIB导出到一个文件中（导出的文件存于切换crm 之前的目录）

load import the CIB from a file # 从文件内容载入CIB

graph generate a directed graph

xml raw xml

help show help (help topics for list of topics) # 显示帮助信息

end go back one level # 回到第一级(crm(live)#)

node子命令 # 节点管理和状态

crm(live)# node

crm(live)node# help

Node management and status commands.

Available commands:

status show nodes status as XML # 以xml格式显示节点状态信息

show show node # 命令行格式显示节点状态信息

standby put node into standby # 模拟指定节点离线（standby在后面必须的FQDN）

online set node online # 节点重新上线

maintenance put node into maintenance mode

ready put node into ready mode

fence fence node # 隔离节点

clearstate Clear node state # 清理节点状态信息

delete delete node # 删除一个节点

attribute manage attributes

utilization manage utilization attributes

status-attr manage status attributes

help show help (help topics for list of topics)

end go back one level

quit exit the program

ra子命令 # 资源代理类别都在此处

crm(live)# ra

crm(live)ra# help

Available commands:

classes list classes and providers # 为资源代理分类

list list RA for a class (and provider)# 显示一个类别中的提供的资源

meta show meta data for a RA # 显示一个资源代理序的可用参数（如meta ocf:heartbeat:IPaddr2）

providers show providers for a RA and a class

help show help (help topics for list of topics)

end go back one level

quit exit the program

show xml 显示完整的xml格式信息

crm(live)configure# show

node node2.test.com

node node3.test.com # 当前集群共有三个节点

property cib-bootstrap-options: \

dc-version=1.1.11-97629de \ # DC的版本

cluster-infrastructure="classic openais (with plugin)" \ # 底层基础架构(经典的openais，使用plugin方式来运行)

expected-quorum-votes=2 \ # 当前节点一共有两票

stonith-enabled=false # stonith 设备已被禁用

禁用stonith设备：

configure

crm(live)configure# property stonith-enabled=false

crm(live)configure# commit

crm_verify -L -V 此时在检查就不会检查 stoith 设备了；

尝试配置VIP：

crm(live)#configure

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.16.4.88 nic='eth0' cidr_netmask='16' broadcast='172.16.255.255'

# 只要IPaddr 不在一个以上资源代理类别下存在，ocf:heartbeat都可以省略

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# cd

crm(live)# status

Last updated: Sat Jan 3 18:40:23 2015

Last change: Sat Jan 3 18:40:19 2015

Stack: classic openais (with plugin)

Current DC: node3.zhangjian.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured, 2 expected votes

1 Resources configured

Online: [ node2.test.com node3.zhangjian.com ]

webip(ocf::heartbeat:IPaddr):Started node2.test.com # VIP已经配置成功

删除一个资源：

crm(live)# resource

crm(live)resource# stop webip # 删除需先将资源停止

crm(live)resource# cd ..

crm(live)# configure

crm(live)configure# delete webip # 删除一个CIB对象

crm(live)configure# commit # 想要生效需要提交

crm(live)configure# cd ..

crm(live)# status

Last updated: Sat Jan 3 21:01:44 2015

Last change: Sat Jan 3 21:01:39 2015

Stack: classic openais (with plugin)

Current DC: node2.zhangjian.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured, 2 expected votes

0 Resources configured

Online: [ node2.zhangjian.com node3.zhangjian.com ]

monitor 监控资源monitor [:] [:]

监控哪个资源哪个角色多长时间监控一次监控超时时长是多少

例：

monitor apcfence 60m:60s 监控apcfence 这个资源 60分钟监控一次 60s 超时

注：每一个资源都有它的默认监控法则，我们所定义的时长，不应该小于它的默认法则时长；

例如：(获取IPaddr资源的默认监控法则)

crm(live)# ra

crm(live)ra# info IPaddr

Operations' defaults (advisory minimum):

start timeout=20s # 启动时的超时时长

stop timeout=20s # 停止时的超时时长

status timeout=20s interval=10s # 监控状态时的操作 interval=10s 每隔10s 监控一次

monitor timeout=20s interval=10s # 监控 10s 监控一次超时长为20s

定义IP：

crm(live)configure# primitive webip IPaddr params ip=172.16.4.88 op monitor interval=10s timeout=20s

crm(live)configure# verify

crm(live)configure# commit

定义httpd资源：

[iyunv@node2 html]# crm

crm(live)# configure

crm(live)configure# primitive webserver lsb:httpd op monitor interval=30s timeout=15s

crm(live)configure# verify

crm(live)configure# commit

crm(live)configure# cd

crm(live)# status

Last updated: Sat Jan 3 21:25:49 2015

Last change: Sat Jan 3 21:25:45 2015

Stack: classic openais (with plugin)

Current DC: node2.zhangjian.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured, 2 expected votes

2 Resources configured

Online: [ node2.zhangjian.com node3.zhangjian.com ]

webip(ocf::heartbeat:IPaddr):Started node2.zhangjian.com # webip 运行在节点2上

webserver(lsb:httpd):Started node3.zhangjian.com # webserver 运行在节点3上

注意：现在webip与webserver是分别运行在不同的节点上的，默认情况下资源是尽可能均衡的运行在各节点上的；

两种解决办法：

group 组资源，将两个资源定义在一起，做为一组资源而运行；

colocation 也可以定义排列约束，也叫协同约束，两个资源必须在一起；

crm(live)# configure

crm(live)configure# colocation webserver_with_webip inf: webserver webip # 定义在一起

crm(live)configure# show # 查看刚刚定义是否生效

crm(live)configure# commit

crm(live)configure# cd ..

crm(live)# status

Last updated: Sat Jan 3 21:30:14 2015

Last change: Sat Jan 3 21:30:08 2015

Stack: classic openais (with plugin)

Current DC: node2.zhangjian.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured, 2 expected votes

2 Resources configured

Online: [ node2.zhangjian.com node3.zhangjian.com ]

webip(ocf::heartbeat:IPaddr):Started node2.zhangjian.com

webserver(lsb:httpd):Started node2.zhangjian.com # 此时两个资源都运行在节点2上

定义顺序约束：

order webip_before_webserver mandatory: webip webserver

crm(live)configure# commit

注：mandatory 代表强制，webip、webserver 这两个资源必须按照我所给定的顺序启动；

此时就可以用客户机测试，访问 172.16.4.88 会访问到节点2上的web页面；

crmnode standby # 将 node2 节点转换成备用节点

再重新用浏览器访问测试，此时访问的就是node3 节点上的web页面了；

crm onde online # 此时将 node2 节点重新上线，资源也不会流转回来

定义节点倾向性：

configure

help location # 获取 location 使用帮助

crm(live)configure# location webip_prefer_node1 webip rule 100: #uname eq node2.zhangjian.com # 约束名资源名约束为100 节点名为 node2

crm(live)configure# commit

此时在用浏览器访问web页面就会变成 node2上的页面，因为资源对node2 的倾向性更大，即使将node2 变成备用模式，资源转移出去了，在让node2重新上线，它立马就会流转回来，因为我们定义了webip对于node2的倾向性是100，默认对所有节点的倾向性都是0，所以只要node2在，它就会运行在节点2上；

粘性：每一个资源对于当前节点的粘性；