1.1部署drbd9
1. 安装drbd-utils
yum install -y gcc gcc-c++ make autoconf automake flex kernel-devel libxslt libxslt-devel asciidoc
tar -zxvf drbd-utils-9.10.0.tar.gz
cd drbd-utils-9.10.0
./autogen.sh
./configure --prefix=/usr/local/drbd-utils-9.10.0 \
--localstatedir=/var \
--sysconfdir=/etc \
--without-83support \
--without-84support \
--without-manual
make KDIR=/usr/src/kernels/3.10.0-693.el7.x86_64
make install
cp scripts/drbd-overview.pl /usr/bin/drbd-overview
2. 安装drbd
tar -xf drbd-9.0.19-1.tar.gz
cd drbd-9.0.19-1
make
make install
3. 加载模块
depmod
modprobe drbd
lsmod | grep drbd
drbd 555120 0
libcrc32c 12644 2 xfs,drbd
4. 配置drbd
vim /etc/drbd.d/global_common.conf
global {
usage-count yes;
udev-always-use-vnr;
}
common {
handlers {
}
startup {
wfc-timeout 100;
degr-wfc-timeout 120;
}
options {
auto-promote yes;
}
disk {
}
net {
protocol C;
# trasport "tcp";
}
}
vim /etc/drbd.d/data.res
resource data {
on node1 {
node-id 0;
device /dev/drbd0 minor 0;
disk /dev/sdb;
meta-disk internal;
address ipv4 192.168.111.129:7788;
}
on node2 {
node-id 1;
device /dev/drbd0 minor 0;
disk /dev/sdb;
meta-disk internal;
address ipv4 192.168.111.130:7788;
}
connection-mesh {
hosts node1 node2;
}
disk {
resync-rate 100M;
}
}
5. 启动drbd
systemctl start drbd
systemctl enable drbd
6. 初始化drbd分区
1)创建元数据
dd if=/dev/zero of=/dev/sdb bs=1M count=100
drbdadm create-md data
drbdadm primary --force data
2)查看初始化进度
drbdadm status data
资源同步中:
data role:Primary
disk:UpToDate
node2 role:Secondary
replication:SyncSource peer-disk:Inconsistent done:20.30
资源同步结束:
data role:Secondary
disk:UpToDate
node1 role:Primary
peer-disk:UpToDate
3)格式化drbd分区
mkfs.xfs /dev/drbd0
1.2 部署crorsync
1. 安装corosync
yum install -y corosync
2. 配置corosync(centos7使用 pcs服务会自动生成配置文件,不用做这一步)
1)修改corosync配置文件
cd /etc/corosync
cp corosync.conf.example.udpu corosync.conf
说明:
corosync.conf.example corosync使用组播通信样例
corosync.conf.example.udpu corosync使用单播通信样例
vim corosync.conf
totem {
version: 2
crypto_cipher: aes256
crypto_hash: sha1
token: 10000
interface {
ringnumber: 0
bindnetaddr: 192.168.111.0
mcastport: 5405
ttl: 1
}
transport: udpu
}
logging {
fileline: off
to_logfile: no
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
nodelist {
node {
ring0_addr: 192.168.111.129
nodeid: 1
}
node {
ring0_addr: 192.168.111.130
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
}
2)生成corosync认证文件
corosync-keygen -l
scp corosync.conf authkey node2:/etc/corosync/
3. 启动corosync
systemctl start corosync
systemctl enable corosync
1.3 部署pacemaker
1. 安装pacemaker
yum install -y pacemaker
2. 启动pacemaker
systemctl start pacemaker
systemctl enable pacemaker
1.4 部署pcs
1. 说明
使用pcs管理corosync+pacemaker需要关闭corosync和pacemaker的启动和自启动;corosync和pacemaker的启停由pcsd服务接管。
2. 安装pcs
yum install -y pcs
3. 启动pcsd
systemctl start pcsd
systemctl enable pcsd
4. 配置pcsd
1)设置hacluster账户密码
passwd hacluster
Changing password for user hacluster.
New password: Password@123
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: Password@123
passwd: all authentication tokens updated successfully.
2)用户认证
pcs cluster auth node1 node2
Username: hacluster
Password: Password@123
node1: Authorized
node2: Authorized
3)同步corosync配置
pcs cluster setup --name hacluster node1 node2
Error: node1: node is already in a cluster
Error: node2: node is already in a cluster
Error: nodes availability check failed, use --force to override. WARNING: This will destroy existing cluster on the nodes.
pcs cluster setup --name hacluster node1 node2 --force
Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node2: Successfully destroyed cluster
node1: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node1', 'node2'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded
Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success
4)启动集群
pcs cluster start --all
node1: Starting Cluster...
node2: Starting Cluster...
5)集群开机自启动
pcs cluster enable --all
node1: Cluster Enabled
node2: Cluster Enabled
6)检查corosync安装
corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.111.129
status = ring 0 active with no faults
Printing ring status.
Local node ID 2
RING ID 0
id = 192.168.111.130
status = ring 0 active with no faults
7)禁用stonish设备
如果没有Fence,建议禁用STONITH;
pcs property set stonith-enabled=false
8)无仲裁时选择忽略
正常集群Quorum(法定)需要半数以上的票数,如果是双节点的集群则配置忽略;
pcs property set no-quorum-policy=ignore
9)检查配置,没有报错
crm_verify -L -V
10)查看pcs设备配置
pcs property show
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: hacluster
dc-version: 1.1.16-12.el7-94ff4df
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
5. 添加pcsd资源
1)添加VIP
pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.111.131 cidr_netmask=24 op monitor interval=30s --group group
使用资源组约束 <--group 组名>
2)添加drbd挂载
pcs resource create drbdstone ocf:heartbeat:Filesystem device=/dev/drbd0 directory=/mydata fstype=xfs op monitor interval=30s --group pachira
3)添加mysql资源
mysql的安装过程就不再赘述;
pcs resource create mysqld service:mysqld op monitor interval=30s --group group
pcs resource create pgsql service:postgresql op monitor interval=30s --group group
6. 问题说明
1)查看集群状态
pcs status
Cluster name: hacluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Thu Jul 18 01:48:50 2019
Last change: Thu Jul 18 01:34:24 2019 by hacluster via crmd on node1
2 nodes configured
3 resources configured
Online: [ node1 node2 ]
Full list of resources:
Resource Group: group
vip (ocf::heartbeat:IPaddr2): Started node2
drbdstone (ocf::heartbeat:Filesystem): Started node2
mysqld (service:mysqld): Started node2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
2)资源unknown error
现象:
pcs status
Cluster name: hacluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Thu Jul 18 01:32:22 2019
Last change: Thu Jul 18 01:31:57 2019 by root via cibadmin on node1
2 nodes configured
3 resources configured
Node node1: standby
Online: [ node2 ]
Full list of resources:
Resource Group: pachira
vip (ocf::heartbeat:IPaddr2): Started node2
drbdstone (ocf::heartbeat:Filesystem): Started node2
mysqld (service:mysqld): Stopped
Failed Actions:
* mysqld_start_0 on node2 'unknown error' (1): call=56, status=complete, exitreason='none',
last-rc-change='Thu Jul 18 01:32:05 2019', queued=0ms, exec=1036ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
解决方法:
(1)找到资源为正确启动的原因;
(2)执行 pcs resource cleanup mysqld 即可自动恢复;
Cleaning up vip on node1, removing fail-count-vip
Cleaning up vip on node2, removing fail-count-vip
Cleaning up drbdstone on node1, removing fail-count-drbdstone
Cleaning up drbdstone on node2, removing fail-count-drbdstone
Cleaning up mysqld on node1, removing fail-count-mysqld
Cleaning up mysqld on node2, removing fail-count-mysqld
Waiting for 6 replies from the CRMd...... OK
3)模拟切机和恢复
将集群的一个节点设置为standby
pcs node standby node1
pcs node unstandby node1
4)删除配置错误资源
pcs resource delete mysqld
1.5 pcsd图形界面
1. 修改pcsd服务图形界面端口
pcsd默认端口是在tcp6的2224,需要手动配置为ipv4端口;
netstat -antp | grep 2224 | grep LIST
tcp6 0 0 :::2224 :::* LISTEN 3831/ruby
vim /usr/lib/pcsd/ssl.rb
webrick_options = {
:Port => 2224,
#:BindAddress => primary_addr,
#:Host => primary_addr,
:BindAddress => '0.0.0.0',
:Host => '0.0.0.0',
systemctl restart pcsd
netstat -antp | grep 2224 | grep LIST
tcp 0 0 0.0.0.0:2224 0.0.0.0:* LISTEN 96985/ruby
2. 访问https://IP:2224
pcs1.png pcs2.png pcs3.png pcs4.png然后就能看到之前命令行配置的集群了
网友评论