美文网首页Zabbix
Zabbix集群升级

Zabbix集群升级

作者: 苏水的北 | 来源:发表于2021-06-18 17:58 被阅读0次

一、###Zabbix集群构建:

Mysql的MHA架构清单:

主机名 IP 主/从 MHA高可用软件 读写分离软件
db01 192.168.98.5 绑定VIP:192.168.98.92
db02 192.168.98.6
db03 192.168.98.13 MHA软件部署位置 读写分离软件 部署位置

Zabbix的web端清单:

主机名 IP 主/从 功能
zabbix01 192.168.98.5 zabbix服务端
zabbix02 192.168.98.6 zabbix服务端
负载均衡 192.168.98.6 提供负载均衡功能

二、给db01、db02、db03服务器部署Mysql数据库:

注意:下面只给出db01上搭建mysql的步骤,db02和db03用相同的方法搭建。
2.1 创建相关目录:

[root@db01 ~]# mkdir /application           ----> 软件的存放目录
[root@db01 /]# mkdir /data/3306/data -p     ----> 数据的存放位置

2.2 创建用户:

[root@db01 /]# useradd -s /sbin/nologin -M mysql

2.3 上传软件并解压处理:

[root@db01 application]# ll
total 1079692
-rw-r--r--  1 mysql mysql 644869837 Aug  5 09:31 mysql-5.7.26-linux-glibc2.12-x86_64.tar.gz
[root@db01 /application]# tar xf mysql-5.7.26-linux-glibc2.12-x86_64
[root@db01 /application]# mv mysql-5.7.26-linux-glibc2.12-x86_64 mysql
[root@db01 application]# ll
total 1079692
drwxr-xr-x  9 mysql mysql       129 Aug  5 12:22 mysql
-rw-r--r--  1 mysql mysql 644869837 Aug  5 09:31 mysql-5.7.26-linux-glibc2.12-x86_64.tar.gz
注:软件包可以去mysql.com官网下载

2.4 数据初始化(创建系统数据):

[root@db01 ~]# rpm -qa|grep maria
mariadb-libs-5.5.60-1.el7_5.x86_64
[root@db01 ~]# yum  remove mariadb-libs-5.5.60-1.el7_5.x86_64 -y
"注:由于centos7系统自带mariadb数据库,所以必须要将其删除。"
[root@db01 ~]# vim /etc/profile
export PATH=/application/mysql/bin:$PATH
[root@db01 ~]# mysql -V   //次命令用于检查mysql版本信息
mysql  Ver 14.14 Distrib 5.7.26, for linux-glibc2.12 (x86_64) using  EditLine wrapper
[root@db01 data]# mysqld --initialize-insecure --user=mysql --basedir=/application/mysql --datadir=/data/3306/data  //数据库初始化命令
2019-08-05T08:41:34.391208Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-08-05T08:41:36.705525Z 0 [Warning] InnoDB: New log files created, LSN=45790
2019-08-05T08:41:36.910614Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2019-08-05T08:41:37.021365Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: d63578ab-b75c-11e9-a331-000c29a71104.
2019-08-05T08:41:37.022771Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2019-08-05T08:41:37.023736Z 1 [Warning] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.

2.5 书写配置文件:

cat >/etc/my.cnf <<EOF
[mysqld]
user=mysql
basedir=/application/mysql
datadir=/data/3306/data
socket=/tmp/mysql.sock
[mysql]
socket=/tmp/mysql.sock
prompt=wuxin[\\d]>   //wuxin变量可以自己定义
EOF

2.6 准备启动脚本:

[root@db01 ~]#chown -R mysql.mysql /data/* /application/mysql/*
[root@db01 ~]#cp /application/mysql/support-files/mysql.server /etc/init.d/mysqld 
[root@db01 ~]# /etc/init.d/mysqld start   //启动mysql方法
Starting MySQL.Logging to '/data/3306/data/db01.err'.
SUCCESS! 

2.7 使用systemd管理mysql:

[root@db01 ~]# /etc/init.d/mysqld start   //因为要用systemctl去管理,所以先按上一步的方法关闭mysql
[root@db01 ~]# cat > /etc/systemd/system/mysqld.service <<EOF
[Unit]
Description=MySQL Server
Documentation=man:mysqld(8)
Documentation=http://dev.mysql.com/doc/refman/en/using-systemd.html
After=network.target
After=syslog.target
[Install]
WantedBy=multi-user.target
[Service]
User=mysql
Group=mysql
ExecStart=/application/mysql/bin/mysqld --defaults-file=/etc/my.cnf
LimitNOFILE = 5000
EOF
注意:将原来模式启动mysqld先关闭,然后再用systemd管理。

2.8 用systemctl管理启动数据库:

[root@db01 ~]#systemctl  start   mysqld

三、给db01、db02、db03三台数据库构建主从关系,开启GTID复制功能:

3.1 清理环境,三台数据库都操作:

pkill mysqld
 \rm -rf /data/3306/data/*
 \rm -rf /data/binlog/*
mkdir -p /data/3306/data 
mkdir -p /data/binlog/
chown -R mysql.mysql  /data/*

3.2 准备配置文件,每台数据库填写各自的配置文件:

# 主库db01:
cat > /etc/my.cnf <<EOF
[mysqld]
basedir=/application/mysql/
datadir=/data/3306/data
socket=/tmp/mysql.sock
server_id=51
port=3306
secure-file-priv=/tmp
autocommit=0
log_bin=/data/binlog/mysql-bin
binlog_format=row
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1
[mysql]
prompt=db01 [\\d]>
EOF


# slave1(db02):

cat > /etc/my.cnf <<EOF
[mysqld]
basedir=/application/mysql
datadir=/data/3306/data
socket=/tmp/mysql.sock
server_id=52
port=3306
secure-file-priv=/tmp
autocommit=0
log_bin=/data/binlog/mysql-bin
binlog_format=row
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1
[mysql]
prompt=db02 [\\d]>
EOF

# slave2(db03):

cat > /etc/my.cnf <<EOF
[mysqld]
basedir=/application/mysql
datadir=/data/3306/data
socket=/tmp/mysql.sock
server_id=53
port=3306
secure-file-priv=/tmp
autocommit=0
log_bin=/data/binlog/mysql-bin
binlog_format=row
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1
[mysql]
prompt=db03 [\\d]>
EOF

3.3 初始化数据,三台数据库都执行:

mysqld --initialize-insecure --user=mysql --basedir=/application/mysql  --datadir=/data/3306/data 

3.4 启动数据库,三台数据库都启动:

systemctl start  mysqld

3.5 构建主从:
db01主库执行:

mysql -e "grant replication slave  on *.* to repl@'192.168.%.%' identified by '123456';"

db02从库和db03从库分别执行:

mysql -e "change master to master_host='192.168.98.5',master_user='repl',master_password='123456' ,MASTER_AUTO_POSITION=1;"
mysql -e  "start slave;"
[root@db02 ~]# mysql -e  "show slave status \G"|grep Running:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[root@db03 ~]# mysql -e  "show slave status \G"|grep Running:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

总结:做完上述步骤,就完成了一主两从的搭建。

四、MHA环境搭建

规划:
主库:
192.168.98.5 主库
VIP:192.168.98.92
从库:
192.168.98.6 从库
192.168.98.13 从库、MHA软件的manager管理端
4.1、配置关键程序软连接(三台数据库都执行):

ln -s /application/mysql/bin/mysqlbinlog    /usr/bin/mysqlbinlog
ln -s /application/mysql/bin/mysql          /usr/bin/mysql

4.2、配置各节点互信(在db01上执行):

rm -rf /root/.ssh 
ssh-keygen
cd /root/.ssh 
mv id_rsa.pub authorized_keys
scp  -r  /root/.ssh  192.168.98.6:/root 
scp  -r  /root/.ssh  192.168.98.13:/root 

各节点验证
db01:

ssh 192.168.98.5 date
ssh 192.168.98.6 date
ssh 192.168.98.13 date

db02:

ssh 192.168.98.5 date
ssh 192.168.98.6 date
ssh 192.168.98.13 date

db03:

ssh 192.168.98.5 date
ssh 192.168.98.6 date
ssh 192.168.98.13 date

4.3、安装软件
4.3.1、下载mha软件
mha官网:https://code.google.com/archive/p/mysql-master-ha/
github下载地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads

4.3.2、所有节点安装Node软件依赖包(三台数据库都执行):

yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm

4.3.3、在db01主库中创建mha需要的用户(因为主从已经搭建完成,所以在db01上创建mha用户,db02和db03上面也会存在mha用户):

grant all privileges on *.* to mha@'192.168.%.%' identified by 'mha';

4.3.4、Manager软件安装(因为db03作为MHA软件的manager,所以只在它上面安装)

yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm

4.3.5、配置文件准备(db03上配置mha的配置文件)
创建配置文件目录

 mkdir -p /etc/mha

创建日志目录

 mkdir -p /var/log/mha/app1

编辑mha配置文件:

vim /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager   ##MHA架构manager软件的日志(工作流程日志)
manager_workdir=/var/log/mha/app1      ##工作路径      
master_binlog_dir=/data/binlog       ##主库的二进制日志路径
user=mha          ##连接数据库用户                         
password=mha    ##连接数据库密码                           
ping_interval=2    ##ping时间间隔
repl_password=123456  ##主从复制用户密码
repl_user=repl     ##主从复制用户
ssh_user=root        ##互信用户                       
[server1]                                   
hostname=192.168.98.5
port=3306                                  
[server2]            
hostname=192.168.98.6
port=3306
[server3]
hostname=192.168.98.13
port=3306

4.3.6、 状态检查
4.3.6.1、互信检查(db03里面执行)

[root@db03 ~]# masterha_check_ssh  --conf=/etc/mha/app1.cnf

Fri Apr 19 16:39:34 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 19 16:39:34 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [info] Starting SSH connection tests..
Fri Apr 19 16:39:35 2019 - [debug] 
Fri Apr 19 16:39:34 2019 - [debug]  Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.52(10.0.0.52:22)..
Fri Apr 19 16:39:34 2019 - [debug]   ok.
Fri Apr 19 16:39:34 2019 - [debug]  Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.53(10.0.0.53:22)..
Fri Apr 19 16:39:35 2019 - [debug]   ok.
Fri Apr 19 16:39:36 2019 - [debug] 
Fri Apr 19 16:39:35 2019 - [debug]  Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.51(10.0.0.51:22)..
Fri Apr 19 16:39:35 2019 - [debug]   ok.
Fri Apr 19 16:39:35 2019 - [debug]  Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.53(10.0.0.53:22)..
Fri Apr 19 16:39:35 2019 - [debug]   ok.
Fri Apr 19 16:39:37 2019 - [debug] 
Fri Apr 19 16:39:35 2019 - [debug]  Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.51(10.0.0.51:22)..
Fri Apr 19 16:39:35 2019 - [debug]   ok.
Fri Apr 19 16:39:35 2019 - [debug]  Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.52(10.0.0.52:22)..
Fri Apr 19 16:39:36 2019 - [debug]   ok.
Fri Apr 19 16:39:37 2019 - [info] All SSH connection tests passed successfully.

4.3.6.2、主从状态检查(db03里面)

[root@db03 ~]# masterha_check_repl  --conf=/etc/mha/app1.cnf 
Fri Jun 18 05:40:31 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Jun 18 05:40:31 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Jun 18 05:40:31 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Jun 18 05:40:31 2021 - [info] MHA::MasterMonitor version 0.56.
Fri Jun 18 05:40:32 2021 - [info] GTID failover mode = 1
Fri Jun 18 05:40:32 2021 - [info] Dead Servers:
Fri Jun 18 05:40:32 2021 - [info] Alive Servers:
Fri Jun 18 05:40:32 2021 - [info]   192.168.98.5(192.168.98.5:3306)
Fri Jun 18 05:40:32 2021 - [info]   192.168.98.6(192.168.98.6:3306)
Fri Jun 18 05:40:32 2021 - [info]   192.168.98.13(192.168.98.13:3306)
Fri Jun 18 05:40:32 2021 - [info] Alive Slaves:
Fri Jun 18 05:40:32 2021 - [info]   192.168.98.6(192.168.98.6:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Jun 18 05:40:32 2021 - [info]     GTID ON
Fri Jun 18 05:40:32 2021 - [info]     Replicating from 192.168.98.5(192.168.98.5:3306)
Fri Jun 18 05:40:32 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Jun 18 05:40:32 2021 - [info]   192.168.98.13(192.168.98.13:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Jun 18 05:40:32 2021 - [info]     GTID ON
Fri Jun 18 05:40:32 2021 - [info]     Replicating from 192.168.98.5(192.168.98.5:3306)
Fri Jun 18 05:40:32 2021 - [info] Current Alive Master: 192.168.98.5(192.168.98.5:3306)
Fri Jun 18 05:40:32 2021 - [info] Checking slave configurations..
Fri Jun 18 05:40:32 2021 - [info]  read_only=1 is not set on slave 192.168.98.6(192.168.98.6:3306).
Fri Jun 18 05:40:32 2021 - [info]  read_only=1 is not set on slave 192.168.98.13(192.168.98.13:3306).
Fri Jun 18 05:40:32 2021 - [info] Checking replication filtering settings..
Fri Jun 18 05:40:32 2021 - [info]  binlog_do_db= , binlog_ignore_db= 
Fri Jun 18 05:40:32 2021 - [info]  Replication filtering check ok.
Fri Jun 18 05:40:32 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Jun 18 05:40:32 2021 - [info] Checking SSH publickey authentication settings on the current master..
Fri Jun 18 05:40:32 2021 - [info] HealthCheck: SSH to 192.168.98.5 is reachable.
Fri Jun 18 05:40:32 2021 - [info] 
192.168.98.5(192.168.98.5:3306) (current master)
 +--192.168.98.6(192.168.98.6:3306)
 +--192.168.98.13(192.168.98.13:3306)

Fri Jun 18 05:40:32 2021 - [info] Checking replication health on 192.168.98.6..
Fri Jun 18 05:40:32 2021 - [info]  ok.
Fri Jun 18 05:40:32 2021 - [info] Checking replication health on 192.168.98.13..
Fri Jun 18 05:40:32 2021 - [info]  ok.
Fri Jun 18 05:40:32 2021 - [info] Checking master_ip_failover_script status:
Fri Jun 18 05:40:32 2021 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.98.5 --orig_master_ip=192.168.98.5 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.98.92/24===

Checking the Status of the script.. OK 
Fri Jun 18 05:40:32 2021 - [info]  OK.
Fri Jun 18 05:40:32 2021 - [warning] shutdown_script is not defined.
Fri Jun 18 05:40:32 2021 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

4.3.7、 开启MHA(db03):

[root@db03 ~]#nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &

停止MHA命令:

[root@db03 ~]#masterha_stop --conf=/etc/mha/app1.cnf

4.3.8、 查看MHA状态(db03):

[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:1101) is running(0:PING_OK), master:192.168.98.5

4.4 故障模拟及处理

(1) 停主库db01:

systemctl stop mysqld

备注:观察manager 日志 tail -f /var/log/mha/app1/manager
末尾必须显示successfully,才算正常切换成功。
(2) 主机宕机,修复主库

[root@db01 ~]# systemctl start  mysqld

备注:启动完成主库后,集群并不认识,所以需要后续操作。

(3) 修复主从(把它重新加入MHA集群,当做从库添加进去)

[root@db01 ~]#mysql -e "change master to master_host(新主库ip)='192.168.98.6',master_user='repl',master_password='123456' ,MASTER_AUTO_POSITION=1;"
[root@db01 ~]#mysql -e  "start slave;"
[root@db01 ~]# mysql -e  "show slave status \G"|grep Running:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

(4) 恢复配置文件(在db03中做)

[root@db03 /etc/mha]# cat app1.cnf 
[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/data/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha
ping_interval=2
repl_password=123456
repl_user=repl
report_script=/usr/local/bin/send
ssh_user=root
user=mha

[server1]   ##server1节点重新加入,因为之前他宕机了,系统把它自动清理掉。
candidate_master=1
hostname=192.168.98.5
port=3306

[server2]
candidate_master=1
hostname=192.168.98.6
port=3306

[server3]
hostname=192.168.98.13
port=3306

(5) 启动MHA

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &

检查状态:

masterha_check_status --conf=/etc/mha/app1.cnf

备注:故障事遇到问题,如何修复:
主库由db2切换至db1,但是在db2上指定主库db1时候报错:

[root@db02 ~]# mysql -e "change master to master_host='192.168.98.5',master_user='repl',master_password='123456' ,MASTER_AUTO_POSITION=1;"
ERROR 3081 (HY000) at line 1: This operation cannot be performed with running replication threads; run STOP SLAVE FOR CHANNEL '' first

解决办法:把主从复制停止,然后切换到主库,再把主从复制开启解决故障:

[root@db02 ~]# mysql -e  "stop slave;"
[root@db02 ~]# mysql -e "change master to master_host='192.168.98.5',master_user='repl',master_password='123456' ,MASTER_AUTO_POSITION=1;"
[root@db02 ~]# mysql -e  "start slave;"
[root@db02 ~]# mysql -e  "show slave status \G"|grep Running:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
MHA上检查信息:
[root@db03 ~]# masterha_check_repl  --conf=/etc/mha/app1.cnf 
MySQL Replication Health is OK.

后面就可以正常开启MHA,再检查主库信息,修复完成。
启动MHA:

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
检查状态:
masterha_check_status --conf=/etc/mha/app1.cnf

相关文章

网友评论

    本文标题:Zabbix集群升级

    本文链接:https://www.haomeiwen.com/subject/xcxlyltx.html