美文网首页Zabbix
Zabbix集群升级

Zabbix集群升级

作者: 苏水的北 | 来源:发表于2021-06-18 17:58 被阅读0次

    一、###Zabbix集群构建:

    Mysql的MHA架构清单:

    主机名 IP 主/从 MHA高可用软件 读写分离软件
    db01 192.168.98.5 绑定VIP:192.168.98.92
    db02 192.168.98.6
    db03 192.168.98.13 MHA软件部署位置 读写分离软件 部署位置

    Zabbix的web端清单:

    主机名 IP 主/从 功能
    zabbix01 192.168.98.5 zabbix服务端
    zabbix02 192.168.98.6 zabbix服务端
    负载均衡 192.168.98.6 提供负载均衡功能

    二、给db01、db02、db03服务器部署Mysql数据库:

    注意:下面只给出db01上搭建mysql的步骤,db02和db03用相同的方法搭建。
    2.1 创建相关目录:

    [root@db01 ~]# mkdir /application           ----> 软件的存放目录
    [root@db01 /]# mkdir /data/3306/data -p     ----> 数据的存放位置
    

    2.2 创建用户:

    [root@db01 /]# useradd -s /sbin/nologin -M mysql
    

    2.3 上传软件并解压处理:

    [root@db01 application]# ll
    total 1079692
    -rw-r--r--  1 mysql mysql 644869837 Aug  5 09:31 mysql-5.7.26-linux-glibc2.12-x86_64.tar.gz
    [root@db01 /application]# tar xf mysql-5.7.26-linux-glibc2.12-x86_64
    [root@db01 /application]# mv mysql-5.7.26-linux-glibc2.12-x86_64 mysql
    [root@db01 application]# ll
    total 1079692
    drwxr-xr-x  9 mysql mysql       129 Aug  5 12:22 mysql
    -rw-r--r--  1 mysql mysql 644869837 Aug  5 09:31 mysql-5.7.26-linux-glibc2.12-x86_64.tar.gz
    注:软件包可以去mysql.com官网下载
    

    2.4 数据初始化(创建系统数据):

    [root@db01 ~]# rpm -qa|grep maria
    mariadb-libs-5.5.60-1.el7_5.x86_64
    [root@db01 ~]# yum  remove mariadb-libs-5.5.60-1.el7_5.x86_64 -y
    "注:由于centos7系统自带mariadb数据库,所以必须要将其删除。"
    [root@db01 ~]# vim /etc/profile
    export PATH=/application/mysql/bin:$PATH
    [root@db01 ~]# mysql -V   //次命令用于检查mysql版本信息
    mysql  Ver 14.14 Distrib 5.7.26, for linux-glibc2.12 (x86_64) using  EditLine wrapper
    [root@db01 data]# mysqld --initialize-insecure --user=mysql --basedir=/application/mysql --datadir=/data/3306/data  //数据库初始化命令
    2019-08-05T08:41:34.391208Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
    2019-08-05T08:41:36.705525Z 0 [Warning] InnoDB: New log files created, LSN=45790
    2019-08-05T08:41:36.910614Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
    2019-08-05T08:41:37.021365Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: d63578ab-b75c-11e9-a331-000c29a71104.
    2019-08-05T08:41:37.022771Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
    2019-08-05T08:41:37.023736Z 1 [Warning] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
    

    2.5 书写配置文件:

    cat >/etc/my.cnf <<EOF
    [mysqld]
    user=mysql
    basedir=/application/mysql
    datadir=/data/3306/data
    socket=/tmp/mysql.sock
    [mysql]
    socket=/tmp/mysql.sock
    prompt=wuxin[\\d]>   //wuxin变量可以自己定义
    EOF
    

    2.6 准备启动脚本:

    [root@db01 ~]#chown -R mysql.mysql /data/* /application/mysql/*
    [root@db01 ~]#cp /application/mysql/support-files/mysql.server /etc/init.d/mysqld 
    [root@db01 ~]# /etc/init.d/mysqld start   //启动mysql方法
    Starting MySQL.Logging to '/data/3306/data/db01.err'.
    SUCCESS! 
    

    2.7 使用systemd管理mysql:

    [root@db01 ~]# /etc/init.d/mysqld start   //因为要用systemctl去管理,所以先按上一步的方法关闭mysql
    [root@db01 ~]# cat > /etc/systemd/system/mysqld.service <<EOF
    [Unit]
    Description=MySQL Server
    Documentation=man:mysqld(8)
    Documentation=http://dev.mysql.com/doc/refman/en/using-systemd.html
    After=network.target
    After=syslog.target
    [Install]
    WantedBy=multi-user.target
    [Service]
    User=mysql
    Group=mysql
    ExecStart=/application/mysql/bin/mysqld --defaults-file=/etc/my.cnf
    LimitNOFILE = 5000
    EOF
    注意:将原来模式启动mysqld先关闭,然后再用systemd管理。
    

    2.8 用systemctl管理启动数据库:

    [root@db01 ~]#systemctl  start   mysqld
    

    三、给db01、db02、db03三台数据库构建主从关系,开启GTID复制功能:

    3.1 清理环境,三台数据库都操作:

    pkill mysqld
     \rm -rf /data/3306/data/*
     \rm -rf /data/binlog/*
    mkdir -p /data/3306/data 
    mkdir -p /data/binlog/
    chown -R mysql.mysql  /data/*
    

    3.2 准备配置文件,每台数据库填写各自的配置文件:

    # 主库db01:
    cat > /etc/my.cnf <<EOF
    [mysqld]
    basedir=/application/mysql/
    datadir=/data/3306/data
    socket=/tmp/mysql.sock
    server_id=51
    port=3306
    secure-file-priv=/tmp
    autocommit=0
    log_bin=/data/binlog/mysql-bin
    binlog_format=row
    gtid-mode=on
    enforce-gtid-consistency=true
    log-slave-updates=1
    [mysql]
    prompt=db01 [\\d]>
    EOF
    
    
    # slave1(db02):
    
    cat > /etc/my.cnf <<EOF
    [mysqld]
    basedir=/application/mysql
    datadir=/data/3306/data
    socket=/tmp/mysql.sock
    server_id=52
    port=3306
    secure-file-priv=/tmp
    autocommit=0
    log_bin=/data/binlog/mysql-bin
    binlog_format=row
    gtid-mode=on
    enforce-gtid-consistency=true
    log-slave-updates=1
    [mysql]
    prompt=db02 [\\d]>
    EOF
    
    # slave2(db03):
    
    cat > /etc/my.cnf <<EOF
    [mysqld]
    basedir=/application/mysql
    datadir=/data/3306/data
    socket=/tmp/mysql.sock
    server_id=53
    port=3306
    secure-file-priv=/tmp
    autocommit=0
    log_bin=/data/binlog/mysql-bin
    binlog_format=row
    gtid-mode=on
    enforce-gtid-consistency=true
    log-slave-updates=1
    [mysql]
    prompt=db03 [\\d]>
    EOF
    

    3.3 初始化数据,三台数据库都执行:

    mysqld --initialize-insecure --user=mysql --basedir=/application/mysql  --datadir=/data/3306/data 
    

    3.4 启动数据库,三台数据库都启动:

    systemctl start  mysqld
    

    3.5 构建主从:
    db01主库执行:

    mysql -e "grant replication slave  on *.* to repl@'192.168.%.%' identified by '123456';"
    

    db02从库和db03从库分别执行:

    mysql -e "change master to master_host='192.168.98.5',master_user='repl',master_password='123456' ,MASTER_AUTO_POSITION=1;"
    mysql -e  "start slave;"
    [root@db02 ~]# mysql -e  "show slave status \G"|grep Running:
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    [root@db03 ~]# mysql -e  "show slave status \G"|grep Running:
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    

    总结:做完上述步骤,就完成了一主两从的搭建。

    四、MHA环境搭建

    规划:
    主库:
    192.168.98.5 主库
    VIP:192.168.98.92
    从库:
    192.168.98.6 从库
    192.168.98.13 从库、MHA软件的manager管理端
    4.1、配置关键程序软连接(三台数据库都执行):

    ln -s /application/mysql/bin/mysqlbinlog    /usr/bin/mysqlbinlog
    ln -s /application/mysql/bin/mysql          /usr/bin/mysql
    

    4.2、配置各节点互信(在db01上执行):

    rm -rf /root/.ssh 
    ssh-keygen
    cd /root/.ssh 
    mv id_rsa.pub authorized_keys
    scp  -r  /root/.ssh  192.168.98.6:/root 
    scp  -r  /root/.ssh  192.168.98.13:/root 
    

    各节点验证
    db01:

    ssh 192.168.98.5 date
    ssh 192.168.98.6 date
    ssh 192.168.98.13 date
    

    db02:

    ssh 192.168.98.5 date
    ssh 192.168.98.6 date
    ssh 192.168.98.13 date
    

    db03:

    ssh 192.168.98.5 date
    ssh 192.168.98.6 date
    ssh 192.168.98.13 date
    

    4.3、安装软件
    4.3.1、下载mha软件
    mha官网:https://code.google.com/archive/p/mysql-master-ha/
    github下载地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads

    4.3.2、所有节点安装Node软件依赖包(三台数据库都执行):

    yum install perl-DBD-MySQL -y
    rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
    

    4.3.3、在db01主库中创建mha需要的用户(因为主从已经搭建完成,所以在db01上创建mha用户,db02和db03上面也会存在mha用户):

    grant all privileges on *.* to mha@'192.168.%.%' identified by 'mha';
    

    4.3.4、Manager软件安装(因为db03作为MHA软件的manager,所以只在它上面安装)

    yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
    rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
    

    4.3.5、配置文件准备(db03上配置mha的配置文件)
    创建配置文件目录

     mkdir -p /etc/mha
    

    创建日志目录

     mkdir -p /var/log/mha/app1
    

    编辑mha配置文件:

    vim /etc/mha/app1.cnf
    [server default]
    manager_log=/var/log/mha/app1/manager   ##MHA架构manager软件的日志(工作流程日志)
    manager_workdir=/var/log/mha/app1      ##工作路径      
    master_binlog_dir=/data/binlog       ##主库的二进制日志路径
    user=mha          ##连接数据库用户                         
    password=mha    ##连接数据库密码                           
    ping_interval=2    ##ping时间间隔
    repl_password=123456  ##主从复制用户密码
    repl_user=repl     ##主从复制用户
    ssh_user=root        ##互信用户                       
    [server1]                                   
    hostname=192.168.98.5
    port=3306                                  
    [server2]            
    hostname=192.168.98.6
    port=3306
    [server3]
    hostname=192.168.98.13
    port=3306
    

    4.3.6、 状态检查
    4.3.6.1、互信检查(db03里面执行)

    [root@db03 ~]# masterha_check_ssh  --conf=/etc/mha/app1.cnf
    
    Fri Apr 19 16:39:34 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Fri Apr 19 16:39:34 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
    Fri Apr 19 16:39:34 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
    Fri Apr 19 16:39:34 2019 - [info] Starting SSH connection tests..
    Fri Apr 19 16:39:35 2019 - [debug] 
    Fri Apr 19 16:39:34 2019 - [debug]  Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.52(10.0.0.52:22)..
    Fri Apr 19 16:39:34 2019 - [debug]   ok.
    Fri Apr 19 16:39:34 2019 - [debug]  Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.53(10.0.0.53:22)..
    Fri Apr 19 16:39:35 2019 - [debug]   ok.
    Fri Apr 19 16:39:36 2019 - [debug] 
    Fri Apr 19 16:39:35 2019 - [debug]  Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.51(10.0.0.51:22)..
    Fri Apr 19 16:39:35 2019 - [debug]   ok.
    Fri Apr 19 16:39:35 2019 - [debug]  Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.53(10.0.0.53:22)..
    Fri Apr 19 16:39:35 2019 - [debug]   ok.
    Fri Apr 19 16:39:37 2019 - [debug] 
    Fri Apr 19 16:39:35 2019 - [debug]  Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.51(10.0.0.51:22)..
    Fri Apr 19 16:39:35 2019 - [debug]   ok.
    Fri Apr 19 16:39:35 2019 - [debug]  Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.52(10.0.0.52:22)..
    Fri Apr 19 16:39:36 2019 - [debug]   ok.
    Fri Apr 19 16:39:37 2019 - [info] All SSH connection tests passed successfully.
    

    4.3.6.2、主从状态检查(db03里面)

    [root@db03 ~]# masterha_check_repl  --conf=/etc/mha/app1.cnf 
    Fri Jun 18 05:40:31 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Fri Jun 18 05:40:31 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
    Fri Jun 18 05:40:31 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
    Fri Jun 18 05:40:31 2021 - [info] MHA::MasterMonitor version 0.56.
    Fri Jun 18 05:40:32 2021 - [info] GTID failover mode = 1
    Fri Jun 18 05:40:32 2021 - [info] Dead Servers:
    Fri Jun 18 05:40:32 2021 - [info] Alive Servers:
    Fri Jun 18 05:40:32 2021 - [info]   192.168.98.5(192.168.98.5:3306)
    Fri Jun 18 05:40:32 2021 - [info]   192.168.98.6(192.168.98.6:3306)
    Fri Jun 18 05:40:32 2021 - [info]   192.168.98.13(192.168.98.13:3306)
    Fri Jun 18 05:40:32 2021 - [info] Alive Slaves:
    Fri Jun 18 05:40:32 2021 - [info]   192.168.98.6(192.168.98.6:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
    Fri Jun 18 05:40:32 2021 - [info]     GTID ON
    Fri Jun 18 05:40:32 2021 - [info]     Replicating from 192.168.98.5(192.168.98.5:3306)
    Fri Jun 18 05:40:32 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
    Fri Jun 18 05:40:32 2021 - [info]   192.168.98.13(192.168.98.13:3306)  Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
    Fri Jun 18 05:40:32 2021 - [info]     GTID ON
    Fri Jun 18 05:40:32 2021 - [info]     Replicating from 192.168.98.5(192.168.98.5:3306)
    Fri Jun 18 05:40:32 2021 - [info] Current Alive Master: 192.168.98.5(192.168.98.5:3306)
    Fri Jun 18 05:40:32 2021 - [info] Checking slave configurations..
    Fri Jun 18 05:40:32 2021 - [info]  read_only=1 is not set on slave 192.168.98.6(192.168.98.6:3306).
    Fri Jun 18 05:40:32 2021 - [info]  read_only=1 is not set on slave 192.168.98.13(192.168.98.13:3306).
    Fri Jun 18 05:40:32 2021 - [info] Checking replication filtering settings..
    Fri Jun 18 05:40:32 2021 - [info]  binlog_do_db= , binlog_ignore_db= 
    Fri Jun 18 05:40:32 2021 - [info]  Replication filtering check ok.
    Fri Jun 18 05:40:32 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
    Fri Jun 18 05:40:32 2021 - [info] Checking SSH publickey authentication settings on the current master..
    Fri Jun 18 05:40:32 2021 - [info] HealthCheck: SSH to 192.168.98.5 is reachable.
    Fri Jun 18 05:40:32 2021 - [info] 
    192.168.98.5(192.168.98.5:3306) (current master)
     +--192.168.98.6(192.168.98.6:3306)
     +--192.168.98.13(192.168.98.13:3306)
    
    Fri Jun 18 05:40:32 2021 - [info] Checking replication health on 192.168.98.6..
    Fri Jun 18 05:40:32 2021 - [info]  ok.
    Fri Jun 18 05:40:32 2021 - [info] Checking replication health on 192.168.98.13..
    Fri Jun 18 05:40:32 2021 - [info]  ok.
    Fri Jun 18 05:40:32 2021 - [info] Checking master_ip_failover_script status:
    Fri Jun 18 05:40:32 2021 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.98.5 --orig_master_ip=192.168.98.5 --orig_master_port=3306 
    
    
    IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.98.92/24===
    
    Checking the Status of the script.. OK 
    Fri Jun 18 05:40:32 2021 - [info]  OK.
    Fri Jun 18 05:40:32 2021 - [warning] shutdown_script is not defined.
    Fri Jun 18 05:40:32 2021 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.
    

    4.3.7、 开启MHA(db03):

    [root@db03 ~]#nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
    

    停止MHA命令:

    [root@db03 ~]#masterha_stop --conf=/etc/mha/app1.cnf
    

    4.3.8、 查看MHA状态(db03):

    [root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
    app1 (pid:1101) is running(0:PING_OK), master:192.168.98.5
    

    4.4 故障模拟及处理

    (1) 停主库db01:

    systemctl stop mysqld
    

    备注:观察manager 日志 tail -f /var/log/mha/app1/manager
    末尾必须显示successfully,才算正常切换成功。
    (2) 主机宕机,修复主库

    [root@db01 ~]# systemctl start  mysqld
    

    备注:启动完成主库后,集群并不认识,所以需要后续操作。

    (3) 修复主从(把它重新加入MHA集群,当做从库添加进去)

    [root@db01 ~]#mysql -e "change master to master_host(新主库ip)='192.168.98.6',master_user='repl',master_password='123456' ,MASTER_AUTO_POSITION=1;"
    [root@db01 ~]#mysql -e  "start slave;"
    [root@db01 ~]# mysql -e  "show slave status \G"|grep Running:
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    

    (4) 恢复配置文件(在db03中做)

    [root@db03 /etc/mha]# cat app1.cnf 
    [server default]
    manager_log=/var/log/mha/app1/manager
    manager_workdir=/var/log/mha/app1
    master_binlog_dir=/data/binlog
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    password=mha
    ping_interval=2
    repl_password=123456
    repl_user=repl
    report_script=/usr/local/bin/send
    ssh_user=root
    user=mha
    
    [server1]   ##server1节点重新加入,因为之前他宕机了,系统把它自动清理掉。
    candidate_master=1
    hostname=192.168.98.5
    port=3306
    
    [server2]
    candidate_master=1
    hostname=192.168.98.6
    port=3306
    
    [server3]
    hostname=192.168.98.13
    port=3306
    

    (5) 启动MHA

    nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
    

    检查状态:

    masterha_check_status --conf=/etc/mha/app1.cnf
    

    备注:故障事遇到问题,如何修复:
    主库由db2切换至db1,但是在db2上指定主库db1时候报错:

    [root@db02 ~]# mysql -e "change master to master_host='192.168.98.5',master_user='repl',master_password='123456' ,MASTER_AUTO_POSITION=1;"
    ERROR 3081 (HY000) at line 1: This operation cannot be performed with running replication threads; run STOP SLAVE FOR CHANNEL '' first
    

    解决办法:把主从复制停止,然后切换到主库,再把主从复制开启解决故障:

    [root@db02 ~]# mysql -e  "stop slave;"
    [root@db02 ~]# mysql -e "change master to master_host='192.168.98.5',master_user='repl',master_password='123456' ,MASTER_AUTO_POSITION=1;"
    [root@db02 ~]# mysql -e  "start slave;"
    [root@db02 ~]# mysql -e  "show slave status \G"|grep Running:
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    MHA上检查信息:
    [root@db03 ~]# masterha_check_repl  --conf=/etc/mha/app1.cnf 
    MySQL Replication Health is OK.
    

    后面就可以正常开启MHA,再检查主库信息,修复完成。
    启动MHA:

    nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
    检查状态:
    masterha_check_status --conf=/etc/mha/app1.cnf
    

    相关文章

      网友评论

        本文标题:Zabbix集群升级

        本文链接:https://www.haomeiwen.com/subject/xcxlyltx.html