一、背景
mongoDB主从数据同步
主库数据小于从库,从库回滚到跟主库同步状态
image.png
从上可知,假如原来主从数据同步出了问题,主库数据量远大于从库数据量,而你并没有察觉,某次主、从、仲裁进程全部挂了
如果先起原来从库和仲裁,将原来从库选举为现在PRIMARY,然后再启原来的主库成为现在的从库,那么就会导致原来的主库也就是现在的从库数据发生回滚
显然这不是你想看到的的情况,所以务必先启动原来的主库,让其也成为现在的主库,避免此类问题的发生!
二、mongoDB重启playbook
针对primary-secondary-secondary架构,mongoDB的实例重启:
1.从primary-secondary-secondary日志中获取判断之前主节点IP
2.启动primary节点mongo进程
3.启动其他secondary节点mongo进程
$ cat hosts
[mongo]
xx.xx.xx.xx ansible_ssh_host=xx.xx.xx.xx ansible_ssh_pass=XXX
xx.xx.xx.xx ansible_ssh_host=xx.xx.xx.xx ansible_ssh_pass=XXX
xx.xx.xx.xx ansible_ssh_host=xx.xx.xx.xx ansible_ssh_pass=XXX
[all:vars]
ansible_ssh_extra_args='-o StrictHostKeyChecking=no'
ansible_hosts_dir='/path/to/'
instance_name='XXX'
instance_port='XXX'
$ cat start_primary-secondary-secondary.yml
# 从实例日志中找出断电前主节点ip
- hosts: mongo
gather_facts: false
tasks:
- name: "mongo"
block:
- name: "print mongo instance name"
debug:
msg: "#################### mongo ####################"
- name: "select primary ip from log"
shell: |
primary_ip=`grep -w "now in state PRIMARY" /path/to/mongod.log | awk 'END {print}' | awk -F "Member" '{print $2}' |awk -F ":" '{print $1}'`
echo "${primary_ip}"
register: mongo_primary_ip
- name: "print mongo primary ip"
debug:
msg: "{{ mongo_primary_ip }}"
# 检查3个节点日志中primary ip的一致性
- hosts: localhost
gather_facts: false
tasks:
- name: "localhost"
block:
- name: "check the consistency of primary ip from log"
shell: |
ip0="{{ hostvars[groups['mongo'][0]].mongo_primary_ip.stdout_lines[0] }}"
ip1="{{ hostvars[groups['mongo'][1]].mongo_primary_ip.stdout_lines[0] }}"
ip2="{{ hostvars[groups['mongo'][2]].mongo_primary_ip.stdout_lines[0] }}"
if [ "${ip0}${ip1}${ip2}" == "" -o "${ip0}${ip1}" == "" -o "${ip1}${ip2}" == "" -o "${ip0}${ip2}" == "" ]; then
echo "实例日志中,至少两个primary ip为空,无法判断,请检查!"
exit 1
fi
if [ "${ip0}" == "${ip1}" ]; then
echo "${ip0}"
elif [ "${ip1}" == "${ip2}" ]; then
echo "${ip1}"
elif [ "${ip0}" == "${ip2}" ]; then
echo "${ip2}"
else
echo "实例日志中,至少两个primary ip不同,无法判断,请检查!"
exit 1
fi
register: consistent_primary_ip
- name: "print consistent primary ip"
debug:
msg: "{{ consistent_primary_ip.stdout }}"
# 如果至少有2个节点日志中primary ip一致,启动primary,如果启动失败,退出实例启动程序
- name: "start primary"
shell: ansible -i {{ ansible_hosts_dir }}/hosts "{{ consistent_primary_ip.stdout }}" -m shell -a "sh /path/to/start_mongo.sh"
- name: "check if primary is started"
shell: |
if ansible -i {{ ansible_hosts_dir }}/hosts "{{ consistent_primary_ip.stdout }}" -m shell -a 'ps aux |grep -w "mongod" | grep -w "{{ instance_name}}" | grep -w "{{ instance_port }}" | grep -v "grep"'; then
echo "主节点启动成功!"
else
echo "主节点启动失败,退出实例启动程序,请检查!"
exit 1
fi
register: primary_start_result
- name: "print primary start result"
debug:
msg: "{{ primary_start_result.stdout_lines }}"
# 启动secondary
- hosts: mongo
gather_facts: false
tasks:
- name: "mongo"
block:
- name: "start secondary"
shell: >
if ! ps aux |grep -w "mongod" | grep -w "{{ instance_name }}" | grep -w "{{ instance_port }}" | grep -v "grep" > /dev/null 2>&1; then
sh /path/to/start_mongo.sh
fi
$ ansible-playbook -i hosts start_primary-secondary-secondary.yml
三、参考
mongoDB副本集 stateStr状态说明
https://www.jianshu.com/p/7f196c22af43
网友评论