美文网首页k8s
K8s集群的Etcd数据库的备份与还原

K8s集群的Etcd数据库的备份与还原

作者: 前浪浪奔浪流 | 来源:发表于2022-01-07 16:27 被阅读0次

一,Etcd数据备份及恢复

etcd的数据默认会存放在我们的命令工作目录中,我们发现数据所在的目录,会被分为两个文件夹中:

  • snap: 存放快照数据,etcd防止WAL文件过多而设置的快照,存储etcd数据状态。
  • wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中,所有数据的修改在提交前,都要先写入到WAL中。

准备工作:

建立备份存放目录
mkdir  -p /backup_$(date +%Y%m%d)
备份/etc/kubernetes目录
cp  -r  /etc/kubernetes/    /backup_$(date +%Y%m%d)/
备份/var/lib/etcd目录
cp -r /var/lib/etcd/   /backup_$(date +%Y%m%d)/ 
备份 /var/lib/kubelet目录
cp -r /var/lib/kubelet/   /backup_$(date +%Y%m%d)/ 

使用kubeadm创建的k8s集群,etcd是使用容器运行的,因此备份和还原数据库需要将容器中的etcdctl命令拷贝到操作节点系统下的/usr/bin/目录下
docker cp $(docker ps  |  grep -v etcd-mirror | grep -w etcd | awk '{print $1}'):/usr/local/bin/etcdctl /usr/bin/

A,单节点etcd数据备份和恢复

这种方式的备份和恢复,用基于文件的备份即可。Kubeadm的默认安装时,将etcd的存储数据落地到了宿主机的/var/lib/etcd/目录,将此目录下的文件定期备份起来,如果以后etcd的数据出现问题,需要恢复时,直接将文件还原到此目录下,就实现了单节点的etcd数据恢复。

(tips:如果etcd容器正在启动,是不能覆盖的,这时只需要将etcd的manifest文件[/etc/kubernetes/manifests/etcd.yaml]里的etcd版本号更改一下,然后,用docker stop命令停止etcd容器,就不会自动重启了。数据还原后,将etcd版本再回到正确的版本,kubelet服务就会自动将etcd容器重启起来)

Kubeadm安装的单master集群

V3版api:

备份ETCDCTL_API为3的etcd数据到之前的备份目录下。

ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot save /backup_$(date +%Y%m%d)/snap-$(date +%Y%m%d%H%M).db

恢复

恢复步骤:
需要先停掉Master节点的kube-apiserver和etcd容器,确保kube-apiserver已经停止了。
备注:/etc/kubernetes/manifests 这个目录就是master自动运行的一些容器,将其移走或者重命名,自然就会停止了。

cd /etc/kubernetes/
ll
image.png

停掉Master机器的kube-apiserver和etcd

mv manifests  manifests.bak

查看etcd、api是否up,等待全部停止

docker ps|grep k8s_ 

未停之前


image.png

停止之后


检查停止apiserver和etcd后对应的pod是否已经完全停止.png

重命名manifests目录后,无法再获取到pods了
可以发现manifests目录的重要性,因此建议对此目录也进行定期备份

# kubectl get pods -A
The connection to the server 192.168.100.201:6443 was refused - did you specify the right host or port?

变更/var/lib/etcd

mv /var/lib/etcd /var/lib/etcd.bak

恢复etcd数据

ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot restore /backup_20220108/snap-202201081337.db

备注:
1)ETCDCTL_API=3,指定使用 Etcd 的 v3 版本的 API;
2)endponits可以通过下面的命令查找,一般会有两个IP,一个是127.0.0.1,另外一个本机的局域网IP,如:192.168.100.201

#kubectl describe pod etcd-master -n kube-system| grep listen-client-urls
      --listen-client-urls=https://127.0.0.1:2379,https://192.168.100.201:2379

恢复manifests

mv manifests.bak  manifests

查看pod是否恢复正常了

kubectl get pod -n kube-system

V2版api:

备份ETCDCTL_API为2的etcd数据到之前的备份目录下。(未验证)

# etcdctl backup --data-dir /home/etcd/ --backup-dir /home/etcd_backup

恢复

# etcdctl -data-dir=/home/etcd_backup/  -force-new-cluster

二进制集群Etcd数据库的备份

首先需要安装etcdctl 命令行命令

yum install -y etcd

V3版api:

# ETCDCTL_API=3  etcdctl snapshot save snap.20220107.db --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem --endpoints="https://192.168.119.72:2379"

{"level":"info","ts":1630499882.9289303,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"snap.db.part"}
{"level":"info","ts":"2022-01-07T20:38:02.933+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1630499882.933808,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://192.168.119.72:2379"}
{"level":"info","ts":"2022-01-07T20:38:03.040+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1630499883.0697453,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://192.168.119.72:2379","size":"13 MB","took":0.140736973}
{"level":"info","ts":1630499883.0698237,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"snap.db"}
Snapshot saved at snap.20220107.db
# ls -ltr
-rw-------  1 root root 12906528 1月   7 20:38 snap.20220107.db

二进制集群Etcd数据库的还原

下面的二进制etcd集群数据库的还原操作没有在二进制集群实际验证,只是理论步骤,请勿在生产环境直接操作!

systemctl stop kube-apiserver
systemctl stop etcd
mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak

--->如果不知道二进制集群的etcd数据库放在哪里了,可以这样查看

# systemctl cat etcd.service
 
# ETCDCTL_API=3 etcdctl snapshot restore /data/backup/etcd-snapshot-previous.db --data-dir=/var/lib/etcd/default.etcd
 
# chown -R etcd:etcd /var/lib/etcd

# systemctl start kube-apiserver
# systemctl start etcd.service

B,etcd集群数据的备份和恢复

Kubeadm安装的多master集群

V3版api:

备份ETCDCTL_API为3的etcd数据到之前的备份目录下。
可以在多个master节点上执行备份操作

ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379"  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot save /backup_$(date +%Y%m%d)/snap-$(date +%Y%m%d%H%M).db

恢复

恢复步骤:
需要先停掉所有Master节点的kube-apiserver和etcd,确保kube-apiserver已经停止了。

需要分别在master1、master2、master3上进行同样的操作

cd /etc/kubernetes/
ll
image.png

停掉Master机器的kube-apiserver和etcd

mv manifests  manifests.bak

查看etcd、api是否up,等待全部停止

docker ps|grep k8s_ 

未停之前


image.png

停止之后


检查停止apiserver和etcd后对应的pod是否已经完全停止.png

变更/var/lib/etcd

mv /var/lib/etcd /var/lib/etcd.bak

恢复etcd数据
etcd集群用同一份snapshot恢复;

scp /backup_20220108/snap-202201081337.db root@192.168.100.172:/backup_20220108/
scp /backup_20220108/snap-202201081337.db root@192.168.100.173:/backup_20220108/

在master1上执行

ETCDCTL_API=3 etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
    --endpoints=192.168.100.171:2379 \
    --name=master1 \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --initial-advertise-peer-urls=https://192.168.100.171:2380 \
    --initial-cluster-token=etcd-cluster-0 \
    --initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
    --data-dir=/var/lib/etcd

在master2上执行

ETCDCTL_API=3 etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
    --endpoints=192.168.100.172:2379 \
    --name=master2 \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --initial-advertise-peer-urls=https://192.168.100.172:2380 \
    --initial-cluster-token=etcd-cluster-0 \
    --initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
    --data-dir=/var/lib/etcd

在master3上执行

ETCDCTL_API=3
etcdctl snapshot restore /backup_20220108/snap-202201081337.db \
    --endpoints=192.168.100.173:2379 \
    --name=master3 \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --initial-advertise-peer-urls=https://192.168.100.173:2380 \
    --initial-cluster-token=etcd-cluster-0 \
    --initial-cluster=master1=https://192.168.100.171:2380,master2=https://192.168.100.172:2380,master3=https://192.168.100.173:2380 \
    --data-dir=/var/lib/etcd

备注:
1)ETCDCTL_API=3,指定使用 Etcd 的 v3 版本的 API;
2)如果不知道 --name= 则可以用如下命令查看

集群列出成员
ETCDCTL_API=3 etcdctl --endpoints 192.168.100.171:2379,192.168.100.172:2379,192.168.100.173:2379 --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt" member list --write-out=table
返回结果:

+------------------+---------+---------+------------------------------+------------------------------+------------+
|        ID        | STATUS  |  NAME   |          PEER ADDRS          |         CLIENT ADDRS         | IS LEARNER |
+------------------+---------+---------+------------------------------+------------------------------+------------+
| 442ee8f1d97e7dcd | started | master3 | https://192.168.100.173:2380 | https://192.168.100.173:2379 |      false |
| 4972579f39eb9468 | started | master1 | https://192.168.100.171:2380 | https://192.168.100.171:2379 |      false |
| 4bff6a42b677cc19 | started | master2 | https://192.168.100.172:2380 | https://192.168.100.172:2379 |      false |
+------------------+---------+---------+------------------------------+------------------------------+------------+

在三台master节点上恢复manifests

mv manifests.bak  manifests

查看pod是否恢复正常了

kubectl get pod -n kube-system

刚恢复时的结果


恢复数据后的pods的情况.png

恢复后1分钟后的结果

恢复后1分钟左右的pods状态.png

Kubeadm安装的多master集群

二进制部署方式安装的多etcd节点集群备份:

首先需要安装etcdctl 命令行命令

yum install -y etcd

备份

ETCDCTL_API=3 etcdctl \
snapshot save snap.db \
--endpoints=https://192.168.10.160:2379 \
--cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem

恢复
先暂停kube-apiserver和etcd

systemctl stop kube-apiserver
systemctl stop etcd etcd
mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak

在每个节点上恢复
节点一恢复

ETCDCTL_API=3 etcdctl snapshot restore snap.db \
--name etcd-1 \
--initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
--initial-advertise-peer-url=https://192.168.10.160:2380 \
--data-dir=/var/lib/etcd/default.etcd
ETCDCTL_API=3 etcdctl snapshot restore snap.db \
--name etcd-2 \
--initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
--initial-advertise-peer-url=https://192.168.10.162:2380 \
--data-dir=/var/lib/etcd/default.etcd
ETCDCTL_API=3 etcdctl snapshot restore snap.db \
--name etcd-3 \
--initial-cluster= "etcd-1=https://192.168.10.160:2380,etcd-2=https://192.168.10.161:2380,etcd-3=https:192.168.10.162:2380" \
--initial-advertise-peer-url=https://192.168.10.162:2380 \
--data-dir=/var/lib/etcd/default.etcd

启动kube-apiserver和etcd

mv /var/lib/etcd/default.etcd.bak /var/lib/etcd/default.etcd
systemctl start kube-apiserver
systemctl start etcd.service

参考:https://blog.csdn.net/cnskylee/article/details/120048464

参考:etcd 灾难恢复文档

参考k8s中文文档
https://kubernetes.io/zh/docs/tasks/administer-cluster/configure-upgrade-etcd/

参考:https://blog.csdn.net/qq_27234433/article/details/113731407

Kubernetes的ETCD集群备份、恢复
https://blog.csdn.net/heian_99/article/details/123398209

相关文章

网友评论

    本文标题:K8s集群的Etcd数据库的备份与还原

    本文链接:https://www.haomeiwen.com/subject/mvvjcrtx.html