美文网首页
Rancher 2.1.7 单机版内部ETCD数据清理

Rancher 2.1.7 单机版内部ETCD数据清理

作者: InGramViking | 来源:发表于2021-11-04 12:39 被阅读0次

Rancher 2.1.7内部没有etcdctl 命令,通过另外启动etcd容器进行清理

日志报错:

Failed to update lock: etcdserver: mvcc: database space exceeded

参考:
Rancher Etcd inner db cannot clean
single install etcd troubleshooting

<NAME_OF_RANCHER_CONTAINER> 替换为rancher-server的CONTAINER ID
如果没有k8s的配置,可省略命令中的证书部分

v2.1.8 only (for 2.2.x, see https://gist.github.com/superseb/f223b15949c031983da2cb850f56a897)

When etcd db size exceeds quota, it will raise an alarm and throw the error mvcc: database space exceeded.

Rancher 容器启动中

You can get the current status of etcd by running:

# Copy needed etcd certificates
$ docker cp <NAME_OF_RANCHER_CONTAINER>:/etc/kubernetes/ssl etcdssl
$ docker run --rm --net=container:<NAME_OF_RANCHER_CONTAINER> -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out=table"                                                                                                                                                                                                  
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | e92d66acd89ecf29 | 3.2.13  | 2.1 GB  | true      |         3 |       5852 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
$ docker run --rm --net=container:<NAME_OF_RANCHER_CONTAINER> -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm list"
memberID:16802198677343883049 alarm:NOSPACE

Compact and defrag:
做压缩

$ rev=$(docker run --rm --net=container:<NAME_OF_RANCHER_CONTAINER> -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out json | egrep -o '\"revision\":[0-9]*' | egrep -o '[0-9]*'")                                                                                                                                       
$ echo $rev
5456
$ docker run --rm --net=container:<NAME_OF_RANCHER_CONTAINER> -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl compact $rev"                                                                                                                                                                                                                    
compacted revision 5456
$ docker run --rm --net=container:<NAME_OF_RANCHER_CONTAINER> -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl defrag"
Finished defragmenting etcd member[127.0.0.1:2379]
$ docker run --rm --net=container:<NAME_OF_RANCHER_CONTAINER> -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm disarm"                                                                                                                                                                                                                       
memberID:16802198677343883049 alarm:NOSPACE
$ docker run --rm --net=container:<NAME_OF_RANCHER_CONTAINER> -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm list"
<empty>
$ docker run --rm --net=container:<NAME_OF_RANCHER_CONTAINER> -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out=table"                                                                                                                                                                                                  
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | e92d66acd89ecf29 | 3.2.13  | 7.4 MB  | true      |         3 |       6114 |
+----------------+------------------+---------+---------+-----------+-----------+------------+

At this point, the rancher/rancher container should stop logging mvcc: database space exceeded.

Rancher container keeps crashing/restarting

Rancher一直在重启中(我是通过这个方式解决的,停掉Rancher)

In case that the rancher/rancher won't keep running, we need external maintenance to etcd as we cannot use the rancher/rancher container to perform maintenance.

# Copy needed etcd certificates
$ docker cp <NAME_OF_RANCHER_CONTAINER>:/etc/kubernetes/ssl etcdssl

# Stop Rancher container (and block restarting)
# 停掉Rancher-server的容器
$ docker stop <NAME_OF_RANCHER_CONTAINER>

# 可通过cp -ravp 命令将etcd的数据备份一下

# Run etcd container with data dir from Rancher's embedded etcd
$ docker run -d -e ETCDCTL_API=3 --name etcd-maintenance --volumes-from=<NAME_OF_RANCHER_CONTAINER> quay.io/coreos/etcd:v3.2.13 /usr/local/bin/etcd   --data-dir=/var/lib/rancher/etcd

# Check etcd status
$ docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | e92d66acd89ecf29 |  3.2.13 |  2.1 GB |      true |         7 |       8773 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
$ docker exec etcd-maintenance etcdctl alarm list
memberID:16802198677343883049 alarm:NOSPACE 

# Run compact/defrag
$ rev=$(docker exec etcd-maintenance etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
$ echo $rev
7921
$ docker exec etcd-maintenance etcdctl compact "$rev"
compacted revision 7921
$ docker exec etcd-maintenance etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]
# docker exec etcd-maintenance etcdctl alarm disarm
memberID:16802198677343883049 alarm:NOSPACE
$ docker exec etcd-maintenance etcdctl alarm list
<empty>
$ docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | e92d66acd89ecf29 |  3.2.13 |  6.3 MB |      true |         7 |       8775 |
+----------------+------------------+---------+---------+-----------+-----------+------------+

# Stop etcd-maintenance container
docker stop etcd-maintenance

# Start Rancher
 docker start <NAME_OF_RANCHER_CONTAINER>

相关文章

网友评论

      本文标题:Rancher 2.1.7 单机版内部ETCD数据清理

      本文链接:https://www.haomeiwen.com/subject/uvxkzltx.html