客户现场集群异常掉电,我们于中午进行远程恢复集群。启动etcd服务时。出现如下错误
member c77b7b06d2075637 has already been bootstrapped
查看资料说是:
One of the member was bootstrapped via discovery service. You must remove the previous data-dir to clean up the member information. Or the member will ignore the new configuration and start with the old configuration. That is why you see the mismatch.
大概意思:
其中一个成员是通过discovery service引导的。必须删除以前的数据目录来清理成员信息。否则成员将忽略新配置,使用旧配置。这就是为什么你看到了不匹配。
看到了这里,问题所在也就很明确了,启动失败的原因在于data-dir (/var/lib/etcd/default.etcd)中记录的信息与 etcd启动的选项所标识的信息不太匹配造成的。
解决方案:将该节点的etcd从集群中移除,并删除相关数据(后面可同步恢复)。再重新加入etcd集群。
1.查看现有etcd节点
export ETCDCTL_API=3
etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem member list
c666144c29031acd, started, etcd-host0, https://20.140.249.65:2380, https://20.140.249.65:2379
c77b7b06d2075637, started, etcd-host1, https://20.140.249.66:2380, https://20.140.249.66:2379
f11a3a48abfa96dd, started, etcd-host2, https://20.140.249.67:2380, https://20.140.249.67:2379
2.将报错节点移除
export ETCDCTL_API=3
[root@ga-k8s1 data]# etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem member remove c77b7b06d2075637
Member c77b7b06d2075637 removed from cluster 7ab1847bce8f7723
3.修改/usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=/app/etcd/
ExecStart=/usr/local/bin/etcd \
--name=etcd-host0 \
--data-dir=/app/etcd \
--cert-file=/etc/etcd/ssl/etcd.pem \
--key-file=/etc/etcd/ssl/etcd-key.pem \
--trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--peer-cert-file=/etc/etcd/ssl/etcd.pem \
--peer-key-file=/etc/etcd/ssl/etcd-key.pem \
--peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--peer-client-cert-auth \
--client-cert-auth \
--initial-advertise-peer-urls=https://20.140.249.66:2380 \
--listen-peer-urls=https://20.140.249.66:2380 \
--listen-client-urls=https://20.140.249.66:2379,https://127.0.0.1:2379 \
--advertise-client-urls=https://20.140.249.66:2379 \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=etcd-host0=https://20.140.249.65:2380,etcd-host1=https://20.140.249.66:2380,etcd-host2=https://20.140.249.67:2380 \
--initial-cluster-state=existing \ # 将new这个参数修改成existing.
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
4.删除数据
rm -rf /var/lib/etcd/
rm -rf /app/etcd/ # WorkingDirectory=/app/etcd/
5.重新将etcd节点进行添加
export ETCDCTL_API=2
etcdctl --endpoints=https://127.0.0.1:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/etcd/ssl/etcd.pem --key-file=/etc/etcd/ssl/etcd-key.pem member add etcd-host1 https://20.140.249.66:2380
6.启动etcd,重新加入的节点会向前两个节点重新同步数据
systemctl daemon-reload && systemctl start etcd
网友评论