以All-in-One
模式部署在虚拟机的 Kubernetes 和 KubeSphere ,在经历虚机频繁的重启和关闭,在某次启动后,出现以下情况。
root@shawn-virtual-machine:~# k get node
E0902 13:17:50.626394 13014 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
E0902 13:17:50.626662 13014 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
E0902 13:17:50.628043 13014 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
E0902 13:17:50.628178 13014 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
E0902 13:17:50.629708 13014 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
The connection to the server lb.kubesphere.local:6443 was refused - did you specify the right host or port?
root@shawn-virtual-machine:~# k get pod -A
E0902 13:17:54.921584 13084 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
E0902 13:17:54.922120 13084 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
E0902 13:17:54.923808 13084 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
E0902 13:17:54.923964 13084 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
E0902 13:17:54.925327 13084 memcache.go:265] couldn't get current server API group list: Get "https://lb.kubesphere.local:6443/api?timeout=32s": dial tcp 192.168.17.18:6443: connect: connection refused
The connection to the server lb.kubesphere.local:6443 was refused - did you specify the right host or port?
1、检查docker
状态
systemctl status docker
2、检查kubelet
状态
systemctl status kubelet
3、检查6443
端口状态
netstat -pnlt | grep 6443
6443
端口没有被监听
4、查看kubelet
日志
journalctl -xeu kubelet
5、判断可能是etcd
出现问题,检查etcd
状态
systemctl status etcd
ETCDCTL_API=3 etcdctl --endpoints 192.168.17.18:2379 \
--cert=/etc/ssl/etcd/ssl/node-shawn-virtual-machine.pem \
--key=/etc/ssl/etcd/ssl/node-shawn-virtual-machine-key.pem \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
member list
ETCDCTL_API=3 etcdctl --endpoints 192.168.17.18:2379 \
--cert=/etc/ssl/etcd/ssl/node-shawn-virtual-machine.pem \
--key=/etc/ssl/etcd/ssl/node-shawn-virtual-machine-key.pem \
--cacert=/etc/ssl/etcd/ssl/ca.pem \
endpoint health
6、手动启动etcd
etcd --data-dir=/var/lib/etcd --listen-client-urls=http://192.168.17.18:2379 \
--advertise-client-urls=http://192.168.17.18:2379
启动报错:
panic: freepages: failed to get all reachable pages (key[0]=(hex)616c61726d on leaf page(437) needs to be < than key of the next element in ancestor (hex)000000000000b3f85f0000000000000000. Pages stack: [3100 437])
7、从备份(/root/tmp/snapshot.db
)中恢复
rm -rf /var/lib/etcd
etcdutl --data-dir=/var/lib/etcd snapshot restore /root/tmp/snapshot.db
root@shawn-virtual-machine:/var/lib# etcdutl --data-dir=/var/lib/etcd snapshot restore /root/tmp/snapshot.db 2024-09-02T14:28:13+08:00 info snapshot/v3_snapshot.go:260 restoring snapshot {"path": "/root/tmp/snapshot.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}
2024-09-02T14:28:13+08:00 info membership/store.go:141 Trimming membership information from the backend...
2024-09-02T14:28:13+08:00 info membership/cluster.go:421 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2024-09-02T14:28:13+08:00 info snapshot/v3_snapshot.go:287 restored snapshot {"path": "/root/tmp/snapshot.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}
root@shawn-virtual-machine:/var/lib#
root@shawn-virtual-machine:/var/lib# ETCDCTL_API=3 etcdctl --endpoints 192.168.17.18:2379 --cert=/etc/ssl/etcd/ssl/node-shawn-virtual-machine.pem --key=/etc/ssl/etcd/ssl/node-shawn-virtual-machine-key.pem --cacert=/etc/ssl/etcd/ssl/ca.pem member list
8e9e05c52164694d, started, etcd-shawn-virtual-machine, http://localhost:2380, https://192.168.17.18:2379, false
8、启动成功后,执行k get node
,报错
Error from server (Forbidden): nodes is forbidden: User "kubernetes-admin" cannot list resource "nodes" in API group "" at the cluster scope
# kubeadm certs certificate-key
63881154cf600a52f90fc673e5dfaf529d0a91eca548bcb3afc6465407dd344b
# kubeadm init phase upload-certs --upload-certs --certificate-key 63881154cf600a52f90fc673e5dfaf529d0a91eca548bcb3afc6465407dd344b
网友评论