美文网首页云计算
k8s替换控制节点-包含etcd

k8s替换控制节点-包含etcd

作者: cloudFans | 来源:发表于2022-09-27 08:43 被阅读0次

    如果你第一个替换的就是第一个控制节点,那么请注意一定要按照如下操作流程进行。
    在kubespray的部署方式中,默认会认为序列为1的节点应该执行集群初始化,无论是etcd还是kubeadm。所以如果替换的是第一个节点,应该就是要把该节点的第一序列位改成其他完好的节点。

    参考: https://github.com/kubernetes-sigs/kubespray/blob/master/docs/nodes.md#3-edit-cluster-info-configmap-in-kube-public-namespace

    ### 1) Change control plane nodes order in inventory
    
    from
    
    source-ini
    [kube_control_plane]
     node-1
     node-2
     node-3
    
    
    to
    
    source-ini
    [kube_control_plane]
     node-2
     node-3
     node-1
    

    2) Remove old first control plane node from cluster

    With the old node still in the inventory, run remove-node.yml. You need to pass -e node=node-1 to the playbook to limit the execution to the node being removed. If the node you want to remove is not online, you should add reset_nodes=false and allow_ungraceful_removal=true to your extra-vars.

    3) Edit cluster-info configmap in kube-public namespace

    kubectl edit cm -n kube-public cluster-info

    Change ip of old kube_control_plane node with ip of live kube_control_plane node (server field). Also, update certificate-authority-data field if you changed certs.

    4) Add new control plane node

    Update inventory (if needed)

    Run cluster.yml with --limit=kube_control_plane

    
    尤其要注意第三个步骤,将原来的cluster-info 从指向master1转为指向master2
    
    
    <         certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1ekNDQWMrZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1Ea3lOREV6TURVMU1sb1hEVE15TURreU1URXpNRFUxTWxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBSjk1Cmw4NXhaUU56akI3VE9sY3FmUSs5OHU3cWZWYkFCMmx1c2NXVGR6NEF4N3hBME1id2U4REJTd1BXUnpEbURUczkKYWJha056VGpEeDUralJlUldaU29EdmZHNDhTaTVUeFdybDRROVdYNGpjMXhUQjJCTDNWTklqUFFBTUxuK0hOaAozVkQ3VjJaYkJLaDRySUpIaEZlVERDV3U1S3kweUtGYnFqS2gvUXZDbUJ1QlJkZlVaQkdha1pFbVZYOWlKd3YvClZlazRkb2pyb3Q4emNRajhGazVQd0RUeE0zREc5My8zS3MySnd3RTBJOWhkZTlBdDlPZTRzdmtuUmgyOTdlb0QKMXltZjRmc1YzU2IxbGFSbG82MnpTblRkWjJXWmhXVHFGK0ZsQ1pNTGs2d3M4dkE4VWdwTFl4U2w5N0tEV0srVgpteVEzeHB3ZTR2TjUxdWpYZU84Q0F3RUFBYU5DTUVBd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZNNzExekJHNy9MNGFoVk5hZjBqNVVUYVRpdlBNQTBHQ1NxR1NJYjMKRFFFQkN3VUFBNElCQVFDV3VsMUo5aHVLUldTSmJoVWU3STdiNVBzMXJRWnNucFZmalRrYkdUdXJ4bTZScWZ2QQpuc0x5NTNiM0swSnRYeFJTK0pTeWFtR000Zzcxck9MRGx1SkJJcFVoVzR4VU9SU0duNDM1cmI1TjNRekZ5RnJsCkwxVCt1YytEY0pFUFg0T092SUlvSzhMbTlNaW1FNXJBcW9JWFpLcVZDZ25UWGN0QlpTOUFZOW1NWUVoaHphOCsKWXJjWTZjMjJZSWIxb0oxMVlDbndiVUZkNm9VVW1YYXUxV3Y0MnFJNHRxYlNMQ0VxRTlZNERlREdBdkpGNFZqawp1cG9sYjQzVzVIWE1wb0gyOGxhejVadE5aMzRqbS83RTM0ZkhJZHAzOVZoWE1LVk9pTlhXKysvaWN3ckVEUit2CjRMelZGTzYyR1dnUDNiRWVreW5hN1F1Rmh5OWs5b0dRVlJWSQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    <         server: https://10.120.33.146:6443
    ---
    >         certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1ekNDQWMrZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeE1EZ3hNakEyTVRreU0xb1hEVE14TURneE1EQTJNVGt5TTFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTUVOCmxrNll5ZWtwMSs0RHQzS1BtajVhSHZZZ2J4Q0krSWpVSmV0TVp2UmNzRXR3TmU1YU9yRWZFZXdOK3NyeFRJUVkKSEJHSkxIbUMzZ0VzQjNjdmdIWDZZaGlKWVZMYVZBVi9aN1puK0J3cFllbVVKaWFkNXBxMTZvUDl4dXlZZHpaZgpybCtYajdMai9HdGFYQXNrNmZSS2hzTXVyMmlBMmpBTkwzRG8yZEdHRUtleVNIQVFBaEZqTEErSk9SdElKZEYzCjZyWndsdDNOM2MweFRHU0Y5OGJqMFl5MDR4cG1qVXp1cHVQUWovOEVTT1JaUkhxS0FoRHJLeU5vWDBHbzhSY1AKKzNoa3dvTnVZbit0dTg3Mzk5dG1lUU5DMXpKQkpZemFVQTAxMkRiSWk5bzltcnNTZklpSEtQTjlqeGlFU1BhTAphWTJUV0FWeGt2UzQxY1V1M2hNQ0F3RUFBYU5DTUVBd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZFaGRIc3BySThhUVJhV3J0NWhaZFBocTJtazRNQTBHQ1NxR1NJYjMKRFFFQkN3VUFBNElCQVFBbVAyaStUdGQ3czVKdEIvWkFoMVBFWkV2V0NLTXpWcHdYaW1NUzlhOWxldjdlRFp0VApzSEdYMzhSeUhDNGdKb2N3S1VXamZ5YlBFUnJkTTY1cUN2SXVONW9nQmphZU1iYjRNTUpnM0d4cE45a3RvaU9PCktsa1hKblVHZm83MkpCNTBTSnpJdGthbHFPelhENkgzbzUxQTNYbHp6MUZENTdhRERFZEkxMUZJY2ozTk4vVkoKaVRzSHZyaVd4MGtDK0V1eXhYWE9ma1p3VEkrSjFnMWx2NkZPYW9ZcWZhYVpVQ3cyTmFLc1dMTG9FT2FiNG15TgptV25pQ1M2Q2h6K2xBa2Q5N0w5ck12WmRKZWxlMEJNWmZXSGZTbzJRSlRvc0dMdDdWY2YrVlRmSE9vQlRBNGlXCmpwLzVINVVZdmJrQUV1SmpVV1hCYTZLNTR5N3JJdEhBeUVidwotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    >         server: https://10.120.34.53:6443
    
    # 直接从cat /etc/kubernetes/admin.conf拷贝,改下管理ip即可
    

    如果不改动该项,重新执行cluster.yaml后,master-1无法加入集群,报错如下“

    [root@pro-k8s-master-1 ~]# cat /etc/kubernetes/kubeadm-controlplane.yaml
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: JoinConfiguration
    discovery:
      bootstrapToken:
        apiServerEndpoint: 10.120.34.53:6443
        token: m49jmj.fv3zqgm57tnwgtor
        unsafeSkipCAVerification: true
      timeout: 5m0s
      tlsBootstrapToken: m49jmj.fv3zqgm57tnwgtor
    controlPlane:
      localAPIEndpoint:
        advertiseAddress: 10.120.33.146
        bindPort: 6443
      certificateKey: c8d4ef0b01aa6e54caed3d6fd2a1da2a7ada69b3833aeadaf3bac8a81cd01cfa
    nodeRegistration:
      name: pro-k8s-master-1
      criSocket: /var/run/dockershim.sock
    [root@pro-k8s-master-1 ~]# /usr/local/bin/kubeadm join --config /etc/kubernetes/kubeadm-controlplane.yaml --ignore-preflight-errors=all
    [preflight] Running pre-flight checks
    [preflight] Reading configuration from the cluster...
    [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
    error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://10.120.33.146:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 10.120.33.146:6443: connect: connection refused
    To see the stack trace of this error execute with --v=5 or higher
    

    因为加入的时候,是从cm: kube-public cluster-info中读取到的集群证书校验数据 和 kube-api endpoint

    node join的2个场景:
    以下两种场景的加入都依赖两个需要注意的配置项

    1. kube-public cluster-info configmap,里面存储集群证书校验数据,以及集群的kube-api endpoint,如果是lb vip则不用改,如果是节点ip,那么这里就需要改,否则无法加入节点。

    2. 控制面第一个节点进行kube-adm init时需要指定 --upload-certs

    kube-adm init时,会生成ca crt key等证书文件,指定该参数当新node加入节点时,会自动同步crt文件到/etc/kubernetes/ssl 目录下

    image.png
    1. 普通 worker加入
    image.png
    1. 控制面节点加入加入
    kubeadm join 10.120.34.53:6443 --token 5r6s2m.g7wimq9aoist154w     --discovery-token-ca-cert-hash sha256:eadb3051b0ea751f058de8805e3c2569769ae8346a889acb835b542b22840d58     --control-plane --certificate-key 6ed53f56e64f12c0cb7a3024203e2a99e95aaaacc52b58b8100a025ff577257e
    
    # 控制面需要指定 --control-plane 
    
    

    替换第三个节点

    在关闭第三个节点后,etcd 会出现丢失选举的情况,应该有超过15s中

    [root@pro-k8s-master-2 ~]# hostname=`hostname`
    [root@pro-k8s-master-2 ~]# export ETCDCTL_API=3
    [root@pro-k8s-master-2 ~]# export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-$hostname.pem
    [root@pro-k8s-master-2 ~]# export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-$hostname-key.pem
    [root@pro-k8s-master-2 ~]# export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
    [root@pro-k8s-master-2 ~]# export ETCDCTL_ENDPOINTS="https://10.120.33.146:2379,https://10.120.34.53:2379,https://10.120.35.101:2379"
    [root@pro-k8s-master-2 ~]# # 确认
    [root@pro-k8s-master-2 ~]# etcdctl member list
    89b51fbdfb2a9906, started, etcd2, https://10.120.34.53:2380, https://10.120.34.53:2379, false
    8fc45151ffa61a8e, started, etcd1, https://10.120.33.146:2380, https://10.120.33.146:2379, false
    d2fccd85a33c58f9, started, etcd3, https://10.120.35.101:2380, https://10.120.35.101:2379, false
    [root@pro-k8s-master-2 ~]# etcdctl endpoint status -w table
    +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    |          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | https://10.120.33.146:2379 | 8fc45151ffa61a8e |  3.4.13 |  100 MB |     false |      false |      1561 |  284432261 |          284432261 |        |
    |  https://10.120.34.53:2379 | 89b51fbdfb2a9906 |  3.4.13 |  100 MB |      true |      false |      1561 |  284432262 |          284432261 |        |
    | https://10.120.35.101:2379 | d2fccd85a33c58f9 |  3.4.13 |  100 MB |     false |      false |      1561 |  284432262 |          284432262 |        |
    +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    [root@pro-k8s-master-2 ~]#
    [root@pro-k8s-master-2 ~]# logout
    Connection to pro-k8s-master-2 closed.
    [root@deployer ~]# ssh pro-k8s-master-2
    Last login: Wed Sep 28 10:47:26 2022 from 10.120.33.122
    [root@pro-k8s-master-2 ~]#
    [root@pro-k8s-master-2 ~]#
    [root@pro-k8s-master-2 ~]#
    [root@pro-k8s-master-2 ~]# etcdctl endpoint status -w table
    {"level":"warn","ts":"2022-09-28T10:49:44.482Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection closed"}
    Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
    +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    [root@pro-k8s-master-2 ~]# etcdctl endpoint status -w table
    {"level":"warn","ts":"2022-09-28T10:50:09.362Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection closed"}
    Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
    +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    +----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    [root@pro-k8s-master-2 ~]#
    [root@pro-k8s-master-2 ~]#
    [root@pro-k8s-master-2 ~]# hostname=`hostname`
    [root@pro-k8s-master-2 ~]# export ETCDCTL_API=3
    [root@pro-k8s-master-2 ~]# export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-$hostname.pem
    [root@pro-k8s-master-2 ~]# export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-$hostname-key.pem
    [root@pro-k8s-master-2 ~]# export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
    [root@pro-k8s-master-2 ~]# export ETCDCTL_ENDPOINTS="https://10.120.33.146:2379,https://10.120.34.53:2379,https://10.120.35.101:2379"
    [root@pro-k8s-master-2 ~]# # 确认
    [root@pro-k8s-master-2 ~]# etcdctl member list
    89b51fbdfb2a9906, started, etcd2, https://10.120.34.53:2380, https://10.120.34.53:2379, false
    8fc45151ffa61a8e, started, etcd1, https://10.120.33.146:2380, https://10.120.33.146:2379, false
    d2fccd85a33c58f9, started, etcd3, https://10.120.35.101:2380, https://10.120.35.101:2379, false
    [root@pro-k8s-master-2 ~]# etcdctl endpoint status -w table
    {"level":"warn","ts":"2022-09-28T10:50:27.214Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.120.35.101:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    Failed to get the status of endpoint https://10.120.35.101:2379 (context deadline exceeded)
    +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    |          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
    +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    | https://10.120.33.146:2379 | 8fc45151ffa61a8e |  3.4.13 |  100 MB |     false |      false |      1561 |  284434365 |          284434365 |        |
    |  https://10.120.34.53:2379 | 89b51fbdfb2a9906 |  3.4.13 |  100 MB |      true |      false |      1561 |  284434365 |          284434365 |        |
    +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
    [root@pro-k8s-master-2 ~]#
    
    

    需要先确认leader能够再恢复,否则先等一下不要立即进行节点替换

    相关文章

      网友评论

        本文标题:k8s替换控制节点-包含etcd

        本文链接:https://www.haomeiwen.com/subject/sldxartx.html