美文网首页
kubernetes集群管理系列讲座(四)伸缩etcd节点

kubernetes集群管理系列讲座(四)伸缩etcd节点

作者: 炼狱腾蛇Eric | 来源:发表于2020-06-21 23:26 被阅读0次

    课程目标

    • 向etcd集群中添加节点
    • 从etcd集群中删除节点
    • etcd集群的备份与恢复
    • etcd集群的升级

    1. 环境

    IP HOST
    10.0.11.36 infra0
    10.0.12.21 infra1
    10.0.13.126 infra2
    10.0.12.157 infra3
    10.0.13.118 infra4

    且infra0,infra1,infra2是创建好的etcd集群,方法请看这里

    2. 向etcd集群中添加节点

    官方文档中不建议我们向生产系统中添加节点,因为添加etcd节点会损失集群的性能,增加节点并不会增加集群的性能或者是容量。一般来说不建议对etcd集群扩/缩容。更不要配置任何的etcd集群自动扩展策略。我们建议在生产系统中部署五个member的etcd集群。过程参考github

    当然,如果有特殊需求,我们也可以部署7个节点,比如两地三中心的架构,三个DC中分别部署2/2/3个etcd集群。

    2.1. 添加一个正常工作的etcd节点

    一般来说,添加节点有两个步骤:

    • 我们可以通过 HTTP members APIgRPC members API,或者是 etcdctl member add 命令。当然我们肯定是需要使用etcdctl命令的
    • 使用新的集群配置启动新的节点
    • 同步后,使用新的配置重新启动原有节点

    注意1:如果原来的etcd使用的是ssl证书在签署的时候,指定了hosts的地址的证书是不能够使用的,比如:

    cfssl certinfo -csr server.csr
    .
    .
      "IPAddresses": [
        "127.0.0.1",
        "10.0.11.36",
        "10.0.12.21",
        "10.0.13.126"
      ]
    }
    

    这样的证书在添加了新的member之后是无法正常通信的,需要重新签署证书,把新的节点加入到证书中。

    注意2:由于签署的证书只用于数据传输,不用于数据加密,所以即使更换CA,重新签署证书,也是没有问题的。但是,如果我们首次启动etcd集群的时候使用的是非加密方式,后面改成SSL方式启动etcd集群的时候就会报错

    注意3:json文件无法使用hosts: [""]这种方式来通配所有地址(理论上可以行,实际操作上报错)

    注意4:json文件中可以使用域名来通配一类host,理论上可行,但是没有测试

    2.1.1. 环境准备

    请参考环境准备下载和解压

    2.1.2. 添加节点

    • 检查节点健康状态

      $ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 在任意一个etcd节点上执行

      $ etcdctl member add infra3 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" --peer-urls=https://10.0.12.157:2380
      
      Member 6a5e9d82c1c1c471 added to cluster f36e4227f36fdc03
      
      ETCD_NAME="infra3"
      ETCD_INITIAL_CLUSTER="infra2=https://10.0.13.126:2380,infra3=https://10.0.12.157:2380,infra1=https://10.0.12.21:2380,infra0=https://10.0.11.36:2380"
      ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.0.12.157:2380"
      ETCD_INITIAL_CLUSTER_STATE="existing"
      
    • 再查看集群状态

      $ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      6a5e9d82c1c1c471, unstarted, , https://10.0.12.157:2380, , false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 在新节点创建配置文件/etc/etcd/etcd.conf

      DATA_DIR=/data/etcd
      HOST_NAME=infra3
      HOST_IP=10.0.12.157
      CLUSTER=infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380,infra3=https://10.0.12.157:2380
      CLUSTER_STATE=existing
      TOKEN=ea8cfe2bfe85b7e6c66fe190f9225838
      
    • 修改systemd文件/lib/systemd/system/etcd.service

      [Unit]
      Description=Etcd Server
      After=network.target
      After=network-online.target
      Wants=network-online.target
      
      [Service]
      Type=notify
      WorkingDirectory=/data/etcd
      EnvironmentFile=-/etc/etcd/etcd.conf
      User=etcd
      # set GOMAXPROCS to number of processors
      ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /opt/etcd/etcd \
                --data-dir ${DATA_DIR} \
                --name ${HOST_NAME} \
                --initial-advertise-peer-urls https://${HOST_IP}:2380 \
                --listen-peer-urls https://${HOST_IP}:2380 \
                --advertise-client-urls https://${HOST_IP}:2379 \
                --listen-client-urls https://127.0.0.1:2379,https://${HOST_IP}:2379 \
                --listen-metrics-urls=http://127.0.0.1:2381 \
                --initial-cluster ${CLUSTER} \
                --initial-cluster-state ${CLUSTER_STATE} \
                --initial-cluster-token ${TOKEN} \
                --client-cert-auth \
                --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
                --cert-file=/etc/kubernetes/pki/etcd/server.pem \
                --key-file=/etc/kubernetes/pki/etcd/server-key.pem \
                --peer-client-cert-auth \
                --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
                --peer-cert-file=/etc/kubernetes/pki/etcd/members.pem \
                --peer-key-file=/etc/kubernetes/pki/etcd/members-key.pem"
      Restart=on-failure
      LimitNOFILE=65536
      
      [Install]
      WantedBy=multi-user.target
      
    • 通过scp方式把etcd的pki同步到新机器上并修改权限

      chmod 400 /etc/kubernetes/pki/etcd/*
      chown -R etcd:adm /etc/kubernetes/pki/etcd/
      
    • 在新节点上启动etcd

      systemctl start etcd
      
    • 如果报错,就吧数据文件/data/etcd/member删除再重启

    • 这时候再list member

      $ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 注意:修改每个节点上的/etc/etcd/etcd.conf都需要修改为对应的配置

    2.1.3. 添加一个learner etcd节点

    从etcd3.4版本之后,etcd知识把节点添加为learner节点(不会参与投票的节点)。这样做是为了让添加新节点更加的安全,较少添加节点过程中集群的宕机时间,我们建议在节点在完全同步数据之前作为learner节点。所以我们增加节点的步骤变成了下面几个

    • 我们可以通过 HTTP members APIgRPC members API,或者是 etcdctl member add --learner 命令.
    • 使用新的集群配置启动新的节点
    • 使用新的配置重新启动原有节点
    • 使用 gRPC members API,或者是 etcdctl member promote命令来让learner节点成为有投票权的节点。这个时候etcd集群会检查一下promote的请求来确认这个操作是安全的。他会去确认learner节点确实已经同步了leader数据的节点,然后才让这个节点拥有投票权。

    2.1.4. 添加learner etcd节点具体操作如下

    • 检查节点健康状态

      $ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 在任意一个etcd节点上执行

      $ etcdctl member add infra4 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" --peer-urls=https://10.0.13.118:2380 --learner
      Member d11c0954aee6f056 added to cluster f36e4227f36fdc03
      
      ETCD_NAME="infra4"
      ETCD_INITIAL_CLUSTER="infra2=https://10.0.13.126:2380,infra3=https://10.0.12.157:2380,infra1=https://10.0.12.21:2380,infra4=https://10.0.13.118:2380,infra0=https://10.0.11.36:2380"
      ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.0.13.118:2380"
      ETCD_INITIAL_CLUSTER_STATE="existing"
      
    • 再查看集群状态

      etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      
      249badc16d370eb0, unstarted, , https://10.0.13.118:2380, , true
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 在新节点创建配置文件/etc/etcd/etcd.conf

      DATA_DIR=/data/etcd
      HOST_NAME=infra4
      HOST_IP=10.0.13.118
      CLUSTER=infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380,infra3=https://10.0.12.157:2380,infra4=https://10.0.13.118:2380
      CLUSTER_STATE=existing
      TOKEN=ea8cfe2bfe85b7e6c66fe190f9225838
      
    • 修改systemd文件/lib/systemd/system/etcd.service

      [Unit]
      Description=Etcd Server
      After=network.target
      After=network-online.target
      Wants=network-online.target
      
      [Service]
      Type=notify
      WorkingDirectory=/data/etcd
      EnvironmentFile=-/etc/etcd/etcd.conf
      User=etcd
      # set GOMAXPROCS to number of processors
      ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /opt/etcd/etcd \
                --data-dir ${DATA_DIR} \
                --name ${HOST_NAME} \
                --initial-advertise-peer-urls https://${HOST_IP}:2380 \
                --listen-peer-urls https://${HOST_IP}:2380 \
                --advertise-client-urls https://${HOST_IP}:2379 \
                --listen-client-urls https://127.0.0.1:2379,https://${HOST_IP}:2379 \
                --listen-metrics-urls=http://127.0.0.1:2381 \
                --initial-cluster ${CLUSTER} \
                --initial-cluster-state ${CLUSTER_STATE} \
                --initial-cluster-token ${TOKEN} \
                --client-cert-auth \
                --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
                --cert-file=/etc/kubernetes/pki/etcd/server.pem \
                --key-file=/etc/kubernetes/pki/etcd/server-key.pem \
                --peer-client-cert-auth \
                --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem \
                --peer-cert-file=/etc/kubernetes/pki/etcd/members.pem \
                --peer-key-file=/etc/kubernetes/pki/etcd/members-key.pem"
      Restart=on-failure
      LimitNOFILE=65536
      
      [Install]
      WantedBy=multi-user.target
      
    • 通过scp方式把etcd的pki同步到新机器上并修改权限

      $ chmod 400 /etc/kubernetes/pki/etcd/*
      $ chown -R etcd:adm /etc/kubernetes/pki/etcd/
      
    • 启动etcd

      $ systemctl start etcd
      
    • 再查看集群状态,发现infra4的learner是true

      $ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      12d20db544264be2, started, infra4, https://10.0.13.118:2380, https://10.0.13.118:2379, true
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 查看集群状态,集群状态的显示结果为The items in the lists are endpoint, ID, version, db size, is leader, is learner, raft term, raft index, raft applied index, errors.

      $ etcdctl endpoint status --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" --endpoints=https://127.0.0.1:2379 --cluster
      https://10.0.13.118:2379, 12d20db544264be2, 3.4.9, 16 kB, false, true, 4232, 26, 26,
      https://10.0.13.126:2379, 466035bfe5ac7a64, 3.4.9, 20 kB, false, false, 4232, 26, 26,
      https://10.0.12.157:2379, 6a5e9d82c1c1c471, 3.4.9, 20 kB, false, false, 4232, 26, 26,
      https://10.0.12.21:2379, 784e0050552d81cd, 3.4.9, 20 kB, false, false, 4232, 26, 26,
      https://10.0.11.36:2379, f7d6895384ae86d2, 3.4.9, 20 kB, true, false, 4232, 26, 26,
      
    • 当信息同步之后,我们就可以开始promote了

      $ etcdctl member promote 12d20db544264be2 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem"
      Member 12d20db544264be2 promoted in cluster f36e4227f36fdc03
      
    • 再查看集群状态

      $ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      12d20db544264be2, started, infra4, https://10.0.13.118:2380, https://10.0.13.118:2379, false
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 注意:修改每个节点上的/etc/etcd/etcd.conf都需要修改为对应的配置

    2.2. 如果添加节点出现问题

    我们这里举例是同步出现问题,比如:

    $ etcd --name infra3 \
      --initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
      --initial-cluster-state existing
    etcdserver: assign ids error: the member count is unequal
    exit 1
    

    我们需要清空数据文件,再更换一下节点的信息(节点名称和IP)

    $ etcd --name infra4 \
      --initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra4=http://10.0.1.14:2380 \
      --initial-cluster-state existing
    etcdserver: assign ids error: unmatched member while checking PeerURLs
    exit 1
    

    2.3. 如果添加learner节点出现问题

    • 集群中多于1个learner会报错
    $ etcdctl member add infra4 --peer-urls=http://10.0.1.14:2380 --learner
    Error: etcdserver: too many learner members in cluster
    
    • 数据不同步的情况下会报错
    $ etcdctl member promote 9bf1b35fc7761a23
    Error: etcdserver: can only promote a learner member which is in sync with leader
    
    • 提升一个不是learner的节点会报错
    $ etcdctl member promote 9bf1b35fc7761a23
    Error: etcdserver: can only promote a learner member
    
    • 提升一个不存在的learner节点会报错
    $ etcdctl member promote 12345abcde
    Error: etcdserver: member not found
    

    3. 移除节点

    • 查看集群状态

      $ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      12d20db544264be2, started, infra4, https://10.0.13.118:2380, https://10.0.13.118:2379, false
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 从集群中删除节点infra4

      $ etcdctl member remove 12d20db544264be2 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem"
      Member 12d20db544264be2 removed from cluster f36e4227f36fdc03
      
    • 查看状态

      $ etcdctl member --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" list
      466035bfe5ac7a64, started, infra2, https://10.0.13.126:2380, https://10.0.13.126:2379, false
      6a5e9d82c1c1c471, started, infra3, https://10.0.12.157:2380, https://10.0.12.157:2379, false
      784e0050552d81cd, started, infra1, https://10.0.12.21:2380, https://10.0.12.21:2379, false
      f7d6895384ae86d2, started, infra0, https://10.0.11.36:2380, https://10.0.11.36:2379, false
      
    • 注意:修改每个节点上的/etc/etcd/etcd.conf都需要修改为对应的配置

    为了方便大家学习,请大家加我的微信,我会把大家加到微信群(微信群的二维码会经常变)和qq群821119334,问题答案云原生技术课堂,有问题可以一起讨论

    • 个人微信
      640.jpeg

    • 腾讯课堂
      640-20200506145837072.jpeg

    • 微信公众号
      640-20200506145842007.jpeg

    • 专题讲座

    2020 CKA考试视频 真题讲解 https://www.bilibili.com/video/BV167411K7hp

    2020 CKA考试指南 https://www.bilibili.com/video/BV1sa4y1479B/

    2020年 5月CKA考试真题 https://mp.weixin.qq.com/s/W9V4cpYeBhodol6AYtbxIA

    相关文章

      网友评论

          本文标题:kubernetes集群管理系列讲座(四)伸缩etcd节点

          本文链接:https://www.haomeiwen.com/subject/owazxktx.html