美文网首页微服务架构和实践Kubernetes
centos7使用kubeadm配置高可用k8s集群的另一种方式

centos7使用kubeadm配置高可用k8s集群的另一种方式

作者: 647f379ea944 | 来源:发表于2018-08-12 09:23 被阅读3次

    简介

    使用kubeadm配置多master节点,实现高可用。

    安装

    实验环境说明

    实验架构
    lab1: etcd master keepalived 11.11.11.111
    lab2: etcd master keepalived 11.11.11.112
    lab3: etcd master keepalived 11.11.11.113
    lab4: node  11.11.11.114
    lab5: node  11.11.11.115
    lab6: node  11.11.11.116
    
    vip: 11.11.11.110
    
    实验使用的Vagrantfile
    # -*- mode: ruby -*-
    # vi: set ft=ruby :
    
    ENV["LC_ALL"] = "en_US.UTF-8"
    
    Vagrant.configure("2") do |config|
        (1..6).each do |i|
          config.vm.define "lab#{i}" do |node|
            node.vm.box = "centos-7.4-docker-17"
            node.ssh.insert_key = false
            node.vm.hostname = "lab#{i}"
            node.vm.network "private_network", ip: "11.11.11.11#{i}"
            node.vm.provision "shell",
              inline: "echo hello from node #{i}"
            node.vm.provider "virtualbox" do |v|
              v.cpus = 2
              v.customize ["modifyvm", :id, "--name", "lab#{i}", "--memory", "2048"]
            end
          end
        end
    end
    

    在所有机器上安装kubeadm

    参考之前的文章《centos7安装kubeadm》

    配置所有节点的kubelet

    # 配置kubelet使用国内可用镜像
    # 修改/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
    # 添加如下配置 
    Environment="KUBELET_EXTRA_ARGS=--pod-infra-container-image=registry.cn-shanghai.aliyuncs.com/gcr-k8s/pause-amd64:3.0"
    
    # 使用命令
    sed -i '/ExecStart=$/i Environment="KUBELET_EXTRA_ARGS=--pod-infra-container-image=registry.cn-shanghai.aliyuncs.com/gcr-k8s/pause-amd64:3.0"'  /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
    
    # 重新载入配置
    systemctl daemon-reload
    

    配置hosts

    cat >>/etc/hosts<<EOF
    11.11.11.111 lab1
    11.11.11.112 lab2
    11.11.11.113 lab3
    11.11.11.114 lab4
    11.11.11.115 lab5
    11.11.11.116 lab6
    EOF
    

    启动etcd集群

    lab1,lab2,lab3节点上启动etcd集群

    # lab1
    docker stop etcd && docker rm etcd
    rm -rf /data/etcd
    mkdir -p /data/etcd
    docker run -d \
    --restart always \
    -v /etc/etcd/ssl/certs:/etc/ssl/certs \
    -v /data/etcd:/var/lib/etcd \
    -p 2380:2380 \
    -p 2379:2379 \
    --name etcd \
    registry.cn-hangzhou.aliyuncs.com/google_containers/etcd-amd64:3.1.12 \
    etcd --name=etcd0 \
    --advertise-client-urls=http://11.11.11.111:2379 \
    --listen-client-urls=http://0.0.0.0:2379 \
    --initial-advertise-peer-urls=http://11.11.11.111:2380 \
    --listen-peer-urls=http://0.0.0.0:2380 \
    --initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \
    --initial-cluster=etcd0=http://11.11.11.111:2380,etcd1=http://11.11.11.112:2380,etcd2=http://11.11.11.113:2380 \
    --initial-cluster-state=new \
    --auto-tls \
    --peer-auto-tls \
    --data-dir=/var/lib/etcd
    
    # lab2
    docker stop etcd && docker rm etcd
    rm -rf /data/etcd
    mkdir -p /data/etcd
    docker run -d \
    --restart always \
    -v /etc/etcd/ssl/certs:/etc/ssl/certs \
    -v /data/etcd:/var/lib/etcd \
    -p 2380:2380 \
    -p 2379:2379 \
    --name etcd \
    registry.cn-hangzhou.aliyuncs.com/google_containers/etcd-amd64:3.1.12 \
    etcd --name=etcd1 \
    --advertise-client-urls=http://11.11.11.112:2379 \
    --listen-client-urls=http://0.0.0.0:2379 \
    --initial-advertise-peer-urls=http://11.11.11.112:2380 \
    --listen-peer-urls=http://0.0.0.0:2380 \
    --initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \
    --initial-cluster=etcd0=http://11.11.11.111:2380,etcd1=http://11.11.11.112:2380,etcd2=http://11.11.11.113:2380 \
    --initial-cluster-state=new \
    --auto-tls \
    --peer-auto-tls \
    --data-dir=/var/lib/etcd
    
    # lab3
    docker stop etcd && docker rm etcd
    rm -rf /data/etcd
    mkdir -p /data/etcd
    docker run -d \
    --restart always \
    -v /etc/etcd/ssl/certs:/etc/ssl/certs \
    -v /data/etcd:/var/lib/etcd \
    -p 2380:2380 \
    -p 2379:2379 \
    --name etcd \
    registry.cn-hangzhou.aliyuncs.com/google_containers/etcd-amd64:3.1.12 \
    etcd --name=etcd2 \
    --advertise-client-urls=http://11.11.11.113:2379 \
    --listen-client-urls=http://0.0.0.0:2379 \
    --initial-advertise-peer-urls=http://11.11.11.113:2380 \
    --listen-peer-urls=http://0.0.0.0:2380 \
    --initial-cluster-token=9477af68bbee1b9ae037d6fd9e7efefd \
    --initial-cluster=etcd0=http://11.11.11.111:2380,etcd1=http://11.11.11.112:2380,etcd2=http://11.11.11.113:2380 \
    --initial-cluster-state=new \
    --auto-tls \
    --peer-auto-tls \
    --data-dir=/var/lib/etcd
    
    # 验证查看集群
    docker exec -ti etcd ash
    etcdctl member list
    etcdctl cluster-health
    exit
    

    配置keepalived

    在3台master节点操作

    # 载入内核相关模块
    lsmod | grep ip_vs
    modprobe ip_vs
    
    # 启动keepalived
    # eth1为本次实验11.11.11.0/24网段的所在网卡
    docker run --net=host --cap-add=NET_ADMIN \
    -e KEEPALIVED_INTERFACE=eth1 \
    -e KEEPALIVED_VIRTUAL_IPS="#PYTHON2BASH:['11.11.11.110']" \
    -e KEEPALIVED_UNICAST_PEERS="#PYTHON2BASH:['11.11.11.111','11.11.11.112','11.11.11.113']" \
    -e KEEPALIVED_PASSWORD=hello \
    --name k8s-keepalived \
    --restart always \
    -d osixia/keepalived:1.4.4
    
    # 查看日志
    # 会看到两个成为backup 一个成为master
    docker logs k8s-keepalived
    
    # 此时会配置 11.11.11.110 到其中一台机器
    # ping测试
    ping -c4 11.11.11.110
    
    # 如果失败后清理后,重新实验
    docker rm -f k8s-keepalived
    ip a del 11.11.11.110/32 dev eth1
    

    在第一台master节点初始化

    # 生成token
    # 保留token后面还要使用
    token=$(kubeadm token generate)
    echo $token
    
    # 生成配置文件
    # advertiseAddress 配置为VIP地址
    cat >kubeadm-master.config<<EOF
    apiVersion: kubeadm.k8s.io/v1alpha1
    kind: MasterConfiguration
    kubernetesVersion: v1.10.3
    imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
    
    api:
      advertiseAddress: 11.11.11.110
    
    apiServerExtraArgs:
      endpoint-reconciler-type: lease
    
    controllerManagerExtraArgs:
      node-monitor-grace-period: 10s
      pod-eviction-timeout: 10s
    
    networking:
      podSubnet: 10.244.0.0/16
    
    etcd:
      endpoints:
      - "http://11.11.11.111:2379"
      - "http://11.11.11.112:2379"
      - "http://11.11.11.113:2379"
    
    apiServerCertSANs:
    - "lab1"
    - "lab2"
    - "lab3"
    - "11.11.11.111"
    - "11.11.11.112"
    - "11.11.11.113"
    - "11.11.11.110"
    - "127.0.0.1"
    
    token: $token
    tokenTTL: "0"
    
    featureGates:
      CoreDNS: true
    EOF
    
    # 初始化
    kubeadm init --config kubeadm-master.config
    systemctl enable kubelet
    
    # 保存初始化完成之后的join命令
    # kubeadm join 11.11.11.110:6443 --token nevmjk.iuh214fc8i0k3iue --discovery-token-ca-cert-hash sha256:0e4f738348be836ff810bce754e059054845f44f01619a37b817eba83282d80f
    
    # 配置kubectl使用
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    
    # 安装网络插件
    # 下载配置
    mkdir flannel && cd flannel
    wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    
    # 修改配置
    # 此处的ip配置要与上面kubeadm的pod-network一致
      net-conf.json: |
        {
          "Network": "10.244.0.0/16",
          "Backend": {
            "Type": "vxlan"
          }
        }
    
    # 修改镜像
    image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64
    
    # 如果Node有多个网卡的话,参考flannel issues 39701,
    # https://github.com/kubernetes/kubernetes/issues/39701
    # 目前需要在kube-flannel.yml中使用--iface参数指定集群主机内网网卡的名称,
    # 否则可能会出现dns无法解析。容器无法通信的情况,需要将kube-flannel.yml下载到本地,
    # flanneld启动参数加上--iface=<iface-name>
        containers:
          - name: kube-flannel
            image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64
            command:
            - /opt/bin/flanneld
            args:
            - --ip-masq
            - --kube-subnet-mgr
            - --iface=eth1
    
    # 启动
    kubectl apply -f kube-flannel.yml
    
    # 查看
    kubectl get pods -n kube-system
    kubectl get svc -n kube-system
    
    # 设置master允许部署应用pod,参与工作负载,现在可以部署其他系统组件
    # 如 dashboard, heapster, efk等
    kubectl taint nodes --all node-role.kubernetes.io/master-
    

    启动其他master节点

    # 打包第一台master初始化之后的/etc/kubernetes/pki目录
    cd /etc/kubernetes && tar czvf /root/pki.tgz pki/ && cd ~
    
    # 上传到其他master的/etc/kubernetes目录下
    tar xf pki.tgz -C /etc/kubernetes/
    
    # 复制启动第一台master时的配置文件到其他master节点
    
    # 初始化
    kubeadm init --config kubeadm-master.config
    systemctl enable kubelet
    
    # 配置kubectl使用
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    # 在第一台配置master节点查看
    kubectl get pod --all-namespaces -o wide | grep lab1
    kubectl get pod --all-namespaces -o wide | grep lab2
    kubectl get pod --all-namespaces -o wide | grep lab3
    kubectl get nodes -o wide
    

    启动node节点

    # 加入master节点
    # 这个命令是之前初始化master完成时,输出的命令
    kubeadm join 11.11.11.110:6443 --token nevmjk.iuh214fc8i0k3iue --discovery-token-ca-cert-hash sha256:0e4f738348be836ff810bce754e059054845f44f01619a37b817eba83282d80f
    systemctl enable kubelet
    

    测试

    重建多个coredns副本

    # 删除coredns的pods
    kubectl get pods -n kube-system -o wide | grep coredns
    all_coredns_pods=$(kubectl get pods -n kube-system -o wide | grep coredns | awk '{print $1}' | xargs)
    echo $all_coredns_pods
    kubectl delete pods $all_coredns_pods -n kube-system
    
    # 修改副本数
    # replicas: 3
    # 可以修改为node节点的个数
    kubectl edit deploy coredns -n kube-system
    
    # 查看状态
    kubectl get pods -n kube-system -o wide | grep coredns
    

    基础测试

    1. 启动

    # 直接使用命令测试
    kubectl run nginx --replicas=2 --image=nginx:alpine --port=80
    kubectl expose deployment nginx --type=NodePort --name=example-service-nodeport
    kubectl expose deployment nginx --name=example-service
    
    # 使用配置文件测试
    cat >example-nginx.yml<<EOF
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: nginx
    spec:
      replicas: 2
      template:
        metadata:
          labels:
            app: nginx
        spec:
          restartPolicy: Always
          containers:
          - name: nginx
            image: nginx:alpine
            ports:
            - containerPort: 80
            livenessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 10
              periodSeconds: 3
            readinessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 10
              periodSeconds: 3
    ---
    kind: Service
    apiVersion: v1
    metadata:
      name: example-service
    spec:
        selector:
          app: nginx
        ports:
          - name: http
            port: 80
            targetPort: 80
    
    ---
    kind: Service
    apiVersion: v1
    metadata:
      name: example-service-nodeport
    spec:
        selector:
          app: nginx
        type: NodePort
        ports:
          - name: http-nodeport
            port: 80
            nodePort: 32223
    EOF
    kubectl apply -f example-nginx.yml
    

    2. 查看状态

    kubectl get deploy
    kubectl get pods
    kubectl get svc
    kubectl describe svc example-service
    

    3. DNS解析

    kubectl run curl --image=radial/busyboxplus:curl -i --tty
    nslookup kubernetes
    nslookup example-service
    curl example-service
    
    # 如果时间过长会返回错误,可以使用如下方式再进入测试
    curlPod=$(kubectl get pod | grep curl | awk '{print $1}')
    kubectl exec -ti $curlPod -- sh
    

    4. 访问测试

    # 10.96.59.56 为查看svc时获取到的clusterip
    curl "10.96.59.56:80"
    
    # 32223 为查看svc时获取到的 nodeport
    http://11.11.11.114:32223/
    http://11.11.11.115:32223/
    

    3. 清理删除

    kubectl delete svc example-service example-service-nodeport
    kubectl delete deploy nginx curl
    

    高可用测试

    任意关闭master节点测试集群是能否正常执行上一步的基础测试,查看相关信息,只关闭到只一台master,因为etcd部署在相应的master节点上,如果关闭了两台,会造成etcd不可用,进而让整个集群不可用。

    kubectl get pod --all-namespaces -o wide
    kubectl get pod --all-namespaces -o wide | grep lab1
    kubectl get pod --all-namespaces -o wide | grep lab2
    kubectl get pod --all-namespaces -o wide | grep lab3
    kubectl get nodes -o wide
    

    注意事项

    • 当直接把node节点关闭时,只有过了5分钟之后,上面的pod才会被检测到有问题,并迁移到其他节点

      如果想快速迁移可以执行 kubectl delete node

      也可以修改controller-manager的pod-eviction-timeout参数,默认5m

      node-monitor-grace-period参数,默认40s

    • 此方案和之前文章中写的高可用方案相比,缺点就是不能使用 kube-apiserver 多节点负载均衡的功能。所有对kube-apiserver的请求都只会发给一个master节点,只有当这个master节点挂掉之后,才会把所有有请求发给另外的master

    参考文档

    相关文章

      网友评论

        本文标题:centos7使用kubeadm配置高可用k8s集群的另一种方式

        本文链接:https://www.haomeiwen.com/subject/rrtsbftx.html