美文网首页JAVA
kubernetes(一) 高可用集群生产部署

kubernetes(一) 高可用集群生产部署

作者: 勤_ | 来源:发表于2021-09-14 17:16 被阅读0次

    概述

    2032456-3447c966ff470446.png

    Kubernetes节点有运行应用容器必备的服务,而这些都是受Master的控制。每次个节点上当然都要运行Docker。Docker来负责所有具体的映像下载和容器运行。 Kubernetes主要由以下几个核心组件组成:

    • etcd保存了整个集群的状态;
    • apiserver提供了资源操作的唯一入口,并提供认证、授权、访问控制、API注册和发现等机制;
    • controller manager负责维护集群的状态,比如故障检测、自动扩展、滚动更新等;
    • scheduler负责资源的调度,按照预定的调度策略将Pod调度到相应的机器上;
    • kubelet负责维护容器的生命周期,同时也负责Volume(CVI)和网络(CNI)的管理;
    • Container runtime负责镜像管理以及Pod和容器的真正运行(CRI);
    • kube-proxy负责为Service提供cluster内部的服务发现和负载均衡;
    • node 的职责是运行容器应用。Node 由 Master 管理,Node 负责监控并汇报容器的状态,并根据 Master 的要求管理容器的生命周期。Node 运行在 Linux 操作系统,可以是物理机或者是虚拟机。
    • pod是 Kubernetes 的最小工作单元。每个 Pod 包含一个或多个容器。Pod 中的容器会作为一个整体被 Master 调度到一个 Node 上运行。pod为docker创建的一个容器。

    除了核心组件,还有一些推荐的Add-ons:

    • kube-dns负责为整个集群提供DNS服务
    • Ingress Controller为服务提供外网入口
    • Dashboard提供GUI
    • Federation提供跨可用区的集群
    • Fluentd-elasticsearch提供集群日志采集、存储与查询
    2032456-3cfc247de0522435.png

    工具及版本

    工具 版本
    docker 18.03.1.ce-1.el7.centos
    centos 7.x
    Kubernetes v1.18.0
    kubeadm、kubelet、kubectl 1.18.3-0
    quay.io/coreos/flannel v0.14.0
    kubernetesui/dashboard v2.0.0-rc7
    registry.aliyuncs.com/google_containers/etcd 3.4.3-0
    k8s.gcr.io/coredns 1.6.7
    k8s.gcr.io/pause 3.2

    环境

    节点 主机名 IP
    Master、etcd、registry k8s-master 192.168.3.130
    node1 k8s-node1 192.168.3.31
    node2 k8s-node2 192.168.3.32
    hostnamectl set-hostname k8s-master
    hostnamectl set-hostname k8s-node1
    hostnamectl set-hostname k8s-node2
    

    调整/etc/hosts:

    #Master节点和node节点:
    192.168.x.130   k8s-master
    192.168.x.31   k8s-node1
    192.168.x.32   k8s-node2
    

    安装

    3台服务器分别安装docker、kubeadm、kubelet、kubectl等工具。

    docker

    #!/bin/sh
    
    #关闭防火墙
    systemctl stop firewalld
    systemctl disable firewalld
    
    #关闭selinux
    sed -i 's/enforcing/disabled/' /etc/selinux/config 
    setenforce 0
    
    ## 关闭交换空间
    swapoff -a
    sed -i 's/.*swap.*/#&/' /etc/fstab
    
    #安装docker依赖,docker源和docker
    yum install -y yum-utils device-mapper-persistent-data lvm2
    yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
    yum makecache fast
    #yum install -y docker-ce-19.03.13 docker-ce-cli-19.03.13
    yum install docker-ce-18.03.1.ce-1.el7.centos -y
    
    mkdir -p /etc/docker
    cat > /etc/docker/daemon.json << "EOF"
    {
    "registry-mirrors": ["https://zfzbet67.mirror.aliyuncs.com"],
    "data-root": "/data/docker",
    "insecure-registries":[
        "192.168.x.234:8089"
     ]
    }
    EOF
    
    systemctl daemon-reload
    systemctl enable docker.service
    systemctl restart docker
    
    ipvs内核调整
    #!/bin/sh
    ##
    echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
    sysctl -p
    
    cat > /etc/sysconfig/modules/ipvs.modules <<EOF
    #!/bin/bash
    ipvs_modules="ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_fo ip_vs_nq ip_vs_sed ip_vs_ftp nf_conntrack_ipv4"
    for kernel_module in \${ipvs_modules}; do
        /sbin/modinfo -F filename \${kernel_module} > /dev/null 2>&1
        if [ $? -eq 0 ]; then
            /sbin/modprobe \${kernel_module}
        fi
    done
    EOF
    chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep ip_vs
    
    

    安装kubeadm、kubelet、kubectl

    #!/bin/sh
    
    #调整yum.repo
    cat > /etc/yum.repos.d/kubernetes.repo << "EOF"
    [kubernetes]
    name=Kubernetes Repository
    baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
    gpgcheck=1
    gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
          https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
    EOF
    
    yum install -y kubeadm-1.18.3-0 kubelet-1.18.3-0 kubectl-1.18.3-0
    
    systemctl start kubelet && systemctl enable kubelet && systemctl status kubelet
    

    Master安装

    通过kubeadm搭建集群
    # 切换到家目录
    cd ~
    # 导出配置文件
    kubeadm config print init-defaults --kubeconfig ClusterConfiguration > kubeadm.yml
    
    vi kubeadm.yaml
    # 替换 imageRepository: k8s.gcr.io 为下面的内容
    imageRepository: registry.aliyuncs.com/google_containers
    
    # 查看所需镜像列表
    kubeadm config images list --config kubeadm.yml
    # 拉取k8s集群所需要的镜像
    kubeadm config images pull --config kubeadm.yml
    # 拉取完成之后查看本地镜像列表
    docker images | grep registry.aliyuncs.com
    
    kubeadm init --kubernetes-version=1.18.2  \
    --apiserver-advertise-address=192.168.x.130  \
    --image-repository=registry.aliyuncs.com/google_containers  \
    --pod-network-cidr=10.222.0.0/16
    --ignore-preflight-errors=NumCPU
    
    # 根据提示,输入如下命令
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    # 安装完后提示:
    kubeadm join 192.168.x.130:6443 --token yey0s3.khwqa73ykgh0fmvz \
        --discovery-token-ca-cert-hash sha256:606aa551fae0fba4ed1c61ca441a9f7493e9d42f3357037881dc01e8e39b4b96
    
    
    安装flannel网络插件
    wget  https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    # 会发现没法解析raw.githubusercontent.com 这个域名;会报如下图的错误信息
    # 解决办法,打开下面地址,输入 raw.githubusercontent.com查看服务器的IP地址,如下图
    https://site.ip138.com/raw.githubusercontent.com/
    # 在/etc/hosts 中添加域名解析(我这里选择的是香港的IP)
    151.101.76.133 raw.githubusercontent.com
    # 重新获取flannel 插件yaml配置信息
    wget  https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    
    # 获取成功之后安装
    kubectl apply -f kube-flannel.yml
    
    # 查看所有的pods信息
    kubectl get pods --all-namespaces
    NAMESPACE     NAME                            READY   STATUS     RESTARTS   AGE
    kube-system   coredns-7ff77c879f-pw7tl        0/1     Pending    0          44m
    kube-system   coredns-7ff77c879f-zhzvz        0/1     Pending    0          44m
    kube-system   etcd-node1                      1/1     Running    0          44m
    kube-system   kube-apiserver-node1            1/1     Running    0          44m
    kube-system   kube-controller-manager-node1   1/1     Running    0          44m
    kube-system   kube-flannel-ds-amd64-58j6d     0/1     Init:0/1   0          22s
    kube-system   kube-proxy-mnvfl                1/1     Running    0          44m
    kube-system   kube-scheduler-node1            1/1     Running    0          44m
    # 会发现STATUS有的处于Pending状态,说明正在安装
    # 等所有的STATUS状态都变成Running状态之后再次查看节点信息
    
    kubectl get nodes
    NAME    STATUS   ROLES    AGE   VERSION
    node1   Ready    master   49m   v1.18.2
    # 这时候发现当前节点node1即master节点已经初始化完成了
    
    安装kubernetes-dashboard
    # 下载dashboard的配置yaml文件
    wget  https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-rc7/aio/deploy/recommended.yaml
    
    # kubernetes-dashboard 默认只能集群内部访问;所以这里修改配置文件让端口暴露给宿主机
    vim recommended.yaml
    
    # 加入下面的字段到recommended.yaml(如下图)
    type: NodePort 
    
    image-20210907170326202.png
    # 安装dashboard
    kubectl apply -f recommended.yaml
    
    # 查看kubernetes-dashboard映射到宿主机的端口
    kubectl get svc -n kubernetes-dashboard
    NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
    dashboard-metrics-scraper   ClusterIP   10.106.200.77    <none>        8000/TCP        10m
    kubernetes-dashboard        NodePort    10.100.186.120   <none>        443:31178/TCP   10m
    
    # 在火狐浏览器中访问 dashboard,如下图选择Token登录
    https://192.168.x.130:31121
    
    # 获取kubernetes-dashboard 登录Token
    kubectl -n kube-system describe $(kubectl get secret -n kube-system -o name | grep namespace) | grep token:
    
    # 使用获取到的Token登录,如下图所示:
    
    image-20210907170500262.png

    Work安装

    将work节点加入集群

    kubeadm join 192.168.x.130:6443 --token yey0s3.khwqa73ykgh0fmvz     --discovery-token-ca-cert-hash sha256:606aa551fae0fba4ed1c61ca441a9f7493e9d42f3357037881dc01e8e39b4b96
    

    高可用部署

    architecture-ha-k8s-cluster.png

    如上图部署,增加2个master节点,keepalived+haproxy部署3台做高可用。

    环境
    节点 主机名 IP
    Master、etcd、registry、keepalived、haproxy k8s-master 192.168.3.130
    Master、etcd、registry k8s-master-1 192.168.3.151
    Master、etcd、registry k8s-master-2 192.168.3.77
    VIP k8s-vip 192.168.3.138

    设置主机名以及/etc/hosts,这里vip没有用域名,建议统一用域名。

    hostnamectl set-hostname k8s-master-1
    hostnamectl set-hostname k8s-master-2
    
    #/etc/hosts配置:
    192.168.x.130   k8s-master
    192.168.x.151   k8s-master-1
    192.168.x.77    k8s-master-2
    192.168.x.31    k8s-node1
    192.168.x.32    k8s-node2
    
    安装
    安装keepalived 和 haproxy
    • 安装keepalived 和 haproxy
    yum install keepalived haproxy -y
    
    • 配置 keepalived

    节点配置根据节点的域名、ip、priority(主节点100、从节点 50)、interface替换即可。

    cat <<EOF > /etc/keepalived/keepalived.conf
    global_defs {
      script_user root
      enable_script_security
    }
    vrrp_script check_apiserver {
      script "/etc/keepalived/check_apiserver.sh"
      interval 3
      weight 2
    }
    
    vrrp_instance VI_1 {
        state MASTER
        interface ens192
        virtual_router_id  161
        priority 100
        authentication {
            auth_type PASS
            auth_pass nice
        }
        unicast_src_ip 192.168.x.130  
        unicast_peer {
          192.168.x.77
          192.168.x.151
        }
        virtual_ipaddress {
            192.168.x.138
        }
        track_script {
            check_apiserver
        }
    }
    
    EOF
    
    • 配置 keepalived健康检查,设置chmod +x check_apiserver.sh可以执行。
    vi /etc/keepalived/check_apiserver.sh
    ### 添加内容
    #!/bin/sh
    if [ $(ps -C haproxy --no-header | wc -l) -eq 0 ];then
            systemctl stop keepalived
    fi
    
    • 配置haproxy
    # /etc/haproxy/haproxy.cfg
    #---------------------------------------------------------------------
    # Global settings
    #---------------------------------------------------------------------
    global
        log /dev/log local0
        log /dev/log local1 notice
        daemon
    
    #---------------------------------------------------------------------
    # common defaults that all the 'listen' and 'backend' sections will
    # use if not designated in their block
    #---------------------------------------------------------------------
    defaults
        mode                    http
        log                     global
        option                  httplog
        option                  dontlognull
        option http-server-close
        option forwardfor       except 127.0.0.0/8
        option                  redispatch
        retries                 1
        timeout http-request    10s
        timeout queue           20s
        timeout connect         5s
        timeout client          20s
        timeout server          20s
        timeout http-keep-alive 10s
        timeout check           10s
    
    #---------------------------------------------------------------------
    # apiserver frontend which proxys to the masters
    #---------------------------------------------------------------------
    frontend apiserver
        bind *:16443
        mode tcp
        option tcplog
        default_backend apiserver
    
    #---------------------------------------------------------------------
    # round robin balancing for apiserver
    #---------------------------------------------------------------------
    backend apiserver
        option httpchk GET /healthz
        http-check expect status 200
        mode tcp
        option ssl-hello-chk
        balance     roundrobin
            server k8s-master 192.168.x.130:6443 check
            server k8s-master-1 192.168.x.151:6443 check
            server k8s-master-2 192.168.x.77:6443 check
    
    • 上面的配置完成后启动keepalivedhaproxy,并设置为自动启动。
    systemctl enable haproxy --now
    systemctl enable keepalived --now   
    
    Master节点安装

    先卸载之前的master节点,再通过kubeadm搭建集群,与非高可用安装对比,调整如下,先在其中一台服务器执行以下命令:

    kubeadm init   --control-plane-endpoint "192.168.x.138:16443" \
    --kubernetes-version=1.18.2  \
    --image-repository=registry.aliyuncs.com/google_containers  \
    --pod-network-cidr=10.222.0.0/16 \
    --service-cidr=10.222.0.0/16 \
    --upload-certs \
    --ignore-preflight-errors=NumCPU | tee kubeadm-init.log 
    

    输出日志:

    Your Kubernetes control-plane has initialized successfully!
    
    To start using your cluster, you need to run the following as a regular user:
    
      mkdir -p $HOME/.kube
      sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
      sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    You should now deploy a pod network to the cluster.
    Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
      https://kubernetes.io/docs/concepts/cluster-administration/addons/
    
    You can now join any number of the control-plane node running the following command on each as root:
    
      kubeadm join 192.168.x.138:16443 --token hflxeu.7kbw3ayaqb9r6mi7 \
        --discovery-token-ca-cert-hash sha256:1b87cc4635ed2005d2d2780cc0fdc779bcfcd38a359c8783d70388ce123c59a0 \
        --control-plane --certificate-key 62fa89c160a01997ef9388d6bc182abf09da2ce9c2ce82ce3b4d83621227ebfa
    
    Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
    As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
    "kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
    
    Then you can join any number of worker nodes by running the following on each as root:
    
    kubeadm join 192.168.x.138:16443 --token hflxeu.7kbw3ayaqb9r6mi7 \
        --discovery-token-ca-cert-hash sha256:1b87cc4635ed2005d2d2780cc0fdc779bcfcd38a359c8783d70388ce123c59a0
    

    其他的master节点执行以下命令:

      kubeadm join 192.168.x.138:16443 --token tb07he.ytnyrlvhlehxx5n6 \
        --discovery-token-ca-cert-hash sha256:1b87cc4635ed2005d2d2780cc0fdc779bcfcd38a359c8783d70388ce123c59a0 \
        --control-plane --certificate-key e8172057200f1ebfeec1c143b2652ae240d8d67aec2149c4bcbc68421f285123
    
    work节点安装

    先卸载之前的worker节点,再加入kubeadm搭建集群

    kubeadm join 192.168.x.138:16443 --token tb07he.ytnyrlvhlehxx5n6 \
        --discovery-token-ca-cert-hash sha256:1b87cc4635ed2005d2d2780cc0fdc779bcfcd38a359c8783d70388ce123c59a0
    
    验证
    • 执行命令查看:
    [root@k8s-master ~]# kubectl get pods,svc --all-namespaces -o wide
    [root@k8s-master ~]# kubectl get nodes -o wide
    
    image-20210914164740270.png image-20210914164834862.png
    • 验证haproxy+keepalived高可用,停用192.168.x.130上的haproxy或者keepalived,发现虚拟ip漂移到192.168.x.151。

      image-20210914165007503.png image-20210914165047895.png

    其他必备组件安装

    NFS

    #任意一台做服务器,安装如下:
    root# yum install -y nfs-utils rpcbind
    root# mkdir -p /data/work/nfs-share/
    root# vi /etc/exports #添加以下内容
    /data/work/nfs-share *(insecure,rw,async,no_root_squash)
    root# systemctl start rpcbind
    root# systemctl enable rpcbind
    root# systemctl enable nfs && systemctl restart nfs
    
    #客户端安装,可以制作脚本,方便后续执行:
    #!/bin/sh
    yum install -y nfs-utils rpcbind
    mkdir -p /data/work/nfs-share/
    mount -t nfs 192.168.x.31:/data/work/nfs-share/ /data/work/nfs-share/
    

    INGRESS

    bVbJjhR.png

    这里我们选择traefik,编辑traffic-ingress.yaml文件如下并执行:

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: traefik-ingress-controller
      namespace: kube-system
    ---
    kind: DaemonSet
    apiVersion: apps/v1
    metadata:
      name: traefik-ingress-controller
      namespace: kube-system
      labels:
        k8s-app: traefik-ingress-lb
    spec:
      selector:
        matchLabels:
          k8s-app: traefik-ingress-lb
      template:
        metadata:
          annotations:
            prometheus.io/path: /metrics
            prometheus.io/port: "8080"
            prometheus.io/scrape: "true"
          labels:
            k8s-app: traefik-ingress-lb
            name: traefik-ingress-lb
        spec:
          serviceAccountName: traefik-ingress-controller
          terminationGracePeriodSeconds: 60
          hostNetwork: true
          containers:
          - image: traefik:1.7.24
            name: traefik-ingress-lb
            ports:
            - name: http
              containerPort: 80
              hostPort: 80
            - name: admin
              containerPort: 8080
              hostPort: 8080
            - name: https
              containerPort: 443
              hostPort: 443
            securityContext:
              capabilities:
                drop:
                - ALL
                add:
                - NET_BIND_SERVICE
            args:
            - --api
            - --kubernetes
            - --logLevel=ERROR
            - --metrics.prometheus
            - --web
            - --metrics
            - --configFile=/etc/traefik/config.toml
            volumeMounts:
            - mountPath: /etc/traefik
              name: config
          volumes:
          - configMap:
              defaultMode: 420
              name: traefik-conf
            name: config
    ---
    kind: Service
    apiVersion: v1
    metadata:
      name: traefik-ingress-service
      namespace: kube-system
    spec:
      selector:
        k8s-app: traefik-ingress-lb
      ports:
        - protocol: TCP
          port: 80
          name: web
        - protocol: TCP
          port: 8080
          name: admin
        - protocol: TCP
          port: 443
          name: https
    ---
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1beta1
    metadata:
      name: traefik-ingress-controller
    rules:
      - apiGroups:
          - ""
        resources:
          - services
          - endpoints
          - secrets
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - extensions
        resources:
          - ingresses
        verbs:
          - get
          - list
          - watch
      - apiGroups:
        - extensions
        resources:
        - ingresses/status
        verbs:
        - update
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1beta1
    metadata:
      name: traefik-ingress-controller
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: traefik-ingress-controller
    subjects:
    - kind: ServiceAccount
      name: traefik-ingress-controller
      namespace: kube-system
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: traefik-web-ui
      namespace: kube-system
    spec:
      selector:
        k8s-app: traefik-ingress-lb
      ports:
      - name: web
        port: 80
        targetPort: 8080
    ---
    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
      name: traefik-web-ui
      namespace: kube-system
    spec:
      rules:
      - host: traefik-ui.minikube
        http:
          paths:
          - path: /
            backend:
              serviceName: traefik-web-ui
              servicePort: web
    ---
    apiVersion: v1
    data:
      config.toml: |
        defaultEntryPoints = ["http","https"]
        insecureSkipVerify = true
        [entryPoints]
          [entryPoints.http]
          address = ":80"
          [entryPoints.https]
          address = ":443"
    kind: ConfigMap
    metadata:
      name: traefik-conf
      namespace: kube-system
    
    [root@k8s-master ~]# kubectl apply -f traffic-ingress.yaml
    

    可以看到执行结果如下:

    image-20210914162236780.png

    DASHBOARD

    如安装kubernetes-dashboard这节所述:

    [root@k8s-master ~]# wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.5/aio/deploy/recommended.yaml
    [root@k8s-master ~]# kubectl apply -f recommended.yaml
    
    image-20210914165220272.png

    异常

    1,Etcd报错:error #0: dial tcp 192.168.x.130:2379

    [root@localhost ~]# etcdctl -C http://etcd:2379 cluster-health
    cluster may be unhealthy: failed to list members
    Error:  client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.x.130:2379: connect: connection refused
    error #0: dial tcp 192.168.x.130:2379: connect: connection refused
    

    解决办法:

    ETCD_LISTEN_CLIENT_URLS="http://localhost:2379" 改成 ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"

    2, 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.

    解决办法:

    节点有了污点无法容忍,执行 kubectl get no -o yaml | grep taint -A 5 之后发现该节点是不可调度的。这是因为kubernetes出于安全考虑默认情况下无法在master节点上部署pod。

    kubectl taint nodes --all node-role.kubernetes.io/master-
    

    3,Work节点从机报错:

    [root@k8s-node2 ~]# kubectl get nodes
    The connection to the server localhost:8080 was refused - did you specify the right host or port?
    

    解决办法:

    拷贝主机admin.conf文件到work节点到~目录,并export KUBECONFIG=$HOME/admin.conf.

    [root@k8s-master ~]# ll /etc/kubernetes/admin.conf
    -rw------- 1 root root 5453 Sep  7 15:23 /etc/kubernetes/admin.conf
    

    4,安装docker出问题,怎么卸载?

    [root@PTC-PTesting-151 ~]# systemctl stop docker
    [root@PTC-PTesting-151 ~]# rpm -aq | grep docker
    docker-ce-cli-19.03.4-3.el7.x86_64
    docker-ce-19.03.4-3.el7.x86_64
    [root@PTC-PTesting-151 ~]# yum remove -y docker-ce-cli-19.03.4-3.el7.x86_64
    

    5,安装集群节点出问题,怎么卸载?

    kubeadm reset
    yum remove -y kubeadm kubelet kubectl
    

    6, (VI_1): ip address associated with VRID 51 not present in MASTER advert : 10.12.50.198

    解决方法:

    virtual_router_id和局域网内的其它的keepalive master的virtual_router_id有冲突,修改 /etc/keepalived/keepalived.conf 配置文件里的 virtual_router_id 。

    7,nginx-ingress-controller的pod节点状态为CrashLoopBackOff,日志分析10.1.0.1:443网络不通。

    [root@k8s-master ~]# kubectl logs nginx-ingress-controller-84d976849b-9pdnm -n microservice-component
    -------------------------------------------------------------------------------
    NGINX Ingress controller
      Release:       0.30.0
      Build:         git-7e65b90c4
      Repository:    https://github.com/kubernetes/ingress-nginx
      nginx version: nginx/1.17.8
    
    -------------------------------------------------------------------------------
    
    W0909 07:59:13.830981       6 flags.go:260] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
    W0909 07:59:13.831045       6 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
    I0909 07:59:13.831223       6 main.go:193] Creating API client for https://10.1.0.1:443
    

    解决方法:

    root# kubectl edit configmap kube-proxy -n kube-system
    # 调整:将 mode: “” 为空 改成 mode: “ipvs”  ,masqueradeAll: null 改成 masqueradeAll: true
    root# kubectl get pod -n kube-system | grep kube-proxy | awk '{system(" kubectl delete pod "$1" -n kube-system")}'
    
    输入ip route show table local 可以看到路由信息上有ipvs这个网卡
    
    image-20210914165357543.png

    8,加入新的master节点报错:

    couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "hflxeu"

    解决办法:

    kubeadm token create #生成新的token
    
    openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' #生成openssl 的discovery-token-ca-cert-hash
    

    error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run kubeadm init phase upload-certs --upload-certs on a control plane to generate a new one

    解决方法:

    [root@k8s-master ~]# kubeadm init phase upload-certs --upload-certs
    

    9,卸掉master节点重新加入报错:error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint

    root  kubectl describe configmaps kubeadm-config -n kube-system
    root kubectl get pods -n kube-system | grep etcd
    root  kubectl exec -it etcd-k8s-master sh -n kube-system
    
    #进入etcd-k8s-master容器后
    ## 配置环境
    $ export ETCDCTL_API=3
    $ alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
    
    ## 查看 etcd 集群成员列表
    $ etcdctl member list
    
    63bfe05c4646fb08, started, k8s-master-2-11, https://192.168.2.11:2380, https://192.168.2.11:2379, false
    8e41efd8164c6e3d, started, k8s-master-2-12, https://192.168.2.12:2380, https://192.168.2.12:2379, false
    a61d0bd53c1cbcb6, started, k8s-master-2-13, https://192.168.2.13:2380, https://192.168.2.13:2379, false
    
    ## 删除 etcd 集群成员 k8s-master-2-11
    $ etcdctl member remove 63bfe05c4646fb08
    
    Member 63bfe05c4646fb08 removed from cluster ed984b9o8w35cap2
    
    ## 再次查看 etcd 集群成员列表
    $ etcdctl member list
    
    8e41efd8164c6e3d, started, k8s-master-2-12, https://192.168.2.12:2380, https://192.168.2.12:2379, false
    a61d0bd53c1cbcb6, started, k8s-master-2-13, https://192.168.2.13:2380, https://192.168.2.13:2379, false
    
    ## 退出容器
    $ exit
    

    10,设置了污点也发现pod调度到了master节点,怎么办?

    kubectl cordon master #禁止节点调度
    kubeclt uncordon master #允许节点调度
    

    11,ping 192.168.x.x 集群外部IP访问不通。

    解决方法:

    [root@k8s-master ~]# kubectl run busybox -it --image=datica/busybox-dig --namespace=kube-system --restart=Never --rm sh
    kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
    / # ping 192.168.x.x 集群外部IP访问不通
    PING 192.168.x.X (192.168.x.X): 56 data bytes  
    
    ##于是发现pod cidr网段与网络插件flannel网段不一样
    #读取configmap
    kubectl get configmap -n kube-system -o yaml kube-flannel-cfg > flannel.yaml
    #修改flannel net-conf.json网段配置,设置为与pod cidr一致
    vi flannel.yaml 
    net-conf.json: |
    {
        "Network": "10.222.0.0/16", #修改此行
        "Backend": {
        "Type": "vxlan"
        }
    }
    #应用配置
    kubectl apply -f flannel.yaml
    #删除flannel pod 重新生成
    kubectl delete pod xxxxx -n kube-system
    

    12,kube-proxy报错 Failed to list IPVS destinations

    [root@k8s-master ~]# kubectl -n kube-system logs -f kube-proxy-9stzs
    E0910 07:55:04.913607       1 proxier.go:1192] Failed to sync endpoint for service: 10.1.0.10:53/UDP, err: parseIP Error ip=[10 222 0 7 0 0 0 0 0 0 0 0 0 0 0 0]
    E0910 07:55:04.913750       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[10 222 0 7 0 0 0 0 0 0 0 0 0 0 0 0]
    E0910 07:55:04.913778       1 proxier.go:1192] Failed to sync endpoint for service: 10.1.0.10:53/TCP, err: parseIP Error ip=[10 222 0 7 0 0 0 0 0 0 0 0 0 0 0 0]
    

    需要要么centos 内核升级4.4,要么如下降级版本。

    解决方法:

    [root@k8s-master ~]# kubectl -n kube-system set image daemonset/kube-proxy *=registry.aliyuncs.com/k8sxio/kube-proxy:v1.17.6
    

    13,验证coredns域名解析失败,状况如下:

    [root@k8s-master ~]# kubectl logs -f coredns-7ff77c879f-66tp7 -n kube-system
    .:53
    [INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
    CoreDNS-1.6.7
    linux/amd64, go1.13.6, da7f65b
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:42435->192.168.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:38594->192.168.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:51004->192.168.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:38502->192.168.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:56500->192.168.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:50625->192.168.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:42169->192.168.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:55823->192.168.1.1:53: i/o timeout
    [ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:58438->192.168.1.1:53: i/o timeout
    

    解决方法:

    在宿主机上添加域名解析地址,再重新部署coredns,可以看到下图能够正常解析了。

    [root@k8s-master echo "nameserver 114.114.114.114" >> /etc/resolv.conf
    [root@k8s-master echo "nameserver 8.8.8.8" >> /etc/resolv.conf
    
    image-20210910162554513.png

    14, kubernetes从镜像仓库下载报错:pull: unauthorized to access repository:

    解决方法:

    kubectl create secret docker-registry harbor-archetype --namespace=archetype --docker-server=192.168.x.234:8089 --docker-username=liuqianding  --docker-password=123456@Ding --docker-email=dingge8311@dingtalk.com
    
    #在部署的yaml文件添加imagePullSecrets
    ......
          imagePullSecrets:
          - name: harbor
          containers:
          - name: harbor-archetype
    ........  
    
    #注意指定同一个namespace
    

    15,报错: nodes are available: 2 node(s) didn't match node selector

    未指定节点类型,通过以下步骤指定。

    解决方法:

    kubectl get nodes --show-labels
    kubectl label node k8s-node2 type=worker
    

    16,请求curl -L service很慢,怎么办?

    解决方法:

    root# ethtool -K flannel.1 tx-checksum-ip-generic off
    

    17,Unable to authenticate the request due to an error: [invalid bearer token

    待续.

    参考

    基于kubernetes集群部署DashBoard
    nginx-ingress官网
    nginx-ingress版本关系
    kubernetes集群网络原理

    相关文章

      网友评论

        本文标题:kubernetes(一) 高可用集群生产部署

        本文链接:https://www.haomeiwen.com/subject/ulqwwltx.html