美文网首页
K8S 集群定时清理 Evicted 状态的 Pod

K8S 集群定时清理 Evicted 状态的 Pod

作者: awker | 来源:发表于2023-02-12 15:27 被阅读0次

    问题现象

    看到 k8s 集群中有 Evicted 状态的 pod,没有被清理

    # kubectl get pod -o wide -A | grep Evicted
    simulation-prod      cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h               0/1     Evicted       0          42d     <none>          cn-shanghai.172.22.0.194   <none>           <none>
    

    排查过程

    可以看到 pod 的状态是 Status:FailedReason:Evicted,从 Message 可以知道,Evicted 的原因是 node 磁盘资源不足

    # kubectl -n simulation-prod describe pod cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h
    Name:           cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h
    Namespace:      simulation-prod
    Priority:       0
    Node:           cn-shanghai.172.22.0.194/
    Start Time:     Mon, 29 Nov 2021 15:48:25 +0800
    Labels:         app.kubernetes.io/instance=cloud-simulation-dead-letter-worker
                    app.kubernetes.io/name=cloud-simulation-dead-letter-worker
                    pod-template-hash=d96bdcf98
    Annotations:    kubernetes.io/psp: ack.privileged
    Status:         Failed
    Reason:         Evicted
    Message:        The node was low on resource: ephemeral-storage. Container cloud-simulation-dead-letter-worker was using 291599484Ki, which exceeds its request of 0. 
    IP:             
    IPs:            <none>
    Controlled By:  ReplicaSet/cloud-simulation-dead-letter-worker-d96bdcf98
    Containers:
      cloud-simulation-dead-letter-worker:
        Image:      registry-vpc.cn-shanghai.aliyuncs.com/xxx/cloud_sim:1.1.2111290718.f0cfa04
        Port:       <none>
        Host Port:  <none>
        Command:
          /root/entry/dead_letter_worker.py
        Environment:
          DEPLOYMENT:  prod
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from cloud-simulation-dead-letter-worker-token-4z2xv (ro)
    Volumes:
      cloud-simulation-dead-letter-worker-token-4z2xv:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  cloud-simulation-dead-letter-worker-token-4z2xv
        Optional:    false
    QoS Class:       BestEffort
    Node-Selectors:  node-type=simulation-prod
    Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                     node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:          <none>
    

    问题原因

    节点压力驱逐是 kubelet 主动终止 Pod 以回收节点上资源的过程。
    kubelet 监控集群节点的 CPU、内存、磁盘空间和文件系统的 inode 等资源。 当这些资源中的一个或者多个达到特定的消耗水平, kubelet 可以主动地使节点上一个或者多个 Pod 失效,以回收资源防止饥饿。
    在节点压力驱逐期间,kubelet 将所选 Pod 的 PodPhase 设置为 Failed。这将终止 Pod。
    节点压力驱逐不同于 API 发起的驱逐。kubelet 并不理会你配置的 PodDisruptionBudget 或者是 Pod 的 terminationGracePeriodSeconds。

    解决办法

    kubectl 不会删除 Status:Failed 和 Reason:Evicted 状态的 pod ,因此选择 k8s CronJob 定时删除这些 pod

    $ vim 01-sa.yaml
    apiVersion: v1
    kind: Namespace
    metadata:
      name: delete-evicted-pods
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: delete-evicted-pods
      namespace: delete-evicted-pods
    
    $ vim 02-cr.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: delete-evicted-pods
    rules:
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["get", "watch", "list", "delete"]
    
    
    
    $ vim 03-crb.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: delete-evicted-pods
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: delete-evicted-pods
    subjects:
      - kind: ServiceAccount
        name: delete-evicted-pods
        namespace: delete-evicted-pods
    
    $ vim 04-cj.yaml
    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: delete-evicted-pods
      namespace: delete-evicted-pods
    spec:
      schedule: "*/30 * * * *"
      jobTemplate:
        spec:
          template:
            spec:
              serviceAccountName: delete-evicted-pods
              containers:
              - name: kubectl-runner
                image: bitnami/kubectl:1.21.8
                imagePullPolicy: IfNotPresent
                command:
                - /bin/sh
                - -c
                - kubectl get pods --all-namespaces -o go-template='{{range .items}} {{if (eq .status.phase "Failed" )}} {{.metadata.name}}{{" "}} {{.metadata.namespace}}{{" "}} {{.metadata.creationTimestamp}}{{" "}} {{.status.reason}} {{"\n"}}{{end}} {{end}}' | while read epod namespace ct reason; do if [ x"$reason" = x"Evicted" -a $((`date +%s`-`date -d "$ct" +%s`)) -gt 259200 ];then echo "`date "+%Y-%m-%d %H:%M:%S"` delete $namespace $reason $epod "; kubectl -n $namespace delete pod $epod; fi; done;
              restartPolicy: OnFailure
    

    参考:

    1. Pod 的生命周期:https://kubernetes.io/zh/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination
    2. 节点压力驱逐:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/
    3. kubelet 驱逐时 Pod 的选择:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/#kubelet-%E9%A9%B1%E9%80%90%E6%97%B6-pod-%E7%9A%84%E9%80%89%E6%8B%A9
    4. Kubelet does not delete evicted pods:https://github.com/kubernetes/kubernetes/issues/55051
    5. 字段选择器的链式选择器:https://kubernetes.io/zh/docs/concepts/overview/working-with-objects/field-selectors/#chained-selectors
    6. 使用 RBAC 鉴权:https://kubernetes.io/zh/docs/reference/access-authn-authz/rbac/

    相关文章

      网友评论

          本文标题:K8S 集群定时清理 Evicted 状态的 Pod

          本文链接:https://www.haomeiwen.com/subject/qhedkdtx.html