K8S 集群定时清理 Evicted 状态的 Pod

作者: awker | 来源:发表于2023-02-12 15:27 被阅读0次

K8S 集群定时清理 Evicted 状态的 Pod
Kubernetes 批量删除
kubernetes pod status 监控
k8s 常用命令
删除k8s集群中所有Evicted状态的Pod
kubernetes service 介绍
k8s下的jenkins如何设置maven
k8s的网络发现
k8s-访问外网服务的两种方式
k8s常用命令

问题现象

看到 k8s 集群中有 Evicted 状态的 pod，没有被清理

# kubectl get pod -o wide -A | grep Evicted
simulation-prod      cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h               0/1     Evicted       0          42d     <none>          cn-shanghai.172.22.0.194   <none>           <none>

排查过程

可以看到 pod 的状态是 Status:Failed 和 Reason:Evicted，从 Message 可以知道，Evicted 的原因是 node 磁盘资源不足

# kubectl -n simulation-prod describe pod cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h
Name:           cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h
Namespace:      simulation-prod
Priority:       0
Node:           cn-shanghai.172.22.0.194/
Start Time:     Mon, 29 Nov 2021 15:48:25 +0800
Labels:         app.kubernetes.io/instance=cloud-simulation-dead-letter-worker
                app.kubernetes.io/name=cloud-simulation-dead-letter-worker
                pod-template-hash=d96bdcf98
Annotations:    kubernetes.io/psp: ack.privileged
Status:         Failed
Reason:         Evicted
Message:        The node was low on resource: ephemeral-storage. Container cloud-simulation-dead-letter-worker was using 291599484Ki, which exceeds its request of 0. 
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/cloud-simulation-dead-letter-worker-d96bdcf98
Containers:
  cloud-simulation-dead-letter-worker:
    Image:      registry-vpc.cn-shanghai.aliyuncs.com/xxx/cloud_sim:1.1.2111290718.f0cfa04
    Port:       <none>
    Host Port:  <none>
    Command:
      /root/entry/dead_letter_worker.py
    Environment:
      DEPLOYMENT:  prod
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from cloud-simulation-dead-letter-worker-token-4z2xv (ro)
Volumes:
  cloud-simulation-dead-letter-worker-token-4z2xv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloud-simulation-dead-letter-worker-token-4z2xv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-type=simulation-prod
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

问题原因

节点压力驱逐是 kubelet 主动终止 Pod 以回收节点上资源的过程。
kubelet 监控集群节点的 CPU、内存、磁盘空间和文件系统的 inode 等资源。当这些资源中的一个或者多个达到特定的消耗水平， kubelet 可以主动地使节点上一个或者多个 Pod 失效，以回收资源防止饥饿。
在节点压力驱逐期间，kubelet 将所选 Pod 的 PodPhase 设置为 Failed。这将终止 Pod。
节点压力驱逐不同于 API 发起的驱逐。kubelet 并不理会你配置的 PodDisruptionBudget 或者是 Pod 的 terminationGracePeriodSeconds。

解决办法

kubectl 不会删除 Status:Failed 和 Reason:Evicted 状态的 pod ，因此选择 k8s CronJob 定时删除这些 pod

$ vim 01-sa.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: delete-evicted-pods
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: delete-evicted-pods
  namespace: delete-evicted-pods

$ vim 02-cr.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: delete-evicted-pods
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list", "delete"]



$ vim 03-crb.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: delete-evicted-pods
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: delete-evicted-pods
subjects:
  - kind: ServiceAccount
    name: delete-evicted-pods
    namespace: delete-evicted-pods

$ vim 04-cj.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: delete-evicted-pods
  namespace: delete-evicted-pods
spec:
  schedule: "*/30 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: delete-evicted-pods
          containers:
          - name: kubectl-runner
            image: bitnami/kubectl:1.21.8
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - kubectl get pods --all-namespaces -o go-template='{{range .items}} {{if (eq .status.phase "Failed" )}} {{.metadata.name}}{{" "}} {{.metadata.namespace}}{{" "}} {{.metadata.creationTimestamp}}{{" "}} {{.status.reason}} {{"\n"}}{{end}} {{end}}' | while read epod namespace ct reason; do if [ x"$reason" = x"Evicted" -a $((`date +%s`-`date -d "$ct" +%s`)) -gt 259200 ];then echo "`date "+%Y-%m-%d %H:%M:%S"` delete $namespace $reason $epod "; kubectl -n $namespace delete pod $epod; fi; done;
          restartPolicy: OnFailure

参考：