美文网首页
How to debug a pod on kubernetes

How to debug a pod on kubernetes

作者: Lis_ | 来源:发表于2019-10-22 14:56 被阅读0次

    在kubernetes中如何debug一个运行失败的pod?首先可以过滤出非Running状态的podkubectl get pods --all-namespaces | grep -iv Running,pod最常见的错误状态是CrashLoopBackOff,这表示着这个pod在启动之后恰好crashes了,kubernetes接着尝试再去启动这个pod,但是pod最终还是启动失败了。

    Pod Crash 可能的原因

    1. 在Pull image的时候出现错误,错误的或者丢失了 secrets或者image;
    2. 应用运行时错误,比如没有缺少环境变量或者ConfigMaps Secrets;
    3. Liveness probe 检查失败;
    4. 资源消耗太高(Mem,CPU)或者是太严格的资源限制;
    5. PV没有创建出来或者没有mount成功;
    6. 容器的image没有更新。
      通常,可以使用kubectl logs ...或者kubectl describe...加上对应的参数就可以获得一些失败的信息。通过kubectl logs --help可以得到命令的具体参数如何使用。
      注:即使你的Pod处于running的状态,如果Restarts的次数太多,这也表示你的Pod可能存在潜在的问题。

    错误的image名字导致Pod运行失败

    可以通过kubectl describe pod <your-pod> <your-namespace>来获得更多的信息。
    Events项,会提示错误信息Failed to pull image...Reason: Failed。此时Pod的状态是ImagePullBackOff
    创建一个Pod

    apiVersion: v1
    kind: Pod 
    metadata:
      name: termination-demo
    spec:
      containers:
      - name: termination-demo-container
        image: debiann
        command: ["/bin/sh"]
        args: ["-c", "sleep 10 && echo Sleep expired > /dev/termination-log"]
    
    # kubectl get pods
    NAME                               READY   STATUS         RESTARTS   AGE
    termination-demo                   0/1     ErrImagePull   0          4s
    
    # kubectl describe pods termination-demo
    ...
    Events:
      Type     Reason     Age                From                     Message
      ----     ------     ----               ----                     -------
      Normal   Scheduled  72s                default-scheduler        Successfully assigned default/termination-demo to 172.16.219.186
      Normal   Pulling    31s (x3 over 71s)  kubelet, 172.16.219.186  pulling image "debiann"
      Warning  Failed     30s (x3 over 70s)  kubelet, 172.16.219.186  Failed to pull image "debiann": rpc error: code = Unknown desc = Error response from daemon: pull access denied for debiann, repository does not exist or may require 'docker login'
      Warning  Failed     30s (x3 over 70s)  kubelet, 172.16.219.186  Error: ErrImagePull
      Normal   BackOff    6s (x4 over 69s)   kubelet, 172.16.219.186  Back-off pulling image "debiann"
      Warning  Failed     6s (x4 over 69s)   kubelet, 172.16.219.186  Error: ImagePullBackOff
    

    丢失ConfigMap或者Secrets

    创建Pod

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: termination-demo
      labels:
         app: termination-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: termination-demo
      template:
        metadata:
          labels:
            app: termination-demo
        spec:
          containers:
          - name: termination-demo-container
            image: debian
            command: ["/bin/sh"]
            args: ["-c", "sed \"s/foo/bar/\" < $MYFILE"]
    
    # kubectl get pods
    NAME                                READY   STATUS             RESTARTS   AGE
    termination-demo-6654b86785-vf9bx   0/1     CrashLoopBackOff   2          41s
    
    # kubectl describe pods termination-demo-6654b86785-vf9bx
    ......
    Events:
      Type     Reason     Age                From                     Message
      ----     ------     ----               ----                     -------
      Normal   Scheduled  69s                default-scheduler        Successfully assigned default/termination-demo-6654b86785-vf9bx to 172.16.219.186
      Normal   Pulling    16s (x4 over 68s)  kubelet, 172.16.219.186  pulling image "debian"
      Normal   Pulled     15s (x4 over 63s)  kubelet, 172.16.219.186  Successfully pulled image "debian"
      Normal   Created    14s (x4 over 62s)  kubelet, 172.16.219.186  Created container
      Normal   Started    14s (x4 over 62s)  kubelet, 172.16.219.186  Started container
      Warning  BackOff    1s (x8 over 59s)   kubelet, 172.16.219.186  Back-off restarting failed container
    
    # kubectl logs termination-demo-6654b86785-vf9bx
    /bin/sh: 1: cannot open : No such file
    

    没有如何提示错误的信息,在这个pod中其实是缺少一个ConfigMap,手动创建一个ConfigMap

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: app-env
    data:
      MYFILE: "/etc/profile"
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: termination-demo
      labels:
         app: termination-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: termination-demo
      template:
        metadata:
          labels:
            app: termination-demo
        spec:
          containers:
          - name: termination-demo-container
            image: debian
            command: ["/bin/sh"]
            args: ["-c", "sed \"s/foo/bar/\" < $MYFILE"]
            envFrom:
            - configMapRef:
                name: app-env 
    
    # kubectl apply -f configmap.yaml
    configmap/app-env created
    deployment.apps/termination-demo configured
    

    当加入ConfigMap以后,你会发现Pod的状态依旧是CrashLoopBackOff的,这是因为当应用执行完sed命令以后,Pod就运行完毕了,这不是一个long running service,为了让Pod保持一直运行,可以加一个一直运行的脚本

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: app-env
    data:
      MYFILE: "/etc/profile"
      SLEEP: "5"
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: termination-demo
      labels:
         app: termination-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: termination-demo
      template:
        metadata:
          labels:
            app: termination-demo
        spec:
          containers:
          - name: termination-demo-container
            image: debian
            command: ["/bin/sh"]
            # args: ["-c", "sed \"s/foo/bar/\" < $MYFILE"]
            args: ["-c", "while true; do sleep $SLEEP; echo sleeping; done;"]
            envFrom:
            - configMapRef:
                name: app-env
    

    资源限制

    在定义一个pod时,你可以会指定应用可使用的资源如Mem或者CPU,如果没有定义这些限制,那系统会使用默认的资源配置,CPU:0m (in Milli CPU) , RAM: 0Gi 表示节点本身没有任何限制。
    如果你的应用需要更多的资源,kubernetes会在requestslimit之间权衡,request指定保证的资源总量,limit告诉kubernetes容器可能需要的最大的资源的数量,他们之间的关系可以表示成0 <= requests <= limit,对于这两种设置,你都需要考虑可用节点提供的资源总量。

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: termination-demo
      labels:
         app: termination-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: termination-demo
      template:
        metadata:
          labels:
            app: termination-demo
        spec:
          containers:
          - name: termination-demo-container
            image: debian
            command: ["/bin/sh"]
            args: ["-c", "sleep 10 && echo Sleep expired > /dev/termination-log"]
            resources:
              requests:
                cpu: "600m" 
    
    $ kubectl describe po termination-demo-fdb7bb7d9-mzvfw
    Name:           termination-demo-fdb7bb7d9-mzvfw
    Namespace:      default
    ...
    Containers:
      termination-demo-container:
        Image:      debian
        Port:       <none>
        Host Port:  <none>
        Command:
          /bin/sh
        Args:
          -c
          sleep 10 && echo Sleep expired > /dev/termination-log
        Requests:
          cpu:        6
        Environment:  <none>
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-t549m (ro)
    Conditions:
      Type           Status
      PodScheduled   False
    Events:
      Type     Reason            Age               From               Message
      ----     ------            ----              ----               -------
      Warning  FailedScheduling  9s (x7 over 40s)  default-scheduler  0/2 nodes are available: 2 Insufficient cpu.
    

    Image没有更新

    假如你在你的应用加入了新的fix,重新build出image并且push到镜像仓库中,在你部署了应用后,容器并没有Running起来。这个问题取决于你在kubernetes中如何定义image的使用策略。
    如果你没有更改image的tag,则默认image策略IfNotPresent会告诉Kubernetes使用缓存的image。
    最佳做法是,无论何时更改image中的任何内容,都不应使用最新tag并更改image的tag。

    相关文章

      网友评论

          本文标题:How to debug a pod on kubernetes

          本文链接:https://www.haomeiwen.com/subject/eldqvctx.html