美文网首页
Kubernetes QA

Kubernetes QA

作者: davisgao | 来源:发表于2018-06-15 14:21 被阅读0次

    1.QA

    Q:   控制Master节点是否可调度
    A:   允许 kubectl taint nodes --all node-role.kubernetes.io/master- 
         禁止 kubectl taint nodes centos-master-1 node-role.kubernetes.io/master=true:NoSchedule
    Q:   Jan 11 09:42:40 k8s78 kubelet[517]: E0111 09:42:40.935017     517 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
    A:   在kubelet中追加配置
    --runtime-cgroups=/sys/fs/cgroup/systemd/system.slice --kubelet-cgroups=/sys/fs/cgroup/systemd/system.slice
    
    Q:   Failed to list *v1.Node: nodes is forbidden
    A:   认证加Node,策略加NodeRestriction
    --authorization-mode=Node,RBAC \
    --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds,NodeRestriction 
    
    Q:   iptables中会自动增加服务的reject问题
    A:   关于iptables规则中的KUBE-FIREWALL并不会影响服务,影响的主要原因是如下策略,该规则是动态产生不人为控制,
    这也是kubernetes内部的熔断机制(类似于nginx的健康检查,k8s中的service也有),即当的服务中(只限有endpoint的服务)的所有容器都无法请求时,会自动增加reject做内部防护
    
    Q:   Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory
    A:   最新版cAdvisor的bug,记录下workaround
    mount -o remount,rw '/sys/fs/cgroup'
    ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu
    
    Q:   Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/run-26637.scope: no space left on device
    A:   cat /proc/sys/fs/inotify/max_user_watches # default is 8192
    sudo sysctl fs.inotify.max_user_watches=1048576 # increase to 1048576
    
    Q:   unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
    A:   [ips@ips81 bin]$ curl 10.254.16.15:18082/apis/metrics/v1alpha1/nodes
          curl: (7) Failed connect to 10.254.16.15:18082; Connection refused
         [ips@ips81 bin]$ ./kubectl top node
         Error from server (ServiceUnavailable): the server is currently unable to handle the request (get services http:heapster:)
    
        1.由于flannel网络不通
        2./kubectl -n kube-system get ep 查看端口是否正确
    
    Q:   Metric-server: x509: subject with cn=front-proxy-client is not in the allowed list: [aggregator]
    A:   请求头部标识不正确,在kube-apiserver中增加配置
         --requestheader-allowed-names=aggregator,front-proxy-client 
    
    Q:  failed to register unfinished metric admission_quota_controller: duplicate metrics collector registration attempted
    

    2.相关问题思路

    现象: Error updating node status, will retry: error getting node "10.1.235.82": Get https://10.1.235.7:8443/api/v1/nodes/10.1.235.82?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
    
    分析: 刚开始的思路是由于TCP的长连接导致,kubelet的连接没有断开 ,而且现象也一直是有连接,但是端口没变,IP使用虚拟IP
    原因: 由于kubelet和连接虚拟IP时是是通过虚拟IP连接的,当切换完成后IP虽然切换但是仍然有只是在其他主机上而已,需要让kubelet连接时使用本机的IP,而非虚拟IP。解决办法需要在keepalived中调整配置。
    方案: 
    #keepalived调整前:
    virtual_ipaddress {
        10.1.235.7
    }
    
    #keepalived调整后:一定要指定网络所对应的子网掩码
    virtual_ipaddress {
        10.1.235.7/24
    }
    
    
    #调整前网卡信息:
    eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether fa:16:3e:e6:a9:c8 brd ff:ff:ff:ff:ff:ff
        inet 10.1.235.82/24 brd 10.1.235.255 scope global dynamic eth0
           valid_lft 62206361sec preferred_lft 62206361sec
        inet 10.1.235.7/32 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:fee6:a9c8/64 scope link 
           valid_lft forever preferred_lft forever
    
    #调整后网卡信息:
    eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether fa:16:3e:e6:a9:c8 brd ff:ff:ff:ff:ff:ff
        inet 10.1.235.82/24 brd 10.1.235.255 scope global dynamic eth0
           valid_lft 62198673sec preferred_lft 62198673sec
        inet 10.1.235.7/24 scope global secondary eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::f816:3eff:fee6:a9c8/64 scope link 
           valid_lft forever preferred_lft forever
    
    # 调整前kubelet和apiserver的连接信息:
    tcp        0      0 10.1.235.7:44134       10.1.235.7:8443         ESTABLISHED 23559/kubelet
    
    # 调整后kubelet和apiserver的连接信息:
    tcp        0      0 10.1.235.82:44134       10.1.235.7:8443         ESTABLISHED 23559/kubelet
    
    • 停止docker前没有删除相应的容器
    Q:  rm: cannot remove ‘work/kubernetes/kubelet/pods/736b274c-d68a-11e8-8c3b-001b21992e84/volumes/kubernetes.io~secret/calico-node-token-9kss7’: Device or resource busy
    A:  cat /proc/mounts |grep "kube" |awk '{print $2}' |xargs umount
    
    • 8calico容器访问外网
    iptables -t nat -A POSTROUTING -s 192.168.0.0/24  -j MASQUERADE
    iptables -t nat -A POSTROUTING -s 192.168.0.0/24 ! -d 192.168.89.0/16 -j MASQUERADE
    
    
    • Deployment滚动更新过程中流量负载均衡异常,会出现丢失请求的情况
    分析: Pod Terminating过程中,有些机器的Iptable还未刷新,导致部分流量仍然请求到Terminating的Pod上,导致请求出错。
    方案: 利用Kubernetes的preStop特性为每个Pod设置一个退出时间,让每个Pod收到退出信号时时默认等待一段时间再退出。
    
    • Kubernetes1.9之前Apiserver挂掉之后Kubernetes Endpoints不更新,导致部分访问失败
    分析: Kubernetes1.9之前只要Apiserver启动成功Kubernetes Endpoints便不再更新,需手动维护。 
    方案: 升级到Kubernetes 1.10版本后设置 –endpoint-reconciler-type = lease 
    Use an endpoint reconciler (master-count, lease, none)
    

    15.pids 无法mount 【version1.5.1】

    Jul 31 11:12:08 node1 kubelet[16285]: F0731 11:12:08.727594   16285 kubelet.go:1370] Failed to start ContainerManager failed to initialize top level QOS containers: failed to update top level Burstable QOS cgroup : failed to set supported cgroup subsystems for cgroup [kubepods burstable]: Failed to find subsystem mount for required subsystem: pids
    
    #操作系统不支持pids subsystem,升级操作系统
    [root@node1 /]# uname -r
    3.10.0-327.el7.x86_64
    [root@node1 /]# cat /proc/cgroups
    #subsys_name    hierarchy       num_cgroups     enabled
    cpuset  3       4       1
    cpu     6       59      1
    cpuacct 6       59      1
    memory  4       59      1
    devices 9       59      1
    freezer 5       4       1
    net_cls 7       4       1
    blkio   8       59      1
    perf_event      2       4       1
    hugetlb 10      4       1
    

    3.待解决

    待解决问题:/sys/fs/cgroup/memory/system.slice下run-*.scope 很多
    

    相关文章

      网友评论

          本文标题:Kubernetes QA

          本文链接:https://www.haomeiwen.com/subject/xhcmeftx.html