1.QA
Q: 控制Master节点是否可调度
A: 允许 kubectl taint nodes --all node-role.kubernetes.io/master-
禁止 kubectl taint nodes centos-master-1 node-role.kubernetes.io/master=true:NoSchedule
Q: Jan 11 09:42:40 k8s78 kubelet[517]: E0111 09:42:40.935017 517 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
A: 在kubelet中追加配置
--runtime-cgroups=/sys/fs/cgroup/systemd/system.slice --kubelet-cgroups=/sys/fs/cgroup/systemd/system.slice
Q: Failed to list *v1.Node: nodes is forbidden
A: 认证加Node,策略加NodeRestriction
--authorization-mode=Node,RBAC \
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds,NodeRestriction
Q: iptables中会自动增加服务的reject问题
A: 关于iptables规则中的KUBE-FIREWALL并不会影响服务,影响的主要原因是如下策略,该规则是动态产生不人为控制,
这也是kubernetes内部的熔断机制(类似于nginx的健康检查,k8s中的service也有),即当的服务中(只限有endpoint的服务)的所有容器都无法请求时,会自动增加reject做内部防护
Q: Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory
A: 最新版cAdvisor的bug,记录下workaround
mount -o remount,rw '/sys/fs/cgroup'
ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu
Q: Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/run-26637.scope: no space left on device
A: cat /proc/sys/fs/inotify/max_user_watches # default is 8192
sudo sysctl fs.inotify.max_user_watches=1048576 # increase to 1048576
Q: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
A: [ips@ips81 bin]$ curl 10.254.16.15:18082/apis/metrics/v1alpha1/nodes
curl: (7) Failed connect to 10.254.16.15:18082; Connection refused
[ips@ips81 bin]$ ./kubectl top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get services http:heapster:)
1.由于flannel网络不通
2./kubectl -n kube-system get ep 查看端口是否正确
Q: Metric-server: x509: subject with cn=front-proxy-client is not in the allowed list: [aggregator]
A: 请求头部标识不正确,在kube-apiserver中增加配置
--requestheader-allowed-names=aggregator,front-proxy-client
Q: failed to register unfinished metric admission_quota_controller: duplicate metrics collector registration attempted
2.相关问题思路
现象: Error updating node status, will retry: error getting node "10.1.235.82": Get https://10.1.235.7:8443/api/v1/nodes/10.1.235.82?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
分析: 刚开始的思路是由于TCP的长连接导致,kubelet的连接没有断开 ,而且现象也一直是有连接,但是端口没变,IP使用虚拟IP
原因: 由于kubelet和连接虚拟IP时是是通过虚拟IP连接的,当切换完成后IP虽然切换但是仍然有只是在其他主机上而已,需要让kubelet连接时使用本机的IP,而非虚拟IP。解决办法需要在keepalived中调整配置。
方案:
#keepalived调整前:
virtual_ipaddress {
10.1.235.7
}
#keepalived调整后:一定要指定网络所对应的子网掩码
virtual_ipaddress {
10.1.235.7/24
}
#调整前网卡信息:
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:e6:a9:c8 brd ff:ff:ff:ff:ff:ff
inet 10.1.235.82/24 brd 10.1.235.255 scope global dynamic eth0
valid_lft 62206361sec preferred_lft 62206361sec
inet 10.1.235.7/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fee6:a9c8/64 scope link
valid_lft forever preferred_lft forever
#调整后网卡信息:
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:e6:a9:c8 brd ff:ff:ff:ff:ff:ff
inet 10.1.235.82/24 brd 10.1.235.255 scope global dynamic eth0
valid_lft 62198673sec preferred_lft 62198673sec
inet 10.1.235.7/24 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fee6:a9c8/64 scope link
valid_lft forever preferred_lft forever
# 调整前kubelet和apiserver的连接信息:
tcp 0 0 10.1.235.7:44134 10.1.235.7:8443 ESTABLISHED 23559/kubelet
# 调整后kubelet和apiserver的连接信息:
tcp 0 0 10.1.235.82:44134 10.1.235.7:8443 ESTABLISHED 23559/kubelet
Q: rm: cannot remove ‘work/kubernetes/kubelet/pods/736b274c-d68a-11e8-8c3b-001b21992e84/volumes/kubernetes.io~secret/calico-node-token-9kss7’: Device or resource busy
A: cat /proc/mounts |grep "kube" |awk '{print $2}' |xargs umount
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j MASQUERADE
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 ! -d 192.168.89.0/16 -j MASQUERADE
- Deployment滚动更新过程中流量负载均衡异常,会出现丢失请求的情况
分析: Pod Terminating过程中,有些机器的Iptable还未刷新,导致部分流量仍然请求到Terminating的Pod上,导致请求出错。
方案: 利用Kubernetes的preStop特性为每个Pod设置一个退出时间,让每个Pod收到退出信号时时默认等待一段时间再退出。
- Kubernetes1.9之前Apiserver挂掉之后Kubernetes Endpoints不更新,导致部分访问失败
分析: Kubernetes1.9之前只要Apiserver启动成功Kubernetes Endpoints便不再更新,需手动维护。
方案: 升级到Kubernetes 1.10版本后设置 –endpoint-reconciler-type = lease
Use an endpoint reconciler (master-count, lease, none)
15.pids 无法mount 【version1.5.1】
Jul 31 11:12:08 node1 kubelet[16285]: F0731 11:12:08.727594 16285 kubelet.go:1370] Failed to start ContainerManager failed to initialize top level QOS containers: failed to update top level Burstable QOS cgroup : failed to set supported cgroup subsystems for cgroup [kubepods burstable]: Failed to find subsystem mount for required subsystem: pids
#操作系统不支持pids subsystem,升级操作系统
[root@node1 /]# uname -r
3.10.0-327.el7.x86_64
[root@node1 /]# cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 3 4 1
cpu 6 59 1
cpuacct 6 59 1
memory 4 59 1
devices 9 59 1
freezer 5 4 1
net_cls 7 4 1
blkio 8 59 1
perf_event 2 4 1
hugetlb 10 4 1
3.待解决
待解决问题:/sys/fs/cgroup/memory/system.slice下run-*.scope 很多
网友评论