美文网首页学习AI-大数据
第10课 Kubernetes之Service不能访问排查流程实

第10课 Kubernetes之Service不能访问排查流程实

作者: 笔名辉哥 | 来源:发表于2021-08-22 18:11 被阅读0次

    摘要

    在学习Kubernetes过程中,经常会遇到Service无法访问,这篇文章总结了可能导致的情况,希望能帮助你找到问题所在。

    内容

    为了完成本次演练,先运行部署一个应用:

    # kubectl create deployment web --image=nginx --replicas=3
    deployment.apps/web created
    # kubectl expose deployment web --port=8082 --type=NodePort
    service/web exposed
    

    确保Pod运行:

    #  kubectl get pods,svc
    NAME                      READY   STATUS    RESTARTS   AGE
    pod/dnsutils              1/1     Running   25         25h
    pod/mysql-5ws56           1/1     Running   0          20h
    pod/mysql-fwpgc           1/1     Running   0          25h
    pod/mysql-smggm           1/1     Running   0          20h
    pod/myweb-8dc2n           1/1     Running   0          25h
    pod/myweb-mfbpd           1/1     Running   0          25h
    pod/myweb-zn8z2           1/1     Running   0          25h
    pod/web-96d5df5c8-8fwsb   1/1     Running   0          69s
    pod/web-96d5df5c8-g6hgp   1/1     Running   0          69s
    pod/web-96d5df5c8-t7xzv   1/1     Running   0          69s
    
    NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
    service/kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP          25h
    service/mysql        ClusterIP   10.99.230.190    <none>        3306/TCP         25h
    service/myweb        NodePort    10.105.77.88     <none>        8080:31330/TCP   25h
    service/web          NodePort    10.103.246.193   <none>        8082:31303/TCP   17s
    

    问题1:无法通过 Service 名称访问

    如果你是访问的Service名称,需要确保CoreDNS服务已经部署:

    # kubectl get pods -n kube-system
    NAME                                 READY   STATUS    RESTARTS   AGE
    coredns-74ff55c5b-8q44c              1/1     Running   0          26h
    coredns-74ff55c5b-f7j5g              1/1     Running   0          26h
    etcd-k8s-master                      1/1     Running   2          26h
    kube-apiserver-k8s-master            1/1     Running   2          26h
    kube-controller-manager-k8s-master   1/1     Running   0          26h
    kube-flannel-ds-f5tn6                1/1     Running   0          21h
    kube-flannel-ds-ftfgf                1/1     Running   0          26h
    kube-proxy-hnp7c                     1/1     Running   0          26h
    kube-proxy-njw8l                     1/1     Running   0          21h
    kube-scheduler-k8s-master            1/1     Running   0          26h
    
    

    确认CoreDNS已部署,如果状态不是Running,请检查容器日志进一步查找问题。
    采用dnsutils来测试域名解析。
    dnsutils.yaml

    apiVersion: v1
    kind: Pod
    metadata:
      name: dnsutils
    spec:
      containers:
      - name: dnsutils
        image: mydlqclub/dnsutils:1.3
        imagePullPolicy: IfNotPresent
        command: ["sleep","3600"]
    

    运行并进入容器

    # kubectl create -f dnsutils.yaml
    
    # kubectl exec -it dnsutils sh
    kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
    
    / # nslookup web
    Server:     10.96.0.10
    Address:    10.96.0.10#53
    
    Name:   web.default.svc.cluster.local
    Address: 10.103.246.193
    

    如果解析失败,可以尝试限定命名空间:

    / # nslookup web.default
    Server:     10.96.0.10
    Address:    10.96.0.10#53
    
    Name:   web.default.svc.cluster.local
    Address: 10.103.246.193
    

    如果解析成功,需要调整应用使用跨命名空间的名称访问Service。

    如果仍然解析失败,尝试使用完全限定的名称:

    / # nslookup web.default.svc.cluster.local
    Server:     10.96.0.10
    Address:    10.96.0.10#53
    
    Name:   web.default.svc.cluster.local
    Address: 10.103.246.193
    
    

    说明:其中“default”表示正在操作的命名空间,“svc”表示是一个Service,“cluster.local”是集群域。

    再集群中的Node尝试指定DNS IP(你的可能不同,可以通过kubectl get svc -n kube-system查看)解析下:

    #  nslookup web.default.svc.cluster.local
    Server:     103.224.222.222
    Address:    103.224.222.222#53
    
    ** server can't find web.default.svc.cluster.local: REFUSED
    
    

    发现查找不到。检查 /etc/resolv.conf 文件是否正确,增加coreDNS的IP和查找路径。
    增加:

    nameserver 10.96.0.10
    search default.svc.cluster.local svc.cluster.local cluster.local
    options ndots:5
    

    改为:
    vim /etc/resolv.conf

    # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
    #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
    nameserver 103.224.222.222
    nameserver 103.224.222.223
    nameserver 8.8.8.8
    nameserver 10.96.0.10
    search default.svc.cluster.local svc.cluster.local cluster.local
    options ndots:5
    

    说明:

    nameserver:行必须指定CoreDNS Service,它通过在kubelet设置 --cluster-dns 参加自动配置。

    search :行必须包含一个适当的后缀,以便查找 Service 名称。在本例中,它在本地 Namespace(default.svc.cluster.local)、所有 Namespace 中的 Service(svc.cluster.local)以及集群(cluster.local)中查找服务。

    options :行必须设置足够高的 ndots,以便 DNS 客户端库优先搜索路径。在默认情况下,Kubernetes 将这个值设置为 5。

    问题2:无法通过 Service IP访问

    假设可以通过Service名称访问(CoreDNS正常工作),那么接下来要测试的 Service 是否工作正常。从集群中的一个节点,访问 Service 的 IP:

    # curl -I 10.103.246.193
    HTTP/1.1 200 OK
    Server: Tengine
    Date: Sun, 22 Aug 2021 13:04:15 GMT
    Content-Type: text/html
    Content-Length: 1326
    Last-Modified: Wed, 26 Apr 2017 08:03:47 GMT
    Connection: keep-alive
    Vary: Accept-Encoding
    ETag: "59005463-52e"
    Accept-Ranges: bytes
    
    

    本集群异常,连接超时:

    # curl -I 10.103.246.193
    curl: (7) Failed to connect to 10.103.246.193 port 8082: Connection timed out
    
    

    思路1:Service 端口配置是否正确?

    检查 Service 配置和使用的端口是否正确:

    # kubectl get svc web -o yaml
    apiVersion: v1
    kind: Service
    metadata:
      creationTimestamp: "2021-08-22T04:04:11Z"
      labels:
        app: web
      managedFields:
      - apiVersion: v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:labels:
              .: {}
              f:app: {}
          f:spec:
            f:externalTrafficPolicy: {}
            f:ports:
              .: {}
              k:{"port":8082,"protocol":"TCP"}:
                .: {}
                f:port: {}
                f:protocol: {}
                f:targetPort: {}
            f:selector:
              .: {}
              f:app: {}
            f:sessionAffinity: {}
            f:type: {}
        manager: kubectl-expose
        operation: Update
        time: "2021-08-22T04:04:11Z"
      name: web
      namespace: default
      resourceVersion: "118039"
      uid: fa5bbc6b-7a79-45a4-b6ba-e015340d2bab
    spec:
      clusterIP: 10.103.246.193
      clusterIPs:
      - 10.103.246.193
      externalTrafficPolicy: Cluster
      ports:
      - nodePort: 31303
        port: 8082
        protocol: TCP
        targetPort: 8082
      selector:
        app: web
      sessionAffinity: None
      type: NodePort
    status:
      loadBalancer: {}
    

    说明:

    • spec.ports[]:访问ClusterIP带的端口,8082
    • targetPort :目标端口,是容器中服务提供的端口,8082
    • spec.nodePort :集群外部访问端口,http://NodeIP:31303

    思路2:Service 是否正确关联到Pod?

    检查 Service 关联的 Pod 是否正确:

    # kubectl get pods  -o wide -l app=web
    NAME                  READY   STATUS    RESTARTS   AGE    IP           NODE        NOMINATED NODE   READINESS GATES
    web-96d5df5c8-8fwsb   1/1     Running   0          4h9m   10.244.1.5   k8s-node2   <none>           <none>
    web-96d5df5c8-g6hgp   1/1     Running   0          4h9m   10.244.1.6   k8s-node2   <none>           <none>
    web-96d5df5c8-t7xzv   1/1     Running   0          4h9m   10.244.1.4   k8s-node2   <none>           <none>
    

    -l app=hostnames 参数是一个标签选择器。

    在 Kubernetes 系统中有一个控制循环,它评估每个 Service 的选择器,并将结果保存到 Endpoints 对象中。

    在k8s-node2上却是可以通的。

    root@k8s-node2:/data/k8s# curl -I 10.244.1.4
    HTTP/1.1 200 OK
    Server: nginx/1.21.1
    Date: Sun, 22 Aug 2021 08:16:16 GMT
    Content-Type: text/html
    Content-Length: 612
    Last-Modified: Tue, 06 Jul 2021 14:59:17 GMT
    Connection: keep-alive
    ETag: "60e46fc5-264"
    Accept-Ranges: bytes
    

    这3个POD都部署在k8s-node2上,不是查询的k8s-master节点。
    说明本集群的2个节点不同,大概率是flannel出问题了。

    在 Kubernetes 系统中有一个控制循环,它评估每个 Service 的选择器,并将结果保存到 Endpoints 对象中。

    root@k8s-master:/data/k8s# kubectl get endpoints web
    NAME   ENDPOINTS                                         AGE
    web    10.244.1.4:8082,10.244.1.5:8082,10.244.1.6:8082   4h14m
    

    结果所示, endpoints 控制器已经为 Service 找到了 Pods。但并不说明关联的Pod就是正确的,还需要进一步确认Service 的 spec.selector 字段是否与Deployment中的 metadata.labels 字段值一致。

    root@k8s-master:/data/k8s# kubectl get svc web -o yaml
    ...
      selector:
        app: web
    ...
    

    获取deployment的信息;

    root@k8s-master:/data/k8s# kubectl get deployment web -o yaml
    
    ...
      selector:
        matchLabels:
          app: web
    ...
    

    思路3:Pod 是否正常工作?

    检查Pod是否正常工作,绕过Service,直接访问Pod IP:

    root@k8s-master:/data/k8s# kubectl get pods -o wide
    NAME                  READY   STATUS    RESTARTS   AGE     IP           NODE         NOMINATED NODE   READINESS GATES
    dnsutils              1/1     Running   29         29h     10.244.0.4   k8s-master   <none>           <none>
    mysql-5ws56           1/1     Running   0          24h     10.244.1.3   k8s-node2    <none>           <none>
    mysql-fwpgc           1/1     Running   0          29h     10.244.0.5   k8s-master   <none>           <none>
    mysql-smggm           1/1     Running   0          24h     10.244.1.2   k8s-node2    <none>           <none>
    myweb-8dc2n           1/1     Running   0          29h     10.244.0.7   k8s-master   <none>           <none>
    myweb-mfbpd           1/1     Running   0          29h     10.244.0.6   k8s-master   <none>           <none>
    myweb-zn8z2           1/1     Running   0          29h     10.244.0.8   k8s-master   <none>           <none>
    web-96d5df5c8-8fwsb   1/1     Running   0          4h21m   10.244.1.5   k8s-node2    <none>           <none>
    web-96d5df5c8-g6hgp   1/1     Running   0          4h21m   10.244.1.6   k8s-node2    <none>           <none>
    web-96d5df5c8-t7xzv   1/1     Running   0          4h21m   10.244.1.4   k8s-node2    <none>           <none>
    

    部署在另一个节点的pods不可以通信。

    root@k8s-master:/data/k8s# curl -I 10.244.1.3:3306
    curl: (7) Failed to connect to 10.244.1.4 port 3306: Connection timed out
    

    部署在本节点的pods可以通信。

    root@k8s-master:/data/k8s# curl -I 10.244.0.5:3306
    5.7.35=H9A_)cÿÿ󿿕b.>,q#99~/~mysql_native_password!ÿ#08S01Got packets out of order
    

    此处问题在此指向2个节点pod无法通信的问题。

    注: 使用的是 Pod 端口(3306),而不是 Service 端口(3306)。

    如果不能正常响应,说明容器中服务有问题, 这个时候可以用kubectl logs查看日志或者使用 kubectl exec 直接进入 Pod检查服务。

    除了本身服务问题外,还有可能是CNI网络组件部署问题,现象是:curl访问10次,可能只有两三次能访问,能访问时候正好Pod是在当前节点,这并没有走跨主机网络。
    如果是这种现象,检查网络组件运行状态和容器日志:

    root@k8s-master:/data/k8s# kubectl get pods -n kube-system
    NAME                                 READY   STATUS    RESTARTS   AGE
    coredns-74ff55c5b-8q44c              1/1     Running   0          29h
    coredns-74ff55c5b-f7j5g              1/1     Running   0          29h
    etcd-k8s-master                      1/1     Running   2          29h
    kube-apiserver-k8s-master            1/1     Running   2          29h
    kube-controller-manager-k8s-master   1/1     Running   0          29h
    kube-flannel-ds-f5tn6                1/1     Running   0          24h
    kube-flannel-ds-ftfgf                1/1     Running   0          29h
    kube-proxy-hnp7c                     1/1     Running   0          29h
    kube-proxy-njw8l                     1/1     Running   0          24h
    kube-scheduler-k8s-master            1/1     Running   0          29h
    

    思路4:kube-proxy 组件正常工作吗?

    如果到了这里,你的 Service 正在运行,也有 Endpoints, Pod 也正在服务。
    接下来就该检查负责 Service 的组件kube-proxy是否正常工作。
    确认 kube-proxy 运行状态:

    root@k8s-master:/data/k8s# ps -ef |grep kube-proxy
    root      8494  8469  0 Aug21 ?        00:00:15 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=k8s-master
    root     24323 25972  0 16:34 pts/1    00:00:00 grep kube-proxy
    

    如果有进程存在,下一步确认它有没有工作中有错误,比如连接主节点失败。
    要做到这一点,必须查看日志。查看日志方式取决于K8s部署方式,如果是kubeadm部署。
    检查k8s-master的日志

    root@k8s-master:/data/k8s# kubectl logs kube-proxy-hnp7c  -n kube-system
    I0821 02:41:24.705408       1 node.go:172] Successfully retrieved node IP: 192.168.0.3
    I0821 02:41:24.705709       1 server_others.go:142] kube-proxy node IP is an IPv4 address (192.168.0.3), assume IPv4 operation
    W0821 02:41:24.740886       1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
    I0821 02:41:24.740975       1 server_others.go:185] Using iptables Proxier.
    I0821 02:41:24.742224       1 server.go:650] Version: v1.20.5
    I0821 02:41:24.742656       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
    I0821 02:41:24.742680       1 conntrack.go:52] Setting nf_conntrack_max to 131072
    I0821 02:41:24.742931       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
    I0821 02:41:24.742990       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
    I0821 02:41:24.747556       1 config.go:315] Starting service config controller
    I0821 02:41:24.748858       1 shared_informer.go:240] Waiting for caches to sync for service config
    I0821 02:41:24.748901       1 config.go:224] Starting endpoint slice config controller
    I0821 02:41:24.748927       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
    I0821 02:41:24.849006       1 shared_informer.go:247] Caches are synced for endpoint slice config 
    I0821 02:41:24.849071       1 shared_informer.go:247] Caches are synced for service config 
    

    检查k8s-node2的日志

    root@k8s-master:/data/k8s# kubectl logs kube-proxy-njw8l  -n kube-system
    I0821 07:43:39.092419       1 node.go:172] Successfully retrieved node IP: 192.168.0.5
    I0821 07:43:39.092475       1 server_others.go:142] kube-proxy node IP is an IPv4 address (192.168.0.5), assume IPv4 operation
    W0821 07:43:39.108196       1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
    I0821 07:43:39.108294       1 server_others.go:185] Using iptables Proxier.
    I0821 07:43:39.108521       1 server.go:650] Version: v1.20.5
    I0821 07:43:39.108814       1 conntrack.go:52] Setting nf_conntrack_max to 131072
    I0821 07:43:39.109295       1 config.go:315] Starting service config controller
    I0821 07:43:39.109304       1 shared_informer.go:240] Waiting for caches to sync for service config
    I0821 07:43:39.109323       1 config.go:224] Starting endpoint slice config controller
    I0821 07:43:39.109327       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
    I0821 07:43:39.209418       1 shared_informer.go:247] Caches are synced for endpoint slice config 
    I0821 07:43:39.209418       1 shared_informer.go:247] Caches are synced for service config 
    
    

    发现一个信息,Unknown proxy mode "", assuming iptables proxy,表明采用的是iptables模式。

    如果是二进制方式部署:

    journalctl -u kube-proxy
    

    思路5:kube-proxy 是否在写 iptables 规则?

    kube-proxy 的主要负载 Services 的 负载均衡 规则生成,默认情况下使用iptables实现,检查一下这些规则是否已经被写好了。
    检查k8s-master的iptables记录:

    root@k8s-master:/data/k8s# iptables-save |grep web
    -A KUBE-NODEPORTS -p tcp -m comment --comment "default/myweb" -m tcp --dport 31330 -j KUBE-MARK-MASQ
    -A KUBE-NODEPORTS -p tcp -m comment --comment "default/myweb" -m tcp --dport 31330 -j KUBE-SVC-FCM76ICS4D7Y4C5Y
    -A KUBE-NODEPORTS -p tcp -m comment --comment "default/web" -m tcp --dport 31303 -j KUBE-MARK-MASQ
    -A KUBE-NODEPORTS -p tcp -m comment --comment "default/web" -m tcp --dport 31303 -j KUBE-SVC-LOLE4ISW44XBNF3G
    -A KUBE-SEP-KYOPKKRUSGN4EPOL -s 10.244.0.8/32 -m comment --comment "default/myweb" -j KUBE-MARK-MASQ
    -A KUBE-SEP-KYOPKKRUSGN4EPOL -p tcp -m comment --comment "default/myweb" -m tcp -j DNAT --to-destination 10.244.0.8:8080
    -A KUBE-SEP-MOKUSSRWIVOFT5Y7 -s 10.244.0.7/32 -m comment --comment "default/myweb" -j KUBE-MARK-MASQ
    -A KUBE-SEP-MOKUSSRWIVOFT5Y7 -p tcp -m comment --comment "default/myweb" -m tcp -j DNAT --to-destination 10.244.0.7:8080
    -A KUBE-SEP-V6Q53FEPJ64J3EJW -s 10.244.1.6/32 -m comment --comment "default/web" -j KUBE-MARK-MASQ
    -A KUBE-SEP-V6Q53FEPJ64J3EJW -p tcp -m comment --comment "default/web" -m tcp -j DNAT --to-destination 10.244.1.6:8082
    -A KUBE-SEP-YCBVNDXW4SG5UDC3 -s 10.244.1.5/32 -m comment --comment "default/web" -j KUBE-MARK-MASQ
    -A KUBE-SEP-YCBVNDXW4SG5UDC3 -p tcp -m comment --comment "default/web" -m tcp -j DNAT --to-destination 10.244.1.5:8082
    -A KUBE-SEP-YQ4MLBG6JI5O2LTN -s 10.244.0.6/32 -m comment --comment "default/myweb" -j KUBE-MARK-MASQ
    -A KUBE-SEP-YQ4MLBG6JI5O2LTN -p tcp -m comment --comment "default/myweb" -m tcp -j DNAT --to-destination 10.244.0.6:8080
    -A KUBE-SEP-ZNATZ23XMS7WU546 -s 10.244.1.4/32 -m comment --comment "default/web" -j KUBE-MARK-MASQ
    -A KUBE-SEP-ZNATZ23XMS7WU546 -p tcp -m comment --comment "default/web" -m tcp -j DNAT --to-destination 10.244.1.4:8082
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.105.77.88/32 -p tcp -m comment --comment "default/myweb cluster IP" -m tcp --dport 8080 -j KUBE-MARK-MASQ
    -A KUBE-SERVICES -d 10.105.77.88/32 -p tcp -m comment --comment "default/myweb cluster IP" -m tcp --dport 8080 -j KUBE-SVC-FCM76ICS4D7Y4C5Y
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.103.246.193/32 -p tcp -m comment --comment "default/web cluster IP" -m tcp --dport 8082 -j KUBE-MARK-MASQ
    -A KUBE-SERVICES -d 10.103.246.193/32 -p tcp -m comment --comment "default/web cluster IP" -m tcp --dport 8082 -j KUBE-SVC-LOLE4ISW44XBNF3G
    -A KUBE-SVC-FCM76ICS4D7Y4C5Y -m comment --comment "default/myweb" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-YQ4MLBG6JI5O2LTN
    -A KUBE-SVC-FCM76ICS4D7Y4C5Y -m comment --comment "default/myweb" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-MOKUSSRWIVOFT5Y7
    -A KUBE-SVC-FCM76ICS4D7Y4C5Y -m comment --comment "default/myweb" -j KUBE-SEP-KYOPKKRUSGN4EPOL
    -A KUBE-SVC-LOLE4ISW44XBNF3G -m comment --comment "default/web" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-ZNATZ23XMS7WU546
    -A KUBE-SVC-LOLE4ISW44XBNF3G -m comment --comment "default/web" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YCBVNDXW4SG5UDC3
    -A KUBE-SVC-LOLE4ISW44XBNF3G -m comment --comment "default/web" -j KUBE-SEP-V6Q53FEPJ64J3EJW
    
    

    检查k8s-node2的iptables记录:

    root@k8s-node2:/data/k8s# iptables-save |grep web
    -A KUBE-NODEPORTS -p tcp -m comment --comment "default/myweb" -m tcp --dport 31330 -j KUBE-MARK-MASQ
    -A KUBE-NODEPORTS -p tcp -m comment --comment "default/myweb" -m tcp --dport 31330 -j KUBE-SVC-FCM76ICS4D7Y4C5Y
    -A KUBE-NODEPORTS -p tcp -m comment --comment "default/web" -m tcp --dport 31303 -j KUBE-MARK-MASQ
    -A KUBE-NODEPORTS -p tcp -m comment --comment "default/web" -m tcp --dport 31303 -j KUBE-SVC-LOLE4ISW44XBNF3G
    -A KUBE-SEP-KYOPKKRUSGN4EPOL -s 10.244.0.8/32 -m comment --comment "default/myweb" -j KUBE-MARK-MASQ
    -A KUBE-SEP-KYOPKKRUSGN4EPOL -p tcp -m comment --comment "default/myweb" -m tcp -j DNAT --to-destination 10.244.0.8:8080
    -A KUBE-SEP-MOKUSSRWIVOFT5Y7 -s 10.244.0.7/32 -m comment --comment "default/myweb" -j KUBE-MARK-MASQ
    -A KUBE-SEP-MOKUSSRWIVOFT5Y7 -p tcp -m comment --comment "default/myweb" -m tcp -j DNAT --to-destination 10.244.0.7:8080
    -A KUBE-SEP-V6Q53FEPJ64J3EJW -s 10.244.1.6/32 -m comment --comment "default/web" -j KUBE-MARK-MASQ
    -A KUBE-SEP-V6Q53FEPJ64J3EJW -p tcp -m comment --comment "default/web" -m tcp -j DNAT --to-destination 10.244.1.6:8082
    -A KUBE-SEP-YCBVNDXW4SG5UDC3 -s 10.244.1.5/32 -m comment --comment "default/web" -j KUBE-MARK-MASQ
    -A KUBE-SEP-YCBVNDXW4SG5UDC3 -p tcp -m comment --comment "default/web" -m tcp -j DNAT --to-destination 10.244.1.5:8082
    -A KUBE-SEP-YQ4MLBG6JI5O2LTN -s 10.244.0.6/32 -m comment --comment "default/myweb" -j KUBE-MARK-MASQ
    -A KUBE-SEP-YQ4MLBG6JI5O2LTN -p tcp -m comment --comment "default/myweb" -m tcp -j DNAT --to-destination 10.244.0.6:8080
    -A KUBE-SEP-ZNATZ23XMS7WU546 -s 10.244.1.4/32 -m comment --comment "default/web" -j KUBE-MARK-MASQ
    -A KUBE-SEP-ZNATZ23XMS7WU546 -p tcp -m comment --comment "default/web" -m tcp -j DNAT --to-destination 10.244.1.4:8082
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.105.77.88/32 -p tcp -m comment --comment "default/myweb cluster IP" -m tcp --dport 8080 -j KUBE-MARK-MASQ
    -A KUBE-SERVICES -d 10.105.77.88/32 -p tcp -m comment --comment "default/myweb cluster IP" -m tcp --dport 8080 -j KUBE-SVC-FCM76ICS4D7Y4C5Y
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.103.246.193/32 -p tcp -m comment --comment "default/web cluster IP" -m tcp --dport 8082 -j KUBE-MARK-MASQ
    -A KUBE-SERVICES -d 10.103.246.193/32 -p tcp -m comment --comment "default/web cluster IP" -m tcp --dport 8082 -j KUBE-SVC-LOLE4ISW44XBNF3G
    -A KUBE-SVC-FCM76ICS4D7Y4C5Y -m comment --comment "default/myweb" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-YQ4MLBG6JI5O2LTN
    -A KUBE-SVC-FCM76ICS4D7Y4C5Y -m comment --comment "default/myweb" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-MOKUSSRWIVOFT5Y7
    -A KUBE-SVC-FCM76ICS4D7Y4C5Y -m comment --comment "default/myweb" -j KUBE-SEP-KYOPKKRUSGN4EPOL
    -A KUBE-SVC-LOLE4ISW44XBNF3G -m comment --comment "default/web" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-ZNATZ23XMS7WU546
    -A KUBE-SVC-LOLE4ISW44XBNF3G -m comment --comment "default/web" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YCBVNDXW4SG5UDC3
    -A KUBE-SVC-LOLE4ISW44XBNF3G -m comment --comment "default/web" -j KUBE-SEP-V6Q53FEPJ64J3EJW
    

    如果你已经讲代理模式改为IPVS了,可以通过这种方式查看。正确情况下信息如:

    [root@k8s-node1 ~]# ipvsadm -ln
    Prot LocalAddress:Port Scheduler Flags
      -> RemoteAddress:Port Forward Weight ActiveConn InActConn
    ...
    TCP 10.104.0.64:80 rr
      -> 10.244.169.135:80 Masq 1 0 0
      -> 10.244.36.73:80 Masq 1 0 0
      -> 10.244.169.136:80 Masq 1 0 0...
    

    使用ipvsadm查看ipvs相关规则,若是没有这个命令能够直接yum安装

    apt-get  install -y ipvsadm
    

    目前k8s-master的情况如下:

    root@k8s-master:/data/k8s# ipvsadm -ln
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
      -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
    
    

    正常会得到上面结果,如果没有对应规则,说明kube-proxy组件没工作或者与当前操作系统不兼容导致生成规则失败。

    附:Service工作流程图(附图为示意,非实际IP地址。)


    问题2解决:无法通过 Service IP访问

    查看iptables-save的结果没有发现异常,还是对iptalbes方式不够熟悉。采用kube-proxy开启ipvs代替iptables的方案看看。

    在k8s-master,k8s-node2这2个节点执行以下操作。

    加载内核模快

    查看内核模块是否加载负

    # lsmod|grep ip_vs
    ip_vs_sh               16384  0
    ip_vs_wrr              16384  0
    ip_vs_rr               16384  0
    ip_vs                 147456  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
    nf_conntrack          106496  7 ip_vs,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4
    libcrc32c              16384  2 raid456,ip_vs
    

    若是没有加载,使用以下命令加载ipvs相关模块性能

    modprobe -- ip_vs
    modprobe -- ip_vs_rr
    modprobe -- ip_vs_wrr
    modprobe -- ip_vs_sh
    modprobe -- nf_conntrack_ipv4
    

    更改kube-proxy配置

    # kubectl edit configmap kube-proxy -n kube-system
    

    找到以下部分的内容.net

        ipvs:
          minSyncPeriod: 0s
          scheduler: ""
          syncPeriod: 30s
        kind: KubeProxyConfiguration
        metricsBindAddress: ""
        mode: "ipvs"
        nodePortAddresses: null
    

    其中mode原来是空,默认为iptables模式,改成ipvs日志
    scheduler默认是空,默认负载均衡算法为轮训code
    编辑完,保存退出。

    删除全部kube-proxy的pod

    # kubectl get pods -n kube-system |grep kube-proxy
    kube-proxy-hnp7c                     1/1     Running   0          30h
    kube-proxy-njw8l                     1/1     Running   0          25h
    
    root@k8s-node2:/data/k8s# kubectl delete pod   kube-proxy-hnp7c  -n kube-system
    pod "kube-proxy-hnp7c" deleted
    root@k8s-node2:/data/k8s# kubectl delete pod   kube-proxy-njw8l  -n kube-system 
    pod "kube-proxy-njw8l" deleted
    
    root@k8s-node2:/data/k8s#  kubectl get pods -n kube-system |grep kube-proxy
    kube-proxy-4sv2c                     1/1     Running   0          36s
    kube-proxy-w7kpm                     1/1     Running   0          16s
    
    # kubectl logs kube-proxy-4sv2c  -n kube-system
    
    root@k8s-node2:/data/k8s# kubectl logs kube-proxy-4sv2c  -n kube-system
    I0822 09:36:38.757662       1 node.go:172] Successfully retrieved node IP: 192.168.0.3
    I0822 09:36:38.757707       1 server_others.go:142] kube-proxy node IP is an IPv4 address (192.168.0.3), assume IPv4 operation
    I0822 09:36:38.772798       1 server_others.go:258] Using ipvs Proxier.
    W0822 09:36:38.774131       1 proxier.go:445] IPVS scheduler not specified, use rr by default
    I0822 09:36:38.774388       1 server.go:650] Version: v1.20.5
    I0822 09:36:38.774742       1 conntrack.go:52] Setting nf_conntrack_max to 131072
    I0822 09:36:38.775051       1 config.go:224] Starting endpoint slice config controller
    I0822 09:36:38.775127       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
    I0822 09:36:38.775245       1 config.go:315] Starting service config controller
    I0822 09:36:38.775290       1 shared_informer.go:240] Waiting for caches to sync for service config
    I0822 09:36:38.875365       1 shared_informer.go:247] Caches are synced for endpoint slice config 
    I0822 09:36:38.875616       1 shared_informer.go:247] Caches are synced for service config 
    
    

    .有.....Using ipvs Proxier......便可.

    运行ipvsadm

    使用ipvsadm查看ipvs相关规则,若是没有这个命令能够直接使用apt-get安装。

    root@k8s-master:/data/k8s# ipvsadm -ln
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
      -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
    TCP  172.17.0.1:31330 rr
      -> 10.244.0.6:8080              Masq    1      0          0         
      -> 10.244.0.7:8080              Masq    1      0          0         
      -> 10.244.0.8:8080              Masq    1      0          0         
    TCP  192.168.0.3:31303 rr
      -> 10.244.1.4:8082              Masq    1      0          0         
      -> 10.244.1.5:8082              Masq    1      0          0         
      -> 10.244.1.6:8082              Masq    1      0          0         
    TCP  192.168.0.3:31330 rr
      -> 10.244.0.6:8080              Masq    1      0          0         
      -> 10.244.0.7:8080              Masq    1      0          0         
      -> 10.244.0.8:8080              Masq    1      0          0         
    TCP  10.96.0.1:443 rr
      -> 192.168.0.3:6443             Masq    1      0          0         
    TCP  10.96.0.10:53 rr
      -> 10.244.0.2:53                Masq    1      0          0         
      -> 10.244.0.3:53                Masq    1      0          0         
    TCP  10.96.0.10:9153 rr
      -> 10.244.0.2:9153              Masq    1      0          0         
      -> 10.244.0.3:9153              Masq    1      0          0         
    TCP  10.99.230.190:3306 rr
      -> 10.244.0.5:3306              Masq    1      0          0         
      -> 10.244.1.2:3306              Masq    1      0          0         
      -> 10.244.1.3:3306              Masq    1      0          0         
    TCP  10.103.246.193:8082 rr
      -> 10.244.1.4:8082              Masq    1      0          0         
      -> 10.244.1.5:8082              Masq    1      0          0         
      -> 10.244.1.6:8082              Masq    1      0          0         
    TCP  10.105.77.88:8080 rr
      -> 10.244.0.6:8080              Masq    1      0          0         
      -> 10.244.0.7:8080              Masq    1      0          0         
      -> 10.244.0.8:8080              Masq    1      0          0         
    TCP  10.244.0.0:31303 rr
      -> 10.244.1.4:8082              Masq    1      0          0         
      -> 10.244.1.5:8082              Masq    1      0          0         
      -> 10.244.1.6:8082              Masq    1      0          0         
    TCP  10.244.0.0:31330 rr
      -> 10.244.0.6:8080              Masq    1      0          0         
      -> 10.244.0.7:8080              Masq    1      0          0         
      -> 10.244.0.8:8080              Masq    1      0          0         
    TCP  10.244.0.1:31303 rr
      -> 10.244.1.4:8082              Masq    1      0          0         
      -> 10.244.1.5:8082              Masq    1      0          0         
      -> 10.244.1.6:8082              Masq    1      0          0         
    TCP  10.244.0.1:31330 rr
      -> 10.244.0.6:8080              Masq    1      0          0         
      -> 10.244.0.7:8080              Masq    1      0          0         
      -> 10.244.0.8:8080              Masq    1      0          0         
    TCP  127.0.0.1:31303 rr
      -> 10.244.1.4:8082              Masq    1      0          0         
      -> 10.244.1.5:8082              Masq    1      0          0         
      -> 10.244.1.6:8082              Masq    1      0          0         
    TCP  127.0.0.1:31330 rr
      -> 10.244.0.6:8080              Masq    1      0          0         
      -> 10.244.0.7:8080              Masq    1      0          0         
      -> 10.244.0.8:8080              Masq    1      0          0         
    TCP  172.17.0.1:31303 rr
      -> 10.244.1.4:8082              Masq    1      0          0         
      -> 10.244.1.5:8082              Masq    1      0          0         
      -> 10.244.1.6:8082              Masq    1      0          0         
    UDP  10.96.0.10:53 rr
      -> 10.244.0.2:53                Masq    1      0          564       
      -> 10.244.0.3:53                Masq    1      0          563
    
    root@k8s-master:/data/k8s# curl -I 10.103.246.193:8082
    ^C
    root@k8s-master:/data/k8s# curl -I 114.67.107.240:8082
    ^C
    
    

    还是没有解决。

    底层的iptables设置

    百度收到了以下一篇文章,解决flannel下k8s pod及容器无法跨主机互通问题,参考其完成在k8s-master和k8s-node2的配置。

    # iptables -P INPUT ACCEPT
    # iptables -P FORWARD ACCEPT
    # iptables -F
    
    # iptables -L -n
    
    root@k8s-master:/data/k8s#  iptables -L -n
    Chain INPUT (policy ACCEPT)
    target     prot opt source               destination         
    JDCLOUDHIDS_IN_LIVE  all  --  0.0.0.0/0            0.0.0.0/0           
    JDCLOUDHIDS_IN  all  --  0.0.0.0/0            0.0.0.0/0           
    
    Chain FORWARD (policy ACCEPT)
    target     prot opt source               destination         
    KUBE-FORWARD  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */
    ACCEPT     all  --  10.244.0.0/16        0.0.0.0/0           
    ACCEPT     all  --  0.0.0.0/0            10.244.0.0/16       
    
    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination         
    JDCLOUDHIDS_OUT_LIVE  all  --  0.0.0.0/0            0.0.0.0/0           
    JDCLOUDHIDS_OUT  all  --  0.0.0.0/0            0.0.0.0/0           
    
    Chain DOCKER-USER (0 references)
    target     prot opt source               destination         
    
    Chain JDCLOUDHIDS_IN (1 references)
    target     prot opt source               destination         
    
    Chain JDCLOUDHIDS_IN_LIVE (1 references)
    target     prot opt source               destination         
    
    Chain JDCLOUDHIDS_OUT (1 references)
    target     prot opt source               destination         
    
    Chain JDCLOUDHIDS_OUT_LIVE (1 references)
    target     prot opt source               destination         
    
    Chain KUBE-EXTERNAL-SERVICES (0 references)
    target     prot opt source               destination         
    
    Chain KUBE-FIREWALL (0 references)
    target     prot opt source               destination         
    
    Chain KUBE-FORWARD (1 references)
    target     prot opt source               destination         
    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */ mark match 0x4000/0x4000
    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
    
    Chain KUBE-KUBELET-CANARY (0 references)
    target     prot opt source               destination         
    
    Chain KUBE-PROXY-CANARY (0 references)
    target     prot opt source               destination         
    
    Chain KUBE-SERVICES (0 references)
    target     prot opt source               destination
    

    然后重新操作,发现服务节点能直接访问了。但是8082端口还是不能访问,跨节点ping包还是不成功的。

    root@k8s-master:/data/k8s# curl -I 10.103.246.193:8082
    ^C
    root@k8s-master:/data/k8s# curl -I 114.67.107.240:8082
    ^C
    
    root@k8s-master:/data/k8s# ping 10.244.1.3
    PING 10.244.1.3 (10.244.1.3) 56(84) bytes of data.
    ^C
    --- 10.244.1.3 ping statistics ---
    12 packets transmitted, 0 received, 100% packet loss, time 10999ms
    
    root@k8s-master:/data/k8s# ping 10.244.0.5
    PING 10.244.0.5 (10.244.0.5) 56(84) bytes of data.
    64 bytes from 10.244.0.5: icmp_seq=1 ttl=64 time=0.089 ms
    64 bytes from 10.244.0.5: icmp_seq=2 ttl=64 time=0.082 ms
    ^C
    --- 10.244.0.5 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 999ms
    rtt min/avg/max/mdev = 0.082/0.085/0.089/0.009 ms
    
    
    # curl -I 10.103.246.193
    HTTP/1.1 200 OK
    Server: Tengine
    Date: Sun, 22 Aug 2021 13:10:02 GMT
    Content-Type: text/html
    Content-Length: 1326
    Last-Modified: Wed, 26 Apr 2017 08:03:47 GMT
    Connection: keep-alive
    Vary: Accept-Encoding
    ETag: "59005463-52e"
    Accept-Ranges: bytes
    

    参考

    (1)K8s常见问题:Service 不能访问排查流程 https://mp.weixin.qq.com/s/oCRWkBquUnRLC36CPwoZ1Q

    (2)kube-proxy开启ipvs代替iptables
    https://www.shangmayuan.com/a/8fae7d6c18764194a8adce91.html

    相关文章

      网友评论

        本文标题:第10课 Kubernetes之Service不能访问排查流程实

        本文链接:https://www.haomeiwen.com/subject/rgiciltx.html