美文网首页
定位kube-vip 公网ip 1/3 概率性不通的问题

定位kube-vip 公网ip 1/3 概率性不通的问题

作者: cloudFans | 来源:发表于2022-10-07 09:24 被阅读0次
  1. 集群内部检查并查看svc对应的后端三个pod


[root@pro-k8s-master-1 ~]# kubectl  get po -A -o wide | grep ingress
kube-system                      traefik-ingress-controller-bb4fd888c-6cdlz                       1/1     Running             0          4m23s   10.120.37.211   inner-worker-3                                        <none>           <none>
kube-system                      traefik-ingress-controller-bb4fd888c-7jddp                       1/1     Running             0          3m39s   10.120.37.212   inner-worker-2                                        <none>           <none>
kube-system                      traefik-ingress-controller-bb4fd888c-pmrnt                       1/1     Running             0          4m57s   10.120.37.210   inner-worker-1  


[root@pro-k8s-master-1 ~]# ping 10.120.37.211
PING 10.120.37.211 (10.120.37.211) 56(84) bytes of data.
64 bytes from 10.120.37.211: icmp_seq=1 ttl=64 time=52.5 ms
^C
--- 10.120.37.211 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 52.527/52.527/52.527/0.000 ms
[root@pro-k8s-master-1 ~]# ping 10.120.37.212
PING 10.120.37.212 (10.120.37.212) 56(84) bytes of data.
64 bytes from 10.120.37.212: icmp_seq=1 ttl=64 time=2.33 ms
^C
--- 10.120.37.212 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.330/2.330/2.330/0.000 ms
[root@pro-k8s-master-1 ~]# ping 10.120.37.210
PING 10.120.37.210 (10.120.37.210) 56(84) bytes of data.
64 bytes from 10.120.37.210: icmp_seq=1 ttl=64 time=2.38 ms

[root@pro-k8s-master-3 ~]# curl 10.120.37.211
404 page not found
[root@pro-k8s-master-3 ~]# curl 10.120.37.212
404 page not found
[root@pro-k8s-master-3 ~]# curl 10.120.37.210
404 page not found


[root@pro-k8s-master-3 ~]#  kubectl  get svc -A -o wide | grep traefik-ingress-service
kube-system       traefik-ingress-service                         LoadBalancer   10.98.104.134    10.120.47.203   80:31753/TCP,8080:31108/TCP,443:31925/TCP      380d    app=traefik-ingress-lb
[root@pro-k8s-master-3 ~]#
[root@pro-k8s-master-3 ~]#
[root@pro-k8s-master-3 ~]# curl 10.98.104.134
404 page not found
[root@pro-k8s-master-3 ~]# curl 10.98.104.134
404 page not found
[root@pro-k8s-master-3 ~]# curl 10.98.104.134
404 page not found
[root@pro-k8s-master-3 ~]# curl 10.98.104.134
404 page not found
[root@pro-k8s-master-3 ~]# curl 10.98.104.134
404 page not found
[root@pro-k8s-master-3 ~]# curl 10.98.104.134
404 page not found


# 可以看到集群内部是完全没问题的
  1. 排查集群外部到集群内的链路
image.png image.png

ping 完全没问题,表示eth0 vip 的网卡子接口,完全没问题

  1. 查看10.120.47.203对应的网卡初始化是否异常
(py3env) [root@deployer env-inner-prod-on-prem]# ansible all -i inventory/env-inner-prod-on-prem/inventory.ini  -m shell -a "ip a | grep 10.120.47.203"
pro-k8s-master-1 | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
pro-k8s-master-2 | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
pro-k8s-master-3 | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-worker-2 | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global eth0 # 子接口在这个节点,在这里抓外部进来的包
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-prod-common-c6-4xl-asg-ofg-6ns-y5j-server-tv4 | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-prod-common-c6-4xl-asg-ofg-ntl-bcr-server-x5k | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-worker-3 | CHANGED | rc=0 >> # 这个dummy接口看起来有问题
    inet 10.120.47.203/32 scope global kube-ipvs0Dump was interrupted and may be inconsistent.
inner-prod-common-c6-4xl-asg-ofg-jkr-2mx-server-n45 | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-prod-common-c6-4xl-asg-ofg-tyw-5lw-server-7dp | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-prod-common-c6-4xl-asg-ofg-a2s-4jv-server-qcq | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-worker-1 | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-prod-common-c6-4xl-asg-ofg-hok-aok-server-ujn | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-prod-common-c6-4xl-asg-ofg-qgh-63l-server-tci | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
cn-xm-logging-es-asg-3pm-qut-4xx | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
inner-prod-common-c6-4xl-asg-ofg-j7q-2mj-server-ytk | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0
cn-xm-logging-es-asg-3pm-eik-cql | CHANGED | rc=0 >>
    inet 10.120.47.203/32 scope global kube-ipvs0


  1. 抓包定位比较成功的包和失败的包的异同

[root@inner-worker-2 ~]# tcpdump -i any host  10.60.22.36 and port 80 -netvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14018, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.57922 > 10.120.47.203.http: Flags [S], cksum 0x0c37 (correct), seq 3301729766, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.120.47.203.http > 10.60.22.36.57922: Flags [S.], cksum 0xf034 (correct), seq 897740622, ack 3301729767, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 196, id 14019, offset 0, flags [DF], proto TCP (6), length 40)
    10.60.22.36.57922 > 10.120.47.203.http: Flags [.], cksum 0xa118 (correct), seq 1, ack 1, win 513, length 0
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 133: (tos 0x0, ttl 196, id 14020, offset 0, flags [DF], proto TCP (6), length 117)
    10.60.22.36.57922 > 10.120.47.203.http: Flags [P.], cksum 0x0db7 (correct), seq 1:78, ack 1, win 513, length 77: HTTP, length: 77
        GET / HTTP/1.1
        Host: 10.120.47.203
        User-Agent: curl/7.84.0
        Accept: */*

Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 63, id 31000, offset 0, flags [DF], proto TCP (6), length 40)
    10.120.47.203.http > 10.60.22.36.57922: Flags [.], cksum 0xa292 (correct), seq 1, ack 78, win 58, length 0
Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 232: (tos 0x0, ttl 63, id 31001, offset 0, flags [DF], proto TCP (6), length 216)
    10.120.47.203.http > 10.60.22.36.57922: Flags [P.], cksum 0x676a (correct), seq 1:177, ack 78, win 58, length 176: HTTP, length: 176
        HTTP/1.1 404 Not Found
        Content-Type: text/plain; charset=utf-8
        X-Content-Type-Options: nosniff
        Date: Thu, 29 Sep 2022 02:14:32 GMT
        Content-Length: 19

        404 page not found
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 196, id 14021, offset 0, flags [DF], proto TCP (6), length 40)
    10.60.22.36.57922 > 10.120.47.203.http: Flags [F.], cksum 0xa01b (correct), seq 78, ack 177, win 512, length 0
Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 63, id 31002, offset 0, flags [DF], proto TCP (6), length 40)
    10.120.47.203.http > 10.60.22.36.57922: Flags [F.], cksum 0xa1e0 (correct), seq 177, ack 79, win 58, length 0
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 196, id 14022, offset 0, flags [DF], proto TCP (6), length 40)
    10.60.22.36.57922 > 10.120.47.203.http: Flags [.], cksum 0xa01a (correct), seq 79, ack 178, win 512, length 0


# 正常的包的走向
client eip <--> lb eip <--> 后端ip 

traefik-ingress-controller-bb4fd888c-pmrnt      1/1     Running          0       4m57s   10.120.37.210 inner-worker-1                                        
traefik-ingress-controller-bb4fd888c-6cdlz      1/1     Running          0       4m23s   10.120.37.211 inner-worker-3                          
traefik-ingress-controller-bb4fd888c-7jddp      1/1     Running          0       3m39s   10.120.37.212 inner-worker-2    


# 可以看到这里没进行原地址替换为vip,直接转过去了,由于路由问题,导致回包接收不到

 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14033, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.47.203.http: Flags [S], cksum 0x04d9 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14033, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
  P fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14033, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out 00:00:00:c1:56:c1 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.120.37.212.http > 10.60.22.36.58181: Flags [S.], cksum 0x3a14 (correct), seq 2311377861, ack 2980740964, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
  P dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 196, id 462, offset 0, flags [DF], proto TCP (6), length 40)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [R], cksum 0x1981 (correct), seq 2980740964, win 0, length 0
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14034, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.47.203.http: Flags [S], cksum 0x04d9 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14034, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
  P fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14034, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out 00:00:00:c1:56:c1 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.120.37.212.http > 10.60.22.36.58181: Flags [S.], cksum 0x50d2 (incorrect -> 0x2178), seq 2326981491, ack 2980740964, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
  P dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 196, id 463, offset 0, flags [DF], proto TCP (6), length 40)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [R], cksum 0x1981 (correct), seq 2980740964, win 0, length 0
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14035, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.47.203.http: Flags [S], cksum 0x04d9 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14035, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
  P fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14035, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out 00:00:00:c1:56:c1 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.120.37.212.http > 10.60.22.36.58181: Flags [S.], cksum 0x50d2 (incorrect -> 0xca8d), seq 2358263936, ack 2980740964, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
  P dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 196, id 464, offset 0, flags [DF], proto TCP (6), length 40)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [R], cksum 0x1981 (correct), seq 2980740964, win 0, length 0
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14036, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.47.203.http: Flags [S], cksum 0x04d9 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14036, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
  P fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14036, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out 00:00:00:c1:56:c1 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.120.37.212.http > 10.60.22.36.58181: Flags [S.], cksum 0x50d2 (incorrect -> 0x0cd8), seq 2420767356, ack 2980740964, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
  P dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 196, id 465, offset 0, flags [DF], proto TCP (6), length 40)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [R], cksum 0x1981 (correct), seq 2980740964, win 0, length 0
 In dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14037, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.47.203.http: Flags [S], cksum 0x04d9 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14037, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
  P fa:16:3e:47:db:fc ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 196, id 14037, offset 0, flags [DF], proto TCP (6), length 52)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [S], cksum 0x0ed0 (correct), seq 2980740963, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
Out 00:00:00:c1:56:c1 ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.120.37.212.http > 10.60.22.36.58181: Flags [S.], cksum 0x50d2 (incorrect -> 0x67da), seq 2545784838, ack 2980740964, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 9], length 0
  P dc:ef:80:5a:44:13 ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 196, id 466, offset 0, flags [DF], proto TCP (6), length 40)
    10.60.22.36.58181 > 10.120.37.212.http: Flags [R], cksum 0x1981 (correct), seq 2980740964, win 0, length 0




# 规则是没问题的
[root@inner-worker-2 ~]# ipvsadm -ln | grep -A 4 10.120.47.203:80
TCP  10.120.47.203:80 rr
  -> 10.120.37.210:80             Masq    1      8          2
  -> 10.120.37.211:80             Masq    1      5          1
  -> 10.120.37.212:80             Masq    1      2          0

#而且集群内测试 无法复现该问题
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found
[root@inner-worker-2 ~]# curl 10.120.47.203:80
404 page not found

其实从各个角度大致排除了连接数的问题:

  1. 当前集群本就是小集群,链接数都没到100,而最大限制是1024

[root@inner-worker-2 ~]# ulimit -n
1024
[root@inner-worker-2 ~]#
[root@inner-worker-2 ~]#
[root@inner-worker-2 ~]#
[root@inner-worker-2 ~]# netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
ESTABLISHED 17
TIME_WAIT 53

[https://www.jianshu.com/p/71d554222f9e](https://www.jianshu.com/p/71d554222f9e)

  1. 内存 cpu 也是足够的
[root@inner-worker-2 ~]# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 578352 255092 9095200    0    0     4    44    0    0  6  2 92  0  0
[root@inner-worker-2 ~]#

  1. 这个问题的实质在于 就只有最后一个pod会出问题,且100%复现,所以不太可能是资源限制或不足的问题
  1. 后来查了下ipvs复用连接没有snat的问题: 发现跟这个bug很像
    参考: https://imroc.cc/kubernetes/networking/faq/ipvs-conn-reuse-mode.html
# 默认系统配置为1 
# 当 conn_reuse_mode 为 0 表示启用 ipvs 连接复用,为 1 表示不复用,
# 是不是有点反直觉?这个确实也比较有争议。
[root@inner-worker-2 ~]# sysctl  -a | grep net.ipv4.vs.conn_reuse_mode
net.ipv4.vs.conn_reuse_mode = 1  # 为了解决syn丢包的问题,可以暂时将该参数改为0

开启这个内核参数实际就表示 ipvs 转发时不做连接复用,每次新建的连接都会重新调度 rs 并新建 ip_vs_conn,但它的实现有个问题: 在新建连接时 (SYN 包),如果 client ip:client port 匹配到了 ipvs 旧连接 (TIME_WIAT 状态),且使用了 conntrack,就会丢掉第一个 SYN 包,等待重传后 (1s) 才能成功建连,从而导致建连性能急剧下降。

Kubernetes 社区也发现了这个 bug,所以当 kube-proxy 使用 ipvs 转发模式时,默认将 conn_reuse_mode 置为 0 来规避这个问题,详见 PR #71114 与 issue #70747

但是即使是改为0,依然会有一定的问题,但是这个问题只要后端pod没问题就ok的,这个实际上可以通过增强ipvs的健康检查来实现, 而如果用了kube-ovn的话,可以用kube-ovn的lb以及lb健康检查机制来规避

conn_resue_mode=0 引发的问题

由于 Kubernetes 为了规避 conn_resue_mode=1 带来的性能问题,在 ipvs 模式下,让 kube-proxy 在启动时将 conn_resue_mode 置为了 0 ,即使用 ipvs 连接复用的能力,但 ipvs 连接复用有两个问题:

  1. 只要有 client ip:client port 匹配上 ip_vs_conn (发生复用),就直接转发给对应的 rs,不管 rs 当前是什么状态,即便 rs 的 weight 为 0 (通常是 TIME_WAIT 状态) 也会转发,TIME_WAIT 的 rs 通常是 Terminating 状态已销毁的 Pod,转发过去的话连接就必然异常。
  2. 高并发下大量复用,没有为新连接没有调度 rs,直接转发到所复用连接对应的 rs 上,导致很多新连接被 "固化" 到部分 rs 上。

业务中实际遇到的现象可能有很多种:

  1. 滚动更新连接异常。 被访问的服务滚动更新时,Pod 有新建有销毁,ipvs 发生连接复用时转发到了已销毁的 Pod 导致连接异常 (no route to host)。
  2. 滚动更新负载不均。 由于复用时不会重新调度连接,导致新连接也被 "固化" 在某些 Pod 上了。
  3. 新扩容的 Pod 接收流量少。 同样也是由于复用时不会重新调度连接,导致很多新连接被 "固化" 在扩容之前的这些 Pod 上了。

规避方案

参考: https://imroc.cc/kubernetes/networking/faq/ipvs-conn-reuse-mode.html

还有一个点: k8s 集群都要开启
sysctl -w net.bridge.bridge-nf-call-iptables=1
当然我这里是开启的,所以不是这个原因干扰:
https://imroc.cc/kubernetes/networking/faq/why-enable-bridge-nf-call-iptables.html

bug原文介绍:
Hello everyone:
We are very fortunate to tell you that this bug has been fixed by us and has been verified to work very well. The patch(ipvs: avoid drop first packet by reusing conntrack) is being submitted to the Linux kernel community. You can also apply this patch to your own kernel, and then only need to set net.ipv4.vs.conn_reuse_mode=1(default) and net.ipv4.vs.conn_reuse_old_conntrack=1(default). As the net.ipv4.vs.conn_reuse_old_conntrack sysctl switch is newly added. You can adapt the kube-proxy by judging whether there is net.ipv4.vs.conn_reuse_old_conntrack, if so, it means that the current kernel is the version that fixed this bug.
That Can solve the following problems:

  1. Rolling update, IPVS keeps scheduling traffic to the destroyed Pod
  2. Unbalanced IPVS traffic scheduling after scaled up or rolling update
  3. fix IPVS low throughput issue fix IPVS low throughput issue #71114
    fix IPVS low throughput issue #71114
  4. One second connection delay in masque
    https://marc.info/?t=151683118100004&r=1&w=2
  5. IPVS low throughput IPVS low throughput #70747
    IPVS low throughput #70747
  6. Apache Bench can fill up ipvs service proxy in seconds Support Restart policy in the kubelet (pre-design) #544
    Apache Bench can fill up ipvs service proxy in seconds cloudnativelabs/kube-router#544
  7. Additional 1s latency in host -> service IP -> pod when upgrading from 1.15.3 -> 1.18.1 on RHEL 8.1 Additional 1s latency in host -> service IP -> pod when upgrading from 1.15.3 -> 1.18.1 on RHEL 8.1 #90854
    Additional 1s latency in host -> service IP -> pod when upgrading from 1.15.3 -> 1.18.1 on RHEL 8.1 #90854
  8. kube-proxy ipvs conn_reuse_mode setting causes errors with high load from single client kube-proxy ipvs conn_reuse_mode setting causes errors with high load from single client #81775
    kube-proxy ipvs conn_reuse_mode setting causes errors with high load from single client #81775

Thank you.
By Yang Yuxi (TencentCloudContainerTeam)

https://github.com/kubernetes/kubernetes/pull/71114

最终结论:

# 解决centos7 3.10 内核,kube-vip,ipvs drop syn的问题
sysctl -w net.ipv4.vs.conn_reuse_mode=0

相关文章

  • 定位kube-vip 公网ip 1/3 概率性不通的问题

    集群内部检查并查看svc对应的后端三个pod 排查集群外部到集群内的链路 ping 完全没问题,表示eth0 vi...

  • 快速查询自己的网络是否为公网IP

    一、什么是公网IP? 公网IP是全球唯一的IP地址。公网IP又分为静态公网IP和动态公网IP,可进入路由器查看连接...

  • day27 静态路由项目

    一、常见的网络命令 1.1 ping 命令 1)ping 域名2)ping IP公网 IP地址3)ping 网关...

  • 黑群晖外网访问

    黑群晖外网访问有两种情况,分有公网IP/无公网IP。 公网IP:公网IP是唯一的,拥有公网IP的互联网主机中可以通...

  • 05 NAT技术描述

    1. 为什么使用NAT? 公网ip地址的短缺。为了解决这一问题,启用私网ip地址,但是私网地址也想访问公网时就不行...

  • Docker 安装Gogs

    1.拉取镜像 2.运行容器 3.网页访问公网ip:10080

  • 远程连接排错流程

    1.远程连接排错流程 1#IP地址 服务器的位置 公网ip:只要有公网IP,全世界都可以访问. 内网ip:私网 局...

  • Linux检测网络通畅命令

    1 域名和公网ip都有可能是通的,但是ping不通. 不能用是否ping通作为判断标准. 2 用telnet可以检...

  • kube-vip 0.3 external ip pending

    kube-vip 0.3.0 版本 会出现external ip 无法patch(超时)更新的情况 手动修复的话,...

  • 百度查询IP的小bug

    查看公网IP 说到如何查看自己的公网IP地址,大家都会想到使用百度、搜索IP,就会显示出自己所在网络的公网IP地址...

网友评论

      本文标题:定位kube-vip 公网ip 1/3 概率性不通的问题

      本文链接:https://www.haomeiwen.com/subject/vvahartx.html