美文网首页
k8s之calico网络

k8s之calico网络

作者: 分享放大价值 | 来源:发表于2020-06-06 20:21 被阅读0次

    环境介绍

    在一个物理server上安装三个VM,VM操作系统如下:
    root@master:~# lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description: Ubuntu 19.10
    Release: 19.10
    Codename: eoan

    一个VM作为master,另外两个VM作为worker:

    root@master:~# kubectl get nodes -o wide
    NAME     STATUS   ROLES    AGE    VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION     CONTAINER-RUNTIME
    master   Ready    master   112d   v1.17.3   192.168.122.20   <none>        Ubuntu 19.10   5.3.0-55-generic   docker://19.3.2
    node1    Ready    <none>   112d   v1.17.3   192.168.122.21   <none>        Ubuntu 19.10   5.3.0-55-generic   docker://19.3.2
    node2    Ready    <none>   112d   v1.17.3   192.168.122.22   <none>        Ubuntu 19.10   5.3.0-55-generic   docker://19.3.2
    



    calico安装
    wget https://docs.projectcalico.org/manifests/calico.yaml
    kubectl apply -f calico.yaml

    root@master:~/calico# kubectl get pod -n kube-system -o wide
    NAME                                       READY   STATUS    RESTARTS   AGE    IP               NODE     NOMINATED NODE   READINESS GATES
    calico-kube-controllers-5b644bc49c-94g6h   1/1     Running   0          82s    10.24.104.2      node2    <none>           <none>
    calico-node-75kns                          1/1     Running   0          82s    192.168.122.20   master   <none>           <none>
    calico-node-fh969                          1/1     Running   0          82s    192.168.122.22   node2    <none>           <none>
    calico-node-lbbd9                          1/1     Running   0          82s    192.168.122.21   node1    <none>           <none>
    coredns-9d85f5447-5s8k9                    0/1     Running   3          112d   10.24.219.65     master   <none>           <none>
    coredns-9d85f5447-zbc8m                    1/1     Running   2          112d   10.24.219.66     master   <none>           <none>
    etcd-master                                1/1     Running   2          112d   192.168.122.20   master   <none>           <none>
    kube-apiserver-master                      1/1     Running   2          112d   192.168.122.20   master   <none>           <none>
    kube-controller-manager-master             1/1     Running   2          112d   192.168.122.20   master   <none>           <none>
    kube-proxy-l4wn7                           1/1     Running   2          112d   192.168.122.22   node2    <none>           <none>
    kube-proxy-prhcm                           1/1     Running   2          112d   192.168.122.21   node1    <none>           <none>
    kube-proxy-psxqt                           1/1     Running   2          112d   192.168.122.20   master   <none>           <none>
    kube-scheduler-master                      1/1     Running   2          112d   192.168.122.20   master   <none>           <none>
    

    calico客户端命令工具-calicoctl,可用来查看,修改calico配置

    wget https://github.com/projectcalico/calicoctl/releases/download/v3.5.4/calicoctl -O /usr/bin/calicoctl
    chmod +x /usr/bin/calicoctl
    

    网络模式

    calico支持三种网络模式,可通过修过calico.yaml进行配置:

    • overlay之ipip
    • overlay之vxlan
    • underlay之BGP

    下面分别进行配置验证,并分析数据流向

    overlay -- ipip

    configure

    安装完calico,默认就是ipip模式。
    node之间是full mesh连接。

    root@master:~/calico# calicoctl node status
    Calico process is running.
    
    IPv4 BGP status
    +----------------+-------------------+-------+----------+-------------+
    |  PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
    +----------------+-------------------+-------+----------+-------------+
    | 192.168.122.21 | node-to-node mesh | up    | 17:37:27 | Established |
    | 192.168.122.22 | node-to-node mesh | up    | 17:37:28 | Established |
    +----------------+-------------------+-------+----------+-------------+
    
    IPv6 BGP status
    No IPv6 peers found.
    

    进入calico pod,查看运行的进程。

    • felix为pod配置直接路由,管理接口
    • bird感知pod直接路由,并通过bgp发布给其他node
    • confd动态更新bird的配置文件
    root@master:~/calico# kubectl exec -it calico-node-lbbd9 -n kube-system bash
    [root@node1 /]# ps -ef
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 17:37 ?        00:00:00 /usr/local/bin/runsvdir -P /etc/service/enabled
    root        44     1  0 17:37 ?        00:00:00 runsv felix
    root        45     1  0 17:37 ?        00:00:00 runsv bird6
    root        46     1  0 17:37 ?        00:00:00 runsv bird
    root        47     1  0 17:37 ?        00:00:00 runsv confd
    root        51    47  0 17:37 ?        00:00:00 calico-node -confd
    root       148    45  0 17:37 ?        00:00:00 bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
    root       149    46  0 17:37 ?        00:00:00 bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
    root       163    44  2 17:37 ?        00:00:06 calico-node -felix
    root       866     0  0 17:40 pts/0    00:00:00 bash
    root      1263   866  0 17:42 pts/0    00:00:00 ps -ef
    

    而且在node上会多出一个网络接口tunl0,用于封装/解封装ipip报文

    11: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
        link/ipip 0.0.0.0 brd 0.0.0.0
        inet 10.24.166.129/32 brd 10.24.166.129 scope global tunl0
           valid_lft forever preferred_lft forever
    

    verify

    通过下面yaml文件部署两个pod,验证网络连通性。
    nginx.yaml
    1nginx.yaml -- 复制nginx.yaml,修改name

    root@master:~# cat nginx.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
    spec:
      replicas: 1
      selector:
        matchLabels:
          name: nginx
      template:
        metadata:
          labels:
            name: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.7.9
            imagePullPolicy: Always
    
    ---
    kind: Service
    apiVersion: v1
    metadata:
      name: nginx
    spec:
      type: ClusterIP
      ports:
      - name: nginx
        port: 3306
        targetPort: 80
        protocol: TCP
      selector:
        name: nginx
    
    root@master:~# kubectl apply -f nginx.yaml
    deployment.apps/nginx unchanged
    service/nginx unchanged
    root@master:~# kubectl apply -f 1nginx.yaml
    deployment.apps/nginx1 unchanged
    service/nginx1 unchanged
    root@master:~# kubectl get pod -o wide
    NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE    NOMINATED NODE   READINESS GATES
    nginx-677dc4d96-vrbp5    1/1     Running   0          18s   10.24.104.3     node2   <none>           <none>
    nginx1-677dc4d96-8bjvv   1/1     Running   0          21s   10.24.166.130   node1   <none>           <none>
    

    可看到两个pod分别部署在不同的worker上。
    进入一个pod,可以ping通另一个pod

    root@master:~# kubectl exec -it nginx1-677dc4d96-8bjvv bash
    root@nginx1-677dc4d96-8bjvv:/# ping 10.24.104.3 -c1
    PING 10.24.104.3 (10.24.104.3): 48 data bytes
    56 bytes from 10.24.104.3: icmp_seq=0 ttl=62 time=2.369 ms
    --- 10.24.104.3 ping statistics ---
    1 packets transmitted, 1 packets received, 0% packet loss
    round-trip min/avg/max/stddev = 2.369/2.369/2.369/0.000 ms
    

    traffic flow

    image.png

    以10.24.166.130 ping 10.24.104.3 为例:

    1. 查找pod内路由表可知,需要发送给默认路由 169.254.1.1。
      发送arp请求169.254.1.1的mac。arp请求报文会到达caliadb5d6cab6f。此设备设置了arp proxy,所以会将它的mac回复给pod。(可在caliadb5d6cab6f抓到arp请求和回复报文)
    root@node1:~# cat /proc/sys/net/ipv4/conf/caliadb5d6cab6f/proxy_arp
    1
    
    1. 学习到mac地址后,发送icmp请求报文
    2. 在eth0设备的驱动发送函数veth_xmit函数中,将skb->dev指向eth0的peer设备caliadb5d6cab6f,接着调用netif_rx进入协议栈查找路由。
      可在caliadb5d6cab6f抓到报文。
    18:17:50.003013 0a:65:aa:2b:ef:d1 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 64, id 47525, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.130 > 10.24.104.3: ICMP echo request, id 7168, seq 0, length 56
    
    1. icmp请求到达caliadb5d6cab6f, 查找host路由表得知,下一跳为192.168.122.22(node2的ip),并且需要通过tunl0进行隧道封装。
    root@node1:~# ip r
    default via 192.168.122.1 dev ens3 proto static
    10.24.104.0/26 via 192.168.122.22 dev tunl0 proto bird onlink
    blackhole 10.24.166.128/26 proto bird
    10.24.166.130 dev caliadb5d6cab6f scope link
    10.24.219.64/26 via 192.168.122.20 dev tunl0 proto bird onlink
    172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
    192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.21
    

    所以报文达到tunl0设备时,报文格式如下,源目的ip不变,因为ipip模式,所以mac已经没了。

    root@node1:~# tcpdump -vne -i tunl0
    tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
    18:20:45.293856 ip: (tos 0x0, ttl 63, id 52265, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.130 > 10.24.104.3: ICMP echo request, id 7424, seq 0, length 56
    18:20:45.294975 ip: (tos 0x0, ttl 63, id 57896, offset 0, flags [none], proto ICMP (1), length 76)
        10.24.104.3 > 10.24.166.130: ICMP echo reply, id 7424, seq 0, length 56
    

    封装完ipip,根据外层ip再次查找host路由表,从ens3网卡发送出去

    192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.21
    
    1. 封装后从ens3网卡发出
      最终封装的icmp request报文,可在ens3抓到 ipip 报文
      root@node1:~# tcpdump -vne -i ens3 host 192.168.122.22
    tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
    18:31:17.809729 52:54:00:74:ac:0d > 52:54:00:f3:3a:90, ethertype IPv4 (0x0800), length 110: (tos 0x0, ttl 63, id 2590, offset 0, flags [DF], proto IPIP (4), length 96)
        192.168.122.21 > 192.168.122.22: (tos 0x0, ttl 63, id 61416, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.130 > 10.24.104.3: ICMP echo request, id 7680, seq 0, length 56
    
    1. 封装数据包到达node2后,因为目的ip为local,所以接收此数据包,并向上层协议传递。
      解封装后,将报文发送给tunl0网卡,可在此抓到icmp请求报文
    root@node2:~# tcpdump -vne -i tunl0
    tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
    18:38:56.717329 ip: (tos 0x0, ttl 63, id 19824, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.130 > 10.24.104.3: ICMP echo request, id 7936, seq 0, length 56
    
    1. 再次查找host路由表,得知目的ip 10.24.104.3发给
      calie935ef337bb
    10.24.104.3 dev calie935ef337bb scope link
    
    1. 通过veth,发送到pod
    2. icmp reply数据包处理过程类似

    overlay -- vxlan

    configure

    参考:https://docs.projectcalico.org/getting-started/kubernetes/installation/config-options : Switching from IP-in-IP to VXLAN

    修过 calico.yaml:

    1. Replace environment variable name CALICO_IPV4POOL_IPIP withCALICO_IPV4POOL_VXLAN. Leave the value of the new variable as “Always”.
    2. Optionally, (to save some resources if you’re running a VXLAN-only cluster) completely disable Calico’s BGP-based networking:
      Replace calico_backend: "bird" with calico_backend: "vxlan". This disables BIRD.
      Comment out the line - -bird-ready and - -bird-live from the calico/node readiness/liveness check (otherwise disabling BIRD will cause the readiness/liveness check to fail on every node):
              livenessProbe:
                exec:
                  command:
                  - /bin/calico-node
                  - -felix-live
                 # - -bird-live
              readinessProbe:
                exec:
                  command:
                  - /bin/calico-node
                  # - -bird-ready
                  - -felix-ready
    

    重新apply calico.yaml

    kubectl apply -f ./calico.yaml
    



    查看calico node上运行的进程,已经没了bird等和BGP相关的进程。

    root@master:~/calico# kubectl exec -it calico-node-9lh84 -n kube-system bash
    [root@node1 /]# ps -ef
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 10:37 ?        00:00:00 /usr/local/bin/runsvdir -P /etc/service/enabled
    root        37     1  0 10:38 ?        00:00:00 runsv felix
    root        38    37  1 10:38 ?        00:02:08 calico-node -felix
    root      2128     0  1 12:45 pts/0    00:00:00 bash
    root      2148  2128  0 12:45 pts/0    00:00:00 ps -ef
    

    calicoctl查看node状态,也已经没有BGP相关内容

    root@master:~# calicoctl node status
    Calico process is running.
    
    None of the BGP backend processes (BIRD or GoBGP) are running.
    

    而且每个节点上多了一个网络接口:

    7: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue state UNKNOWN group default
        link/ether 66:f9:37:c3:7e:94 brd ff:ff:ff:ff:ff:ff
        inet 10.24.166.128/32 brd 10.24.166.128 scope global vxlan.calico
           valid_lft forever preferred_lft forever
        inet6 fe80::64f9:37ff:fec3:7e94/64 scope link
           valid_lft forever preferred_lft forever
    

    verify

    和ipip模式verify一样,创建两个pod。
    进入一个pod,可以ping通另一个pod

    root@master:~# kubectl get pod -o wide
    NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE    NOMINATED NODE   READINESS GATES
    nginx-677dc4d96-xysui 1/1     Running   0          18s   10.24.104.2     node2   <none>           <none>
    nginx-677dc4d96-wkkcn   1/1     Running   0          21s   10.24.166.130   node1   <none>           <none>
    root@master:~# kubectl exec -it nginx-677dc4d96-wkkcn bash
    root@nginx-677dc4d96-wkkcn:/# ping 10.24.104.2 -c1
    PING 10.24.104.2 (10.24.104.2): 48 data bytes
    56 bytes from 10.24.104.2: icmp_seq=0 ttl=62 time=2.519 ms
    --- 10.24.104.2 ping statistics ---
    1 packets transmitted, 1 packets received, 0% packet loss
    round-trip min/avg/max/stddev = 2.519/2.519/2.519/0.000 ms
    

    traffic flow

    image.png

    以10.24.166.130 ping 10.24.104.2 为例:

    1. 查找pod内路由表可知,需要发送给默认路由 169.254.1.1。
      pod内邻居表项有169.254.1.1对应的mac地址(可能是calico静态配置的)。
    root@nginx-677dc4d96-wkkcn:/# ip neigh
    169.254.1.1 dev eth0 lladdr ee:ee:ee:ee:ee:ee STALE
    192.168.122.21 dev eth0 lladdr ee:ee:ee:ee:ee:ee STALE
    

    所以pod发出icmp request报文,可在eth0抓到。

    1. 在eth0设备的驱动发送函数veth_xmit函数中,将skb->dev指向eth0的peer设备caliea5b03f12b8,接着调用netif_rx进入协议栈查找路由。
      可在caliea5b03f12b8抓到报文。
    16:39:36.406630 4e:78:56:5f:78:5d > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 64, id 59734, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.130 > 10.24.104.2: ICMP echo request, id 21248, seq 0, length 56
    
    1. icmp请求到达caliea5b03f12b8, 查找host路由表得知,下一跳为10.24.104.0(node2的vxlan设备ip),并且需要通过vxlan.calico进行隧道封装。
    root@node1:~# ip r
    default via 192.168.122.1 dev ens3 proto static
    10.24.104.0/26 via 10.24.104.0 dev vxlan.calico onlink
    10.24.166.130 dev caliea5b03f12b8 scope link
    10.24.219.64/26 via 10.24.219.64 dev vxlan.calico onlink
    172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
    192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.21
    
    

    从neigh信息可知,10.24.104.0 对应的mac地址为66:2d:bf:44:a6:8b

    root@node1:~# ip neigh
    192.168.122.22 dev ens3 lladdr 52:54:00:f3:3a:90 STALE
    10.24.219.64 dev vxlan.calico lladdr 66:4f:26:ae:af:db PERMANENT
    10.24.166.130 dev caliea5b03f12b8 lladdr 4e:78:56:5f:78:5d STALE
    192.168.122.1 dev ens3 lladdr 52:54:00:32:63:2e REACHABLE
    10.24.104.0 dev vxlan.calico lladdr 66:2d:bf:44:a6:8b PERMANENT
    192.168.122.20 dev ens3 lladdr 52:54:00:d9:d7:07 REACHABLE
    

    所以报文达到vxlan.calico设备时,报文格式如下,源目的ip不变,但是目的mac已经变为10.24.104.0对应的mac,源mac变为vxlan.calico设备的mac

    13:44:39.560217 66:f9:37:c3:7e:94 > 66:2d:bf:44:a6:8b, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 48899, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.130 > 10.24.104.2: ICMP echo request, id 16128, seq 0, length 56
    

    在 vxlan_xmit 中调用 vxlan_find_mac 根据目的mac查找fdb信息。
    从fdb信息可知,mac 66:2d:bf:44:a6:8b 对应ip 192.168.122.22。
    此ip即为vxlan外层目的ip。

    root@node1:~# bridge fdb show dev vxlan.calico
    66:2d:bf:44:a6:8b dst 192.168.122.22 self permanent
    66:4f:26:ae:af:db dst 192.168.122.20 self permanent
    

    封装完vxlan,根据外层ip再次查找host路由表,从ens3网卡发送出去

    192.168.122.22 dev ens3 lladdr 52:54:00:f3:3a:90 STALE
    
    1. 封装后从ens3网卡发出
      最终封装的icmp request报文,可在ens3抓到
        192.168.122.21.44936 > 192.168.122.22.4789: VXLAN, flags [I] (0x08), vni 4096
    66:f9:37:c3:7e:94 > 66:2d:bf:44:a6:8b, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 1065, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.130 > 10.24.104.2: ICMP echo request, id 15616, seq 0, length 56
    
    1. 封装数据包到达node2后,因为目的ip为local,所以接收此数据包,并向上层协议传递。
      node2上正在监听4789端口号(创建vxlan.calico时,添加的socket vxlan_sock_add),如果有报文来了调用vxlan_rcv处理vxlan报文,
    root@node2:~# netstat -nap | grep 4789
    udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -
    

    解封装后,将报文发送给vxlan.calico网卡,可在此抓到报文

    13:44:25.320094 66:f9:37:c3:7e:94 > 66:2d:bf:44:a6:8b, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 47307, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.130 > 10.24.104.2: ICMP echo request, id 15872, seq 0, length 56
    
    1. 再次查找host路由表,得知目的ip 10.24.104.2发给
      cali82cc91000b8
    10.24.104.2 dev cali82cc91000b8 scope link
    
    1. 通过veth,发送到pod
    2. icmp reply数据包处理过程类似

    underlay -- BGP

    configure

    修改 calico.yaml,将 CALICO_IPV4POOL_IPIP 的value改完 Never

                # Enable IPIP
                - name: CALICO_IPV4POOL_IPIP
                  value: "Never"
    

    重新apply calico.yaml

    kubectl apply -f calico.yaml
    

    查看 calico node status和calico node上的进程,看和ipip模式没有区别。区别在于worker上的路由表,跨节点通信不再通过tunl0。

    root@master:~/calico# calicoctl node status
    Calico process is running.
    
    IPv4 BGP status
    +----------------+-------------------+-------+----------+-------------+
    |  PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
    +----------------+-------------------+-------+----------+-------------+
    | 192.168.122.21 | node-to-node mesh | up    | 19:56:08 | Established |
    | 192.168.122.22 | node-to-node mesh | up    | 19:56:09 | Established |
    +----------------+-------------------+-------+----------+-------------+
    
    IPv6 BGP status
    No IPv6 peers found.
    root@master:~/calico# kubectl exec -it -n kube-system calico-node-czhnn bash
    [root@node1 /]# ps -ef
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 19:56 ?        00:00:00 /usr/local/bin/runsvdir -P /etc/service/enabled
    root        42     1  0 19:56 ?        00:00:00 runsv felix
    root        43     1  0 19:56 ?        00:00:00 runsv bird6
    root        44     1  0 19:56 ?        00:00:00 runsv bird
    root        45     1  0 19:56 ?        00:00:00 runsv confd
    root        47    42  2 19:56 ?        00:00:02 calico-node -felix
    root        48    45  0 19:56 ?        00:00:00 calico-node -confd
    root       144    44  0 19:56 ?        00:00:00 bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
    root       145    43  0 19:56 ?        00:00:00 bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
    root       493     0  1 19:57 pts/0    00:00:00 bash
    root       518   493  0 19:57 pts/0    00:00:00 ps -ef
    

    或者通过如下方式动态更新,从IPIP到纯BGP模式

    root@master:~# calicoctl get ipPool --export -o yaml > pool.yaml
    修改ipipMode为Never
    root@master:~# cat pool.yaml
    apiVersion: projectcalico.org/v3
    items:
    - apiVersion: projectcalico.org/v3
      kind: IPPool
      metadata:
        creationTimestamp: 2020-05-30T18:27:41Z
        name: default-ipv4-ippool
        resourceVersion: "4950731"
        uid: 79dac11f-309c-423a-ad5c-8235aafd08ea
      spec:
        cidr: 10.24.0.0/16
        ipipMode: Never
        natOutgoing: true
    kind: IPPoolList
    metadata:
      resourceVersion: "4950758"
    使配置生效
    root@master:~# calicoctl replace -f pool.yaml
    Successfully replaced 1 'IPPool' resource(s)
    

    verify

    和ipip模式verify一样,创建两个pod。
    进入一个pod,可以ping通另一个pod

    root@master:~# kubectl get pod -o wide
    NAME                     READY   STATUS    RESTARTS   AGE   IP              NODE    NOMINATED NODE   READINESS GATES
    nginx-677dc4d96-c6mxz    1/1     Running   0          14s   10.24.104.1     node2   <none>           <none>
    nginx1-677dc4d96-bjnw9   1/1     Running   0          17s   10.24.166.128   node1   <none>           <none>
    root@master:~# kubectl exec -it nginx1-677dc4d96-bjnw9 bash
    root@nginx1-677dc4d96-bjnw9:/# ping 10.24.104.1 -c1
    PING 10.24.104.1 (10.24.104.1): 48 data bytes
    56 bytes from 10.24.104.1: icmp_seq=0 ttl=63 time=4.949 ms
    --- 10.24.104.1 ping statistics ---
    1 packets transmitted, 1 packets received, 0% packet loss
    round-trip min/avg/max/stddev = 4.949/4.949/4.949/0.000 ms
    

    traffic flow

    image.png

    以10.24.166.128 ping 10.24.104.1为例

    1. 查找pod内路由表可知,需要发送给默认路由 169.254.1.1。
      发送arp请求169.254.1.1的mac。arp请求报文会到底caliadb5d6cab6f。此设备设置了arp proxy,所以会将它的mac回复给pod。(可在caliadb5d6cab6f抓到arp请求和回复报文)
    2. 学习到mac地址后,发送icmp请求报文
    3. 在eth0设备的驱动发送函数veth_xmit函数中,将skb->dev指向eth0的peer设备cali5a1d2678510,接着调用netif_rx进入协议栈查找路由。
      可在cali5a1d2678510抓到报文。
    20:11:15.035450 7a:17:c4:cf:73:81 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 64, id 57736, offset 0, flags [DF], proto ICMP (1), length 76)
        10.24.166.128 > 10.24.104.1: ICMP echo request, id 6400, seq 0, length 56
    
    1. icmp请求到达cali5a1d2678510, 查找host路由表得知,下一跳为192.168.122.22(node2的ip),出接口为ens3,不用再经过任何封装。
    root@node1:~# ip r
    default via 192.168.122.1 dev ens3 proto static
    10.24.104.0/26 via 192.168.122.22 dev ens3 proto bird
    10.24.166.128 dev cali5a1d2678510 scope link
    blackhole 10.24.166.128/26 proto bird
    10.24.219.65 via 192.168.122.20 dev ens3 proto bird
    10.24.219.66 via 192.168.122.20 dev ens3 proto bird
    172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
    192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.21
    
    1. icmp请求报文从ens3网卡发出,源目的ip就是pod的ip
      20:13:48.448931 52:54:00:74:ac:0d > 52:54:00:f3:3a:90, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 2546, offset 0, flags [DF], proto ICMP (1), length 76)
      10.24.166.128 > 10.24.104.1: ICMP echo request, id 6912, seq 0, length 56
    2. 请求报文达到node2后,查找路由表得知目的ip 10.24.104.1发给
      cali06f028cd84e
    root@node2:~# ip r
    default via 192.168.122.1 dev ens3 proto static
    10.24.104.0 dev cali1cd7c4c9ed9 scope link
    blackhole 10.24.104.0/26 proto bird
    10.24.104.1 dev cali06f028cd84e scope link
    10.24.166.128/26 via 192.168.122.21 dev ens3 proto bird
    10.24.219.65 via 192.168.122.20 dev ens3 proto bird
    10.24.219.66 via 192.168.122.20 dev ens3 proto bird
    172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
    192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.22
    
    1. 通过veth,发送到pod
    2. icmp reply数据包处理过程类似

    Q&A

    引用自 https://docs.projectcalico.org/reference/faq

    1. Why does my container have a route to 169.254.1.1?
      In a Calico network, each host acts as a gateway router for the workloads that it hosts. In container deployments, Calico uses 169.254.1.1 as the address for the Calico router. By using a link-local address, Calico saves precious IP addresses and avoids burdening the user with configuring a suitable address.
      While the routing table may look a little odd to someone who is used to configuring LAN networking, using explicit routes rather than subnet-local gateways is fairly common in WAN networking.

    2. Why can’t I see the 169.254.1.1 address mentioned above on my host?
      Calico tries hard to avoid interfering with any other configuration on the host. Rather than adding the gateway address to the host side of each workload interface, Calico sets the proxy_arp flag on the interface. This makes the host behave like a gateway, responding to ARPs for 169.254.1.1 without having to actually allocate the IP address to the interface.

    3. Why do all cali* interfaces have the MAC address ee:ee:ee:ee:ee:ee?
      In some setups the kernel is unable to generate a persistent MAC address and so Calico assigns a MAC address itself. Since Calico uses point-to-point routed interfaces, traffic does not reach the data link layer so the MAC Address is never used and can therefore be the same for all the cali* interfaces.

    相关文章

      网友评论

          本文标题:k8s之calico网络

          本文链接:https://www.haomeiwen.com/subject/npqxtktx.html