TL;DR 为了实现 k8s 对网络的要求,可以通过 host-gw 给宿主机添加路由的方式实现,但是仅局限于二层互联的网络,所以一般都用三层的 overlay 网络,本文实验基于 udp 测试 overlay
什么是 flannel
flannel 是一个三层的网络解决方案,每个宿主机上运行一个 flanneld
程序,并且设置 flannel0 网卡,从配置中心 (etcd) 或是 k8s api 获取当前宿主机的容器子网,根据子网设置 docker0 网桥,来实现跨主机的容器网络通信。当然 flannel 只是个框加,背后的具体实现 backend
有很多种:host-gw, vxlan, udp,还有其它云厂商提商的解决方案,比如 alicloud-vpc-backend, aws-vpc-backend, google gce-backend
udp overlay
三层网络的实现有很多种,在不支持 vxlan 的内核中,udp overlay 就是一种实现方式,但是因为性能太差,己经被弃用了,仅用于调试。
![](https://img.haomeiwen.com/i659877/5249e29d9d01501e.png)
上图就是 udp overlay 实现的拓扑结构,网络数据包不会自己飞的,无论是否是 overlay/underlay, 都要老老实实的走过完整的网络协义。比如 172.17.8.2 去 ping 172.17.5.2
- 容器 172.17.8.2 数据包发送到本机网关, docker0 网桥 172.17.8.1
- docker0 接到数据包后,查看宿主机路由,172.17.0.0/16 的要发送到网卡 flannel0
- flannel0 网卡的数据由用户态的
flanneld
程序接收后,封装成 udp payload 数据 - 走宿主机路由,将 udp 数据发送到 192.168.43.161:8285,注意这个端口就是
flanneld
程序的监听端口 - 192.168.43.161 宿主机
flanneld
解析 udp 包后,将 payload 数据发送到 flannel0 网卡,最后到容器 172.17.5.2,完成容器互通
实验测试
实验都是手工操作,没有使用 k8s,也没有压测数据,不过从 ping 的结果看,udp 实现的 overlay 网络确实不行,而且 flanneld
进程还是 go 写的大压力情况下应该不行。
1. 启动 etcd
这一点不得不吐槽,flannel 居然用的还是 etcd v2 协义,现在主流都是 v3 了。
/usr/bin/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379
默认 etcd 启动只监听回环接口,所以需要写成 0.0.0.0 或是指定的,然后配置 flannel 大网段
etcdctl set /coreos.com/network/config '{ "Network": "172.17.0.0/16", "Backend": {"Type": "udp"}}'
2. 启动 flanneld
下载 flanneld
当前版本是 0.11.0
wget https://github.com/coreos/flannel/releases/download/v0.11.0/flanneld-amd64 && chmod +x flanneld-amd64
然后启动 flanneld
./flanneld-amd64 -etcd-endpoints=http://192.168.43.161:2379 -etcd-prefix=/coreos.com/network -v=3 -etcd-username="" > /var/log/flanneld 2>&1 &
默认 -etcd-prefix
就是 /coreos.com/network
,可以不指定。然后查看启动日志,另外也可以查看 etcd 能看到具体配置,写了哪些数据。
root@ubuntu2:~# tail -f /var/log/flanneld
I1225 09:03:36.654421 23981 main.go:317] Wrote subnet file to /run/flannel/subnet.env
I1225 09:03:36.654441 23981 main.go:321] Running backend.
I1225 09:03:36.654582 23981 udp_network_amd64.go:100] Watching for new subnet leases
I1225 09:03:36.656860 23981 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I1225 09:03:36.656880 23981 iptables.go:167] Deleting iptables rule: -s 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.657763 23981 iptables.go:167] Deleting iptables rule: -d 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.659096 23981 iptables.go:155] Adding iptables rule: -s 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.662668 23981 iptables.go:155] Adding iptables rule: -d 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.667833 23981 udp_network_amd64.go:195] Subnet added: 172.17.5.0/24
I1225 09:03:36.669084 23981 main.go:429] Waiting for 22h59m59.902809261s to renew lease
5. 配置 docker 子网
启动 flanneld 后会看到当前多了一个 flannel0 网卡
root@ubuntu1:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:50:03:fc brd ff:ff:ff:ff:ff:ff
inet 192.168.43.161/24 brd 192.168.43.255 scope global dynamic enp0s3
valid_lft 2848sec preferred_lft 2848sec
inet6 2409:8900:1d61:462e:a00:27ff:fe50:3fc/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 3571sec preferred_lft 3571sec
inet6 fe80::a00:27ff:fe50:3fc/64 scope link
valid_lft forever preferred_lft forever
6: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc fq_codel state UNKNOWN group default qlen 500
link/none
inet 172.17.5.0/32 scope global flannel0
valid_lft forever preferred_lft forever
inet6 fe80::66f8:e24e:8ce8:24f0/64 scope link stable-privacy
valid_lft forever preferred_lft forever
然后会生成一个关于本机 docker 子网的配置文件
root@ubuntu1:~# cat /run/flannel/subnet.env
FLANNEL_NETWORK=172.17.0.0/16
FLANNEL_SUBNET=172.17.5.1/24
FLANNEL_MTU=1472
FLANNEL_IPMASQ=false
这个就是当前 docker 子网应配置 docker0 网桥的 ip 设置,官方 flannel github 有一个脚本 mk-docker-opts.sh
,可以自行去下载,用这个脚本生成 docker ops 配置
root@ubuntu1:~# ./mk-docker-opts.sh -i
root@ubuntu1:~# cat /run/docker_opts.env
DOCKER_OPT_BIP="--bip=172.17.5.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=true"
DOCKER_OPT_MTU="--mtu=1472"
其实这都是一一对应的,不用脚本自己写也行,但是这个 /run/docker_opts.env
要放到 docker 启动文件里的。
root@ubuntu1:~# cat /lib/systemd/system/docker.service
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
EnvironmentFile=/run/docker_opts.env
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd $DOCKER_OPT_BIP $DOCKER_OPT_IPM
ASQ $DOCKER_OPT_MTU
当前使用 systemctl 来管理服务,所以添加 EnvironmentFile=/run/docker_opts.env
到 [Service] 下面,并且将刚才的配置放到 dockerd 启动命令后面,然后启动 docker,会发现 docker0 ip 己正确设置
root@ubuntu1:~# systemctl start docker
root@ubuntu1:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:50:03:fc brd ff:ff:ff:ff:ff:ff
inet 192.168.43.161/24 brd 192.168.43.255 scope global dynamic enp0s3
valid_lft 2603sec preferred_lft 2603sec
inet6 2409:8900:1d61:462e:a00:27ff:fe50:3fc/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 3239sec preferred_lft 3239sec
inet6 fe80::a00:27ff:fe50:3fc/64 scope link
valid_lft forever preferred_lft forever
8: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc fq_codel state UNKNOWN group default qlen 500
link/none
inet 172.17.5.0/32 scope global flannel0
valid_lft forever preferred_lft forever
inet6 fe80::ca56:6554:961b:eb4d/64 scope link stable-privacy
valid_lft forever preferred_lft forever
9: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:27:60:6b:cd brd ff:ff:ff:ff:ff:ff
inet 172.17.5.1/24 brd 172.17.5.255 scope global docker0
valid_lft forever preferred_lft forever
6. 启动容器
分别在两台测试机上启动容器
root@ubuntu2:~# docker run -it myubuntu /bin/bash
7. 容器互 ping
查看当前测试容器 ip
root@f00161eaa2f6:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue state UP group default
link/ether 02:42:ac:11:08:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.8.2/24 brd 172.17.8.255 scope global eth0
valid_lft forever preferred_lft forever
测试 ping 本机 docker0 网桥
root@f00161eaa2f6:/# ping 172.17.8.1
PING 172.17.8.1 (172.17.8.1) 56(84) bytes of data.
64 bytes from 172.17.8.1: icmp_seq=1 ttl=64 time=0.122 ms
64 bytes from 172.17.8.1: icmp_seq=2 ttl=64 time=0.051 ms
^C
--- 172.17.8.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.051/0.086/0.122/0.036 ms
测试 ping 本机宿主机 ip
root@f00161eaa2f6:/# ping 192.168.43.222
PING 192.168.43.222 (192.168.43.222) 56(84) bytes of data.
64 bytes from 192.168.43.222: icmp_seq=1 ttl=64 time=0.061 ms
64 bytes from 192.168.43.222: icmp_seq=2 ttl=64 time=0.044 ms
64 bytes from 192.168.43.222: icmp_seq=3 ttl=64 time=0.047 ms
64 bytes from 192.168.43.222: icmp_seq=4 ttl=64 time=0.045 ms
64 bytes from 192.168.43.222: icmp_seq=5 ttl=64 time=0.060 ms
^C
--- 192.168.43.222 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4083ms
rtt min/avg/max/mdev = 0.044/0.051/0.061/0.009 ms
测试 ping 其它宿主机 ip
root@f00161eaa2f6:/# ping 192.168.43.161
PING 192.168.43.161 (192.168.43.161) 56(84) bytes of data.
64 bytes from 192.168.43.161: icmp_seq=1 ttl=63 time=0.471 ms
64 bytes from 192.168.43.161: icmp_seq=2 ttl=63 time=0.305 ms
64 bytes from 192.168.43.161: icmp_seq=3 ttl=63 time=0.331 ms
64 bytes from 192.168.43.161: icmp_seq=4 ttl=63 time=0.297 ms
64 bytes from 192.168.43.161: icmp_seq=5 ttl=63 time=0.337 ms
^C
--- 192.168.43.161 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4098ms
测试 ping 跨主机容器
rtt min/avg/max/mdev = 0.297/0.348/0.471/0.064 ms
root@f00161eaa2f6:/# ping 172.17.5.2
PING 172.17.5.2 (172.17.5.2) 56(84) bytes of data.
64 bytes from 172.17.5.2: icmp_seq=1 ttl=60 time=0.532 ms
都可以正常工作,说明完成三层 udp overlay 网络搭建,另外从 ping 耗时可以看到,这个架构性能真差。
8. 抓包测试
分别在不同宿主机上,抓 flannel0 网卡数据包和 enp0s3 物理 ip
root@ubuntu2:~# tcpdump -e -n -v -i flannel0
tcpdump: listening on flannel0, link-type RAW (Raw IP), capture size 262144 bytes
09:40:28.999156 ip: (tos 0x0, ttl 63, id 15660, offset 0, flags [DF], proto ICMP (1), length 84)
172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 1, length 64
09:40:28.999920 ip: (tos 0x0, ttl 62, id 52829, offset 0, flags [none], proto ICMP (1), length 84)
172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 1, length 64
09:40:29.999859 ip: (tos 0x0, ttl 63, id 15859, offset 0, flags [DF], proto ICMP (1), length 84)
172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 2, length 64
09:40:30.000353 ip: (tos 0x0, ttl 62, id 52969, offset 0, flags [none], proto ICMP (1), length 84)
172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 2, length 64
root@ubuntu1:~# tcpdump -e -n -v -i flannel0
tcpdump: listening on flannel0, link-type RAW (Raw IP), capture size 262144 bytes
09:40:28.897354 ip: (tos 0x0, ttl 61, id 15660, offset 0, flags [DF], proto ICMP (1), length 84)
172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 1, length 64
09:40:28.897380 ip: (tos 0x0, ttl 64, id 52829, offset 0, flags [none], proto ICMP (1), length 84)
172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 1, length 64
09:40:29.897971 ip: (tos 0x0, ttl 61, id 15859, offset 0, flags [DF], proto ICMP (1), length 84)
172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 2, length 64
09:40:29.897989 ip: (tos 0x0, ttl 64, id 52969, offset 0, flags [none], proto ICMP (1), length 84)
172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 2, length 64
09:40:30.899467 ip: (tos 0x0, ttl 61, id 16091, offset 0, flags [DF], proto ICMP (1), length 84)
172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 3, length 64
09:40:30.899500 ip: (tos 0x0, ttl 64, id 53127, offset 0, flags [none], proto ICMP (1), length 84)
172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 3, length 64
可以看到,在两台测试机上 flannel0 网卡的表现就像真实互联的一样,其实这部份网络包,就是 flanneld
程序解析后分别转发到 flannel0 网卡上的
root@ubuntu2:~# tcpdump -e -n -v -i enp0s3
09:45:31.075384 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 16455, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:45:31.075872 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 50966, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:45:31.076561 08:00:27:c5:a1:4f > 38:f9:d3:2e:a1:6f, ethertype IPv4 (0x0800), length 166: (tos 0x10, ttl 64, id 62105, offset 0, flags [DF], proto TCP (6), length 152)
09:45:31.076761 38:f9:d3:2e:a1:6f > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 66: (tos 0x48, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
09:45:32.097916 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 16597, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:45:32.098376 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 51097, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:45:32.099075 08:00:27:c5:a1:4f > 38:f9:d3:2e:a1:6f, ethertype IPv4 (0x0800), length 166: (tos 0x10, ttl 64, id 62106, offset 0, flags [DF], proto TCP (6), length 152)
root@ubuntu1:~# tcpdump -e -n -v -i enp0s3
09:44:14.088291 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33452, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:15.114805 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1395, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:44:15.114923 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33515, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:16.132893 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1617, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:44:16.133017 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33662, offset 0, flags [DF], proto UDP (17), length 112)
192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:17.156818 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1872, offset 0, flags [DF], proto UDP (17), length 112)
分别抓包物理网卡,可以看到真正的 overlay 数据包是由 flanneld
程序通过 8285 端口传送的。
小结
ks8 网络解决方案 这篇文章有测试,udp overlay 性能是最差的实现方式,所以线上不会使用。接下来测试 vxlan 实现的 overlay
网友评论