美文网首页
docker 学习笔记5:flannel udp 实现的 ove

docker 学习笔记5:flannel udp 实现的 ove

作者: 董泽润 | 来源:发表于2019-12-26 13:24 被阅读0次

TL;DR 为了实现 k8s 对网络的要求,可以通过 host-gw 给宿主机添加路由的方式实现,但是仅局限于二层互联的网络,所以一般都用三层的 overlay 网络,本文实验基于 udp 测试 overlay

什么是 flannel

flannel 是一个三层的网络解决方案,每个宿主机上运行一个 flanneld 程序,并且设置 flannel0 网卡,从配置中心 (etcd) 或是 k8s api 获取当前宿主机的容器子网,根据子网设置 docker0 网桥,来实现跨主机的容器网络通信。当然 flannel 只是个框加,背后的具体实现 backend 有很多种:host-gw, vxlan, udp,还有其它云厂商提商的解决方案,比如 alicloud-vpc-backend, aws-vpc-backend, google gce-backend

udp overlay

三层网络的实现有很多种,在不支持 vxlan 的内核中,udp overlay 就是一种实现方式,但是因为性能太差,己经被弃用了,仅用于调试。


udp overlay

上图就是 udp overlay 实现的拓扑结构,网络数据包不会自己飞的,无论是否是 overlay/underlay, 都要老老实实的走过完整的网络协义。比如 172.17.8.2 去 ping 172.17.5.2

  • 容器 172.17.8.2 数据包发送到本机网关, docker0 网桥 172.17.8.1
  • docker0 接到数据包后,查看宿主机路由,172.17.0.0/16 的要发送到网卡 flannel0
  • flannel0 网卡的数据由用户态的 flanneld 程序接收后,封装成 udp payload 数据
  • 走宿主机路由,将 udp 数据发送到 192.168.43.161:8285,注意这个端口就是 flanneld 程序的监听端口
  • 192.168.43.161 宿主机 flanneld 解析 udp 包后,将 payload 数据发送到 flannel0 网卡,最后到容器 172.17.5.2,完成容器互通

实验测试

实验都是手工操作,没有使用 k8s,也没有压测数据,不过从 ping 的结果看,udp 实现的 overlay 网络确实不行,而且 flanneld 进程还是 go 写的大压力情况下应该不行。

1. 启动 etcd

这一点不得不吐槽,flannel 居然用的还是 etcd v2 协义,现在主流都是 v3 了。

/usr/bin/etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379

默认 etcd 启动只监听回环接口,所以需要写成 0.0.0.0 或是指定的,然后配置 flannel 大网段

etcdctl set /coreos.com/network/config '{ "Network": "172.17.0.0/16", "Backend": {"Type": "udp"}}'

2. 启动 flanneld

下载 flanneld 当前版本是 0.11.0

wget https://github.com/coreos/flannel/releases/download/v0.11.0/flanneld-amd64 && chmod +x flanneld-amd64

然后启动 flanneld

./flanneld-amd64 -etcd-endpoints=http://192.168.43.161:2379 -etcd-prefix=/coreos.com/network -v=3 -etcd-username="" > /var/log/flanneld 2>&1 &

默认 -etcd-prefix 就是 /coreos.com/network,可以不指定。然后查看启动日志,另外也可以查看 etcd 能看到具体配置,写了哪些数据。

root@ubuntu2:~# tail -f /var/log/flanneld
I1225 09:03:36.654421   23981 main.go:317] Wrote subnet file to /run/flannel/subnet.env
I1225 09:03:36.654441   23981 main.go:321] Running backend.
I1225 09:03:36.654582   23981 udp_network_amd64.go:100] Watching for new subnet leases
I1225 09:03:36.656860   23981 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I1225 09:03:36.656880   23981 iptables.go:167] Deleting iptables rule: -s 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.657763   23981 iptables.go:167] Deleting iptables rule: -d 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.659096   23981 iptables.go:155] Adding iptables rule: -s 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.662668   23981 iptables.go:155] Adding iptables rule: -d 172.17.0.0/16 -j ACCEPT
I1225 09:03:36.667833   23981 udp_network_amd64.go:195] Subnet added: 172.17.5.0/24
I1225 09:03:36.669084   23981 main.go:429] Waiting for 22h59m59.902809261s to renew lease

5. 配置 docker 子网

启动 flanneld 后会看到当前多了一个 flannel0 网卡

root@ubuntu1:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:50:03:fc brd ff:ff:ff:ff:ff:ff
    inet 192.168.43.161/24 brd 192.168.43.255 scope global dynamic enp0s3
       valid_lft 2848sec preferred_lft 2848sec
    inet6 2409:8900:1d61:462e:a00:27ff:fe50:3fc/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 3571sec preferred_lft 3571sec
    inet6 fe80::a00:27ff:fe50:3fc/64 scope link
       valid_lft forever preferred_lft forever
6: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none
    inet 172.17.5.0/32 scope global flannel0
       valid_lft forever preferred_lft forever
    inet6 fe80::66f8:e24e:8ce8:24f0/64 scope link stable-privacy
       valid_lft forever preferred_lft forever

然后会生成一个关于本机 docker 子网的配置文件

root@ubuntu1:~# cat /run/flannel/subnet.env
FLANNEL_NETWORK=172.17.0.0/16
FLANNEL_SUBNET=172.17.5.1/24
FLANNEL_MTU=1472
FLANNEL_IPMASQ=false

这个就是当前 docker 子网应配置 docker0 网桥的 ip 设置,官方 flannel github 有一个脚本 mk-docker-opts.sh,可以自行去下载,用这个脚本生成 docker ops 配置

root@ubuntu1:~# ./mk-docker-opts.sh -i
root@ubuntu1:~# cat /run/docker_opts.env
DOCKER_OPT_BIP="--bip=172.17.5.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=true"
DOCKER_OPT_MTU="--mtu=1472"

其实这都是一一对应的,不用脚本自己写也行,但是这个 /run/docker_opts.env 要放到 docker 启动文件里的。

root@ubuntu1:~# cat /lib/systemd/system/docker.service
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
EnvironmentFile=/run/docker_opts.env
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd $DOCKER_OPT_BIP $DOCKER_OPT_IPM
ASQ $DOCKER_OPT_MTU

当前使用 systemctl 来管理服务,所以添加 EnvironmentFile=/run/docker_opts.env 到 [Service] 下面,并且将刚才的配置放到 dockerd 启动命令后面,然后启动 docker,会发现 docker0 ip 己正确设置

root@ubuntu1:~# systemctl start docker
root@ubuntu1:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:50:03:fc brd ff:ff:ff:ff:ff:ff
    inet 192.168.43.161/24 brd 192.168.43.255 scope global dynamic enp0s3
       valid_lft 2603sec preferred_lft 2603sec
    inet6 2409:8900:1d61:462e:a00:27ff:fe50:3fc/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 3239sec preferred_lft 3239sec
    inet6 fe80::a00:27ff:fe50:3fc/64 scope link
       valid_lft forever preferred_lft forever
8: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none
    inet 172.17.5.0/32 scope global flannel0
       valid_lft forever preferred_lft forever
    inet6 fe80::ca56:6554:961b:eb4d/64 scope link stable-privacy
       valid_lft forever preferred_lft forever
9: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:27:60:6b:cd brd ff:ff:ff:ff:ff:ff
    inet 172.17.5.1/24 brd 172.17.5.255 scope global docker0
       valid_lft forever preferred_lft forever

6. 启动容器

分别在两台测试机上启动容器

root@ubuntu2:~# docker run -it myubuntu /bin/bash

7. 容器互 ping

查看当前测试容器 ip

root@f00161eaa2f6:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:08:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.8.2/24 brd 172.17.8.255 scope global eth0
       valid_lft forever preferred_lft forever

测试 ping 本机 docker0 网桥

root@f00161eaa2f6:/# ping 172.17.8.1
PING 172.17.8.1 (172.17.8.1) 56(84) bytes of data.
64 bytes from 172.17.8.1: icmp_seq=1 ttl=64 time=0.122 ms
64 bytes from 172.17.8.1: icmp_seq=2 ttl=64 time=0.051 ms
^C
--- 172.17.8.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.051/0.086/0.122/0.036 ms

测试 ping 本机宿主机 ip

root@f00161eaa2f6:/# ping 192.168.43.222
PING 192.168.43.222 (192.168.43.222) 56(84) bytes of data.
64 bytes from 192.168.43.222: icmp_seq=1 ttl=64 time=0.061 ms
64 bytes from 192.168.43.222: icmp_seq=2 ttl=64 time=0.044 ms
64 bytes from 192.168.43.222: icmp_seq=3 ttl=64 time=0.047 ms
64 bytes from 192.168.43.222: icmp_seq=4 ttl=64 time=0.045 ms
64 bytes from 192.168.43.222: icmp_seq=5 ttl=64 time=0.060 ms
^C
--- 192.168.43.222 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4083ms
rtt min/avg/max/mdev = 0.044/0.051/0.061/0.009 ms

测试 ping 其它宿主机 ip

root@f00161eaa2f6:/# ping 192.168.43.161
PING 192.168.43.161 (192.168.43.161) 56(84) bytes of data.
64 bytes from 192.168.43.161: icmp_seq=1 ttl=63 time=0.471 ms
64 bytes from 192.168.43.161: icmp_seq=2 ttl=63 time=0.305 ms
64 bytes from 192.168.43.161: icmp_seq=3 ttl=63 time=0.331 ms
64 bytes from 192.168.43.161: icmp_seq=4 ttl=63 time=0.297 ms
64 bytes from 192.168.43.161: icmp_seq=5 ttl=63 time=0.337 ms
^C
--- 192.168.43.161 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4098ms

测试 ping 跨主机容器

rtt min/avg/max/mdev = 0.297/0.348/0.471/0.064 ms
root@f00161eaa2f6:/# ping 172.17.5.2
PING 172.17.5.2 (172.17.5.2) 56(84) bytes of data.
64 bytes from 172.17.5.2: icmp_seq=1 ttl=60 time=0.532 ms

都可以正常工作,说明完成三层 udp overlay 网络搭建,另外从 ping 耗时可以看到,这个架构性能真差。

8. 抓包测试

分别在不同宿主机上,抓 flannel0 网卡数据包和 enp0s3 物理 ip

root@ubuntu2:~# tcpdump -e -n -v -i flannel0
tcpdump: listening on flannel0, link-type RAW (Raw IP), capture size 262144 bytes
09:40:28.999156 ip: (tos 0x0, ttl 63, id 15660, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 1, length 64
09:40:28.999920 ip: (tos 0x0, ttl 62, id 52829, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 1, length 64
09:40:29.999859 ip: (tos 0x0, ttl 63, id 15859, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 2, length 64
09:40:30.000353 ip: (tos 0x0, ttl 62, id 52969, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 2, length 64
root@ubuntu1:~# tcpdump -e -n -v -i flannel0
tcpdump: listening on flannel0, link-type RAW (Raw IP), capture size 262144 bytes
09:40:28.897354 ip: (tos 0x0, ttl 61, id 15660, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 1, length 64
09:40:28.897380 ip: (tos 0x0, ttl 64, id 52829, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 1, length 64
09:40:29.897971 ip: (tos 0x0, ttl 61, id 15859, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 2, length 64
09:40:29.897989 ip: (tos 0x0, ttl 64, id 52969, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 2, length 64
09:40:30.899467 ip: (tos 0x0, ttl 61, id 16091, offset 0, flags [DF], proto ICMP (1), length 84)
    172.17.8.0 > 172.17.5.1: ICMP echo request, id 75, seq 3, length 64
09:40:30.899500 ip: (tos 0x0, ttl 64, id 53127, offset 0, flags [none], proto ICMP (1), length 84)
    172.17.5.1 > 172.17.8.0: ICMP echo reply, id 75, seq 3, length 64

可以看到,在两台测试机上 flannel0 网卡的表现就像真实互联的一样,其实这部份网络包,就是 flanneld 程序解析后分别转发到 flannel0 网卡上的

root@ubuntu2:~# tcpdump -e -n -v -i enp0s3
09:45:31.075384 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 16455, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:45:31.075872 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 50966, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:45:31.076561 08:00:27:c5:a1:4f > 38:f9:d3:2e:a1:6f, ethertype IPv4 (0x0800), length 166: (tos 0x10, ttl 64, id 62105, offset 0, flags [DF], proto TCP (6), length 152)
09:45:31.076761 38:f9:d3:2e:a1:6f > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 66: (tos 0x48, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
09:45:32.097916 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 16597, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:45:32.098376 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 51097, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:45:32.099075 08:00:27:c5:a1:4f > 38:f9:d3:2e:a1:6f, ethertype IPv4 (0x0800), length 166: (tos 0x10, ttl 64, id 62106, offset 0, flags [DF], proto TCP (6), length 152)
root@ubuntu1:~# tcpdump -e -n -v -i enp0s3
09:44:14.088291 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33452, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:15.114805 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1395, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:44:15.114923 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33515, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:16.132893 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1617, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.222.8285 > 192.168.43.161.8285: UDP, length 84
09:44:16.133017 08:00:27:50:03:fc > 08:00:27:c5:a1:4f, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 33662, offset 0, flags [DF], proto UDP (17), length 112)
    192.168.43.161.8285 > 192.168.43.222.8285: UDP, length 84
09:44:17.156818 08:00:27:c5:a1:4f > 08:00:27:50:03:fc, ethertype IPv4 (0x0800), length 126: (tos 0x0, ttl 64, id 1872, offset 0, flags [DF], proto UDP (17), length 112)

分别抓包物理网卡,可以看到真正的 overlay 数据包是由 flanneld 程序通过 8285 端口传送的。

小结

ks8 网络解决方案 这篇文章有测试,udp overlay 性能是最差的实现方式,所以线上不会使用。接下来测试 vxlan 实现的 overlay

相关文章

网友评论

      本文标题:docker 学习笔记5:flannel udp 实现的 ove

      本文链接:https://www.haomeiwen.com/subject/ohdkoctx.html