一、什么是TCP重传?
在发送一个数据之后,就开启一个定时器,若是在这个时间内没有收到发送数据的ACK确认报文,则对该报文进行重传。
TCP重传率:重新发送信息的与全部的调用信息之间的比值。
二、TCP重传率高的可能原因
发生重传说明网络传输有丢包,基本上从3个点去定位: 客户端网络情况、服务端网络情况、中间链路网络情况。
1. 客户端机器网络异常
2.服务端网卡流量跑满,网卡有丢包现象,关注ifconfig的error输出
3.中间网络连路拥塞,比如交换机上联、核心交换机链路等,需要逐个排查链路流量情况
三、数据包相关统计
# netstat -s
# which nstat
# rpm -qf /usr/sbin/nstat
iproute-4.11.0-14.el7.x86_64
# nstat -a | grep -i Segs
# cat /proc/net/snmp
这些统计值反应的也是历史状态,独立的来看意义并不大。
一般可统计一段时间内的变化,关注以下几个指标:
1. (发送)TCP 分段重传占比:ΔRetransSegs / ΔOutSegs ;
该值越小越好,如果超过 20% 则应该引起注意(这个值根据实际情况而定);
2. (发送)RST 分段占比:ΔOutRsts / ΔOutSegs ;
该值越小越好,一般应该在 1% 以内;
3. (接收)错误分段占比:ΔInErrs / ΔInSegs ;
该值越小越好,一般应该在 1% 以内,同时由 checksum 导致的问题包应该更低;
# awk 'BEGIN {OFS=" "} $1 ~ /Tcp:/ && $2 !~ /RtoAlgorithm/ {print "InSegs\t",$11,"\nOutSegs\t",$12,"\nRetransSegs\t",$13,"\nPctReTrans\t",($13/$12*100)}' /proc/net/snmp
四、重传率计算脚本
# check_tcp_retrans_rate.sh
#########################################################
#!/bin/bash
awk 'BEGIN {OFS=" "} $1 ~ /Tcp:/ && $2 !~ /RtoAlgorithm/ {print "OutSegs\t",$12,"\nRetransSegs\t",$13}' /proc/net/snmp
out_segs_1=`awk 'BEGIN {OFS=" "} $1 ~ /Tcp:/ && $2 !~ /RtoAlgorithm/ {print $12}' /proc/net/snmp`
retrans_segs_1=`awk 'BEGIN {OFS=" "} $1 ~ /Tcp:/ && $2 !~ /RtoAlgorithm/ {print $13}' /proc/net/snmp`
sleep 60
out_segs_2=`awk 'BEGIN {OFS=" "} $1 ~ /Tcp:/ && $2 !~ /RtoAlgorithm/ {print $12}' /proc/net/snmp`
retrans_segs_2=`awk 'BEGIN {OFS=" "} $1 ~ /Tcp:/ && $2 !~ /RtoAlgorithm/ {print $13}' /proc/net/snmp`
out_segs=$((${out_segs_2} - ${out_segs_1}))
retrans_segs=$((${retrans_segs_2} - ${retrans_segs_1}))
tcp_retrans_rate=$(($retrans_segs/$out_segs))
echo ${tcp_retrans_rate}
#########################################################
五、参考
How passively monitor for tcp packet loss? (Linux)
https://serverfault.com/questions/318909/how-passively-monitor-for-tcp-packet-loss-linux
Linux/TCP 相关统计信息详解.md
https://github.com/moooofly/MarkSomethingDown/blob/master/Linux/TCP%20%E7%9B%B8%E5%85%B3%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF%E8%AF%A6%E8%A7%A3.md
Is there documentation for /proc/net/netstat and /proc/net/snmp?
https://unix.stackexchange.com/questions/435579/is-there-documentation-for-proc-net-netstat-and-proc-net-snmp
Linux性能监控 - CPU、Memory、IO、Network
https://www.cnblogs.com/insane-Mr-Li/p/11209076.html
Linux环境 网络流量统计/proc/net/dev和/proc/net/snmp
https://blog.csdn.net/paradox_1_0/article/details/109175339
Hands-on Guide for Linux /proc file and folders
https://www.coding-bootcamps.com/linux/filesystem/proc.html
Linux计算TCP重传率
http://www.opstool.com/article/309
https://s905060.gitbooks.io/site-reliability-engineer-handbook/content/linux_tcp_retransmission_rate_calculation.html
TCP重传率高的监控
https://blog.csdn.net/universsky2015/article/details/101316686
如何被动监控tcp数据包丢失?(Linux)
https://qastack.cn/server/318909/how-passively-monitor-for-tcp-packet-loss-linux
shell 实现 TCP 重传率监测
https://octocat9lee.github.io/2021/03/25/shell-%E5%AE%9E%E7%8E%B0-TCP-%E9%87%8D%E4%BC%A0%E7%8E%87%E7%9B%91%E6%B5%8B
/proc/net/snmp
https://blog.titanwolf.in/a?ID=00500-531f4315-916d-41c1-a007-7d5a8649934f
Checking for TCP/IP packet loss
https://www.clouddirect.net/knowledge-base/KB0011163/checking-for-tcpip-packet-loss
Linux network metrics: why you should use nstat instead of netstat
https://loicpefferkorn.net/2016/03/linux-network-metrics-why-you-should-use-nstat-instead-of-netstat
网友评论