一次,服务器上面执行docker ps,结果报错
Get http:///var/run/docker.sock/v1.19/containers/json: dial unix /var/run/docker.sock: resource temporarily unavailable. Are you trying to connect to a TLS-enabled daemon without TLS?
执行docker exec -it mysql bash,同样也是报错
Post http:///var/run/docker.sock/v1.19/containers/mysql/exec: dial unix /var/run/docker.sock: resource temporarily unavailable. Are you trying to connect to a TLS-enabled daemon without TLS?
查看一下docker的日志 /var/log/docker
time="2020-04-21T19:31:49.322504775+08:00" level=error msg="collecting system cpu usage: open /proc/stat: too many open files"
2020/04/21 19:31:49 http: Accept error: accept unix /var/run/docker.sock: too many open files; retrying in 1s
从日志上的信息,初步怀疑,是同时有太多的进程在访问/var/run/docker.sock,导致它无法被访问了
进一步,查看哪些进程访问了docker.sock
lsof /var/run/docker.sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
docker 1587 root 0r unix 0xffff8801fe3e3040 0t0 143125 /var/run/docker.sock
docker 1587 root 4u unix 0xffff88041856f840 0t0 10181 /var/run/docker.sock
docker 1587 root 12u unix 0xffff8804166ad540 0t0 143025 /var/run/docker.sock
docker 1587 root 13u unix 0xffff880417497040 0t0 142902 /var/run/docker.sock
....
可以看到有900多条访问docker.sock的记录
ps -Af | grep docker
里面有几条可疑的记录
root 93121 1528 0 20:43 ? 00:00:00 /usr/bin/python /etc/zabbix/zabbix_agentd.d/docker.py trial_redis NetIO
root 93122 93121 0 20:43 ? 00:00:00 docker stats trial_redis --no-stream --format {{.NetIO}}
原因找到了,是zabbix调用的一个docker.py的监控脚本有bug,不停的通过docker stats查看容器的状态。由于部分容器退出了,脚本处理上有问题,还在反复的访问,导致docker.sock打开次数过多
网友评论