问题描述
zuul1.x使用过程中,偶尔会出现failed to respond的异常信息,对应的异常为httpclient 的 NoHttpResponseException。
httpclient 本身在使用线程池时,偶尔也会出现这个异常。
问题原因
一个经典的阻塞 I/O 模型的主要缺点是网络套接字仅当 I/O 操作阻塞时才可以响应 I/O 事件。当一个连接被释放返回管理器时,它可以被保持活动状态而却不能监控套接字的状态和响应任何 I/O 事件。如果连接在服务器端关闭,那么客户端连接也不能去侦测连接状态中的变化和关闭本端的套接字去作出适当响应。
解决方案
解决方案就是减少 idle的连接存活的时间,zuul 本身在SimpleHostRoutingFilter.java
里通过connectionManagerTimer 定时器来关闭超过time-to-live的连接。默认是delay 30s开始检查, 每5s 执行一次。
zuul作为client方连接nginx代理的服务,对应的设置为:
zuul.host.max-per-route-connections=600
zuul.host.socket-timeout-millis=10000
zuul.host.connect-timeout-millis=10000
zuul.host.time-to-live=600000
而nginx设置的keepalive_timeout=180s,所以当空闲达到180s时,服务端已经断开了连接。这时候httpclient在拿到pool里的connection时就有可能出现NoHttpResponseException。
应该将zuul.host.time-to-live设置为 <= nginx的keepalive_timeout。
另外zuul.host.max-per-route-connections应该根据实际情况分配,不宜过大。
而且经过测试,只是偶尔会出现,httpclient内部也在使用各种机制保证connection在断开之后能继续重连。但是在某些特殊情况下,socket 的 write 操作会在毫无异常返回的情况下失败,Httpclient 无法处理这种失败,导致无法解析response。
请参考原文:
Most likely persistent connections that are kept alive by the connection manager become stale. That is, the target server shuts down the connection on its end without HttpClient being able to react to that event, while the connection is being idle, thus rendering the connection half-closed or 'stale'. Usually this is not a problem. HttpClient employs several techniques to verify connection validity upon its lease from the pool. Even if the stale connection check is disabled and a stale connection is used to transmit a request message the request execution usually fails in the write operation with SocketException and gets automatically retried. However under some circumstances the write operation can terminate without an exception and the subsequent read operation returns -1 (end of stream). In this case HttpClient has no other choice but to assume the request succeeded but the server failed to respond most likely due to an unexpected error on the server side. The simplest way to remedy the situation is to evict expired connections and connections that have been idle longer than, say, 1 minute from the pool after a period of inactivity.
网友评论