美文网首页
网关超时等待引发的血案

网关超时等待引发的血案

作者: 04040d1599e6 | 来源:发表于2018-11-10 16:40 被阅读0次

    网关为ZUUL

    事由:

    2018年的1024。本该是一个和平,宁静的周三,本该是一个享受美女按摩的节日~
    然鹅还没有开始上班,噩梦就已经开始。
    早上八点半到公司,闲着没事儿逛Kr,突然业务群反馈用户使用APP请求一直在转圈。

    半小时过去了,定位到问题—— API网关服务挂了。
    网关服务挂了……网关服务……挂了。。。
    jstack一查。250个线程在跑。100个线程挂起。
    链路日志一查,一大半的请求超过10分钟。。 10分钟???WTF?你是凯丁吗?

    最后定位到某个服务调用疯狂超时,导致其他服务一直等待线程资源。

    分析


    首先不能理解的是,为什么又那么多执行在10分钟的(调用日志从调用开始到调用结束),而且还超时了。因为网关服务又设置超时时间60s。

    zuul.host.socket-timeout-millis=60000
    zuul.host.connect-timeout-millis=2000
    

    然后在服务的nginx上也有超时时间设置 75s
    可是程序里面记录的时间不可能骗人啊。一定有别的地方在消耗这个时间。具体是哪儿呢。

    疯狂跟踪代码之后发现了。
    时间耗在了等待线程上面。没错,耗费在了等待线程上面。

    SimpleHostRoutingFilter->run()->forward()->forwardRequest()->CloseableHttpClient.execute()->InternalHttpClient.doExecute()
    在InternalHttpClient会创建或者获取RequestConfig, 如果没有获取到RequestConfig就会使用HttpClient的defaultConfig ,通过setupContext
    入下:

            try {
                final HttpRequestWrapper wrapper = HttpRequestWrapper.wrap(request, target);
                final HttpClientContext localcontext = HttpClientContext.adapt(
                        context != null ? context : new BasicHttpContext());
                RequestConfig config = null;
                if (request instanceof Configurable) {
                    config = ((Configurable) request).getConfig();
                }
    //如果Config为空就会创建一个基础的Config
                if (config == null) {
                    final HttpParams params = request.getParams();
                    if (params instanceof HttpParamsNames) {
                        if (!((HttpParamsNames) params).getNames().isEmpty()) {
                            config = HttpClientParamConfig.getRequestConfig(params, this.defaultConfig);
                        }
                    } else {
                        config = HttpClientParamConfig.getRequestConfig(params, this.defaultConfig);
                    }
                }
                if (config != null) {
                    localcontext.setRequestConfig(config);
                }
                //设置HTTPClient的默认属性到HttpRequestConfig
                setupContext(localcontext);
                final HttpRoute route = determineRoute(target, wrapper, localcontext);
                return this.execChain.execute(route, wrapper, localcontext, execAware);
            } catch (final HttpException httpException) {
                throw new ClientProtocolException(httpException);
            }
    
        private void setupContext(final HttpClientContext context) {
            if (context.getAttribute(HttpClientContext.TARGET_AUTH_STATE) == null) {
                context.setAttribute(HttpClientContext.TARGET_AUTH_STATE, new AuthState());
            }
            if (context.getAttribute(HttpClientContext.PROXY_AUTH_STATE) == null) {
                context.setAttribute(HttpClientContext.PROXY_AUTH_STATE, new AuthState());
            }
            if (context.getAttribute(HttpClientContext.AUTHSCHEME_REGISTRY) == null) {
                context.setAttribute(HttpClientContext.AUTHSCHEME_REGISTRY, this.authSchemeRegistry);
            }
            if (context.getAttribute(HttpClientContext.COOKIESPEC_REGISTRY) == null) {
                context.setAttribute(HttpClientContext.COOKIESPEC_REGISTRY, this.cookieSpecRegistry);
            }
            if (context.getAttribute(HttpClientContext.COOKIE_STORE) == null) {
                context.setAttribute(HttpClientContext.COOKIE_STORE, this.cookieStore);
            }
            if (context.getAttribute(HttpClientContext.CREDS_PROVIDER) == null) {
                context.setAttribute(HttpClientContext.CREDS_PROVIDER, this.credentialsProvider);
            }
          //重点,设置默认的Config 
            if (context.getAttribute(HttpClientContext.REQUEST_CONFIG) == null) {
                context.setAttribute(HttpClientContext.REQUEST_CONFIG, this.defaultConfig);
            }
        }
    

    由于传入的contxt为null 所以会创建一个BasicHttpContext
    接着
    RedirectExec.execute
    RetryExec.execute
    ProtocolExec.execute
    最终通过执行MainClientExec.execute
    从连接池获取连接

            try {
                //获取连接超时时间
                final int timeout = config.getConnectionRequestTimeout();
                //获取连接
                managedConn = connRequest.get(timeout > 0 ? timeout : 0, TimeUnit.MILLISECONDS);
            } catch(final InterruptedException interrupted) {
                Thread.currentThread().interrupt();
                throw new RequestAbortedException("Request aborted", interrupted);
            } catch(final ExecutionException ex) {
                Throwable cause = ex.getCause();
                if (cause == null) {
                    cause = ex;
                }
                throw new RequestAbortedException("Request execution failed", cause);
            }
    

    获取连接
    PoolingHttpClientConnectionManager.leaseConnection()
    AbstractConnPool.get

                @Override
                public E get(final long timeout, final TimeUnit tunit) throws InterruptedException, ExecutionException, TimeoutException {
                    if (entry != null) {
                        return entry;
                    }
                    synchronized (this) {
                        try {
                            for (;;) {
                                //阻塞获取连接资源
                                final E leasedEntry = getPoolEntryBlocking(route, state, timeout, tunit, this);
                                if (validateAfterInactivity > 0)  {
                                    if (leasedEntry.getUpdated() + validateAfterInactivity <= System.currentTimeMillis()) {
                                        if (!validate(leasedEntry)) {
                                            leasedEntry.close();
                                            release(leasedEntry, false);
                                            continue;
                                        }
                                    }
                                }
                                entry = leasedEntry;
                                done = true;
                                onLease(entry);
                                if (callback != null) {
                                    callback.completed(entry);
                                }
                                return entry;
                            }
                        } catch (IOException ex) {
                            done = true;
                            if (callback != null) {
                                callback.failed(ex);
                            }
                            throw new ExecutionException(ex);
                        }
                    }
                }
    
            };
    
    

    AbstractConnPool.getPoolEntryBlocking
    看这个名字就知道。这是一个阻塞获取池资源的方法
    注意 高能来了。

        private E getPoolEntryBlocking(
                final T route, final Object state,
                final long timeout, final TimeUnit tunit,
                final Future<E> future) throws IOException, InterruptedException, TimeoutException {
    
            Date deadline = null;
            if (timeout > 0) {
                deadline = new Date (System.currentTimeMillis() + tunit.toMillis(timeout));
            }
            this.lock.lock();
            try {
                final RouteSpecificPool<T, C, E> pool = getPool(route);
                E entry;
                for (;;) {
                    Asserts.check(!this.isShutDown, "Connection pool shut down");
                    for (;;) {
                        entry = pool.getFree(state);
                        if (entry == null) {
                            break;
                        }
                        if (entry.isExpired(System.currentTimeMillis())) {
                            entry.close();
                        }
                        if (entry.isClosed()) {
                            this.available.remove(entry);
                            pool.free(entry, false);
                        } else {
                            break;
                        }
                    }
                    if (entry != null) {
                        this.available.remove(entry);
                        this.leased.add(entry);
                        onReuse(entry);
                        return entry;
                    }
    
                    // New connection is needed
                    final int maxPerRoute = getMax(route);
                    // Shrink the pool prior to allocating a new connection
                    final int excess = Math.max(0, pool.getAllocatedCount() + 1 - maxPerRoute);
                    if (excess > 0) {
                        for (int i = 0; i < excess; i++) {
                            final E lastUsed = pool.getLastUsed();
                            if (lastUsed == null) {
                                break;
                            }
                            lastUsed.close();
                            this.available.remove(lastUsed);
                            pool.remove(lastUsed);
                        }
                    }
    
                    if (pool.getAllocatedCount() < maxPerRoute) {
                        final int totalUsed = this.leased.size();
                        final int freeCapacity = Math.max(this.maxTotal - totalUsed, 0);
                        if (freeCapacity > 0) {
                            final int totalAvailable = this.available.size();
                            if (totalAvailable > freeCapacity - 1) {
                                if (!this.available.isEmpty()) {
                                    final E lastUsed = this.available.removeLast();
                                    lastUsed.close();
                                    final RouteSpecificPool<T, C, E> otherpool = getPool(lastUsed.getRoute());
                                    otherpool.remove(lastUsed);
                                }
                            }
                            final C conn = this.connFactory.create(route);
                            entry = pool.add(conn);
                            this.leased.add(entry);
                            return entry;
                        }
                    }
    
                    boolean success = false;
                    try {
                        if (future.isCancelled()) {
                            throw new InterruptedException("Operation interrupted");
                        }
                        pool.queue(future);
                        this.pending.add(future);
                        if (deadline != null) {
                            success = this.condition.awaitUntil(deadline);
                        } else {
                            this.condition.await();
                            success = true;
                        }
                        if (future.isCancelled()) {
                            throw new InterruptedException("Operation interrupted");
                        }
                    } finally {
                        // In case of 'success', we were woken up by the
                        // connection pool and should now have a connection
                        // waiting for us, or else we're shutting down.
                        // Just continue in the loop, both cases are checked.
                        pool.unqueue(future);
                        this.pending.remove(future);
                    }
                    // check for spurious wakeup vs. timeout
                    if (!success && (deadline != null && deadline.getTime() <= System.currentTimeMillis())) {
                        break;
                    }
                }
                throw new TimeoutException("Timeout waiting for connection");
            } finally {
                this.lock.unlock();
            }
        }
    

    这段代码有点长。分开来分析一下这个获取池资源的代码:
    1.代码已建立有一个deadline ,然后判断timeout ,这个timeout要注意。如果大于零才会赋值deadline, 如果为0 则不会赋值deadline 也就是说deadline始终为null

            Date deadline = null;
            if (timeout > 0) {
                //如果超时时间有效,则设定deadline
                deadline = new Date (System.currentTimeMillis() + tunit.toMillis(timeout));
            }
    
    

    2.进入锁代码。pool.getFree 获取池资源。如果获取到了,并且Connect的检验并没有被关闭,则直接return entry

                    Asserts.check(!this.isShutDown, "Connection pool shut down");
                    for (;;) {
                        //获取池资源
                        entry = pool.getFree(state);
                        if (entry == null) {
                            break;
                        }
                        //校验超时
                        if (entry.isExpired(System.currentTimeMillis())) {
                            entry.close();
                        }
                        if (entry.isClosed()) {
                            this.available.remove(entry);
                            pool.free(entry, false);
                        } else {
                            break;
                        }
                    }
                    if (entry != null) {
                        this.available.remove(entry);
                        this.leased.add(entry);
                        onReuse(entry);
                        return entry;
                    }
    

    3.如果没有获取到 进行接下来的代码。
    4.判断是否达到了host配置的最大池数量,是否需要增加, 如果需要增加,则会在增加新连接之前缩小池,然后再分配返回entry

                    // New connection is needed  获取是否需要创建新的连接
                    final int maxPerRoute = getMax(route);
                    // Shrink the pool prior to allocating a new connection
                    final int excess = Math.max(0, pool.getAllocatedCount() + 1 - maxPerRoute);
                    if (excess > 0) {
                        for (int i = 0; i < excess; i++) {
                            final E lastUsed = pool.getLastUsed();
                            if (lastUsed == null) {
                                break;
                            }
                            lastUsed.close();
                            this.available.remove(lastUsed);
                            pool.remove(lastUsed);
                        }
                    }
    
                    if (pool.getAllocatedCount() < maxPerRoute) {
                        final int totalUsed = this.leased.size();
                        final int freeCapacity = Math.max(this.maxTotal - totalUsed, 0);
                        if (freeCapacity > 0) {
                            final int totalAvailable = this.available.size();
                            if (totalAvailable > freeCapacity - 1) {
                                if (!this.available.isEmpty()) {
                                    final E lastUsed = this.available.removeLast();
                                    lastUsed.close();
                                    final RouteSpecificPool<T, C, E> otherpool = getPool(lastUsed.getRoute());
                                    otherpool.remove(lastUsed);
                                }
                            }
                            final C conn = this.connFactory.create(route);
                            entry = pool.add(conn);
                            this.leased.add(entry);
                            return entry;
                        }
                    }
    

    6.如果并不是上面的情况,实际情况就是池子被用光了,而且还达到了最大。就不能从池子中获取资源了。只能等了……
    7.等待的时候会判断deadline , 如果deadline不为null 就会await一个时间。如果为null,那么等待就会无限等待,直到有资源。

                    boolean success = false;
                    try {
                        if (future.isCancelled()) {
                            throw new InterruptedException("Operation interrupted");
                        }
                        pool.queue(future);
                        this.pending.add(future);
                        //判断deadline是否有效
                        if (deadline != null) {
                            //如果有效就等待至deadline
                            success = this.condition.awaitUntil(deadline);
                        } else {
                           //如果无效就一直等待,没有超时时间
                            this.condition.await();
                            success = true;
                        }
                        if (future.isCancelled()) {
                            throw new InterruptedException("Operation interrupted");
                        }
                    } finally {
                        // In case of 'success', we were woken up by the
                        // connection pool and should now have a connection
                        // waiting for us, or else we're shutting down.
                        // Just continue in the loop, both cases are checked.
                        pool.unqueue(future);
                        this.pending.remove(future);
                    }
    

    总结

    分析到这儿事情就已经明了了。
    1.有一个后端服务因为调用第三方导致完全处于宕机状态,所有gw过去的请求都会超时。
    2.由于这个服务的请求又特别多,导致GW分给这个服务的连接池耗尽无法获取到连接资源,导致资源请求线程一直积累在GW
    3.GW的对应这个服务的线程数一直在增加,导致别的服务也无法正常工作。


    处理

    其实很简单,加入一个timeout 就可以了。
    这个timeout是等待池资源的超时时间。
    Zuul中,重写SimpleHostRoutingFilter ,重写创建HTTPClient, RequestConfig中设置了ConnectionRequestTimeout

        protected CloseableHttpClient newClient() {
            if(connectionRequestTimeout ==  null || connectionRequestTimeout <= 0){
                connectionRequestTimeout = 60;
            }
            final RequestConfig requestConfig = RequestConfig.custom()
                    //设置socket 时间长度
                    .setSocketTimeout(SOCKET_TIMEOUT.get())
                    //设置连接时长
                    .setConnectTimeout(CONNECTION_TIMEOUT.get())
                     //设置等待时长
                    .setConnectionRequestTimeout(connectionRequestTimeout)
                    .setCookieSpec(CookieSpecs.IGNORE_COOKIES).build();
    
            HttpClientBuilder httpClientBuilder = HttpClients.custom();
            if (!this.sslHostnameValidationEnabled) {
                httpClientBuilder.setSSLHostnameVerifier(NoopHostnameVerifier.INSTANCE);
            }
            return httpClientBuilder.setConnectionManager(newConnectionManager())
                    .disableContentCompression()
                    .useSystemProperties().setDefaultRequestConfig(requestConfig)
                    .setRetryHandler(new DefaultHttpRequestRetryHandler(0, false))
                    .setRedirectStrategy(new RedirectStrategy() {
                        @Override
                        public boolean isRedirected(HttpRequest request,
                                                    HttpResponse response, HttpContext context)
                                throws ProtocolException {
                            return false;
                        }
    
                        @Override
                        public HttpUriRequest getRedirect(HttpRequest request,
                                                          HttpResponse response, HttpContext context)
                                throws ProtocolException {
                            return null;
                        }
                    }).build();
        }
    

    相关文章

      网友评论

          本文标题:网关超时等待引发的血案

          本文链接:https://www.haomeiwen.com/subject/ldqvtqtx.html