续租很简单，用了JUC的线程池，首先用 scheduler 管理 TimedSupervisorTask，然后 TimedSupervisorTask利用 executor 来submit心跳线程 HeartbeatThread，心跳线程以 HTTP请求 的方式向Eureka Server续租。

若HeartbeatThread执行超时，则进入 超时处理逻辑：即过一小会再执行心跳线程。

到底过多久再执行？延迟时间2。延时时间初始值就是*续租间隔时间(renewalIntervalInSecs)。假设renewalIntervalInSecs设置为10秒，那第一次超时后将等待20秒再尝试续约，第二次超时后将等待40秒再尝试续约，以此类推。若续约成功，延迟时间将恢复为renewalIntervalInSecs。

了解以上逻辑后，再反向看源码。

HeartbeatThread是DiscoveryClient的内部类，向Eureka Server发送HTTP请求进行续约。

private class HeartbeatThread implements Runnable {

public void run() {

if (renew()) {

lastSuccessfulHeartbeatTimestamp = System.currentTimeMillis();

}

// 伪代码,有删减

boolean renew() {

try {

httpResponse = sendHeartBeat(应用名, 应用ID, 应用info); // HTTP请求

if (httpResponse.getStatusCode() == 404) {

return register();

}

return httpResponse.getStatusCode() == 200;

} catch (Throwable e) {

return false;

}

schedule TimedSupervisorTask来监控HeartbeatThread，如果超时则执行超时处理逻辑；没超时则由HeartbeatThread正常处理即可。

// Heartbeat timer

scheduler.schedule(

new TimedSupervisorTask(

"heartbeat",

scheduler,

heartbeatExecutor,

renewalIntervalInSecs,

TimeUnit.SECONDS,

expBackOffBound,

new HeartbeatThread()

),

renewalIntervalInSecs, TimeUnit.SECONDS);

TimedSupervisorTask这个任务代码如下，主要看下超时处理和finally中的重新schedule(有较大删减，仅保留必要代码)。

public class TimedSupervisorTask extends TimerTask {

public TimedSupervisorTask(...) {

this.scheduler = scheduler;

this.executor = executor;

this.timeoutMillis = timeUnit.toMillis(timeout);

this.task = task; //就是HeartbeatThread

this.delay = new AtomicLong(timeoutMillis); // 延时时间

this.maxDelay = timeoutMillis * expBackOffBound;

}

public void run() {

Future future = null;

try {

future = executor.submit(task);

// 完成心跳或超时

future.get(timeoutMillis, TimeUnit.MILLISECONDS);

delay.set(timeoutMillis);

} catch (TimeoutException e) {

logger.error("task supervisor timed out", e);

long currentDelay = delay.get();

long newDelay = Math.min(maxDelay, currentDelay * 2);

// 延时时间加倍

delay.compareAndSet(currentDelay, newDelay);

} finally {

if (!scheduler.isShutdown()) {

// 重新schedule心跳任务

scheduler.schedule(this, delay.get(), TimeUnit.MILLISECONDS);

}

heartbeatExecutor是JUC中的Executor，不特别介绍。

ThreadPoolExecutor heartbeatExecutor = new ThreadPoolExecutor(

1, clientConfig.getHeartbeatExecutorThreadPoolSize(), 0, TimeUnit.SECONDS,

new SynchronousQueue(),

new ThreadFactoryBuilder()

.setNameFormat("DiscoveryClient-HeartbeatExecutor-%d")

.setDaemon(true)

.build()

scheduler这个线程池用来管理TimedSupervisorTask。

ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(2,

new ThreadFactoryBuilder()

.setNameFormat("DiscoveryClient-%d")

.setDaemon(true)

.build());

回归问题

续租配置

现在，我们回归问题现象。先看看问题应用的续租时间(renewalIntervalInSecs)配置：

eureka.instance.lease-renewal-interval-in-seconds=1

是的，就是1秒，不要问我为什么，我也不知道！

现在，至少可以确定：应用在高频续租

为何TimeoutException

TimedSupervisorTask设定的超时时间是？看看时间监控伙计的timeout属性：

public TimedSupervisorTask(String name, ScheduledExecutorService scheduler, ThreadPoolExecutor executor,int timeout, TimeUnit timeUnit, int expBackOffBound, Runnable task)

timeout配置是new TimedSupervisorTask()时传进来的，使用的是renewalIntervalInSecs。好了，是一秒！在内网，正常来说1秒也是够的，但因各种情况总是免不了会超时，且HTTP超时时间设置为1秒的也是少之又少。

scheduler.schedule(

new TimedSupervisorTask(

"heartbeat",

scheduler,

heartbeatExecutor,

renewalIntervalInSecs,

TimeUnit.SECONDS,

expBackOffBound,

new HeartbeatThread()

),

renewalIntervalInSecs, TimeUnit.SECONDS);

现在，出现 TimeoutException: task supervisor timed out 就比较清楚了。

想本地重现超时异常的话，在续租renew()中加个断点，模拟下超时，就可以抛出该异常。

高频续租对GC的影响

在摘掉应用流量后，800M的Eden区在4分钟内被耗光。一下也找不到原因，在本地创建了Eureka Server和Client两个简单应用，观察Client的内存消耗情况。

为了便于观察，堆内存设置很小，如下：

-Xmx40m -Xms40m -XX:NewRatio=3 -verbosegc -XX:+PrintGCTimeStamps -Xloggc:/Users/cyj/gcdebug.log

接下来做2个对比实验：

1秒1次心跳

# 排除从Eureka Server获取所有实例的影响, 设置为10分钟

eureka.client.registry-fetch-interval-seconds=600

# 1秒一次心跳

eureka.instance.lease-renewal-interval-in-seconds=1

丢弃掉应用启动时的频繁GC，最后三次的 GC如下：

2018-08-28T22:35:25.459-0800: 116.798: [GC (Allocation Failure) ...

2018-08-28T22:37:45.378-0800: 256.719: [GC (Allocation Failure) ...

2018-08-28T22:39:26.500-0800: 357.844: [GC (Allocation Failure) ...

平均GC时间约 2分钟/次。

不进行心跳

接下来，将续租间隔、租约时间都设置成10分钟，达到在观察期间不进行心跳的效果。

eureka.client.registry-fetch-interval-seconds=600

eureka.instance.lease-expiration-duration-in-seconds=600

eureka.instance.lease-renewal-interval-in-seconds=600

丢弃掉应用启动时的频繁GC，最后三次的 GC如下：

2018-08-28T22:39:53.806-0800: 7.546: [Full GC (Ergonomics) ...

2018-08-28T22:49:53.416-0800: 607.167: [GC (Allocation Failure) ...

2018-08-28T22:57:37.248-0800: 1071.009: [GC (Allocation Failure) ...

碰巧来了次FullGC，但是不影响数据。平均GC时间接近9分钟/次的样子。

这两次对比实验虽然不是特别合适，但可以说明不合理的心跳时间会加速内存的消耗。

调整问题应用的结果

摘除问题应用的流量后，将续租间隔调整为默认值30秒后，该应用的Minor GC频率从 4分钟/次 降低到 13分钟/

网友评论

本文标题：Spring Cloud之极端续租间隔时间的影响

本文链接：https://www.haomeiwen.com/subject/zsqjwftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Spring Cloud之极端续租间隔时间的影响

相关文章