异常日志
com.netflix.hystrix.exception.HystrixRuntimeException: UserSettingClient#get(long,String) timed-out and no fallback available.
com.netflix.hystrix.exception.HystrixRuntimeException: UserSettingClient#get(long,String) short-circuited
Arthas 耗时
查看 log
$ tail -n 100000 service.log | grep -i "Started Application"
2021-07-27 09:51:33,460 com.xxx.athena.Application:61 [ INFO]:TID:N/A: Started Application in 147.318 seconds (JVM running for 165.223)
2021-07-27 09:51:59,436 org.springframework.boot.SpringApplication:61 [ INFO]:TID:N/A: Started application in 19.344 seconds (JVM running for 191.198)
$ tail -n 100000 service.log | grep "DynamicServerListLoadBalancer"
2021-07-27 09:51:33,906 com.netflix.loadbalancer.BaseLoadBalancer:197 [ INFO]:TID:[Ignored Trace]: Client: user-service instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=user-service,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2021-07-27 09:51:33,924 com.netflix.loadbalancer.DynamicServerListLoadBalancer:222 [ INFO]:TID:[Ignored Trace]: Using serverListUpdater PollingServerListUpdater
2021-07-27 09:51:34,010 com.netflix.loadbalancer.DynamicServerListLoadBalancer:150 [ INFO]:TID:[Ignored Trace]: DynamicServerListLoadBalancer for client user-service initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=user-service,current list of Servers=[xxxx:8822, xxxx:8822, xxx:8822],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone; Instance count:3; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
...
2021-07-27 09:51:40,516 com.netflix.loadbalancer.BaseLoadBalancer:197 [ INFO]:TID:[Ignored Trace]: Client: content-provider instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=content-provider,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2021-07-27 09:51:40,519 com.netflix.loadbalancer.DynamicServerListLoadBalancer:222 [ INFO]:TID:[Ignored Trace]: Using serverListUpdater PollingServerListUpdater
2021-07-27 09:51:40,551 com.netflix.loadbalancer.DynamicServerListLoadBalancer:150 [ INFO]:TID:[Ignored Trace]: DynamicServerListLoadBalancer for client content-provider initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=content-provider,current list of Servers=[xxxx:8811, xxxx:8811, xxxx:8811],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone; Instance count:3; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
$ tail -n 100000 service.log | grep "com.netflix.hystrix.exception.HystrixRuntimeException" -B 1
2021-07-27 09:51:35,013 org.apache.rocketmq.spring.support.DefaultRocketMQListenerContainer:328 [ WARN]:TID:[Ignored Trace]: consume message failed. messageExt:MessageExt [brokerName=broker-2, queueId=0, storeSize=505, queueOffset=3783772, sysFlag=0, bornTimestamp=1627350691944, bornHost=/10.0.16.14:41676, storeTimestamp=1627350691944, storeHost=/10.0.16.11:10911, msgId=0A00100B00002A9F0000003DE9918691, commitLogOffset=265911633553, bodyCRC=1259644652, reconsumeTimes=0, preparedTransactionOffset=0, toString()=Message{topic='waiting-notify-learning-queue', flag=0, properties={MIN_OFFSET=3669598, sw6=1-NzcuNDEyLjE2MjczNTA2OTE5NDQ2Mjgx-NzcuNDEyLjE2MjczNTA2OTE5NDQ2Mjgw-0-77-77-MzA=-I1JvY2tldE1RL3dhaXRpbmctbm90aWZ5LWxlYXJuaW5nLXF1ZXVlL1Byb2R1Y2Vy-I1JvY2tldE1RL3dhaXRpbmctbm90aWZ5LWxlYXJuaW5nLXF1ZXVlL1Byb2R1Y2Vy, MAX_OFFSET=3783773, CONSUME_START_TIME=1627350691948, id=209bb1aa-1160-f9e2-2bad-093f79ee12b9, UNIQ_KEY=AC11000162690B25B0958802E8685AB1, CLUSTER=DefaultCluster, WAIT=false, contentType=text/plain, timestamp=1627350691944}, body=[54, 48, 50, 50, 50, 56, 56], transactionId='null'}]
com.xxxx.api.result.ApiException: com.netflix.hystrix.exception.HystrixRuntimeException: UserClient#getUserInfo(long) timed-out and no fallback available.
...
可以看到,应用启动后,RocketMQ 消费者就开始消费,但是 Ribbon 还没有初始化完成。
有人说是要开启 Ribbon 饥饿加载,看了下服务配置,已经开启了。
ribbon.eager-load.enabled=true
ribbon.eager-load.clients=user-service
Ribbon 自动配置
package org.springframework.cloud.netflix.ribbon;
@Configuration
public class RibbonAutoConfiguration {
@Autowired
private RibbonEagerLoadProperties ribbonEagerLoadProperties;
@Bean
@ConditionalOnProperty("ribbon.eager-load.enabled")
public RibbonApplicationContextInitializer ribbonApplicationContextInitializer() {
return new RibbonApplicationContextInitializer(springClientFactory(),
ribbonEagerLoadProperties.getClients());
}
}
@ConfigurationProperties(prefix = "ribbon.eager-load")
public class RibbonEagerLoadProperties {
private boolean enabled = false;
private List<String> clients;
}
public class RibbonApplicationContextInitializer
implements ApplicationListener<ApplicationReadyEvent> {
protected void initialize() {
if (clientNames != null) {
for (String clientName : clientNames) {
// 初始化 IClientConfig
this.springClientFactory.getContext(clientName);
}
}
}
@Override
public void onApplicationEvent(ApplicationReadyEvent event) {
// 在 ApplicationReadyEvent 事件发起后初始化 RibbonApplicationContext
initialize();
}
}
package org.springframework.cloud.openfeign.ribbon;
public class LoadBalancerFeignClient implements Client {
@Override
public Response execute(Request request, Request.Options options) throws IOException {
try {
URI asUri = URI.create(request.url());
String clientName = asUri.getHost();
URI uriWithoutHost = cleanUrl(request.url(), clientName);
FeignLoadBalancer.RibbonRequest ribbonRequest = new FeignLoadBalancer.RibbonRequest(
this.delegate, request, uriWithoutHost);
// 如果没有初始化 IClientConfig,要先初始化 IClientConfig
// 初始化 IClientConfig 耗时比较长
IClientConfig requestConfig = getClientConfig(options, clientName);
return lbClient(clientName)
.executeWithLoadBalancer(ribbonRequest, requestConfig).toResponse();
}
catch (ClientException e) {
IOException io = findIOException(e);
if (io != null) {
throw io;
}
throw new RuntimeException(e);
}
}
}
Spring getSingleton()
protected Object getSingleton(String beanName, boolean allowEarlyReference) {
Object singletonObject = this.singletonObjects.get(beanName);
if (singletonObject == null && isSingletonCurrentlyInCreation(beanName)) {
synchronized (this.singletonObjects) {
singletonObject = this.earlySingletonObjects.get(beanName);
if (singletonObject == null && allowEarlyReference) {
ObjectFactory<?> singletonFactory = this.singletonFactories.get(beanName);
if (singletonFactory != null) {
singletonObject = singletonFactory.getObject();
this.earlySingletonObjects.put(beanName, singletonObject);
this.singletonFactories.remove(beanName);
}
}
}
}
return singletonObject;
}
存在锁 synchronized (this.singletonObjects)
,因此 Ribbon 的初始化是串行的。
解决方案
因为 Ribbon 的初始化是串行的,所以在初始化完成之前要禁止如下两项:
- (1) 定时任务的执行。
- (2) RocketMQ 消费者进行消息消费。
1. 延迟执行定时任务
(1)新增 ScheduleDisableAspect
切面,用来拦截 @Scheduled
方法。
@Component
@Aspect
@RequiredArgsConstructor
public class ScheduleDisableAspect {
private static volatile boolean enable = false;
@Around("@annotation(org.springframework.scheduling.annotation.Scheduled)")
public Object beforePointCut(ProceedingJoinPoint pjp) throws Throwable {
// 先看看是否被禁用,如果被禁用的话,则直接拦截
if (!enable) {
return null;
}
return pjp.proceed();
}
public static void enable() {
enable = true;
}
}
(2)在 SpringApplicationRunListener
的 running()
方法中打开定时任务的开关,让其可以执行。
package com.example.demo;
@Slf4j
public class ScheduleEnableRunListener implements SpringApplicationRunListener {
// Spring 容器实例化使用
public MySpringApplicationRunListener(SpringApplication application, String[] args) {
}
@Override
public void running(ConfigurableApplicationContext context) {
ScheduleDisableAspect.enable();
}
}
(3)在 META-INF/spring.factories 配置:
org.springframework.boot.SpringApplicationRunListener=\
com.example.demo.ScheduleEnableRunListener
因为 ScheduleEnableRunListener
会在 org.springframework.cloud.netflix.ribbon.RibbonApplicationContextInitializer#onApplicationEvent
之后执行,所以在 Ribbon 初始化完成之后才会执行定时任务。
执行顺序的测试,可以参考:SpringApplicationRunListener 的使用
2. 延迟启动 RocketMQ 消费者
(1)去掉 org.apache.rocketmq.spring.autoconfigure.ListenerContainerConfiguration#registerContainer
中的如下代码。
(2)去掉 org.apache.rocketmq.spring.support.DefaultRocketMQListenerContainer#start
中的逻辑,将其拷贝到 doStart()
方法中。
(3)在 SpringApplicationRunListener
的 running()
方法中调用 container.doStart()
方法启动 RocketMQ 消费者。
package org.apache.rocketmq.spring.autoconfigure;
public class ConsumerDelayStartApplicationRunListener implements SpringApplicationRunListener {
private final static Logger log = LoggerFactory.getLogger(ConsumerDelayStartApplicationRunListener.class);
public ConsumerDelayStartApplicationRunListener(SpringApplication application, String[] args) {
}
@Override
public void running(ConfigurableApplicationContext context) {
log.info("ConsumerDelayStartApplicationRunListener.running executed");
context.getBeansOfType(DefaultRocketMQListenerContainer.class).forEach((beanName, container) -> {
if (!container.isRunning()) {
try {
log.info("beanName:{}, topic:{}, consumerGroup:{} container.start() begin", beanName, container.getTopic(), container.getConsumerGroup());
container.doStart();
log.info("beanName:{}, topic:{}, consumerGroup:{} container.start() end", beanName, container.getTopic(), container.getConsumerGroup());
} catch (Exception e) {
log.error("Started container failed. {}", container, e);
throw new RuntimeException(e);
}
}
});
}
}
(4)在 META-INF/spring.factories 配置:
org.springframework.boot.SpringApplicationRunListener=\
org.apache.rocketmq.spring.autoconfigure.ConsumerDelayStartApplicationRunListener
提示:因为定时任务的配置和 MQ 消费者的配置不在同一个项目,所以需要在不同的项目中进行配置。
至此,解决了项目启动时的 Hystrix timed-out
问题。
网友评论