简介
引入了skywalking后,虽然界面可以清晰的看到链路情况,但是对于开发而言,更多的是在出现问题的时候我们才会主动去查询链路信息,而skywalking提供了告警功能可以及时让我们注意到问题。
告警主要有两块内容组成
- 告警规则
- 钩子
告警使用
规则
- 告警名称,唯一,必须_rule结尾
- 监控名称,来自官方的一些分析数据,位于
skywalking/oap-server/generated-analysis/src/main/resources/official_analysis.oal
([https://github.com/apache/skywalking/blob/master/docs/en/guides/backend-oal-scripts.md] (https://github.com/apache/skywalking/blob/master/docs/en/guides/backend-oal-scripts.md)
) -
包含名称,服务,断点等,如图:
image.png
下面是官方等sample里面的内容
# [Optional] Default, match all services in this metrics
include-names:
- dubbox-provider
- dubbox-consumer
- Threshold,目标值。比如,时间1000ms,成功率90
- OP,> 大于, < 小雨, = 等于
- Period,告警检测周期
- Count,数量
- Silence period,沉默周期,如果告警在A时间触发,在A+sp时间内只会触发一次告警,大家应该经历过被已知告警轰炸的经历,所以这个还是很有必要的
官方还给出了默认告警规则,这里就不做过多介绍了。
We provided a default alarm-setting.yml in our distribution only for convenience, which including following rules
- Service average response time over 1s in last 3 minutes.
- Service success rate lower than 80% in last 2 minutes.
- Service 90% response time is over 1s in last 3 minutes
- Service Instance average response time over 1s in last 2 minutes.
- Endpoint average response time over 1s in last 2 minutes.
钩子
在上面有一篇文章介绍Webhook的内容。它主要就是我们日常告警中的一个回调功能。
Webhook requires the peer is a web container. The alarm message will send through HTTP post by application/json content type. The JSON format is based on List<org.apache.skywalking.oap.server.core.alarm.AlarmMessage> with following key information.
@Setter(AccessLevel.PUBLIC)
@Getter(AccessLevel.PUBLIC)
public class AlarmMessage {
public static AlarmMessage NONE = new NoAlarm();
private int scopeId;
private String name;
private int id0;
private int id1;
private String alarmMessage;
private long startTime;
private static class NoAlarm extends AlarmMessage {
}
}
这里用到了lombok,个人觉得开源组件就不应该用lombok,也就多几行Get/Set,所见即所得还是更符合人类习惯的。lombok它是属于业务开发的蜜。
回归正题,下面是发送的代码
public class WebhookCallback implements AlarmCallback {
@Override public void doAlarm(List<AlarmMessage> alarmMessage) {
if (remoteEndpoints.size() == 0) {
return;
}
CloseableHttpClient httpClient = HttpClients.custom().build();
try {
remoteEndpoints.forEach(url -> {
HttpPost post = new HttpPost(url);
post.setConfig(requestConfig);
post.setHeader("Accept", "application/json");
post.setHeader("Content-type", "application/json");
StringEntity entity = null;
try {
entity = new StringEntity(gson.toJson(alarmMessage));
post.setEntity(entity);
CloseableHttpResponse httpResponse = httpClient.execute(post);
StatusLine statusLine = httpResponse.getStatusLine();
if (statusLine != null && statusLine.getStatusCode() != 200) {
logger.error("send alarm to " + url + " failure. Response code: " + statusLine.getStatusCode());
}
} catch (UnsupportedEncodingException e) {
logger.error("Alarm to JSON error, " + e.getMessage(), e);
} catch (ClientProtocolException e) {
logger.error("send alarm to " + url + " failure.", e);
} catch (IOException e) {
logger.error("send alarm to " + url + " failure.", e);
}
});
} finally {
try {
httpClient.close();
} catch (IOException e) {
logger.error(e.getMessage(), e);
}
}
}
}
而它又是org.apache.skywalking.oap.server.core.alarm.provider.AlarmCore#start触发的,它是一个延迟线程池
Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {}, 10, 10, TimeUnit.SECONDS);
页面效果
我在dubbo服务端设置了随机sleep,然后可以看到出现了告警信息
image.png
6.x 官方告警文档
https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/backend-alarm.md
网友评论