原因
最近一个项目,在修改时遇到一个bug,查看日志时意外发现还存在其他问题,由于此问题未导致程序崩溃,所以未能及时发现。
思考
当前中小公司开发,面临的问题就是问题都是业务方发现和提出的,技术人员没精力去关心软件的生命周期中是否存在问题.如果缺少主动监控,基本都是业务那边发现挂了,联系技术,此时技术才去查原因。
其实生命也是如此,细微的日志缺少记录,带来最终的质变,例如生病住院,夫妻离婚,公司裁员等等。
解决
使用elk去搜集日志并主动查询日志。主动及时发现问题。
过程
- 访问 https://github.com/deviantony/docker-elk 直接安装。这个过程按官方的即可。
以后系统还是选centos8 ,不要选阿里云的系统,因为 服务端使用的很多软件,人家不会适配阿里云,导致 二者安装的细微差别。可能带来不必要的麻烦。
- 安装后的配置
logstash.conf的配置如下:该配置会 提取日志里 appname字段创建一个索引。
修改配置后需要重启容器
input {
beats {
port => 5044
}
tcp {
port => 50000
}
}
## Add your filters / logstash plugins configuration here
filter {
grok {
match => {
"message" => ".*(?<appname>(?<=appname\S:\").*(?=\")).*"
}
overwrite => ["message"]
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
user => "logstash_internal"
password => "d9ULrven2u84CpyBh8Ln"
index => ["logstash-%{[appname]}-%{+YYYY.MM.dd}"]
}
}
Elasticsearch 配置一个生命周期,一个索引模板.
效果就是会自动删除过期索引,避免磁盘满了。
PUT _ilm/policy/7days
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "10m",
"actions": {
"delete": {
"delete_searchable_snapshot": true
}
}
}
}
}
}
PUT _template/logstash_7days_template
{
"index_patterns": ["logstash-*"],
"settings": {
"index.lifecycle.name": "7days"
}
}
- 代码修改,提交日志到系统
<!--集成logstash-->
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>5.3</version>
</dependency>
<configuration>
<include resource="org/springframework/boot/logging/logback/defaults.xml"/>
<include resource="org/springframework/boot/logging/logback/console-appender.xml"/>
<!--应用名称-->
<property name="APP_NAME" value="springboot-logback-elk-demo"/>
<!--日志文件保存路径-->
<property name="LOG_FILE_PATH" value="${LOG_FILE:-${LOG_PATH:-${LOG_TEMP:-${java.io.tmpdir:-/tmp}}}/logs}"/>
<contextName>${APP_NAME}</contextName>
<!--每天记录日志到文件appender-->
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_FILE_PATH}/${APP_NAME}-%d{yyyy-MM-dd}.log</fileNamePattern>
<maxHistory>30</maxHistory>
</rollingPolicy>
<encoder>
<pattern>${FILE_LOG_PATTERN}</pattern>
</encoder>
</appender>
<!--输出到logstash的appender-->
<appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
<!--可以访问的logstash日志收集端口-->
<destination>你的日志系统ip地址:50000</destination>
<encoder charset="UTF-8" class="net.logstash.logback.encoder.LogstashEncoder">
<customFields>{"appname":"robot_iot_prod"}</customFields>
<timeZone>UTC</timeZone>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="LOGSTASH"/>
</root>
</configuration>
5分钟监控一次日志发送给飞机上机器人。
{
"trigger": {
"schedule": {
"interval": "5m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"logstash-robot_iot_prod-*"
],
"rest_total_hits_as_int": true,
"body": {
"query": {
"bool": {
"must": [
{
"match": {
"message": "error"
}
},
{
"range": {
"@timestamp": {
"gte": "now-5m"
}
}
}
]
}
},
"sort": [
{
"@timestamp": {
"order": "desc"
}
}
],
"size": 1
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"log_error_count": {
"logging": {
"level": "info",
"text": "There were {{ctx.payload.hits.total}} documents with 'error' in the message field in the last 5 minutes. The latest error message was: {{ctx.payload.hits.hits.0._source.message}}"
}
},
"send_to_feishu": {
"webhook": {
"scheme": "https",
"host": "open.feishu.cn",
"port": 443,
"method": "post",
"path": "open-apis/bot/v2/hook/飞书机器人的token",
"params": {},
"headers": {
"Content-Type": "application/json"
},
"body": """{"msg_type":"text","content":{"text":" There were {{ctx.payload.hits.total}} documents with 'error' in the message field in the last 5 minutes. The latest error message was {{ctx.payload.hits.hits.0._source.desc}} "}}"""
}
}
}
}
效果图
image.png其他
https://grokdebugger.com/ 可以用来测试正则效果
网友评论