美文网首页Elastic Stack
elk日志大盘显示和日志监控报警配置实践

elk日志大盘显示和日志监控报警配置实践

作者: 瑜骐 | 来源:发表于2018-04-11 15:59 被阅读62次

    1. Logstash

    1.1具体内容详解

    具体讲解大家可以看文档:https://www.elastic.co/guide/en/logstash/current/index.html

    1.2使用整体配置

    1.2.1项目工程中的配置

    在logback对应的配置文件logback-spring.xml中配置如下内容:

            注意encoder标签中对应的三个字段:requestUrl,traceId和clientIp,这三个是在要求在输出日志中增加这三个标签,和logback对应的默认输出合并,对应的输出结果为:

    1.2.2 日志发送到elasticsearch配置

    具体对应conf配置如下:

    input {

     file {

     type =>"XXX_kccf_pc_info"  ------对应的elasticsearch索引type值

    path => "/data/log/kccf_pc_info.*.log"  ------输入日志

       start_position => "beginning"

       codec => json {

                charset => "UTF-8"

           }

        }

    }

    filter {

       if[type] == "XXX_kccf_pc_info"

        {

    if "method=HEAD" in [requestUrl] {   ----------如果是心跳检查(head)在不会将日志拉取到elasticSearch

                    drop {}

          }

              mutate {

                    split =>["[requestUrl]",","]

             }

        }

       geoip {

               source => "ip"

               target => "geoip"

         }

    }

    filter {

       if[type] == "XXX_kccf_pc_info"

        {

            if [level] == "ERROR" and "PARAM_NO_PRIVILEGE" notin [message] and "ERROR-START" not in [message]

             {

              mutate {

                  add_tag => "email"}

             }

         }

             geoip{

                         source=> "ip"

                         target => "geoip"

        }

    }

    output {

      if[type] == "XXX_kccf_pc_info"{

     elasticsearch {

           hosts => "XXXXX"

    index => "XXXX-%{+YYYY.MM.dd}"  ------elasticsearch对应的索引名称

           sniffing => false

           manage_template => false

           flush_size => 3000

           idle_flush_time => 5

           user => "logstash"

           password => "logstash"

    #       ssl => true

    #       ssl_certificate_verification => false

    #       truststore => "/etc/logstash/truststore.jks"

    #       truststore_password => changeit

         }

      }

     if"email" in [tags] and [type] =="XXX_kccf_pc_info"

      {

     email{---------------------发送错误日志邮件

           port           =>    "587"

           address        =>    "smtp.XXXX.com"

           username       =>    "devops@XXXX.com"

           password       =>    "xxxxxx"

           authentication =>   "login"

           use_tls        =>    true

           from           =>    "devops@XXX.com"

           subject        =>    "Waring: you have an error on host101.201.118.236(TYJ1)"

           to             =>    "yjk@XXX.com"

           via            =>    "smtp"

           body           =>    "you hava an error ofkccf_pc_info!  server_ip:XXX\n ERROR time: %{@timestamp}+08:00Hours ; \n ERROR message: %{message} ; \nLogger Name: %{logger_name}; \n Level:%{level} ; \n level_value:%{level_value}; \n Stack_trace : \n %{stack_trace}\n "

           }

      }

    }

    2. Elastic search

    2.1具体内容详解

    详细内容参见:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

    Elasticsearch集群可以包含多个索引(数据库),这些索引又包含多个类型(表)。这些类型包含多个文档(行),每个文档都有多个字段(列)。

    2.2 elasticsearch head介绍

    下载和详细内容参见:https://github.com/mobz/elasticsearch-head

    可以查看elasticsearch服务器上各种信息

    2.3 searchguard 插件

    下载和详细内容参见:https://github.com/floragunncom/search-guard

    从对应配置文件名称就可以看出来,就主要有以下主要关系:

    2.4 script field

    详细说明参见:https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-script-fields.html

    就是可以根据脚本动态的在ElasticSearch中生成对应的field,而不是一开始就在mapping中定义的好的,后面在kibana配置大盘的时候会用到。

    2.5 painless 脚本

    脚本语言详解:https://www.elastic.co/guide/en/elasticsearch/reference/5.4/modules-scripting-painless.html

    3.Kibana

    3.1 具体内容详解

    下载和详细内容详解:https://www.elastic.co/guide/en/kibana/current/index.html

    3.2 searchguard 插件

    插件下载地址和详细说明参见:https://github.com/floragunncom/search-guard-kibana-plugin

    3.3 sentinl 邮件报警插件

    插件下载和详细说明地址:https://github.com/sirensolutions/sentinl/issues/137

    配置地址可以参考:http://blog.51cto.com/10546390/2051676

    注意:邮件配置的时候一定是授权码,不是密码,否则报授权失败

    4.日志监控大盘配置

    4.1 kibana日志搜索基本用法

    详细内容查看:https://www.elastic.co/guide/en/kibana/5.4/search.html

    AND,OR,NOT,范围查询语法支持

    其实对于kibana查询,还是可以使用filter进行查询的(高版本已经支持了,但是我们使用的5.4需要经过下面的步骤来增加filter):

    4.2 增加索引scriptfield

    之所以要增加scriptfields是因为比如从对应的下面日志中抽取对应的耗时:

    method=com.kccf.pc.controller.article.ArticleController.getArticleListcost 22 milliseconds

    要从对应的日志中截取对应的耗时内容,所以要使用脚本来进行截取,并将截取的值存放在对应的script fields字段中,供后面的Visualize使用。

    4.3 Visualize配置

    单击“加号”新增一个视图:

    并选择对应的视图类型,然后选择要从哪个索引采集数据进行配置:

    当然X,Y轴都可以配置多个统计,如下面的:

    4.4 Dashboard大盘配置

    大盘就是将上面配置的多个Visualized放在一个地方进行显示,如下图所示:

    新增大盘,通过如下操作:

    将Visualized放置在大盘中:

    5. 监控邮件报警sentinl配置

    5.1 新增watcher配置

    (1) General:表示多长时间运行一次,定时遍历日志

    (2) Input:对elasticSearch中的索引或者索引列表进行过滤采集,得到最终的输入

    (3) Condition:就是过滤之后的日志满足什么条件才进行报警

    (4) Transform:对结果进行转换

    (5) Actions:到达报警条件了,进行什么样的处理,一般就是发送邮件通知

    5.1.1 Input过滤:

    下面是配置的Input过滤:在最近一个小时内,耗时大于20毫秒的记录进行过滤

    {

     "search": {

       "request": {

    "index":[

           "indu_kccf_pc_info-*"

          ],

         "body": {

           "query": {

             "bool": {

               "must": [

                  {

                   "script": {

                     "script": {

    "lang": "painless",

                       "params": {

    "costTimeThrehold":20

                       },

    "inline":"if(null != doc['message.keyword'].value &&doc['message.keyword'].value.trim().startsWith(\"method=\")&&doc['message.keyword'].value.trim().endsWith(\"milliseconds\")){Number costTimeNum = NumberFormat.getInstance().parse(doc['message.keyword'].value.substring(doc['message.keyword'].value.indexOf('cost')+4,doc['message.keyword'].value.lastIndexOf('milliseconds')-1).trim());if(costTimeNum.longValue() > params.costTimeThrehold) true; else false;}elsefalse;"

                      }

                    }

                  },

                  {

    "range":{

                     "@timestamp": {

                       "gte": "now-1h",

                       "lte": "now",

                       "format": "epoch_millis"

                      }

                    }

                  }

                ]

              }

            }

          }

        }

      }

    }

    注意:inline里面的脚本功能就是截取如下日志中对应的耗时method=com.kccf.pc.controller.article.ArticleController.getArticleListcost 22 milliseconds。Range对应的是时间范围过滤。

    5.1.2 Condition配置:

    Condition配置就是对input过滤得到结果进行统计,如果统计的结果满足设置的条件则进行触发后面的行为:

    上面的条件是Input对应的过滤结果的条数大于1则进行报警,这个脚本是可以自己动态编写的,因为sentinl插件也是在elasticSearch中定义了自己对应的索引:watcher和watcher_alarms-时间,如下面在elasticSearch-head中对应的截图所示:

    (1)索引watcher类型sentinl-watcher对应的内容:这个内容就是我们在kibana页面上配置的watcher内容的json化)

    {

    "_index":"watcher",

    "_type":"sentinl-watcher",

        "_id":"38eljz92mycwqi00ian55ng66r-0tqtq7q46valazkmg8yn20ggb9-im815vceg9gap6akbvjejnhfr",

        "_version":32,

        "_score":1,

        "_source":{

           "title":"所有",

           "disable":false,

           "report":false,

           "trigger":{

               "schedule":{

                  "later":"every 1 secs"

               }

           },

           "input":{

               "search":{

                  "request":{

                      "index":[

                         "cpcn-*"

                      ],

                      "body":{}

                  }

               }

           },

           "condition":{

               "script":{

                  "script":"payload.hits.total >0"

               }

           },

           "actions":{

               "error报警": {

                  "throttle_period":"0h0m1s",

                  "email":{

                      "to":"yjk@XXX.com",

                      "from":"devops@XXX.com",

                      "subject":"报警",

                      "body":"event:{{payload.hits.total}} errorMsgConent:{{payload.hits.hits}}"

                  }

               }

           }

        }

    }

    (2)索引watcher类型sentinl-script对应的内容:就是在kibana中配置的Input中的body内容

    {

    "_index":"watcher",

    "_type":"sentinl-script",

        "_id":"8mmkkevvdr0g0sz3mmojn61or-s6hfac0a3y36ier49ju1sjor-keeuakonk0v51cntohp1ll3di",

        "_version":1,

        "_score":1,

        "_source":{

           "description":"input",

           "title":"error",

           "body":"{

           "search":{

               "request":{

                  "index":["cpcn-*"],

                  "body":{

                      "query":{

                         "bool":{

                             "must":[{

                                    "wildcard":{

                                       "message":"*exception*"

                                    }

                                },

                                {

                                    "wildcard":{

                                       "message":"*error*"

                                    }

                                },

                                {

                                    "range":{

                                       "@timestamp":{

                                           "gte":"now-1h",

                                           "lte":"now",

                                           "format":"epoch_millis"

                                       }

                                    }

                                }

                             ],

                             "must_not":[]

                         }

                      }

                  }

               }

           }

        }

        "

    }

    }

    (3)报警信息内容对应的是索引watcher_alarms-时间类型为配置Action名称的内容:

    {

    "_index":"watcher_alarms-2018.04.09",

    "_type":"email_admin",------配置的action名称

        "_id":"AWKphaIhjWbDkGCDylsw",

        "_version":1,

        "_score":1,

        "_source":{

           "@timestamp":"2018-04-09T08:28:50.080Z",

           "watcher":"watcher_title",

           "level":"high",

           "message":"Found 177 Events",

           "action":"email_admin",

    "payload":{

               "took":4,

               "timed_out":false,

               "_shards":{

                  "total":25,

                  "successful":25,

                  "skipped":0,

                  "failed":0

               },

    "hits":{

                  "total":177,

                  "max_score":1,

    "hits":[{

                      "_index":"cpcn-2018.01.17",

                      "_type":"logs",

                      "_id":"AWECHmRX1zYDH4X06oyZ",

                      "_score":1,

                      "_source":{

                         "@version":"1",

                         "host":"DESKTOP-7DN8E16",

                         "@timestamp":"2018-01-17T03:16:41.677Z",

                         "message":"qwe"

                      }

                  }]

               }

           },

           "report":false

        }

    }

    注意上面的报警内容,正好是我们在Condition中配置的需要的信息,如condition配置所示:

    {

     "script": {

    "script": "payload.hits.total > 1" -----正好对应的上面报警内容对应的记录中的内容

      }

    }

    5.1.3 Actions配置:

    Actions配置就配置在满足报警条件的时候,如何进行处理,如下面报警条件满足的时候进行邮件通知处理配置如下:

    注意:body中的内容正是取索引watcher_alarms-时间类型为“方法耗时超过阈值报警”名称的内容

    相关文章

      网友评论

        本文标题:elk日志大盘显示和日志监控报警配置实践

        本文链接:https://www.haomeiwen.com/subject/piczhftx.html