美文网首页云原生
十七 Prometheus实践

十七 Prometheus实践

作者: 負笈在线 | 来源:发表于2022-06-10 13:10 被阅读0次

    AlertmanagerConfig:
    https://github.com/prometheus-operator/prometheus-operator/blob/master/example/user-guides/alerting/alertmanager-config-example.yaml
    https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#alertmanagerconfig

    (一) PrometheusRule

    可以通过如下命令查看默认配置的告警策略:

    # kubectl get prometheusrule -n monitoring
    NAME AGE
    alertmanager-main-rules 19d
    kube-prometheus-rules 19d
    kube-state-metrics-rules 19d
    kubernetes-monitoring-rules 19d
    node-exporter-rules 19d
    prometheus-k8s-prometheus-rules 19d
    prometheus-operator-rules 19d
    

    也可以通过-oyaml查看某个rules的详细配置:

    # kubectl get prometheusrule -n monitoring node-exporter-rules    -oyaml
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    ...
    spec:
      groups:
      - name: node-exporter
        rules:
        - alert: NodeFilesystemSpaceFillingUp
          annotations:
            description: Filesystem on {{ $labels.device }} at {{ $labels.instance }}
              has only {{ printf "%.2f" $value }}% available space left and is filling
              up.
            runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodefilesystemspacefillingup
            summary: Filesystem is predicted to run out of space within the next 24 hours.
          expr: |
            (
              node_filesystem_avail_bytes{job="node-exporter",fstype!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!=""} * 100 < 40
            and
              predict_linear(node_filesystem_avail_bytes{job="node-exporter",fstype!=""}[6h], 24*60*60) < 0
            and
              node_filesystem_readonly{job="node-exporter",fstype!=""} == 0
            )
          for: 1h
          labels:
            severity: warning
    

    Ø alert:告警策略的名称
    Ø annotations:告警注释信息,一般写为告警信息
    Ø expr:告警表达式
    Ø for:评估等待时间,告警持续多久才会发送告警数据
    Ø labels:告警的标签,用于告警的路由

    (二) 域名访问延迟告警

    假设需要对域名访问延迟进行监控,访问延迟大于1秒进行告警,此时可以创建一个PrometheusRule如下:

    # **cat blackbox.yaml**
    
    apiVersion: monitoring.coreos.com/v1
    
    kind: PrometheusRule
    
    **metadata:**
    
     **labels:**
    
     app.kubernetes.io/component: exporter
    
     app.kubernetes.io/name: blackbox-exporter
    
     **prometheus: k8s**
    
     **role: alert-rules**
    
     name: blackbox
    
     namespace: monitoring
    
    spec:
    
     **groups:**
    
     **- name: blackbox-exporter**
    
     **rules:**
    
     - alert: DomainAccessDelayExceeds1s
    
     annotations:
    
     **description**: 域名:{{ $labels.instance }} 探测延迟大于1秒,当前延迟为:{{ $value }}
    
     **summary**: 域名探测,访问延迟超过1秒
    
     expr: sum(probe_http_duration_seconds{job=~"blackbox"}) by (instance) > 1
    
     for: 1m
    
     **labels**:
    
     severity: warning
    
     type: blackbox
    

    创建并查看该PrometheusRule:

    # kubectl create -f blackbox.yaml
    prometheusrule.monitoring.coreos.com/blackbox created
    # kubectl get -f blackbox.yaml
    NAME AGE
    blackbox 65s
    

    之后也可以在Prometheus的Web UI看到此规则:



    如果探测延迟有超过1s的域名,就会触发告警,如图所示:



    由于告警路由并未匹配黑盒监控的标签,所以会发送给默认的收件人,也就是邮箱:

    接下来可以根据实际业务情况将告警发送给指定的人,此时可以更改路由,将域名探测发送至微信,配置如下(部分代码):

     - match:
     type: blackbox
     receiver: "wechat-ops"
     repeat_interval: 10m
    

    之后在微信端即可收到告警:


    相关文章

      网友评论

        本文标题:十七 Prometheus实践

        本文链接:https://www.haomeiwen.com/subject/sktuurtx.html