美文网首页
prometheus配置告警规则和钉钉告警

prometheus配置告警规则和钉钉告警

作者: 祁恩达 | 来源:发表于2019-08-15 09:20 被阅读0次

    一、配置告警规则

    1、配置rule告警规则存放路径

    $ vim prometheus-configmap.yaml
    增加如下配置:
        rule_files:
        - /etc/config/rules/*.rules
    

    如下图:


    image.png

    2、再次更新prometheus-configmap.yaml ,使其生效。

    $ kubectl apply -f prometheus-configmap.yaml 
    configmap/prometheus-config configured
    

    3、编写告警rules
    这里我们直接编辑几个常规告警rules用于测试(prometheus-rules.yaml)

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-rules
      namespace: kube-system
    data:
      general.rules: |
        groups:
        - name: general.rules
          rules:
          - alert: InstanceDown
            expr: up == 0
            for: 2m
            labels:
              severity: error
            annotations:
              summary: "Instance {{ $labels.instance }} 停止工作"
              description: "{{ $labels.instance }}: job {{ $labels.job }} 已经停止5分钟以上."
      node.rules: |
        groups:
        - name: node.rules
          rules:
          - alert: NodeFilesystemUsage
            expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 1
            for: 2m
            labels:
              severity: warning
            annotations:
              summary: "{{$labels.instance}}: {{$labels.mountpoint }} 分区使用过高"
              description: "{{$labels.instance}}: {{$labels.mountpoint }} 分区使用大于 1% (当前值: {{ $value }})"
          - alert: NodeMemoryUsage
            expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 80
            for: 2m
            labels:
              severity: warning
            annotations:
              summary: "{{$labels.instance}}: 内存使用过高"
              description: "{{$labels.instance}}: 内存使用大于 80% (当前值: {{ $value }})"
          - alert: NodeCPUUsage
            expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 80
            for: 2m
            labels:
              severity: warning
            annotations:
              summary: "{{$labels.instance}}: CPU使用过高"
              description: "{{$labels.instance}}: CPU使用大于 80% (当前值: {{ $value }})"
    

    4、应用 prometheus-rules.yaml

    $ kubectl apply -f prometheus-rules.yaml 
    configmap/prometheus-rules created
    

    5、将configmap挂载到容器rules目录,修改prometheus-statefulset.yaml,增加下图中红框内容。

    $ vim prometheus-statefulset.yaml
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
            - name: prometheus-data
              mountPath: /data
              subPath: ""
            - name: prometheus-rules
              mountPath: /etc/config/rules
    
          terminationGracePeriodSeconds: 300
          volumes:
            - name: config-volume
              configMap:
                name: prometheus-config
            - name: prometheus-rules
              configMap:
                name: prometheus-rules
    
    image.png
    注意:这里的configMap名字对应刚刚prometheus-rules创建的configmap名字

    6、重新应用prometheus-statefulset.yaml

    $ kubectl apply -f prometheus-statefulset.yaml
    NAME                                         READY   STATUS    RESTARTS   AGE
    alertmanager-6b5bbd5bd4-g9mpd                2/2     Running   0          66m
    coredns-55f46dd959-9kspv                     1/1     Running   3          35d
    coredns-55f46dd959-l5vww                     1/1     Running   0          35d
    grafana-0                                    1/1     Running   0          2d
    kube-state-metrics-6cf969f79b-29f2r          1/1     Running   0          5d23h
    kubernetes-dashboard-ccd98cd4c-jzlbs         1/1     Running   0          34d
    node-exporter-7x9zl                          1/1     Running   0          18h
    node-exporter-ksslf                          1/1     Running   0          18h
    prometheus-0                                 2/2     Running   0          30m
    

    7、查看prometheus rules规则已显示生效


    image.png

    二、配置钉钉告警

    1、注册钉钉账号->机器人管理->自定义(通过webhook接入自定义服务)->添加->复制webhook

    image.png
    上述配置好群机器人,获得这个机器人对应的Webhook地址,记录下来,后续配置钉钉告警插件要用,格式如下
    https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxx
    2、创建钉钉告警插件(dingtalk-webhook.yaml),并修改文件中 access_token=xxxxxx 为上一步你获得的机器人认证 token
    $ vim dingtalk-webhook.yaml
    ---
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      labels:
        run: dingtalk
      name: webhook-dingtalk
      namespace: monitoring
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            run: dingtalk
        spec:
          containers:
          - name: dingtalk
            image: timonwong/prometheus-webhook-dingtalk:v0.3.0
            imagePullPolicy: IfNotPresent
            # 设置钉钉群聊自定义机器人后,使用实际 access_token 替换下面 xxxxxx部分
            args:
              - --ding.profile=webhook1=https://oapi.dingtalk.com/robot/send?access_token=94c9f3664df1a928cb59550ac88caf504ca1808a22e7018fdcf92c50d9960fab
            ports:
            - containerPort: 8060
              protocol: TCP
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        run: dingtalk
      name: webhook-dingtalk
      namespace: monitoring
    spec:
      ports:
      - port: 8060
        protocol: TCP
        targetPort: 8060
      selector:
        run: dingtalk
      sessionAffinity: None
    

    3、应用dingtalk-webhook.yaml

    $ kubectl apply -f dingtalk-webhook.yaml
    

    4、修改 alertsmanager 告警配置后,更新alertmanager-configmap.yaml 部署,成功后测试告警发送

    $ vim alertmanager-configmap.yaml 
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: alertmanager-config
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      alertmanager.yml: |
        global: null
        receivers:
        - name: default-receiver
        route:
          group_interval: 5m
          group_wait: 10s
          receiver: dingtalk
          repeat_interval: 10m
    
        receivers:
        - name: dingtalk
          webhook_configs:
          - send_resolved: true
            url: http://webhook-dingtalk.monitoring.svc.cluster.local:8060/dingtalk/webhook1/send
    
    image.png
    注:url处可以直接使用的svc地址,格式为:servicename.namespace.svc.cluster.local

    5、测试钉钉接收告警

    ①、修改prometheus-rules.yaml中的规则
    ②、查看prometheus Alerts中的状态(pending或FIRING)
    其中pending状态为:已触发告警,未发送。
    其中FIRING状态为:已发送告警。(具体信息请查看webhook-dingtalk 的pod日志)

    image.png
    image.png

    相关文章

      网友评论

          本文标题:prometheus配置告警规则和钉钉告警

          本文链接:https://www.haomeiwen.com/subject/zgkajctx.html