美文网首页
IoT 边缘集群基于 Kubernetes Events 的告警

IoT 边缘集群基于 Kubernetes Events 的告警

作者: 东风微鸣 | 来源:发表于2023-02-15 10:01 被阅读0次

    背景

    边缘集群(基于 树莓派 + K3S) 需要实现基本的告警功能。

    边缘集群限制

    1. CPU/内存/存储 资源紧张,无法支撑至少需要 2GB 以上内存和大量存储的基于 Prometheus 的完整监控体系方案(即使是基于 Prometheus Agent, 也无法支撑) (需要避免额外的存储和计算资源消耗)
    2. 网络条件,无法支撑监控体系,因为监控体系一般都需要每 1min 定时(或每时每刻)传输数据,且数据量不小;
      1. 存在 5G 收费网络的情况,且访问的目的端地址需要开通权限,且按照流量收费,且因为 5G 网络条件,网络传输能力受限,且不稳定(可能会在一段时间内离线);

    关键需求

    总结下来,关键需求如下:

    1. 实现对边缘集群异常的及时告警,需要知道边缘集群正在发生的异常情况;
    2. 网络:网络条件情况较差,网络流量少,只只能开通极少数目的端地址,可以容忍网络不稳定(一段时间内离线)的情况;
    3. 资源:需要尽量避免额外的存储和计算资源消耗

    方案

    综上所诉,采用如下方案实现:

    基于 Kubernetes Events 的告警通知

    架构图

    kubernetes-events-arch

    技术方案规划

    1. 从 Kubernetes 的各项资源收集 Events, 如:
      1. pod
      2. node
      3. kubelet
      4. crd
      5. ...
    2. 通过 kubernetes-event-exporter 组件来实现对 Kubernetes Events 的收集;
    3. 只筛选 Warning 级别 Events 供告警通知(后续,条件可以进一步定义)
    4. 告警通过 飞书 webhook 等通信工具进行发送(后续,发送渠道可以增加)

    实施步骤

    手动方式:

    在边缘集群上,执行如下操作:

    1. 创建 roles

    如下:

    cat << _EOF_ | kubectl apply -f -
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: event-exporter-extra
    rules:
      - apiGroups:
          - ""
        resources:
          - nodes
        verbs:
          - get
          - list
          - watch
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      namespace: monitoring
      name: event-exporter
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: event-exporter
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: view
    subjects:
      - kind: ServiceAccount
        namespace: monitoring
        name: event-exporter
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: event-exporter-extra
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: event-exporter-extra
    subjects:
      - kind: ServiceAccount
        namespace: kube-event-export
        name: event-exporter
    _EOF_
    

    2. 创建 kubernetes-event-exporter config

    如下:

    cat << _EOF_ | kubectl apply -f -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: event-exporter-cfg
      namespace: monitoring
    data:
      config.yaml: |
        logLevel: error
        logFormat: json
        route:
          routes:
            - match:
                - receiver: "dump"      
            - drop:
                - type: "Normal"
              match:
                - receiver: "feishu"                     
        receivers:
          - name: "dump"
            stdout: {}
          - name: "feishu"
            webhook:
              endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..."
              headers:
                Content-Type: application/json
              layout:
                msg_type: interactive
                card:
                  config:
                    wide_screen_mode: true
                    enable_forward: true
                  header:
                    title:
                      tag: plain_text
                      content: XXX IoT K3S 集群告警
                    template: red
                  elements:
                    - tag: div
                      text: 
                        tag: lark_md
                        content: "**EventType:**  {{ .Type }}\n**EventKind:**  {{ .InvolvedObject.Kind }}\n**EventReason:**  {{ .Reason }}\n**EventTime:**  {{ .LastTimestamp }}\n**EventMessage:**  {{ .Message }}"
          
    _EOF_
    

    🐾 注意:

    • endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..." 按需修改为对应的 webhook endpoint, ❌切记勿对外公布!!!
    • content: XXX IoT K3S 集群告警: 按需调整为方便快速识别的名称,如:"家里测试 K3S 集群告警"

    3. 创建 Deployment

    cat << _EOF_ | kubectl apply -f -
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: event-exporter
      namespace: monitoring
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: event-exporter
          version: v1
      template:
        metadata:
          labels:
            app: event-exporter
            version: v1
        spec:
          volumes:
            - name: cfg
              configMap:
                name: event-exporter-cfg
                defaultMode: 420
            - name: localtime
              hostPath:
                path: /etc/localtime
                type: ''
            - name: zoneinfo
              hostPath:
                path: /usr/share/zoneinfo
                type: ''
          containers:
            - name: event-exporter
              image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11
              args:
                - '-conf=/data/config.yaml'
              env:
                - name: TZ
                  value: Asia/Shanghai
              volumeMounts:
                - name: cfg
                  mountPath: /data
                - name: localtime
                  readOnly: true
                  mountPath: /etc/localtime
                - name: zoneinfo
                  readOnly: true
                  mountPath: /usr/share/zoneinfo
              imagePullPolicy: IfNotPresent
          serviceAccount: event-exporter
          affinity:
            nodeAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  preference:
                    matchExpressions:
                      - key: node-role.kubernetes.io/controlplane
                        operator: In
                        values:
                          - 'true'
                - weight: 100
                  preference:
                    matchExpressions:
                      - key: node-role.kubernetes.io/control-plane
                        operator: In
                        values:
                          - 'true'
                - weight: 100
                  preference:
                    matchExpressions:
                      - key: node-role.kubernetes.io/master
                        operator: In
                        values:
                          - 'true'    
          tolerations:
            - key: node-role.kubernetes.io/controlplane
              value: 'true'
              effect: NoSchedule
            - key: node-role.kubernetes.io/control-plane
              operator: Exists
              effect: NoSchedule
            - key: node-role.kubernetes.io/master
              operator: Exists
              effect: NoSchedule      
    _EOF_
    

    📝 说明:

    1. event-exporter-cfg 相关配置,是用于加载以 ConfigMap 形式保存的配置文件;
    2. localtime zoneinfo TZ 相关配置,是用于修改该 pod 的时区为Asia/Shanghai, 以使得最终显示的通知效果为 CST 时区;
    3. affinity tolerations 相关配置,是为了确保:无论如何,优先调度到 master node 上去,按需调整,此处是因为 master 往往在边缘集群中作为网关存在,配置较高,且在线时间较长;

    自动化部署

    效果:安装 K3S 时就自动部署

    在 K3S server 所在节点,/var/lib/rancher/k3s/server/manifests/ 目录(如果没有该目录就先创建)下,创建 event-exporter.yaml

    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: event-exporter-extra
    rules:
      - apiGroups:
          - ""
        resources:
          - nodes
        verbs:
          - get
          - list
          - watch
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      namespace: monitoring
      name: event-exporter
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: event-exporter
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: view
    subjects:
      - kind: ServiceAccount
        namespace: monitoring
        name: event-exporter
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: event-exporter-extra
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: event-exporter-extra
    subjects:
      - kind: ServiceAccount
        namespace: kube-event-export
        name: event-exporter
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: event-exporter-cfg
      namespace: monitoring
    data:
      config.yaml: |
        logLevel: error
        logFormat: json
        route:
          routes:
            - match:
                - receiver: "dump"      
            - drop:
                - type: "Normal"
              match:
                - receiver: "feishu"                     
        receivers:
          - name: "dump"
            stdout: {}
          - name: "feishu"
            webhook:
              endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/dc4fd384-996b-4d20-87cf-45b3518869ec"
              headers:
                Content-Type: application/json
              layout:
                msg_type: interactive
                card:
                  config:
                    wide_screen_mode: true
                    enable_forward: true
                  header:
                    title:
                      tag: plain_text
                      content: xxxK3S集群告警
                    template: red
                  elements:
                    - tag: div
                      text: 
                        tag: lark_md
                        content: "**EventType:**  {{ .Type }}\n**EventKind:**  {{ .InvolvedObject.Kind }}\n**EventReason:**  {{ .Reason }}\n**EventTime:**  {{ .LastTimestamp }}\n**EventMessage:**  {{ .Message }}"
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: event-exporter
      namespace: monitoring
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: event-exporter
          version: v1
      template:
        metadata:
          labels:
            app: event-exporter
            version: v1
        spec:
          volumes:
            - name: cfg
              configMap:
                name: event-exporter-cfg
                defaultMode: 420
            - name: localtime
              hostPath:
                path: /etc/localtime
                type: ''
            - name: zoneinfo
              hostPath:
                path: /usr/share/zoneinfo
                type: ''
          containers:
            - name: event-exporter
              image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11
              args:
                - '-conf=/data/config.yaml'
              env:
                - name: TZ
                  value: Asia/Shanghai
              volumeMounts:
                - name: cfg
                  mountPath: /data
                - name: localtime
                  readOnly: true
                  mountPath: /etc/localtime
                - name: zoneinfo
                  readOnly: true
                  mountPath: /usr/share/zoneinfo
              imagePullPolicy: IfNotPresent
          serviceAccount: event-exporter
          affinity:
            nodeAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  preference:
                    matchExpressions:
                      - key: node-role.kubernetes.io/controlplane
                        operator: In
                        values:
                          - 'true'
                - weight: 100
                  preference:
                    matchExpressions:
                      - key: node-role.kubernetes.io/control-plane
                        operator: In
                        values:
                          - 'true'
                - weight: 100
                  preference:
                    matchExpressions:
                      - key: node-role.kubernetes.io/master
                        operator: In
                        values:
                          - 'true'    
          tolerations:
            - key: node-role.kubernetes.io/controlplane
              value: 'true'
              effect: NoSchedule
            - key: node-role.kubernetes.io/control-plane
              operator: Exists
              effect: NoSchedule
            - key: node-role.kubernetes.io/master
              operator: Exists
              effect: NoSchedule  
    

    之后启动 K3S 就会自动部署。

    📚️Reference:
    自动部署 manifests 和 Helm charts | Rancher 文档

    最终效果

    如下图:

    image-20220413122040530

    📚️参考文档

    三人行, 必有我师; 知识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.

    相关文章

      网友评论

          本文标题:IoT 边缘集群基于 Kubernetes Events 的告警

          本文链接:https://www.haomeiwen.com/subject/nfwykdtx.html