美文网首页KubernetesKubernetes
kubernetes监控之prometheus+grafana

kubernetes监控之prometheus+grafana

作者: jinnzy | 来源:发表于2018-03-19 09:55 被阅读453次

    什么是 Prometheus

    Prometheus 是由 SoundCloud 开源监控告警解决方案,从 2012 年开始编写代码,再到 2015 年 github 上开源以来,已经吸引了 9k+ 关注,以及很多大公司的使用;2016 年 Prometheus 成为继 k8s 后,第二名 CNCF成员。作为新一代开源解决方案,很多理念与 Google SRE 运维之道不谋而合。

    基础架构:


    image

    安装prometheus:
    1.创建prometheus和altertmanager(prometheus的报警模块)的configmap文件
    下面用到的yaml配置文件已放到github上有需要的可以自己clone
    cat prometheus-cm.yaml

    apiVersion: v1
    kind: Namespace
    metadata:
      name: kube-ops
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: kube-ops
    data:
      prometheus.yml: |
        global:
          scrape_interval: 30s
          scrape_timeout: 30s
        rule_files:
        - /etc/prometheus/rules.yml
        alerting:
          alertmanagers:
            - static_configs:
              - targets: ["localhost:9093"]
        scrape_configs:
        - job_name: 'prometheus'
          static_configs:
            - targets: ['localhost:9090']
        - job_name: 'kubernetes-apiservers'
          kubernetes_sd_configs:
          - role: endpoints
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
        - job_name: 'kubernetes-nodes'
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics
        - job_name: 'kubernetes-cadvisor'
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
        - job_name: 'kubernetes-node-exporter'
          scheme: http
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - source_labels: [__meta_kubernetes_role]
            action: replace
            target_label: kubernetes_role
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:31672'
            target_label: __address__
        - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
        - job_name: 'kubernetes-services'
          metrics_path: /probe
          params:
            module: [http_2xx]
          kubernetes_sd_configs:
          - role: service
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
            action: keep
            regex: true
          - source_labels: [__address__]
            target_label: __param_target
          - target_label: __address__
            replacement: blackbox-exporter.example.com:9115
          - source_labels: [__param_target]
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: kubernetes_name
      rules.yml: |
        groups:
        - name: test-rule
          rules:
          - alert: NodeFilesystemUsage
            expr: (node_filesystem_size{device="rootfs"} - node_filesystem_free{device="rootfs"}) / node_filesystem_size{device="rootfs"} * 100 > 80
            for: 2m
            labels:
              team: node
            annotations:
              summary: "{{$labels.instance}}: High Filesystem usage detected"
              description: "{{$labels.instance}}: Filesystem usage is above 80% (current value is: {{ $value }}"
          - alert: NodeMemoryUsage
            expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 80
            for: 2m
            labels:
              team: node
            annotations:
              summary: "{{$labels.instance}}: High Memory usage detected"
              description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
          - alert: NodeCPUUsage
            expr: (100 - (avg by (instance) (irate(node_cpu{job="kubernetes-node-exporter",mode="idle"}[5m])) * 100)) > 80
            for: 2m
            labels:
              team: node
            annotations:
              summary: "{{$labels.instance}}: High CPU usage detected"
              description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }}"
    ---
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: alertmanager
      namespace: kube-ops
    data:
      config.yml: |-
        global:
          resolve_timeout: 5m
        route:
          receiver: webhook
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 4h
          group_by: [alertname]
          routes:
          - receiver: webhook
            group_wait: 10s
            match:
              team: node
        receivers:
        - name: webhook
          webhook_configs:
          - url: 'http://apollo/hooks/dingtalk/'
            send_resolved: true
          - url: 'http://apollo/hooks/prome/'
            send_resolved: true
    
    

    kubectl apply -f prometheus-cm.yaml

    2.部署node-exporter(在每个节点都安装,需要用daemonset模式)
    vim node-exporter.yaml

    ---
    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: node-exporter
      namespace: kube-ops
      labels:
        k8s-app: node-exporter
    spec:
      template:
        metadata:
          labels:
            k8s-app: node-exporter
        spec:
          containers:
          - image: prom/node-exporter
            name: node-exporter
            ports:
            - containerPort: 9100
              protocol: TCP
              name: http
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        k8s-app: node-exporter
      name: node-exporter
      namespace: kube-ops
    spec:
      ports:
      - name: http
        port: 9100
        nodePort: 31672
        protocol: TCP
      type: NodePort
      selector:
        k8s-app: node-exporter
    

    kubectl apply -f node-exporter.yaml

    3.安装prometheus+alertmanager(注意这里的data数据盘是使用cephfs pv pvc挂载的,需要自行创建,pv pvc的模板前面的github链接中可以找到)
    vim prometheus-deployment.yaml

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      labels:
        k8s-app: prometheus
      name: prometheus
      namespace: kube-ops
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            k8s-app: prometheus
        spec:
          serviceAccountName: prometheus
          securityContext:
            runAsUser: 0
          containers:
          - image: prom/prometheus:v2.0.0
            name: prometheus
            command:
            - "/bin/prometheus"
            args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--storage.tsdb.retention=24h"
            ports:
            - containerPort: 9090
              protocol: TCP
              name: http
            volumeMounts:
            - mountPath: "/prometheus"
              name: data
            - mountPath: "/etc/prometheus"
              name: config-volume
            resources:
              requests:
                cpu: 100m
                memory: 100Mi
              limits:
                cpu: 200m
                memory: 1Gi
          - image: quay.io/prometheus/alertmanager:v0.12.0
            name: alertmanager
            args:
            - "-config.file=/etc/alertmanager/config.yml"
            - "-storage.path=/alertmanager"
            ports:
            - containerPort: 9093
              protocol: TCP
              name: http
            volumeMounts:
            - name: alertmanager-config-volume
              mountPath: /etc/alertmanager
            resources:
              requests:
                cpu: 50m
                memory: 50Mi
              limits:
                cpu: 200m
                memory: 200Mi
          volumes:
          - name: data
            persistentVolumeClaim:
              claimName: prometheus-pvc
          - configMap:
              name: prometheus-config
            name: config-volume
          - name: alertmanager-config-volume
            configMap:
              name: alertmanager
    

    kubectl apply -f prometheus-deployment.yaml

    4.暴露prometheus端口(也可以使用ingress规则暴露,可自行选择,下面的grafana选择用ingress暴露)
    vim prometheus-svc.yaml

    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus
      namespace: kube-ops
      labels:
        k8s-app: prometheus
    spec:
      selector:
        k8s-app: prometheus
      type: NodePort
      ports:
        - name: web
          port: 9090
          targetPort: http
    
    

    kubectl apply -f prometheus-svc.yaml

    5.安装grafana(prometheus自带的图形展示不行很理想,所以要搭配强大的图形展示工具grafana)
    安装grafnan的时候用到了cephfs pv pvc和ingress。
    vim grafana.yaml

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        kubernetes.io/cluster-service: 'true'
        kubernetes.io/name: grafana
      name: grafana
      namespace: kube-ops
    spec:
      ports:
      - port: 3000
        targetPort: 3000
      selector:
        k8s-app: grafana
    ---
    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
      labels:
        kubernetes.io/cluster-service: 'true'
        kubernetes.io/name: grafana
      name: grafana-ingress
      namespace: kube-ops
    spec:
      rules:
      - host: grafana.io
        http:
          paths:
          - path: /
            backend:
              serviceName: grafana
              servicePort: 3000
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-pv
      namespace: kube-ops
    spec:
      capacity:
        storage: 50Gi
      accessModes:
        - ReadWriteMany
      cephfs:
        monitors:
          - 192.168.0.231:6789
          - 192.168.0.242:6789
          - 192.168.0.211:6789
        path: /data/system/grafana
        user: admin
        secretRef:
          name: ceph-secret
        readOnly: false
      persistentVolumeReclaimPolicy: Recycle
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: prometheus-pvc
      namespace: kube-ops
    spec:
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 50Gi
    ---
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: grafana
      namespace: kube-ops
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            task: monitoring
            k8s-app: grafana
        spec:
          containers:
          - name: grafana
            image: gcr.io/google_containers/heapster-grafana-amd64:v4.4.3
            ports:
            - containerPort: 3000
              protocol: TCP
            volumeMounts:
            - mountPath: /var
              name: grafana
              subPath: grafana/data
            - mountPath: /ssl
              name: ssl
            resources:
              limits:
                cpu: 200m
                memory: 200Mi
              requests:
                cpu: 100m
                memory: 100Mi
            env:
            - name: INFLUXDB_HOST
              value: influxdb.kube-system
            - name: GF_SERVER_HTTP_PORT
              value: "3000"
            - name: GF_AUTH_BASIC_ENABLED
              value: "true"
            - name: GF_AUTH_ANONYMOUS_ENABLED
              value: "false"
            - name: GF_SERVER_ROOT_URL
              value: /
            - name: GF_SMTP_ENABLED
              value: "true"
            - name: GF_ALERTING_ENABLED
              value: "true"
            - name: GF_ALERTING_EXECUTE_ALERTS
              value: "true"
            readinessProbe:
              httpGet:
                path: /login
                port: 3000
              initialDelaySeconds: 30
              timeoutSeconds: 2
          volumes:
          - name: ssl
            hostPath:
              path: /etc/ssl/certs
          - name: grafana
            persistentVolumeClaim:
              claimName: prometheus-pvc
    

    kubectl apply -f grafana.yaml

    这里的ingress是选择 grafana.io作为域名,需要将本地的hosts解析成nodeip(没做lb的话)。
    进入到以下界面,填写数据信息

    image.png
    随后设置dashboard页面,推荐下面的两款。
    https://grafana.com/dashboards/1621
    https://grafana.com/dashboards/315
    这里选择使用315
    效果展示如下图所示
    image.png

    相关文章

      网友评论

      • ALTHE:name: INFLUXDB_HOST
        value: influxdb.kube-system
        请问博主,既然使用prometheus作为数据源那么已上两个参数是否可有可无?
        jinnzy:这个就是传递给pod的变量

      本文标题:kubernetes监控之prometheus+grafana

      本文链接:https://www.haomeiwen.com/subject/wtyrqftx.html