美文网首页k8s那点事儿大数据运维及安全
详细教程丨使用Prometheus和Thanos进行高可用K8S

详细教程丨使用Prometheus和Thanos进行高可用K8S

作者: RancherLabs | 来源:发表于2020-09-10 11:01 被阅读0次

    本文转自Rancher Labs

    介 绍

    Prometheus高可用的必要性

    在过去的几年里,Kubernetes的采用量增长了数倍。很明显,Kubernetes是容器编排的不二选择。与此同时,Prometheus也被认为是监控容器化和非容器化工作负载的绝佳选择。监控是任何基础设施的一个重要关注点,我们应该确保我们的监控设置具有高可用性和高可扩展性,以满足不断增长的基础设施的需求,特别是在采用Kubernetes的情况下。

    因此,今天我们将部署一个集群化的Prometheus设置,它不仅能够弹性应对节点故障,还能保证合适的数据存档,供以后参考。我们的设置还具有很强的可扩展性,以至于我们可以在同一个监控保护伞下跨越多个Kubernetes集群。

    当前方案

    大部分的Prometheus部署都是使用持久卷的pod,而Prometheus则是使用联邦机制进行扩展。但是并不是所有的数据都可以使用联邦机制进行聚合,在这里,当你增加额外的服务器时,你往往需要一个机制来管理Prometheus配置。

    解决方法

    Thanos旨在解决上述问题。在Thanos的帮助下,我们不仅可以对Prometheus的实例进行多重复制,并在它们之间进行数据去重,还可以将数据归档到GCS或S3等长期存储中。

    实施过程

    Thanos 架构

    image

    图片来源: https://thanos.io/quick-tutorial.md/

    Thanos由以下组件构成:

    • Thanos sidecar:这是运行在Prometheus上的主要组件。它读取和归档对象存储上的数据。此外,它还管理着Prometheus的配置和生命周期。为了区分每个Prometheus实例,sidecar组件将外部标签注入到Prometheus配置中。该组件能够在 Prometheus 服务器的 PromQL 接口上运行查询。Sidecar组件还能监听Thanos gRPC协议,并在gRPC和REST之间翻译查询。
    • Thanos 存储:该组件在对象storage bucket中的历史数据之上实现了Store API,它主要作为API网关,因此不需要大量的本地磁盘空间。它在启动时加入一个Thanos集群,并公布它可以访问的数据。它在本地磁盘上保存了少量关于所有远程区块的信息,并使其与 bucket 保持同步。通常情况下,在重新启动时可以安全地删除此数据,但会增加启动时间。
    • Thanos查询:查询组件在HTTP上监听并将查询翻译成Thanos gRPC格式。它从不同的源头汇总查询结果,并能从Sidecar和Store读取数据。在HA设置中,它甚至会对查询结果进行重复数据删除。

    HA组的运行时重复数据删除

    Prometheus是有状态的,不允许复制其数据库。这意味着通过运行多个Prometheus副本来提高高可用性并不易于使用。简单的负载均衡是行不通的,比如在发生某些崩溃之后,一个副本可能会启动,但是查询这样的副本会导致它在关闭期间出现一个小的缺口(gap)。你有第二个副本可能正在启动,但它可能在另一个时刻(如滚动重启)关闭,因此在这些副本上面的负载均衡将无法正常工作。

    • Thanos Querier则从两个副本中提取数据,并对这些信号进行重复数据删除,从而为Querier使用者填补了缺口(gap)。

    • Thanos Compact组件将Prometheus 2.0存储引擎的压实程序应用于对象存储中的块数据存储。它通常不是语义上的并发安全,必须针对bucket 进行单例部署。它还负责数据的下采样——40小时后执行5m下采样,10天后执行1h下采样。

    • Thanos Ruler基本上和Prometheus的规则具有相同作用,唯一区别是它可以与Thanos组件进行通信。

    配 置

    前期准备

    要完全理解这个教程,需要准备以下东西:

    1. 对Kubernetes和使用kubectl有一定的了解。

    2. 运行中的Kubernetes集群至少有3个节点(在本demo中,使用GKE集群)

    3. 实现Ingress Controller和Ingress对象(在本demo中使用Nginx Ingress Controller)。虽然这不是强制性的,但为了减少创建外部端点的数量,强烈建议使用。

    4. 创建用于Thanos组件访问对象存储的凭证(在本例中为GCS bucket)。

    5. 创建2个GCS bucket,并将其命名为Prometheus-long-term和thanos-ruler。

    6. 创建一个服务账户,角色为Storage Object Admin。

    7. 下载密钥文件作为json证书,并命名为thanos-gcs-credentials.json。

    8. 使用凭证创建Kubernetes sercret

    kubectl create secret generic thanos-gcs-credentials --from-file=thanos-gcs-credentials.json

    部署各类组件

    部署Prometheus服务账户、ClusterrolerClusterrolebinding

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: monitoring
      namespace: monitoring
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRole
    metadata:
      name: monitoring
      namespace: monitoring
    rules:
    - apiGroups: [""]
      resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
      verbs: ["get", "list", "watch"]
    - apiGroups: [""]
      resources:
      - configmaps
      verbs: ["get"]
    - nonResourceURLs: ["/metrics"]
      verbs: ["get"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRoleBinding
    metadata:
      name: monitoring
    subjects:
      - kind: ServiceAccount
        name: monitoring
        namespace: monitoring
    roleRef:
      kind: ClusterRole
      Name: monitoring
      apiGroup: rbac.authorization.k8s.io
    ---
    

    以上manifest创建了Prometheus所需的监控命名空间以及服务账户、clusterrole以及clusterrolebinding

    部署Prometheues配置configmap

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-server-conf
      labels:
        name: prometheus-server-conf
      namespace: monitoring
    data:
      prometheus.yaml.tmpl: |-
        global:
          scrape_interval: 5s
          evaluation_interval: 5s
          external_labels:
            cluster: prometheus-ha
            # Each Prometheus has to have unique labels.
            replica: $(POD_NAME)
    
        rule_files:
          - /etc/prometheus/rules/*rules.yaml
    
        alerting:
    
          # We want our alerts to be deduplicated
          # from different replicas.
          alert_relabel_configs:
          - regex: replica
            action: labeldrop
    
          alertmanagers:
            - scheme: http
              path_prefix: /
              static_configs:
                - targets: ['alertmanager:9093']
    
        scrape_configs:
        - job_name: kubernetes-nodes-cadvisor
          scrape_interval: 10s
          scrape_timeout: 10s
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs:
            - role: node
          relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            # Only for Kubernetes ^1.7.3.
            # See: https://github.com/prometheus/prometheus/issues/2916
            - target_label: __address__
              replacement: kubernetes.default.svc:443
            - source_labels: [__meta_kubernetes_node_name]
              regex: (.+)
              target_label: __metrics_path__
              replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
          metric_relabel_configs:
            - action: replace
              source_labels: [id]
              regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
              target_label: rkt_container_name
              replacement: '${2}-${1}'
            - action: replace
              source_labels: [id]
              regex: '^/system\.slice/(.+)\.service$'
              target_label: systemd_service_name
              replacement: '${1}'
    
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - source_labels: [__meta_kubernetes_namespace]
              action: replace
              target_label: kubernetes_namespace
            - source_labels: [__meta_kubernetes_pod_name]
              action: replace
              target_label: kubernetes_pod_name
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
              action: replace
              target_label: __scheme__
              regex: (https?)
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
              action: replace
              target_label: __metrics_path__
              regex: (.+)
            - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
              action: replace
              target_label: __address__
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $1:$2
    
    
        - job_name: 'kubernetes-apiservers'
          kubernetes_sd_configs:
            - role: endpoints
          scheme: https 
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
            - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
              action: keep
              regex: default;kubernetes;https
    
        - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
            - role: endpoints
          relabel_configs:
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - source_labels: [__meta_kubernetes_namespace]
              action: replace
              target_label: kubernetes_namespace
            - source_labels: [__meta_kubernetes_service_name]
              action: replace
              target_label: kubernetes_name
            - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
              action: keep
              regex: true
            - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
              action: replace
              target_label: __scheme__
              regex: (https?)
            - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
              action: replace
              target_label: __metrics_path__
              regex: (.+)
            - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
              action: replace
              target_label: __address__
              regex: (.+)(?::\d+);(\d+)
              replacement: $1:$2
    

    上述Configmap创建了Prometheus配置文件模板。这个配置文件模板将被Thanos sidecar组件读取,它将生成实际的配置文件,而这个配置文件又将被运行在同一个pod中的Prometheus容器所消耗。在配置文件中添加external_labels部分是极其重要的,这样Querier就可以根据这个来重复删除数据。

    部署Prometheus Rules configmap

    这将创建我们的告警规则,这些规则将被转发到alertmanager,以便发送。

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-rules
      labels:
        name: prometheus-rules
      namespace: monitoring
    data:
      alert-rules.yaml: |-
        groups:
          - name: Deployment
            rules:
            - alert: Deployment at 0 Replicas
              annotations:
                summary: Deployment {{$labels.deployment}} in {{$labels.namespace}} is currently having no pods running
              expr: |
                sum(kube_deployment_status_replicas{pod_template_hash=""}) by (deployment,namespace)  < 1
              for: 1m
              labels:
                team: devops
    
            - alert: HPA Scaling Limited  
              annotations: 
                summary: HPA named {{$labels.hpa}} in {{$labels.namespace}} namespace has reached scaling limited state
              expr: | 
                (sum(kube_hpa_status_condition{condition="ScalingLimited",status="true"}) by (hpa,namespace)) == 1
              for: 1m
              labels: 
                team: devops
    
            - alert: HPA at MaxCapacity 
              annotations: 
                summary: HPA named {{$labels.hpa}} in {{$labels.namespace}} namespace is running at Max Capacity
              expr: | 
                ((sum(kube_hpa_spec_max_replicas) by (hpa,namespace)) - (sum(kube_hpa_status_current_replicas) by (hpa,namespace))) == 0
              for: 1m
              labels: 
                team: devops
    
          - name: Pods
            rules:
            - alert: Container restarted
              annotations:
                summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} was restarted
              expr: |
                sum(increase(kube_pod_container_status_restarts_total{namespace!="kube-system",pod_template_hash=""}[1m])) by (pod,namespace,container) > 0
              for: 0m
              labels:
                team: dev
    
            - alert: High Memory Usage of Container 
              annotations: 
                summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} is using more than 75% of Memory Limit
              expr: | 
                ((( sum(container_memory_usage_bytes{image!="",container_name!="POD", namespace!="kube-system"}) by (namespace,container_name,pod_name)  / sum(container_spec_memory_limit_bytes{image!="",container_name!="POD",namespace!="kube-system"}) by (namespace,container_name,pod_name) ) * 100 ) < +Inf ) > 75
              for: 5m
              labels: 
                team: dev
    
            - alert: High CPU Usage of Container 
              annotations: 
                summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} is using more than 75% of CPU Limit
              expr: | 
                ((sum(irate(container_cpu_usage_seconds_total{image!="",container_name!="POD", namespace!="kube-system"}[30s])) by (namespace,container_name,pod_name) / sum(container_spec_cpu_quota{image!="",container_name!="POD", namespace!="kube-system"} / container_spec_cpu_period{image!="",container_name!="POD", namespace!="kube-system"}) by (namespace,container_name,pod_name) ) * 100)  > 75
              for: 5m
              labels: 
                team: dev
    
          - name: Nodes
            rules:
            - alert: High Node Memory Usage
              annotations:
                summary: Node {{$labels.kubernetes_io_hostname}} has more than 80% memory used. Plan Capcity
              expr: |
                (sum (container_memory_working_set_bytes{id="/",container_name!="POD"}) by (kubernetes_io_hostname) / sum (machine_memory_bytes{}) by (kubernetes_io_hostname) * 100) > 80
              for: 5m
              labels:
                team: devops
    
            - alert: High Node CPU Usage
              annotations:
                summary: Node {{$labels.kubernetes_io_hostname}} has more than 80% allocatable cpu used. Plan Capacity.
              expr: |
                (sum(rate(container_cpu_usage_seconds_total{id="/", container_name!="POD"}[1m])) by (kubernetes_io_hostname) / sum(machine_cpu_cores) by (kubernetes_io_hostname)  * 100) > 80
              for: 5m
              labels:
                team: devops
    
            - alert: High Node Disk Usage
              annotations:
                summary: Node {{$labels.kubernetes_io_hostname}} has more than 85% disk used. Plan Capacity.
              expr: |
                (sum(container_fs_usage_bytes{device=~"^/dev/[sv]d[a-z][1-9]$",id="/",container_name!="POD"}) by (kubernetes_io_hostname) / sum(container_fs_limit_bytes{container_name!="POD",device=~"^/dev/[sv]d[a-z][1-9]$",id="/"}) by (kubernetes_io_hostname)) * 100 > 85
              for: 5m
              labels:
                team: devops
    

    部署Prometheus Stateful Set

    apiVersion: storage.k8s.io/v1beta1
    kind: StorageClass
    metadata:
      name: fast
      namespace: monitoring
    provisioner: kubernetes.io/gce-pd
    allowVolumeExpansion: true
    ---
    apiVersion: apps/v1beta1
    kind: StatefulSet
    metadata:
      name: prometheus
      namespace: monitoring
    spec:
      replicas: 3
      serviceName: prometheus-service
      template:
        metadata:
          labels:
            app: prometheus
            thanos-store-api: "true"
        spec:
          serviceAccountName: monitoring
          containers:
            - name: prometheus
              image: prom/prometheus:v2.4.3
              args:
                - "--config.file=/etc/prometheus-shared/prometheus.yaml"
                - "--storage.tsdb.path=/prometheus/"
                - "--web.enable-lifecycle"
                - "--storage.tsdb.no-lockfile"
                - "--storage.tsdb.min-block-duration=2h"
                - "--storage.tsdb.max-block-duration=2h"
              ports:
                - name: prometheus
                  containerPort: 9090
              volumeMounts:
                - name: prometheus-storage
                  mountPath: /prometheus/
                - name: prometheus-config-shared
                  mountPath: /etc/prometheus-shared/
                - name: prometheus-rules
                  mountPath: /etc/prometheus/rules
            - name: thanos
              image: quay.io/thanos/thanos:v0.8.0
              args:
                - "sidecar"
                - "--log.level=debug"
                - "--tsdb.path=/prometheus"
                - "--prometheus.url=http://127.0.0.1:9090"
                - "--objstore.config={type: GCS, config: {bucket: prometheus-long-term}}"
                - "--reloader.config-file=/etc/prometheus/prometheus.yaml.tmpl"
                - "--reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yaml"
                - "--reloader.rule-dir=/etc/prometheus/rules/"
              env:
                - name: POD_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name : GOOGLE_APPLICATION_CREDENTIALS
                  value: /etc/secret/thanos-gcs-credentials.json
              ports:
                - name: http-sidecar
                  containerPort: 10902
                - name: grpc
                  containerPort: 10901
              livenessProbe:
                  httpGet:
                    port: 10902
                    path: /-/healthy
              readinessProbe:
                httpGet:
                  port: 10902
                  path: /-/ready
              volumeMounts:
                - name: prometheus-storage
                  mountPath: /prometheus
                - name: prometheus-config-shared
                  mountPath: /etc/prometheus-shared/
                - name: prometheus-config
                  mountPath: /etc/prometheus
                - name: prometheus-rules
                  mountPath: /etc/prometheus/rules
                - name: thanos-gcs-credentials
                  mountPath: /etc/secret
                  readOnly: false
          securityContext:
            fsGroup: 2000
            runAsNonRoot: true
            runAsUser: 1000
          volumes:
            - name: prometheus-config
              configMap:
                defaultMode: 420
                name: prometheus-server-conf
            - name: prometheus-config-shared
              emptyDir: {}
            - name: prometheus-rules
              configMap:
                name: prometheus-rules
            - name: thanos-gcs-credentials
              secret:
                secretName: thanos-gcs-credentials
      volumeClaimTemplates:
      - metadata:
          name: prometheus-storage
          namespace: monitoring
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: fast
          resources:
            requests:
              storage: 20Gi
    

    关于上面提供的manifest,理解以下内容很重要:

    1. Prometheus是作为一个有状态集部署的,有3个副本,每个副本动态地提供自己的持久化卷。

    2. Prometheus配置是由Thanos sidecar容器使用我们上面创建的模板文件生成的。

    3. Thanos处理数据压缩,因此我们需要设置--storage.tsdb.min-block-duration=2h和--storage.tsdb.max-block-duration=2h。

    4. Prometheus有状态集被标记为thanos-store-api: true,这样每个pod就会被我们接下来创建的headless service发现。正是这个headless service将被Thanos Querier用来查询所有Prometheus实例的数据。我们还将相同的标签应用于Thanos Store和Thanos Ruler组件,这样它们也会被Querier发现,并可用于查询指标。

    5. GCS bucket credentials路径是使用GOOGLE_APPLICATION_CREDENTIALS环境变量提供的,配置文件是由我们作为前期准备中创建的secret挂载到它上面的。

    部署Prometheus服务

    apiVersion: v1
    kind: Service
    metadata: 
      name: prometheus-0-service
      annotations: 
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
      namespace: monitoring
      labels:
        name: prometheus
    spec:
      selector: 
        statefulset.kubernetes.io/pod-name: prometheus-0
      ports: 
        - name: prometheus 
          port: 8080
          targetPort: prometheus
    ---
    apiVersion: v1
    kind: Service
    metadata: 
      name: prometheus-1-service
      annotations: 
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
      namespace: monitoring
      labels:
        name: prometheus
    spec:
      selector: 
        statefulset.kubernetes.io/pod-name: prometheus-1
      ports: 
        - name: prometheus 
          port: 8080
          targetPort: prometheus
    ---
    apiVersion: v1
    kind: Service
    metadata: 
      name: prometheus-2-service
      annotations: 
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
      namespace: monitoring
      labels:
        name: prometheus
    spec:
      selector: 
        statefulset.kubernetes.io/pod-name: prometheus-2
      ports: 
        - name: prometheus 
          port: 8080
          targetPort: prometheus
    ---
    #This service creates a srv record for querier to find about store-api's
    apiVersion: v1
    kind: Service
    metadata:
      name: thanos-store-gateway
      namespace: monitoring
    spec:
      type: ClusterIP
      clusterIP: None
      ports:
        - name: grpc
          port: 10901
          targetPort: grpc
      selector:
        thanos-store-api: "true"
    

    除了上述方法外,你还可以点击这篇文章了解如何在Rancher上快速部署和配置Prometheus服务。

    我们为stateful set中的每个Prometheus pod创建了不同的服务,尽管这并不是必要的。这些服务的创建只是为了调试。上文已经解释了 thanos-store-gateway headless service的目的。我们稍后将使用一个 ingress 对象来暴露 Prometheus 服务。

    部署Prometheus Querier

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: thanos-querier
      namespace: monitoring
      labels:
        app: thanos-querier
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: thanos-querier
      template:
        metadata:
          labels:
            app: thanos-querier
        spec:
          containers:
          - name: thanos
            image: quay.io/thanos/thanos:v0.8.0
            args:
            - query
            - --log.level=debug
            - --query.replica-label=replica
            - --store=dnssrv+thanos-store-gateway:10901
            ports:
            - name: http
              containerPort: 10902
            - name: grpc
              containerPort: 10901
            livenessProbe:
              httpGet:
                port: http
                path: /-/healthy
            readinessProbe:
              httpGet:
                port: http
                path: /-/ready
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: thanos-querier
      name: thanos-querier
      namespace: monitoring
    spec:
      ports:
      - port: 9090
        protocol: TCP
        targetPort: http
        name: http
      selector:
        app: thanos-querier
    

    这是Thanos部署的主要内容之一。请注意以下几点:

    1. 容器参数-store=dnssrv+thanos-store-gateway:10901有助于发现所有应查询的指标数据的组件。

    2. thanos-querier服务提供了一个Web接口来运行PromQL查询。它还可以选择在不同的Prometheus集群中去重复删除数据。

    3. 这是我们提供Grafana作为所有dashboard的数据源的终点(end point)。

    部署Thanos存储网关

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: apps/v1beta1
    kind: StatefulSet
    metadata:
      name: thanos-store-gateway
      namespace: monitoring
      labels:
        app: thanos-store-gateway
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: thanos-store-gateway
      serviceName: thanos-store-gateway
      template:
        metadata:
          labels:
            app: thanos-store-gateway
            thanos-store-api: "true"
        spec:
          containers:
            - name: thanos
              image: quay.io/thanos/thanos:v0.8.0
              args:
              - "store"
              - "--log.level=debug"
              - "--data-dir=/data"
              - "--objstore.config={type: GCS, config: {bucket: prometheus-long-term}}"
              - "--index-cache-size=500MB"
              - "--chunk-pool-size=500MB"
              env:
                - name : GOOGLE_APPLICATION_CREDENTIALS
                  value: /etc/secret/thanos-gcs-credentials.json
              ports:
              - name: http
                containerPort: 10902
              - name: grpc
                containerPort: 10901
              livenessProbe:
                httpGet:
                  port: 10902
                  path: /-/healthy
              readinessProbe:
                httpGet:
                  port: 10902
                  path: /-/ready
              volumeMounts:
                - name: thanos-gcs-credentials
                  mountPath: /etc/secret
                  readOnly: false
          volumes:
            - name: thanos-gcs-credentials
              secret:
                secretName: thanos-gcs-credentials
    ---
    

    这将创建存储组件,它将从对象存储中向Querier提供指标。

    部署Thanos Ruler

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: thanos-ruler-rules
      namespace: monitoring
    data:
      alert_down_services.rules.yaml: |
        groups:
        - name: metamonitoring
          rules:
          - alert: PrometheusReplicaDown
            annotations:
              message: Prometheus replica in cluster {{$labels.cluster}} has disappeared from Prometheus target discovery.
            expr: |
              sum(up{cluster="prometheus-ha", instance=~".*:9090", job="kubernetes-service-endpoints"}) by (job,cluster) < 3
            for: 15s
            labels:
              severity: critical
    ---
    apiVersion: apps/v1beta1
    kind: StatefulSet
    metadata:
      labels:
        app: thanos-ruler
      name: thanos-ruler
      namespace: monitoring
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: thanos-ruler
      serviceName: thanos-ruler
      template:
        metadata:
          labels:
            app: thanos-ruler
            thanos-store-api: "true"
        spec:
          containers:
            - name: thanos
              image: quay.io/thanos/thanos:v0.8.0
              args:
                - rule
                - --log.level=debug
                - --data-dir=/data
                - --eval-interval=15s
                - --rule-file=/etc/thanos-ruler/*.rules.yaml
                - --alertmanagers.url=http://alertmanager:9093
                - --query=thanos-querier:9090
                - "--objstore.config={type: GCS, config: {bucket: thanos-ruler}}"
                - --label=ruler_cluster="prometheus-ha"
                - --label=replica="$(POD_NAME)"
              env:
                - name : GOOGLE_APPLICATION_CREDENTIALS
                  value: /etc/secret/thanos-gcs-credentials.json
                - name: POD_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
              ports:
                - name: http
                  containerPort: 10902
                - name: grpc
                  containerPort: 10901
              livenessProbe:
                httpGet:
                  port: http
                  path: /-/healthy
              readinessProbe:
                httpGet:
                  port: http
                  path: /-/ready
              volumeMounts:
                - mountPath: /etc/thanos-ruler
                  name: config
                - name: thanos-gcs-credentials
                  mountPath: /etc/secret
                  readOnly: false
          volumes:
            - configMap:
                name: thanos-ruler-rules
              name: config
            - name: thanos-gcs-credentials
              secret:
                secretName: thanos-gcs-credentials
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: thanos-ruler
      name: thanos-ruler
      namespace: monitoring
    spec:
      ports:
        - port: 9090
          protocol: TCP
          targetPort: http
          name: http
      selector:
        app: thanos-ruler
    

    现在,如果你在与我们的工作负载相同的命名空间中启动交互式shell,并尝试查看我们的thanos-store-gateway解析到哪些pods,你会看到以下内容:

    root@my-shell-95cb5df57-4q6w8:/# nslookup thanos-store-gateway
    Server:    10.63.240.10
    Address:  10.63.240.10#53
    
    Name:  thanos-store-gateway.monitoring.svc.cluster.local
    Address: 10.60.25.2
    Name:  thanos-store-gateway.monitoring.svc.cluster.local
    Address: 10.60.25.4
    Name:  thanos-store-gateway.monitoring.svc.cluster.local
    Address: 10.60.30.2
    Name:  thanos-store-gateway.monitoring.svc.cluster.local
    Address: 10.60.30.8
    Name:  thanos-store-gateway.monitoring.svc.cluster.local
    Address: 10.60.31.2
    
    root@my-shell-95cb5df57-4q6w8:/# exit
    

    上面返回的IP对应的是我们的Prometheus Pod、thanos-storethanos-ruler。这可以被验证为:

    $ kubectl get pods -o wide -l thanos-store-api="true"
    NAME                     READY   STATUS    RESTARTS   AGE    IP           NODE                              NOMINATED NODE   READINESS GATES
    prometheus-0             2/2     Running   0          100m   10.60.31.2   gke-demo-1-pool-1-649cbe02-jdnv   <none>           <none>
    prometheus-1             2/2     Running   0          14h    10.60.30.2   gke-demo-1-pool-1-7533d618-kxkd   <none>           <none>
    prometheus-2             2/2     Running   0          31h    10.60.25.2   gke-demo-1-pool-1-4e9889dd-27gc   <none>           <none>
    thanos-ruler-0           1/1     Running   0          100m   10.60.30.8   gke-demo-1-pool-1-7533d618-kxkd   <none>           <none>
    thanos-store-gateway-0   1/1     Running   0          14h    10.60.25.4   gke-demo-1-pool-1-4e9889dd-27gc   <none>           <none>
    

    部署Alertmanager

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: alertmanager
      namespace: monitoring
    data:
      config.yml: |-
        global:
          resolve_timeout: 5m
          slack_api_url: "<your_slack_hook>"
          victorops_api_url: "<your_victorops_hook>"
    
        templates:
        - '/etc/alertmanager-templates/*.tmpl'
        route:
          group_by: ['alertname', 'cluster', 'service']
          group_wait: 10s
          group_interval: 1m
          repeat_interval: 5m  
          receiver: default 
          routes:
          - match:
              team: devops
            receiver: devops
            continue: true 
          - match: 
              team: dev
            receiver: dev
            continue: true
    
        receivers:
        - name: 'default'
    
        - name: 'devops'
          victorops_configs:
          - api_key: '<YOUR_API_KEY>'
            routing_key: 'devops'
            message_type: 'CRITICAL'
            entity_display_name: '{{ .CommonLabels.alertname }}'
            state_message: 'Alert: {{ .CommonLabels.alertname }}. Summary:{{ .CommonAnnotations.summary }}. RawData: {{ .CommonLabels }}'
          slack_configs:
          - channel: '#k8-alerts'
            send_resolved: true
    
    
        - name: 'dev'
          victorops_configs:
          - api_key: '<YOUR_API_KEY>'
            routing_key: 'dev'
            message_type: 'CRITICAL'
            entity_display_name: '{{ .CommonLabels.alertname }}'
            state_message: 'Alert: {{ .CommonLabels.alertname }}. Summary:{{ .CommonAnnotations.summary }}. RawData: {{ .CommonLabels }}'
          slack_configs:
          - channel: '#k8-alerts'
            send_resolved: true
    
    ---
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: alertmanager
      namespace: monitoring
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: alertmanager
      template:
        metadata:
          name: alertmanager
          labels:
            app: alertmanager
        spec:
          containers:
          - name: alertmanager
            image: prom/alertmanager:v0.15.3
            args:
              - '--config.file=/etc/alertmanager/config.yml'
              - '--storage.path=/alertmanager'
            ports:
            - name: alertmanager
              containerPort: 9093
            volumeMounts:
            - name: config-volume
              mountPath: /etc/alertmanager
            - name: alertmanager
              mountPath: /alertmanager
          volumes:
          - name: config-volume
            configMap:
              name: alertmanager
          - name: alertmanager
            emptyDir: {}
    ---
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/path: '/metrics'
      labels:
        name: alertmanager
      name: alertmanager
      namespace: monitoring
    spec:
      selector:
        app: alertmanager
      ports:
      - name: alertmanager
        protocol: TCP
        port: 9093
        targetPort: 9093
    

    这将创建我们的Alertmanager部署,它将根据Prometheus规则生成所有告警。

    部署Kubestate指标

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: rbac.authorization.k8s.io/v1 
    # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
    kind: ClusterRoleBinding
    metadata:
      name: kube-state-metrics
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: kube-state-metrics
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: monitoring
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
    kind: ClusterRole
    metadata:
      name: kube-state-metrics
    rules:
    - apiGroups: [""]
      resources:
      - configmaps
      - secrets
      - nodes
      - pods
      - services
      - resourcequotas
      - replicationcontrollers
      - limitranges
      - persistentvolumeclaims
      - persistentvolumes
      - namespaces
      - endpoints
      verbs: ["list", "watch"]
    - apiGroups: ["extensions"]
      resources:
      - daemonsets
      - deployments
      - replicasets
      verbs: ["list", "watch"]
    - apiGroups: ["apps"]
      resources:
      - statefulsets
      verbs: ["list", "watch"]
    - apiGroups: ["batch"]
      resources:
      - cronjobs
      - jobs
      verbs: ["list", "watch"]
    - apiGroups: ["autoscaling"]
      resources:
      - horizontalpodautoscalers
      verbs: ["list", "watch"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
    kind: RoleBinding
    metadata:
      name: kube-state-metrics
      namespace: monitoring
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: kube-state-metrics-resizer
    subjects:
    - kind: ServiceAccount
      name: kube-state-metrics
      namespace: monitoring
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    # kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
    kind: Role
    metadata:
      namespace: monitoring
      name: kube-state-metrics-resizer
    rules:
    - apiGroups: [""]
      resources:
      - pods
      verbs: ["get"]
    - apiGroups: ["extensions"]
      resources:
      - deployments
      resourceNames: ["kube-state-metrics"]
      verbs: ["get", "update"]
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: kube-state-metrics
      namespace: monitoring
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kube-state-metrics
      namespace: monitoring
    spec:
      selector:
        matchLabels:
          k8s-app: kube-state-metrics
      replicas: 1
      template:
        metadata:
          labels:
            k8s-app: kube-state-metrics
        spec:
          serviceAccountName: kube-state-metrics
          containers:
          - name: kube-state-metrics
            image: quay.io/mxinden/kube-state-metrics:v1.4.0-gzip.3
            ports:
            - name: http-metrics
              containerPort: 8080
            - name: telemetry
              containerPort: 8081
            readinessProbe:
              httpGet:
                path: /healthz
                port: 8080
              initialDelaySeconds: 5
              timeoutSeconds: 5
          - name: addon-resizer
            image: k8s.gcr.io/addon-resizer:1.8.3
            resources:
              limits:
                cpu: 150m
                memory: 50Mi
              requests:
                cpu: 150m
                memory: 50Mi
            env:
              - name: MY_POD_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              - name: MY_POD_NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
            command:
              - /pod_nanny
              - --container=kube-state-metrics
              - --cpu=100m
              - --extra-cpu=1m
              - --memory=100Mi
              - --extra-memory=2Mi
              - --threshold=5
              - --deployment=kube-state-metrics
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: kube-state-metrics
      namespace: monitoring
      labels:
        k8s-app: kube-state-metrics
      annotations:
        prometheus.io/scrape: 'true'
    spec:
      ports:
      - name: http-metrics
        port: 8080
        targetPort: http-metrics
        protocol: TCP
      - name: telemetry
        port: 8081
        targetPort: telemetry
        protocol: TCP
      selector:
        k8s-app: kube-state-metrics
    

    Kubestate指标部署需要转发一些重要的容器指标,这些指标不是kubelet原生暴露的,因此不能直接提供给Prometheus。

    部署Node-Exporter Daemonset

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: node-exporter
      namespace: monitoring
      labels:
        name: node-exporter
    spec:
      template:
        metadata:
          labels:
            name: node-exporter
          annotations:
             prometheus.io/scrape: "true"
             prometheus.io/port: "9100"
        spec:
          hostPID: true
          hostIPC: true
          hostNetwork: true
          containers:
            - name: node-exporter
              image: prom/node-exporter:v0.16.0
              securityContext:
                privileged: true
              args:
                - --path.procfs=/host/proc
                - --path.sysfs=/host/sys
              ports:
                - containerPort: 9100
                  protocol: TCP
              resources:
                limits:
                  cpu: 100m
                  memory: 100Mi
                requests:
                  cpu: 10m
                  memory: 100Mi
              volumeMounts:
                - name: dev
                  mountPath: /host/dev
                - name: proc
                  mountPath: /host/proc
                - name: sys
                  mountPath: /host/sys
                - name: rootfs
                  mountPath: /rootfs
          volumes:
            - name: proc
              hostPath:
                path: /proc
            - name: dev
              hostPath:
                path: /dev
            - name: sys
              hostPath:
                path: /sys
            - name: rootfs
              hostPath:
                path: /
    

    Node-Exporter daemonset在每个节点上运行一个node-exporter的pod,并暴露出非常重要的节点相关指标,这些指标可以被Prometheus实例拉取。

    部署Grafana

    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    ---
    apiVersion: storage.k8s.io/v1beta1
    kind: StorageClass
    metadata:
      name: fast
      namespace: monitoring
    provisioner: kubernetes.io/gce-pd
    allowVolumeExpansion: true
    ---
    apiVersion: apps/v1beta1
    kind: StatefulSet
    metadata:
      name: grafana
      namespace: monitoring
    spec:
      replicas: 1
      serviceName: grafana
      template:
        metadata:
          labels:
            task: monitoring
            k8s-app: grafana
        spec:
          containers:
          - name: grafana
            image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4
            ports:
            - containerPort: 3000
              protocol: TCP
            volumeMounts:
            - mountPath: /etc/ssl/certs
              name: ca-certificates
              readOnly: true
            - mountPath: /var
              name: grafana-storage
            env:
            - name: GF_SERVER_HTTP_PORT
              value: "3000"
              # The following env variables are required to make Grafana accessible via
              # the kubernetes api-server proxy. On production clusters, we recommend
              # removing these env variables, setup auth for grafana, and expose the grafana
              # service using a LoadBalancer or a public IP.
            - name: GF_AUTH_BASIC_ENABLED
              value: "false"
            - name: GF_AUTH_ANONYMOUS_ENABLED
              value: "true"
            - name: GF_AUTH_ANONYMOUS_ORG_ROLE
              value: Admin
            - name: GF_SERVER_ROOT_URL
              # If you're only using the API Server proxy, set this value instead:
              # value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
              value: /
          volumes:
          - name: ca-certificates
            hostPath:
              path: /etc/ssl/certs
      volumeClaimTemplates:
      - metadata:
          name: grafana-storage
          namespace: monitoring
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: fast
          resources:
            requests:
              storage: 5Gi
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        kubernetes.io/cluster-service: 'true'
        kubernetes.io/name: grafana
      name: grafana
      namespace: monitoring
    spec:
      ports:
      - port: 3000
        targetPort: 3000
      selector:
        k8s-app: grafana
    

    这将创建我们的Grafana部署和服务,它将使用我们的Ingress对象暴露。为了做到这一点,我们应该添加Thanos-Querier作为我们Grafana部署的数据源:

    1. 点击添加数据源

    2. 设置Name: DS_PROMETHEUS

    3. 设置Type: Prometheus

    4. 设置URL: http://thanos-querier:9090

    5. 保存并测试。现在你可以构建你的自定义dashboard或从grafana.net简单导入dashboard。Dashboard #315和#1471都非常适合入门。

    部署Ingress对象

    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
      name: monitoring-ingress
      namespace: monitoring
      annotations:
        kubernetes.io/ingress.class: "nginx"
    spec:
      rules:
      - host: grafana.<yourdomain>.com
        http:
          paths:
          - path: /
            backend:
              serviceName: grafana
              servicePort: 3000
      - host: prometheus-0.<yourdomain>.com
        http:
          paths:
          - path: /
            backend:
              serviceName: prometheus-0-service
              servicePort: 8080
      - host: prometheus-1.<yourdomain>.com
        http:
          paths:
          - path: /
            backend:
              serviceName: prometheus-1-service
              servicePort: 8080
      - host: prometheus-2.<yourdomain>.com
        http:
          paths:
          - path: /
            backend:
              serviceName: prometheus-2-service
              servicePort: 8080
      - host: alertmanager.<yourdomain>.com
        http: 
          paths:
          - path: /
            backend:
              serviceName: alertmanager
              servicePort: 9093
      - host: thanos-querier.<yourdomain>.com
        http:
          paths:
          - path: /
            backend:
              serviceName: thanos-querier
              servicePort: 9090
      - host: thanos-ruler.<yourdomain>.com
        http:
          paths:
          - path: /
            backend:
              serviceName: thanos-ruler
              servicePort: 9090
    

    这是拼图的最后一块。有助于将我们的所有服务暴露在Kubernetes集群之外,并帮助我们访问它们。确保将<yourdomain>替换为一个你可以访问的域名,并且你可以将Ingress-Controller的服务指向这个域名。

    现在你应该可以访问Thanos Querier,网址是:http://thanos-querier.<yourdomain>.com。它如下所示:

    image

    确保选中重复数据删除(deduplication)。

    如果你点击Store,可以看到所有由thanos-store-gateway服务发现的活动端点。

    image

    现在你可以在Grafana中添加Thanos Querier作为数据源,并开始创建dashboard。

    image

    Kubernetes集群监控dashboard

    image

    Kubernetes节点监控dashboard

    image

    总 结

    将Thanos与Prometheus集成在一起,无疑提供了横向扩展Prometheus的能力,而且由于Thanos-Querier能够从其他querier实例中提取指标数据,因此实际上你可以跨集群提取指标数据,并在一个单一的仪表板中可视化。

    我们还能够将指标数据归档在对象存储中,为我们的监控系统提供无限的存储空间,同时从对象存储本身提供指标数据。这种设置的主要成本部分可以归结为对象存储(S3或GCS)。如果我们对它们应用适当的保留策略,可以进一步降低成本。

    然而,实现这一切需要你进行大量的配置。上面提供的manifest已经在生产环境中进行了测试,你可以大胆进行尝试。

    相关文章

      网友评论

        本文标题:详细教程丨使用Prometheus和Thanos进行高可用K8S

        本文链接:https://www.haomeiwen.com/subject/ltauektx.html