kube-prometheus

作者: 潘猛_9f76 | 来源:发表于2019-06-04 17:41 被阅读0次

    很多地方提到Prometheus Operator是kubernetes集群监控的终极解决方案,但是目前Prometheus Operator已经不包含完整功能,完整的解决方案已经变为kube-prometheus。项目地址为:
    https://github.com/coreos/kube-prometheus

    安装

    下载软件

    #git clone https://github.com/coreos/kube-prometheus.git
    

    查看清单文件

    #cd manifests
    #ls
    00namespace-namespace.yaml                                         node-exporter-clusterRole.yaml
    0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml    node-exporter-daemonset.yaml
    0prometheus-operator-0prometheusCustomResourceDefinition.yaml      node-exporter-serviceAccount.yaml
    0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml  node-exporter-serviceMonitor.yaml
    0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml  node-exporter-service.yaml
    0prometheus-operator-clusterRoleBinding.yaml                       prometheus-adapter-apiService.yaml
    0prometheus-operator-clusterRole.yaml                              prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
    0prometheus-operator-deployment.yaml                               prometheus-adapter-clusterRoleBindingDelegator.yaml
    0prometheus-operator-serviceAccount.yaml                           prometheus-adapter-clusterRoleBinding.yaml
    0prometheus-operator-serviceMonitor.yaml                           prometheus-adapter-clusterRoleServerResources.yaml
    0prometheus-operator-service.yaml                                  prometheus-adapter-clusterRole.yaml
    alertmanager-alertmanager.yaml                                     prometheus-adapter-configMap.yaml
    alertmanager-secret.yaml                                           prometheus-adapter-deployment.yaml
    alertmanager-serviceAccount.yaml                                   prometheus-adapter-roleBindingAuthReader.yaml
    alertmanager-serviceMonitor.yaml                                   prometheus-adapter-serviceAccount.yaml
    alertmanager-service.yaml                                          prometheus-adapter-service.yaml
    grafana-dashboardDatasources.yaml                                  prometheus-clusterRoleBinding.yaml
    grafana-dashboardDefinitions.yaml                                  prometheus-clusterRole.yaml
    grafana-dashboardSources.yaml                                      prometheus-prometheus.yaml
    grafana-deployment.yaml                                            prometheus-roleBindingConfig.yaml
    grafana-serviceAccount.yaml                                        prometheus-roleBindingSpecificNamespaces.yaml
    grafana-serviceMonitor.yaml                                        prometheus-roleConfig.yaml
    grafana-service.yaml                                               prometheus-roleSpecificNamespaces.yaml
    kube-state-metrics-clusterRoleBinding.yaml                         prometheus-rules.yaml
    kube-state-metrics-clusterRole.yaml                                prometheus-serviceAccount.yaml
    kube-state-metrics-deployment.yaml                                 prometheus-serviceMonitorApiserver.yaml
    kube-state-metrics-roleBinding.yaml                                prometheus-serviceMonitorCoreDNS.yaml
    kube-state-metrics-role.yaml                                       prometheus-serviceMonitorKubeControllerManager.yaml
    kube-state-metrics-serviceAccount.yaml                             prometheus-serviceMonitorKubelet.yaml
    kube-state-metrics-serviceMonitor.yaml                             prometheus-serviceMonitorKubeScheduler.yaml
    kube-state-metrics-service.yaml                                    prometheus-serviceMonitor.yaml
    node-exporter-clusterRoleBinding.yaml                              prometheus-service.yaml
    

    修改prometheus-serviceMonitorKubelet.yaml中的port,由https-metrics改为http-metrics,并将scheme改为http

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    spec:
        port: http-metrics
        scheme: http  #很多资料上没有提到scheme
    

    alertmanager-service.yaml 增加nodeport 30093的配置

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        alertmanager: main
      name: alertmanager-main
      namespace: monitoring
    spec:
      ports:
      - name: web
        port: 9093
        targetPort: web
        nodePort: 30093
      type: NodePort
      selector:
        alertmanager: main
        app: alertmanager
      sessionAffinity: ClientIP
    

    grafana-service.yaml 增加nodeport 32000的配置

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: grafana
      name: grafana
      namespace: monitoring
    spec:
      ports:
      - name: http
        port: 3000
        targetPort: http
        nodePort: 32000
      type: NodePort
      selector:
        app: grafana
    

    prometheus-service.yaml 增加nodeport 30090的配置

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        prometheus: k8s
      name: prometheus-k8s
      namespace: monitoring
    spec:
      ports:
      - name: web
        port: 9090
        targetPort: web
        nodePort: 30090
      type: NodePort
      selector:
        app: prometheus
        prometheus: k8s
      sessionAffinity: ClientIP
    

    创建资源,过程中会报资源不存在,建议执行两次

    #kubectl apply -f .
    

    查看自定义资源crd

    #kubectl get crd | grep coreos
    alertmanagers.monitoring.coreos.com           2019-06-03T09:17:48Z
    prometheuses.monitoring.coreos.com            2019-06-03T09:17:48Z
    prometheusrules.monitoring.coreos.com         2019-06-03T09:17:48Z
    servicemonitors.monitoring.coreos.com         2019-06-03T09:17:48Z
    

    查看新建的pod

    #kubectl -n monitoring  get pods  -o wide
    NAME                                   READY   STATUS    RESTARTS   AGE   IP               NODE            NOMINATED NODE   READINESS GATES
    alertmanager-main-0                    2/2     Running   0          16h   10.244.196.134   node01          <none>           <none>
    alertmanager-main-1                    2/2     Running   0          15h   10.244.241.204   ingressnode02   <none>           <none>
    alertmanager-main-2                    2/2     Running   0          15h   10.244.114.4     node05          <none>           <none>
    grafana-69c7b8468d-l8p2b               1/1     Running   0          16h   10.244.17.198    prometheus01    <none>           <none>
    kube-state-metrics-65b5ccc84-kwfgh     4/4     Running   0          15h   10.244.17.199    prometheus01    <none>           <none>
    node-exporter-62mkc                    2/2     Running   0          16h   22.22.3.235      master02        <none>           <none>
    node-exporter-6bsrb                    2/2     Running   0          16h   22.22.3.239      node04          <none>           <none>
    node-exporter-8b5h8                    2/2     Running   0          16h   22.22.3.241      prometheus01    <none>           <none>
    node-exporter-chssb                    2/2     Running   0          16h   22.22.3.243      ingressnode02   <none>           <none>
    node-exporter-dwqkc                    2/2     Running   0          16h   22.22.3.240      node05          <none>           <none>
    node-exporter-kf2cr                    2/2     Running   0          16h   22.22.3.242      ingressnode01   <none>           <none>
    node-exporter-krsm4                    2/2     Running   0          16h   22.22.3.238      node03          <none>           <none>
    node-exporter-lv4gx                    2/2     Running   0          16h   22.22.3.236      node01          <none>           <none>
    node-exporter-v5f9v                    2/2     Running   0          16h   22.22.3.234      master01        <none>           <none>
    node-exporter-zgsr2                    2/2     Running   0          16h   22.22.3.237      node02          <none>           <none>
    prometheus-adapter-6c75d8686d-gq8bn    1/1     Running   0          16h   10.244.17.197    prometheus01    <none>           <none>
    prometheus-k8s-0                       3/3     Running   1          16h   10.244.140.68    node02          <none>           <none>
    prometheus-k8s-1                       3/3     Running   1          16h   10.244.248.198   node04          <none>           <none>
    prometheus-operator-74d449f6b4-q6bjn   1/1     Running   0          16h   10.244.17.196    prometheus01    <none>           <none>
    

    确认网页都能正常打开


    image.png
    image.png
    image.png
    配置prometheus

    展开Status菜单,查看targets,可以看到只有图中两个监控任务没有对应的目标,这和serviceMonitor资源对象有关


    image.png

    查看yaml文件prometheus-serviceMonitorKubeScheduler,selector匹配的是service的标签,但是kube-system namespace中并没有k8s-app=kube-scheduler的service

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: kube-scheduler
      name: kube-scheduler
      namespace: monitoring
    spec:
      endpoints:
      - interval: 30s
        port: http-metrics
      jobLabel: k8s-app
      namespaceSelector:
        matchNames:
        - kube-system
      selector:
        matchLabels:
          k8s-app: kube-scheduler
    

    新建prometheus-kubeSchedulerService.yaml

    apiVersion: v1
    kind: Service
    metadata:
      namespace: kube-system
      name: kube-scheduler
      labels:
        k8s-app: kube-scheduler #与servicemonitor中的selector匹配
    spec:
      selector: 
        component: kube-scheduler # 与scheduler的pod标签一直
      ports:
      - name: http-metrics
        port: 10251
        targetPort: 10251
        protocol: TCP
    

    创建service kube-scheduler

    #kubectl apply -f prometheus-kubeSchedulerService.yaml 
    

    同理新建prometheus-kubeControllerManagerService.yaml

    apiVersion: v1
    kind: Service
    metadata:
      namespace: kube-system
      name: kube-controller-manager
      labels:
        k8s-app: kube-controller-manager
    spec:
      selector:
        component: kube-controller-manager
      ports:
      - name: http-metrics
        port: 10252
        targetPort: 10252
        protocol: TCP
    

    创建service kube-controller-manager

    #kubectl apply -f prometheus-kubeControllerManagerService.yaml
    

    确认所有targets变为正常


    image.png
    配置grafana

    使用admin/admin登录并修改密码
    可以看到数据源已经与prometheus关联


    image.png
    自定义监控项

    以监控etcd为例


    image.png

    将需要的etcd证书保存到secret对象etcd-certs中

    # kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt  --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key  --from-file=/etc/kubernetes/pki/etcd/ca.crt 
    secret/etcd-certs created
    

    修改prometheus资源k8s,在prometheus-prometheus.yaml里面增加secrets

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      labels:
        prometheus: k8s
      name: k8s
      namespace: monitoring
    spec:
      alerting:
        alertmanagers:
        - name: alertmanager-main
          namespace: monitoring
          port: web
      baseImage: quay.io/prometheus/prometheus
      nodeSelector:
        beta.kubernetes.io/os: linux
      replicas: 2
      secrets:
      - etcd-certs
      resources:
        requests:
          memory: 400Mi
      ruleSelector:
        matchLabels:
          prometheus: k8s
          role: alert-rules
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      serviceAccountName: prometheus-k8s
      serviceMonitorNamespaceSelector: {}
      serviceMonitorSelector: {}
      version: v2.7.2
    

    应用prometheus-prometheus.yaml

    #kubectl apply -f prometheus-prometheus.yaml 
    

    在pod中查看证书是否导入成功

    # kubectl -n monitoring  exec -it prometheus-k8s-0 /bin/sh
    Defaulting container name to prometheus.
    Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
    # ls -l /etc/prometheus/secrets/etcd-certs/
    total 0
    lrwxrwxrwx    1 root     root            13 Jun  4 09:12 ca.crt -> ..data/ca.crt
    lrwxrwxrwx    1 root     root            29 Jun  4 09:12 healthcheck-client.crt -> ..data/healthcheck-client.crt
    lrwxrwxrwx    1 root     root            29 Jun  4 09:12 healthcheck-client.key -> ..data/healthcheck-client.key
    /prometheus $ cat /etc/prometheus/secrets/etcd-certs/ca.crt 
    -----BEGIN CERTIFICATE-----
    MIIC9zCCAd+gAwIBAgIJAMiN3pOWJVGOMA0GCSqGSIb3DQEBCwUAMBIxEDAOBgNV
    BAMMB2V0Y2QtY2EwHhcNMTkwNTI3MDgzNDExWhcNMzkwNTIyMDgzNDExWjASMRAw
    DgYDVQQDDAdldGNkLWNhMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
    rG1xQcAwZ67XXG84PzqIIqoqnq/zM3Ru+02PELbzgiZ4MrNPte32vZuj6HK/JDDQ
    nEirgnQQxQJ6OxvnDrFVwyxveNI8jrd+FRfuh2ae0NIiqkWk88O42OioACBW6cJA
    hILpIcn066+E+t2vh/3TmqMduV8eY5p8VAwRT1B04fJAQVcr0sJh3JXExppbtdWL
    Z0T25QTbbbZ/I6oxLMu/NkS171R5l397rSpD2ox0NV0GASoqiitffPznOHBPa1Zs
    UwOlQnZlWaBM5XQHFhRQTG/Bxxhe45azmmPT3DGCpATk+/GnYDPnt4TSZiX9gZ6O
    beRsGUzPDrX/LOEV/Uv+VQIDAQABo1AwTjAdBgNVHQ4EFgQUxQl8C8RdG+tU2U+T
    gy901tOxUNUwHwYDVR0jBBgwFoAUxQl8C8RdG+tU2U+Tgy901tOxUNUwDAYDVR0T
    BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAica5i0wN9ZuCICQOGwMcuVgadBqV
    w4dOyP4EPyD2SKx3YpYREMGXOafYkrX2rWKqsCBqS9xUT34x2DQ4/KuoPY/Ee37h
    pJ+/i47sq8pmiHxqQRUACyGA6SqWtcApfW62+O97qHnRtyUcCftKKLYEu3djzTJd
    FOn6xPehbFzhL9H4tsiZ+kFaXqWDUbhSCAd/LeJ+dxzmOE+Rd0hsPHIyzdmWUKwe
    CTkSaf9X4KPWjBUCqPzB/Td6Mz3HHg8zZo2FgkyI98a7c83rHl3aTfBJEi4LND8x
    PTFwgOGNlZXa6OnUmkn/sHvoNc88EqDm/GjPI6xfLr7BSWE4jJCIwWROvg==
    -----END CERTIFICATE-----
    

    创建serviceMonitor etcd-k8s prometheus-serviceMonitorEtcd.yaml

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: etcd-k8s
      name: etcd-k8s
      namespace: monitoring
    spec:
      endpoints:
      - port: port
        interval: 30s
        scheme: https
        #port: https-metrics
        tlsConfig:
          caFile: /etc/prometheus/secrets/etcd-certs/ca.cert
          certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.cert
          keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
          insecureSkipVerify: true
      jobLabel: k8s-app
      namespaceSelector:
        matchNames:
        - kube-system
      selector:
        matchLabels:
          k8s-app: etcd
    

    应用prometheus-serviceMonitorEtcd.yaml

    #kubectl apply -f prometheus-serviceMonitorEtcd.yaml
    

    创建关联的service,因为etcd是外部的,所以需要手动创建endpoints.prometheus-service-etcd.yaml

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        k8s-app: etcd
      name: etcd-k8s
      namespace: kube-system
    spec:
      ports:
      - name: port
        port: 2379
        protocol: TCP
      type: ClusterIP
      clusterIP: None
    ---
    apiVersion: v1
    kind: Endpoints
    metadata:
      name: etcd-k8s
      namespace: kube-system
      labels:
        k8s-app: etcd
    subsets:
    - addresses:
      - ip: 22.22.3.231
        nodeName: etcd01
      - ip: 22.22.3.232
        nodeName: etcd02
      - ip: 22.22.3.233
        nodeName: etcd03
      ports:
      - name: port
        port: 2379
        protocol: TCP
    

    应用prometheus-service-etcd.yaml

    #kubectl apply -f prometheus-service-etcd.yaml
    
    image.png

    https://grafana.com/dashboards 找到etcd相关dashboard
    https://grafana.com/dashboards/3070

    image.png
    下载json文件,并导入到grafana,需要修改prometheus为prometheus
    image.png
    查看dashboard
    image.png
    • Prometheus和serviceMonitor的配置错误可能导致pod prometheus-k8s-0和prometheus-k8s-1不正常,从而导致prometheus无法打开,只要将配置修改正确就可以恢复。

    相关文章

      网友评论

        本文标题:kube-prometheus

        本文链接:https://www.haomeiwen.com/subject/cdacxctx.html