美文网首页k8s容器K8s
k8s-Prometheus+Grafana(Volume)

k8s-Prometheus+Grafana(Volume)

作者: 会倒立的香飘飘 | 来源:发表于2021-07-26 14:13 被阅读0次

    一,Prometheus简介

    #Prometheus (中文名:普罗米修斯)是由 SoundCloud 开发的开源监控报警系统和时序列数据库(TSDB).自2012年起,许多公司及组织已经采用 Prometheus,并且该项目有着非常活跃的开发者和用户社区.现在已经成为一个独立的开源项目。Prometheus 在2016加入 CNCF ( Cloud Native Computing Foundation ), 作为在 kubernetes 之后的第二个由基金会主持的项目。 Prometheus 的实现参考了Google内部的监控实现,与源自Google的Kubernetes结合起来非常合适。另外相比influxdb的方案,性能更加突出,而且还内置了报警功能。它针对大规模的集群环境设计了拉取式的数据采集方式,只需要在应用里面实现一个metrics接口,然后把这个接口告诉Prometheus就可以完成数据采集了
    下图为prometheus的架构图。

    image.png

    1,Prometheus的特点:

    1、多维数据模型(时序列数据由metric名和一组key/value组成)
    2、在多维度上灵活的查询语言(PromQl)
    3、不依赖分布式存储,单主节点工作.
    4、通过基于HTTP的pull方式采集时序数据
    5、可以通过中间网关进行时序列数据推送(pushing)
    6、目标服务器可以通过发现服务或者静态配置实现
    7、多种可视化和仪表盘支持
    
    prometheus 相关组件,Prometheus生态系统由多个组件组成,其中许多是可选的:
    1、Prometheus 主服务,用来抓取和存储时序数据
    2、client library 用来构造应用或 exporter 代码 (go,java,python,ruby)
    3、push 网关可用来支持短连接任务
    4、可视化的dashboard (两种选择,promdash 和 grafana.目前主流选择是 grafana.)
    4、一些特殊需求的数据出口(用于HAProxy, StatsD, Graphite等服务)
    5、实验性的报警管理端(alartmanager,单独进行报警汇总,分发,屏蔽等 )
    

    2,监控指标

    1,在node中对资源节点,统计他们的利用率。(硬件,系统)
    2,监控pod的情况(系统,应用)
    3,k8s的资源监控(应用)
    4,业务层面的监控(业务)
    5,硬件、系统、应用、API监控、业务监控、流量分析。
    
    监控指标            具体实现                 举例 
    Pod性能            cAdvisor                 容器CPU,内存利用率   
    Node性能           node-exporter            节点CPU,内存利用率   
    K8S资源对象        kube-state-metrics       Pod/Deployment/Service
    
    1,如何监控POD:
      部署一个项目,有很多pod,也有很多容器,如何动态感知这些容器呢,他是通过普罗米修斯自身的kube-state-metrics实时的从k8s的api-server里面做实时的listwatch去感知到有没有新的事件的创建,从而自动的检索到普罗米修斯监控。
    
    相关其他:
    *   除了kube-state-metrics监控K8S资源以外还有很多监控发现,比如azure_sd_config的服务发现,基于consul_sd_config,ec2,openstack,file_sd_config,等等。可以在下面这个链接看到。
    
    *   [https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config)
    
    *   安装:[https://grafana.com/grafana/download](https://grafana.com/grafana/download)
    
    *   仪表盘模板:[https://grafana.com/grafana/dashboards](https://grafana.com/grafana/dashboards)
    
    

    二,部署Prometheus

    部署前提是数据持久化存储,pv和pvc。
    挑选NODE2这台做NFS服务器,但是NFS和RPCBIND是三台都安装。

    node02:
    
    yum -y install nfs-utils rpcbind
    mkdir -p /nfs/kubernetes
    systemctl enable rpcbind
    systemctl enable nfs
    systemctl start nfs
    vim /etc/exports
    /nfs/kubernetes *(rw,no_root_squash)
    exportfs -r
    测试:
    mount -t 10.0.0.13:/nfs/kubernetes :/mnt
    
    [root@k8s-master prometheus]# mount |grep nfs
    sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
    nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
    10.0.0.13:/nfs/kubernetes on /mnt type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.11,local_lock=none,addr=10.0.0.13)
    在nfs上新建三个文件
    mkdir /nfs/kubernetes/{pv01,pv02,pv03}
    
    1,静态分配pv
    创建pv
    [root@k8s-master prometheus]# cat pv.yaml 
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv01
    spec:
      capacity:
        storage: 5Gi
      accessModes: 
        - ReadWriteMany
      nfs:
        path: /nfs/kubernetes/pv01
        server: 10.0.0.13
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv02
    spec:
      capacity:
        storage: 10Gi
      accessModes:
        - ReadWriteMany
      nfs:
        path: /nfs/kubernetes/pv02
        server: 10.0.0.13
    
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv03
    spec:
      capacity:
        storage: 50Gi
      accessModes:
        - ReadWriteMany
      nfs:
        path: /nfs/kubernetes/pv03
        server: 10.0.0.13
    
    创建pvc
    [root@k8s-master prometheus]# cat deploy-pvc.yaml 
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: web
      name: web
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: web
      template:
        metadata:
          labels:
            app: web
        spec:
          containers:
          - image: nginx
            name: nginx
            volumeMounts:
            - name: data
              mountPath: /usr/share/nginx/html
          volumes:
          - name: data
            persistentVolumeClaim: 
              claimName: my-pvc
              
    ---
    
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: my-pvc
    spec:
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 5Gi
    
    查看:
    [root@k8s-master prometheus]# kubectl get pv 
    NAME   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM            STORAGECLASS   REASON   AGE
    pv01   5Gi        RWX            Retain           Bound       default/my-pvc                           83s
    pv02   10Gi       RWX            Retain           Available                                            83s
    pv03   50Gi       RWX            Retain           Available                                            83s
    [root@k8s-master prometheus]# kubectl get pvc
    NAME     STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    my-pvc   Bound    pv01     5Gi        RWX                           9s
    
    
    2,动态分配pv
    需要先安装一个插件
    
    [root@k8s-master prometheus]# cat nfs-client.yaml 
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: nfs-client-provisioner
    ---
    kind: Deployment
    apiVersion: apps/v1 
    metadata:
      name: nfs-client-provisioner
    spec:
      replicas: 1
      strategy:
        type: Recreate
      selector:
        matchLabels:
          app: nfs-client-provisioner
      template:
        metadata:
          labels:
            app: nfs-client-provisioner
        spec:
          serviceAccountName: nfs-client-provisioner
          containers:
            - name: nfs-client-provisioner
              image: quay.io/external_storage/nfs-client-provisioner:latest
              volumeMounts:
                - name: nfs-client-root
                  mountPath: /persistentvolumes
              env:
                - name: PROVISIONER_NAME
                  value: fuseim.pri/nfs
                - name: NFS_SERVER
                  value: 10.0.0.13
                - name: NFS_PATH
                  value: /nfs/kubernetes
          volumes:
            - name: nfs-client-root
              nfs:
                server: 10.0.0.13
                path: /nfs/kubernetes
    生成动态存储
    
    [root@k8s-master prometheus]# cat class.yaml 
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: managed-nfs-storage
    provisioner: fuseim.pri/nfs # or choose another name, must match deployment's env PROVISIONER_NAME'
    parameters:
      archiveOnDelete: "true"
    
    授权
    [root@k8s-master prometheus]# cat rbac.yaml 
    kind: ServiceAccount
    apiVersion: v1
    metadata:
      name: nfs-client-provisioner
    ---
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: nfs-client-provisioner-runner
    rules:
      - apiGroups: [""]
        resources: ["persistentvolumes"]
        verbs: ["get", "list", "watch", "create", "delete"]
      - apiGroups: [""]
        resources: ["persistentvolumeclaims"]
        verbs: ["get", "list", "watch", "update"]
      - apiGroups: ["storage.k8s.io"]
        resources: ["storageclasses"]
        verbs: ["get", "list", "watch"]
      - apiGroups: [""]
        resources: ["events"]
        verbs: ["create", "update", "patch"]
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: run-nfs-client-provisioner
    subjects:
      - kind: ServiceAccount
        name: nfs-client-provisioner
        namespace: default
    roleRef:
      kind: ClusterRole
      name: nfs-client-provisioner-runner
      apiGroup: rbac.authorization.k8s.io
    ---
    kind: Role
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: leader-locking-nfs-client-provisioner
    rules:
      - apiGroups: [""]
        resources: ["endpoints"]
        verbs: ["get", "list", "watch", "create", "update", "patch"]
    ---
    kind: RoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: leader-locking-nfs-client-provisioner
    subjects:
      - kind: ServiceAccount
        name: nfs-client-provisioner
        # replace with namespace where provisioner is deployed
        namespace: default
    roleRef:
      kind: Role
      name: leader-locking-nfs-client-provisioner
      apiGroup: rbac.authorization.k8s.io
    
    自动创建存储卷
    [root@k8s-master prometheus]# cat web.yaml 
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: web
      name: web
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: web
      template:
        metadata:
          labels:
            app: web
        spec:
          containers:
          - image: nginx
            name: nginx
            volumeMounts:
            - name: data
              mountPath: /usr/share/nginx/html
          volumes:
          - name: data
            persistentVolumeClaim: 
              claimName: my-pvc
              
    ---
    
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: my-pvc
    spec:
      storageClassName: "managed-nfs-storage"
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 5Gi
    
    [root@k8s-master prometheus]# kubectl get pvc 
    NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
    my-pvc   Bound    pvc-bcecf31e-ab0b-4428-bbab-8dcae441b51c   5Gi        RWX            managed-nfs-storage   10s
    [root@k8s-master prometheus]# kubectl get pv
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM            STORAGECLASS          REASON   AGE
    pv01                                       5Gi        RWX            Retain           Released    default/my-pvc                                  41m
    pv02                                       10Gi       RWX            Retain           Available                                                   41m
    pv03                                       50Gi       RWX            Retain           Available                                                   41m
    pvc-bcecf31e-ab0b-4428-bbab-8dcae441b51c   5Gi        RWX            Delete           Bound       default/my-pvc   managed-nfs-storage            113s
    自动生成pvc成功
    
    
    3,部署Prometheus
    创建namespace
    kubectl create ns prometheus
    [root@k8s-master prometheus]# cat prome-rbac.yaml 
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: prometheus
      namespace: prometheus
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRole
    metadata:
      name: prometheus
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile 
    rules:
      - apiGroups:
          - ""
        resources:
          - nodes
          - nodes/metrics
          - services
          - endpoints
          - pods
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - ""
        resources:
          - configmaps
        verbs:
          - get
      - nonResourceURLs:
          - "/metrics"
        verbs:
          - get
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRoleBinding
    metadata:
      name: prometheus
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: prometheus
    subjects:
    - kind: ServiceAccount
      name: prometheus
      namespace: prometheus
    
    创建configmap
    [root@k8s-master prometheus]# cat configmap.yaml 
    # Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: prometheus 
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      prometheus.yml: |
        rule_files:
        - /etc/config/rules/*.rules
    
        scrape_configs:
        - job_name: prometheus
          static_configs:
          - targets:
            - localhost:9090
    
        - job_name: kubernetes-apiservers
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - action: keep
            regex: default;kubernetes;https
            source_labels:
            - __meta_kubernetes_namespace
            - __meta_kubernetes_service_name
            - __meta_kubernetes_endpoint_port_name
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
     
        - job_name: kubernetes-nodes-kubelet
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
        - job_name: kubernetes-nodes-cadvisor
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __metrics_path__
            replacement: /metrics/cadvisor
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    
        - job_name: kubernetes-service-endpoints
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scrape
          - action: replace
            regex: (https?)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_service_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
    
        - job_name: kubernetes-services
          kubernetes_sd_configs:
          - role: service
          metrics_path: /probe
          params:
            module:
            - http_2xx
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_service_annotation_prometheus_io_probe
          - source_labels:
            - __address__
            target_label: __param_target
          - replacement: blackbox
            target_label: __address__
          - source_labels:
            - __param_target
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - source_labels:
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
    
        - job_name: kubernetes-pods
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - action: keep
            regex: true
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scrape
          - action: replace
            regex: (.+)
            source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            source_labels:
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            source_labels:
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels:
            - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name
        alerting:
          alertmanagers:
          - static_configs:
              - targets: ["alertmanager:80"]
    查看
    [root@k8s-master prometheus]# kubectl get cm -n prometheus 
    NAME                DATA   AGE
    prometheus-config   1      7s
    
    创建rules
    [root@k8s-master prometheus]# cat prome-rules.yaml 
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-rules
      namespace: prometheus
    data:
      general.rules: |
        groups:
        - name: general.rules
          rules:
          - alert: InstanceDown
            expr: up == 0
            for: 1m
            labels:
              severity: error 
            annotations:
              summary: "Instance {{ $labels.instance }} 停止工作"
              description: "{{ $labels.instance }} job {{ $labels.job }} 已经停止5分钟以上."
              
      node.rules: |
        groups:
        - name: node.rules
          rules:
          - alert: NodeFilesystemUsage
            expr: |
              100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / 
              node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80 
            for: 1m
            labels:
              severity: warning 
            annotations:
              summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
              description: "{{ $labels.instance }}: {{ $labels.mountpoint }} 分区使用大于80% (当前值: {{ $value }})"
    
          - alert: NodeMemoryUsage
            expr: |
              100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / 
              node_memory_MemTotal_bytes * 100 > 80
            for: 1m
            labels:
              severity: warning
            annotations:
              summary: "Instance {{ $labels.instance }} 内存使用率过高"
              description: "{{ $labels.instance }}内存使用大于80% (当前值: {{ $value }})"
    
          - alert: NodeCPUUsage    
            expr: |
              100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 60 
            for: 1m
            labels:
              severity: warning
            annotations:
              summary: "Instance {{ $labels.instance }} CPU使用率过高"       
              description: "{{ $labels.instance }}CPU使用大于60% (当前值: {{ $value }})"
    
          - alert: KubeNodeNotReady
            expr: |
              kube_node_status_condition{condition="Ready",status="true"} == 0
            for: 1m
            labels:
              severity: error
            annotations:
              message: '{{ $labels.node }} 已经有10多分钟没有准备好了.'
    
      pod.rules: |
        groups:
        - name: pod.rules
          rules:
          - alert: PodCPUUsage
            expr: |
               sum(rate(container_cpu_usage_seconds_total{image!=""}[1m]) * 100) by (pod_name, namespace) > 80
            for: 5m
            labels:
              severity: warning 
            annotations:
              summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod_name }} CPU使用大于80% (当前值: {{ $value }})"
    
          - alert: PodMemoryUsage
            expr: |
               sum(container_memory_rss{image!=""}) by(pod_name, namespace) / 
               sum(container_spec_memory_limit_bytes{image!=""}) by(pod_name, namespace) * 100 != +inf > 80
            for: 5m
            labels:
              severity: warning 
            annotations:
              summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod_name }} 内存使用大于80% (当前值: {{ $value }})"
    
          - alert: PodNetworkReceive
            expr: |
               sum(rate(container_network_receive_bytes_total{image!="",name=~"^k8s_.*"}[5m]) /1000) by (pod_name,namespace)  > 30000
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod_name }} 入口流量大于30MB/s (当前值: {{ $value }}K/s)"           
    
          - alert: PodNetworkTransmit
            expr: | 
               sum(rate(container_network_transmit_bytes_total{image!="",name=~"^k8s_.*"}[5m]) /1000) by (pod_name,namespace) > 30000
            for: 5m
            labels:
              severity: warning 
            annotations:
              summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod_name }} 出口流量大于30MB/s (当前值: {{ $value }}/K/s)"
    
          - alert: PodRestart
            expr: |
               sum(changes(kube_pod_container_status_restarts_total[1m])) by (pod,namespace) > 0
            for: 1m
            labels:
              severity: warning 
            annotations:
              summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod重启 (当前值: {{ $value }})"
    
          - alert: PodFailed
            expr: |
               sum(kube_pod_status_phase{phase="Failed"}) by (pod,namespace) > 0
            for: 5s
            labels:
              severity: error 
            annotations:
              summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态Failed (当前值: {{ $value }})"
    
          - alert: PodPending
            expr: | 
               sum(kube_pod_status_phase{phase="Pending"}) by (pod,namespace) > 0
            for: 1m
            labels:
              severity: error
            annotations:
              summary: "命名空间: {{ $labels.namespace }} | Pod名称: {{ $labels.pod }} Pod状态Pending (当前值: {{ $value }})"
    
    查看
    
    [root@k8s-master prometheus]# kubectl get cm -n prometheus 
    NAME                DATA   AGE
    prometheus-config   1      3m
    prometheus-rules    3      16s
    
    部署prome+svc
    
    [root@k8s-master prometheus]# cat prome-statefulset-svc.yaml 
    kind: Service
    apiVersion: v1
    metadata: 
      name: prometheus
      namespace: prometheus
      labels: 
        kubernetes.io/name: "Prometheus"
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    spec: 
      type: NodePort
      ports: 
        - name: http 
          port: 9090
          protocol: TCP
          targetPort: 9090
          nodePort: 30090
      selector: 
        k8s-app: prometheus
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: prometheus 
      namespace: prometheus
      labels:
        k8s-app: prometheus
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
        version: v2.2.1
    spec:
      serviceName: "prometheus"
      replicas: 1
      podManagementPolicy: "Parallel"
      updateStrategy:
       type: "RollingUpdate"
      selector:
        matchLabels:
          k8s-app: prometheus
      template:
        metadata:
          labels:
            k8s-app: prometheus
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
        spec:
          priorityClassName: system-cluster-critical
          serviceAccountName: prometheus
          initContainers:
          - name: "init-chown-data"
            image: "busybox:latest"
            imagePullPolicy: "IfNotPresent"
            command: ["chown", "-R", "65534:65534", "/data"]
            volumeMounts:
            - name: prometheus-data
              mountPath: /data
              subPath: ""
          containers:
            - name: prometheus-server-configmap-reload
              image: "jimmidyson/configmap-reload:v0.1"
              imagePullPolicy: "IfNotPresent"
              args:
                - --volume-dir=/etc/config
                - --webhook-url=http://localhost:9090/-/reload
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/config
                  readOnly: true
              resources:
                limits:
                  cpu: 10m
                  memory: 10Mi
                requests:
                  cpu: 10m
                  memory: 10Mi
    
            - name: prometheus-server
              image: "prom/prometheus:v2.2.1"
              imagePullPolicy: "IfNotPresent"
              args:
                - --config.file=/etc/config/prometheus.yml
                - --storage.tsdb.path=/data
                - --web.console.libraries=/etc/prometheus/console_libraries
                - --web.console.templates=/etc/prometheus/consoles
                - --web.enable-lifecycle
              ports:
                - containerPort: 9090
              readinessProbe:
                httpGet:
                  path: /-/ready
                  port: 9090
                initialDelaySeconds: 30
                timeoutSeconds: 30
              livenessProbe:
                httpGet:
                  path: /-/healthy
                  port: 9090
                initialDelaySeconds: 30
                timeoutSeconds: 30
              # based on 10 running nodes with 30 pods each
              resources:
                limits:
                  cpu: 200m
                  memory: 1000Mi
                requests:
                  cpu: 200m
                  memory: 1000Mi
                
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/config
                - name: prometheus-data
                  mountPath: /data
                  subPath: ""
                - name: prometheus-rules
                  mountPath: /etc/config/rules
    
          terminationGracePeriodSeconds: 300
          volumes:
            - name: config-volume
              configMap:
                name: prometheus-config
            - name: prometheus-rules
              configMap:
                name: prometheus-rules
    
      volumeClaimTemplates:
      - metadata:
          name: prometheus-data
        spec:
          storageClassName: managed-nfs-storage 
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: "16Gi"
    
    查看:
    [root@k8s-master prometheus]# kubectl get pods -n prometheus 
    NAME           READY   STATUS    RESTARTS   AGE
    prometheus-0   2/2     Running   0          3m55s
    [root@k8s-master prometheus]# kubectl get svc -n prometheus 
    NAME         TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    prometheus   NodePort   10.105.99.207   <none>        9090:30090/TCP   4m4s
    
    
    image.png

    4,部署grafan展示数据

    
    [root@k8s-master prometheus]# cat granfana.yaml 
    apiVersion: apps/v1 
    kind: Deployment 
    metadata:
      name: grafana
      namespace: prometheus
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: grafana
      template:
        metadata:
          labels:
            app: grafana
        spec:
          containers:
          - name: grafana
            image: grafana/grafana
            ports:
              - containerPort: 3000
                protocol: TCP
            resources:
              limits:
                cpu: 100m            
                memory: 256Mi          
              requests:
                cpu: 100m            
                memory: 256Mi
            volumeMounts:
              - name: grafana-data
                mountPath: /var/lib/grafana
                subPath: grafana
          securityContext:
            fsGroup: 472
            runAsUser: 472
          volumes:
          - name: grafana-data
            persistentVolumeClaim:
              claimName: grafana 
    
    ---
    
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: grafana 
      namespace: prometheus
    spec:
      storageClassName: "managed-nfs-storage"
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 5Gi
    
    ---
    
    apiVersion: v1
    kind: Service
    metadata:
      name: grafana
      namespace: prometheus
    spec:
      type: NodePort
      ports:
      - port : 80
        targetPort: 3000
        nodePort: 30007
      selector:
        app: grafana
    
    查看:
    [root@k8s-master prometheus]# kubectl get pods -n prometheus 
    NAME                       READY   STATUS    RESTARTS   AGE
    grafana-85dc8f8cd4-bcjkk   1/1     Running   0          2m59s
    prometheus-0               2/2     Running   0          18m
    [root@k8s-master prometheus]# kubectl get svc -n prometheus 
    NAME         TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
    grafana      NodePort   10.101.145.125   <none>        80:30007/TCP     3m7s
    prometheus   NodePort   10.105.99.207    <none>        9090:30090/TCP   18m
    
    [root@k8s-master prometheus]# kubectl get pv
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS          REASON   AGE
    pv01                                       5Gi        RWX            Retain           Released    default/my-pvc                                                           76m
    pv02                                       10Gi       RWX            Retain           Available                                                                            76m
    pv03                                       50Gi       RWX            Retain           Available                                                                            76m
    pvc-2db8aab7-2325-4f13-8f24-c6ba31fcb876   16Gi       RWO            Delete           Bound       prometheus/prometheus-data-prometheus-0   managed-nfs-storage            18m
    pvc-53e19eb4-4f97-4fad-ad88-75ce607632d4   5Gi        RWX            Delete           Bound       prometheus/grafana                        managed-nfs-storage            3m16s
    pvc-bcecf31e-ab0b-4428-bbab-8dcae441b51c   5Gi        RWX            Delete           Bound       default/my-pvc                            managed-nfs-storage            36m
    
    部署grafana-ingress
    
    [root@k8s-master prometheus]# cat granfana-ingress.yaml 
    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
       name: grafana
       namespace: prometheus
    spec:
       rules:
       - host: k8s.grafana
         http:
           paths:
           - path: /
             backend:
              serviceName: grafana
              servicePort: 80
    查看:
    [root@k8s-master prometheus]# kubectl get ingress -n prometheus 
    NAME      CLASS    HOSTS         ADDRESS   PORTS   AGE
    grafana   <none>   k8s.grafana             80      10s
    
    

    做好hosts访问:默认用户密码都是admin


    image.png image.png image.png
    image.png
    下载模板:https://grafana.com/grafana/dashboards/315 image.png
    image.png image.png

    相关文章

      网友评论

        本文标题:k8s-Prometheus+Grafana(Volume)

        本文链接:https://www.haomeiwen.com/subject/iehvmltx.html