Prometheus-operator介绍
Prometheus-operator是CoreOS开发的基于Prometheus的K8s监控方案,是目前功能最全面开源方案。

官网上的原始架构图比上面更复杂,只保留使用到的组件
- prometheus-operator:部署prometheus,存储监控数据。
- kube-state-metrics:收集k8s集群内资源对象数据。
- node_exporter:收集集群中各节点的数据。
- prometheus:存储node_exporter收集的数据。
- alertmanager:实现监控报警。
- grafana:实现数据可视化。
部署kube-prometheus
- 部署环境信息:
- kubernetes当前版本:v1.16.0
- prometheus当前版本: v0.34.0
kube-prometheus官网地址: https://github.com/coreos/kube-prometheus
源码下载:
git clone https://github.com/coreos/kube-prometheus.git
# 官方把所有资源配置文件都放到一个文件目录下,为了方便,把不同服务的清单文件分别归档
cd kube-prometheus/manifests
mv *-serviceMonitor* serviceMonitor/
mv setup operator
mv grafana-* grafana/
mv kube-state-metrics-* kube-state-metrics/
mv alertmanager-* alertmanager/
mv node-exporter-* node-exporter/
mv prometheus-adapter-* adapter/
mv prometheus-* prometheus/
ls -al
drwxr-xr-x 10 root root 244 1月 9 15:07 .
drwxr-xr-x 6 root root 108 1月 9 15:03 ..
drwxr-xr-x 2 root root 4096 1月 9 15:01 adapter
drwxr-xr-x 2 root root 149 1月 9 15:01 alertmanager
-rw-r--r-- 1 root root 102 1月 9 15:01 create-secret-job
drwxr-xr-x 2 root root 219 1月 9 15:01 grafana
drwxr-xr-x 2 root root 305 1月 9 15:01 kube-state-metrics
drwxr-xr-x 2 root root 200 1月 9 15:01 node-exporter
drwxr-xr-x 2 root root 4096 1月 9 15:01 operator
drwxr-xr-x 2 root root 4096 1月 9 15:01 prometheus
-rw-r--r-- 1 root root 995 1月 9 15:01 prometheus-additional.yaml
-rw-r--r-- 1 root root 2516 1月 9 10:29 prometheus-sc.yaml
drwxr-xr-x 2 root root 4096 1月 9 15:01 serviceMonitor
高级配置(服务自动发现、持久化存储)
一、服务自动发现:
1、定义配置job(prometheus-additional.yaml)
$ cd kube-prometheus/manifests
$ vi prometheus-additional.yaml
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
2、创建jod的secret对象
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
3、声明 prometheus 的资源对象文件中添加上这个额外的配置:(prometheus-prometheus.yaml)
添加:
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
###声明 prometheus 的资源对象文件
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
4、修改 prometheus-k8s 的 ClusterRole(prometheus-clusterRole.yaml),添加访问权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
二、持久化存储:
1、创建一个 StorageClass 对象( prometheus-storageclass.yaml)
$ cd kube-prometheus/manifests
$ vi prometheus-sc.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-provisioner
subjects:
- kind: ServiceAccount
name: nfs-provisioner
namespace: monitoring
roleRef:
kind: ClusterRole
name: nfs-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-provisioner
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-provisioner
subjects:
- kind: ServiceAccount
name: nfs-provisioner
namespace: monitoring
roleRef:
kind: Role
name: leader-locking-nfs-provisioner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-provisioner
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: nfs-provisioner
spec:
selector:
matchLabels:
app: nfs-provisioner
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: nfs-provisioner
spec:
serviceAccount: nfs-provisioner
containers:
- name: nfs-provisioner
image: registry.cn-hangzhou.aliyuncs.com/open-ali/nfs-client-provisioner
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: prometheus-data-db/nfs
- name: NFS_SERVER
value: 192.168.17.109
- name: NFS_PATH
value: /data/nfs/med/prometheus
volumes:
- name: nfs-client-root
nfs:
server: 192.168.17.109
path: /data/nfs/med/prometheus
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: prometheus-data-db
provisioner: prometheus-data-db/nfs
reclaimPolicy: Retain
2、在 prometheus 的 CRD 资源对象中添加如下配置:(prometheus-prometheus.yaml)
###声明 prometheus 的资源对象文件
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
###声明 引用sc资源
storage:
volumeClaimTemplate:
spec:
storageClassName: prometheus-data-db
resources:
requests:
storage: 10Gi
三、部署kube-prometheus
1、创建一个 StorageClass 对象( prometheus-storageclass.yaml),并查看
kubectl apply -f prometheus-sc.yaml
kubectl get sc
NAME PROVISIONER AGE
prometheus-data-db prometheus-data-db/nfs 7h52m
2、创建prometheus监控专有命名空间monitoring
kubectl apply -f operator/
3、部署metrics
kubectl apply -f kube-state-metrics/
4、部署其它组件
kubectl apply -f adapter/
kubectl apply -f alertmanager/
kubectl apply -f node-exporter/
kubectl apply -f grafana/
kubectl apply -f prometheus/
kubectl apply -f serviceMonitor/
## 部署完成后,可以查看下个资源运行部署详情
kubectl get all -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main ClusterIP 10.96.92.90 <none> 9093/TCP 23h
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 23h
service/grafana ClusterIP 10.96.156.52 <none> 3000/TCP 23h
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 23h
service/node-exporter ClusterIP None <none> 9100/TCP 23h
service/prometheus-adapter ClusterIP 10.96.169.141 <none> 443/TCP 23h
service/prometheus-k8s ClusterIP 10.96.129.69 <none> 9090/TCP 23h
service/prometheus-operated ClusterIP None <none> 9090/TCP 7h58m
service/prometheus-operator ClusterIP None <none> 8080/TCP 24h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR
AGE
daemonset.apps/node-exporter 30 30 30 30 30 kubernetes.io/os=linux
23h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/grafana 1/1 1 1 23h
deployment.apps/kube-state-metrics 1/1 1 1 23h
deployment.apps/nfs-provisioner 1/1 1 1 18h
deployment.apps/prometheus-adapter 1/1 1 1 23h
deployment.apps/prometheus-operator 1/1 1 1 24h
NAME DESIRED CURRENT READY AGE
replicaset.apps/grafana-58dc7468d7 1 1 1 23h
replicaset.apps/kube-state-metrics-769f4fd4d5 1 1 1 23h
replicaset.apps/nfs-provisioner-567bbdd9f 1 1 1 18h
replicaset.apps/prometheus-adapter-5cd5798d96 1 1 1 23h
replicaset.apps/prometheus-operator-99dccdc56 1 1 1 24h
NAME READY AGE
statefulset.apps/alertmanager-main 3/3 23h
statefulset.apps/prometheus-k8s 2/2 7h58m
查看自动创建的pv以及pvc,并对pvc的回收策略进行更改:
[root@med-harbor-0 manifests]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM
STORAGECLASS REASON AGE
pvc-05bef87f-0823-4832-9fb0-a7c559d28ffc 10Gi RWO Retain Bound monitoring/pro
metheus-k8s-db-prometheus-k8s-1 prometheus-data-db 8h
pvc-d47c659b-479b-4fd7-9b6c-8a98f697225f 10Gi RWO Retain Bound monitoring/pro
metheus-k8s-db-prometheus-k8s-0 prometheus-data-db 8h
# kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MO
DES STORAGECLASS AGE
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-d47c659b-479b-4fd7-9b6c-8a98f697225f 10Gi RWO
prometheus-data-db 8h
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-05bef87f-0823-4832-9fb0-a7c559d28ffc 10Gi RWO
prometheus-data-db 8h
[root@med-harbor-0 manifests]#
kubectl edit -n monitoring pv pvc-05bef87f-0823-4832-9fb0-a7c559d28ffc
persistentVolumeReclaimPolicy: Retain##Delete更改为Retain
storageClassName: prometheus-data-db
volumeMode: Filesystem
5、使用Ingress将端口暴露出来并访问:(略)
prometheus-k8s:9090
alertmanager-main:9093
service/grafana:3000
6、配置grafana
访问grafana,初始化密码为admin/admin,点击Data Source配置prometheus:


网友评论