背景
因某新项目需要需prometheus中采集相关数据,但是发现数据一会有一会没有,觉得很奇怪,追查之后发现在prometheus通过语句查询发现数据出现断图的情况,如下图:
image.png
通过上图怀疑可能跟采集源有关系,故k8s中的kube-state-metrics有关,经过查询对比发现这个组件的内存占用基本到了100%,那么解决的办法也就简单了。
解决办法
首先找到kube-state-metrics的yaml配置文件,看下当前配置的cpu和内存是多少,调整上限值即可,我查看了github中默认配置参数,最终已自己的实际环境配置为主:
github kube-state-metrics v1.3
apiVersion: apps/v1beta2
# Kubernetes versions after 1.9.0 should use apps/v1
# Kubernetes versions before 1.8.0 should use apps/v1beta1 or extensions/v1beta1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: kube-state-metrics
replicas: 1
template:
metadata:
labels:
k8s-app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.3.0
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: k8s.gcr.io/addon-resizer:1.7
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 30Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --container=kube-state-metrics
- --cpu=100m #根据实际情况进行修改,默认100m
- --extra-cpu=1m
- --memory=100Mi #根据实际情况进行修改,默认100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics
修改完之后重新应用一下配置文件即可,命令如下
kubectl apply -f kube-state-metrics-deployment.yaml
等待容器发布更新之后,等10~30分钟之后可以再去观察监控数据是否还会有断点断图的情况,我这里显示已经恢复,如下图
image.png
网友评论