data:image/s3,"s3://crabby-images/17928/17928d325e20cbc2db519a87904677b5d72bcedf" alt=""
一、优雅停服
1.1、pod的启动探针startupProbe
判断容器内的应用程序是否已启动。如果提供了启动探测,则禁用所有其他探测,直到它成功为止。如果启动探测失败,kubelet将杀死容器,容器将服从其重启策略。如果容器没有提供启动探测,则默认状态为成功。
- https://blog.csdn.net/qq_39680564/article/details/106650301
- https://blog.csdn.net/Jerry00713/article/details/123894868
# 示例中的成功依据是http://localhost:9036/mgm/health
# 如果响应的状态码大于等于200 且小于 400,则诊断被认为是成功的。
startupProbe:
failureThreshold: 22
httpGet:
path: /mgm/health
port: 9036
scheme: HTTP
initialDelaySeconds: 25
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
探针的配置说明
- initialDelaySeconds:容器启动后要等待多少秒后存活和就绪探测器才被初始化,默认是 0 秒,最小值是 0。
- periodSeconds:执行探测的时间间隔(单位是秒)。默认是 10 秒。最小值是 1。
- timeoutSeconds:探测的超时后等待多少秒。默认值是 1 秒。最小值是 1。
- successThreshold:探测器在失败后,被视为成功的最小连续成功数。默认值是 1。存活探测的这个值必须是 1。最小值是 1。
- failureThreshold:当 Pod 启动了并且探测到失败,Kubernetes 的重试次数。存活探测情况下的放弃就意味着重新启动容器。就绪探测情况下的放弃 Pod 会被打上未就绪的标签。默认值是 3。最小值是 1。
HTTP探针可以在 httpGet 上配置额外的字段:
- host:连接使用的主机名,默认是 Pod 的 IP。也可以在 HTTP 头中设置 “Host” 来代替。
- scheme:用于设置连接主机的方式(HTTP 还是 HTTPS)。默认是 HTTP。
- path:访问 HTTP 服务的路径。
- httpHeaders:请求中自定义的 HTTP 头。HTTP 头字段允许重复。
- port:访问容器的端口号或者端口名。如果数字必须在 1 ~ 65535 之间。
1.2、在lifecycle中,定义钩子函数
钩子函数能够感知自身生命周期中的事件,并在相应的时刻到来时运行用户指定的程序代码。k8s在主容器的启动之后和停止之前提供了两个钩子函数。
- post start:容器创建之后执行,如果失败了会重启容器
- pre stop:容器终止之前执行,执行完成之后容器将成功终止,在其完成之前会阻塞删除容器的操作
钩子处理器支持使用下面三种方式定义动作:
Exec命令:在容器内执行一次命令
lifecycle:
postStart:
exec:
command: - cat - /tmp/healthy
TCPSocket:在当前容器尝试访问指定的socket
lifecycle:
postStart:
tcpSocket:
port: 8080
HttpGet:在当前容器中向某url发起http请求
lifecycle:
postStart:
httpGet:
path: #uri地址
port:
host:
scheme: HTTP #支持的协议,http或者https
preStop钩子
lifecycle:
preStop:
exec:
command:
- /bin/sh
- '-c'
- >-
wget http://127.0.0.1:54199/offline 2>/tmp/null;sleep 45 &&
/opt/xxx/wrong-answer-service/bin/do_stop.sh
二、arms的数据采集
不要在“全局配置”里配置,验证的版本是arms-bootstrap-1.7.0-SNAPSHOT.jar
data:image/s3,"s3://crabby-images/85383/85383a16f44d36252b8863f934058d41e66c5eaf" alt=""
2.1、采样率
data:image/s3,"s3://crabby-images/a6893/a6893800f851d11e719817dd4de1284998d327a0" alt=""
2.2、忽略采集部分接口
建议忽略的接口:在默认的基础上追加,//mgm/health,//mgm/promethues
data:image/s3,"s3://crabby-images/cc2c7/cc2c797ff933cfb594ed425e5571084ed394a14f" alt=""
完整的yaml示例
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: '3'
creationTimestamp: '2022-09-28T05:24:37Z'
generation: 3
labels:
app: wrong-answer-service
group: xxx
managedFields:
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:labels':
.: {}
'f:app': {}
'f:spec':
'f:progressDeadlineSeconds': {}
'f:replicas': {}
'f:revisionHistoryLimit': {}
'f:selector': {}
'f:strategy':
'f:rollingUpdate':
.: {}
'f:maxSurge': {}
'f:maxUnavailable': {}
'f:type': {}
'f:template':
'f:metadata':
'f:labels':
.: {}
'f:app': {}
'f:armsPilotAutoEnable': {}
'f:armsPilotCreateAppName': {}
'f:spec':
'f:containers':
'k:{"name":"wrong-answer-service"}':
.: {}
'f:env':
.: {}
'k:{"name":"TZ"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"aliyun_logs_wrong-answer-service"}':
.: {}
'f:name': {}
'f:value': {}
'f:image': {}
'f:imagePullPolicy': {}
'f:lifecycle':
.: {}
'f:preStop':
.: {}
'f:exec':
.: {}
'f:command': {}
'f:name': {}
'f:ports':
.: {}
'k:{"containerPort":9036,"protocol":"TCP"}':
.: {}
'f:containerPort': {}
'f:protocol': {}
'f:readinessProbe':
.: {}
'f:failureThreshold': {}
'f:httpGet':
.: {}
'f:path': {}
'f:port': {}
'f:scheme': {}
'f:initialDelaySeconds': {}
'f:periodSeconds': {}
'f:successThreshold': {}
'f:timeoutSeconds': {}
'f:resources':
.: {}
'f:limits':
.: {}
'f:cpu': {}
'f:memory': {}
'f:requests':
.: {}
'f:cpu': {}
'f:memory': {}
'f:startupProbe':
.: {}
'f:failureThreshold': {}
'f:httpGet':
.: {}
'f:path': {}
'f:port': {}
'f:scheme': {}
'f:initialDelaySeconds': {}
'f:periodSeconds': {}
'f:successThreshold': {}
'f:timeoutSeconds': {}
'f:terminationMessagePath': {}
'f:terminationMessagePolicy': {}
'f:volumeMounts':
.: {}
'k:{"mountPath":"/etc/localtime"}':
.: {}
'f:mountPath': {}
'f:name': {}
'k:{"mountPath":"/opt/xxx/logs/xxljob-log/"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:subPath': {}
'k:{"mountPath":"/opt/xxx/wrong-answer-service/resources/"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:subPath': {}
'f:dnsPolicy': {}
'f:nodeSelector': {}
'f:restartPolicy': {}
'f:schedulerName': {}
'f:securityContext': {}
'f:terminationGracePeriodSeconds': {}
'f:volumes':
.: {}
'k:{"name":"volume-localtime"}':
.: {}
'f:hostPath':
.: {}
'f:path': {}
'f:type': {}
'f:name': {}
'k:{"name":"volume-resources"}':
.: {}
'f:name': {}
'f:persistentVolumeClaim':
.: {}
'f:claimName': {}
'k:{"name":"volume-xxljob"}':
.: {}
'f:name': {}
'f:persistentVolumeClaim':
.: {}
'f:claimName': {}
manager: python-requests
operation: Update
time: '2022-09-28T05:24:37Z'
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:labels':
'f:group': {}
'f:spec':
'f:template':
'f:metadata':
'f:annotations':
.: {}
'f:redeploy-timestamp': {}
manager: ACK-Console Apache-HttpClient
operation: Update
time: '2022-09-28T07:10:46Z'
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:deployment.kubernetes.io/revision': {}
'f:status':
'f:availableReplicas': {}
'f:conditions':
.: {}
'k:{"type":"Available"}':
.: {}
'f:lastTransitionTime': {}
'f:lastUpdateTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
'k:{"type":"Progressing"}':
.: {}
'f:lastTransitionTime': {}
'f:lastUpdateTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
'f:observedGeneration': {}
'f:readyReplicas': {}
'f:replicas': {}
'f:updatedReplicas': {}
manager: kube-controller-manager
operation: Update
subresource: status
time: '2022-09-28T07:30:46Z'
name: wrong-answer-service
namespace: java-service
resourceVersion: '28026688'
uid: 2f898c1b-b5c1-4c31-a1ab-f5e27788a433
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: wrong-answer-service
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
redeploy-timestamp: '1664350112420'
labels:
app: wrong-answer-service
armsPilotAutoEnable: 'on'
armsPilotCreateAppName: wrong-answer-service
spec:
containers:
# 环境变量,特别是时区
- env:
- name: aliyun_logs_wrong-answer-service
value: stdout
- name: TZ
value: Asia/Shanghai
# docker镜像
image: >-
xxx-harbor-registry.cn-hangzhou.cr.aliyuncs.com/xxx-zty/wrong-answer-service:1.0.16
imagePullPolicy: Always
lifecycle:
preStop:
exec:
command:
- /bin/sh
- '-c'
- >-
wget http://127.0.0.1:54199/offline 2>/tmp/null;sleep 45 &&
/opt/xxx/wrong-answer-service/bin/do_stop.sh
name: wrong-answer-service
ports:
- containerPort: 9036
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /mgm/health
port: 9036
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
limits:
cpu: '2'
memory: 2Gi
requests:
cpu: 250m
memory: 1717986918400m
startupProbe:
failureThreshold: 22
httpGet:
path: /mgm/health
port: 9036
scheme: HTTP
initialDelaySeconds: 25
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
# 挂载卷
volumeMounts:
- mountPath: /etc/localtime
name: volume-localtime
- mountPath: /opt/xxx/logs/xxljob-log/
name: volume-xxljob
subPath: wrong-answer-service
- mountPath: /opt/xxx/wrong-answer-service/resources/
name: volume-resources
subPath: wrong-answer-service
dnsPolicy: ClusterFirst
nodeSelector:
pod: normal
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 120
volumes:
- hostPath:
path: /etc/localtime
type: ''
name: volume-localtime
- name: volume-xxljob
persistentVolumeClaim:
claimName: xxljob
- name: volume-resources
persistentVolumeClaim:
claimName: resources
status:
availableReplicas: 2
conditions:
- lastTransitionTime: '2022-09-28T05:30:49Z'
lastUpdateTime: '2022-09-28T05:30:49Z'
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: 'True'
type: Available
- lastTransitionTime: '2022-09-28T05:24:37Z'
lastUpdateTime: '2022-09-28T07:30:46Z'
message: >-
ReplicaSet "wrong-answer-service-7f7957f69d" has successfully
progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
observedGeneration: 3
readyReplicas: 2
replicas: 2
updatedReplicas: 2
网友评论