20230323--轻量K8S环境准备(K8S pod调度之污点

作者: 負笈在线 | 来源:发表于2023-03-22 16:28 被阅读0次

kubernetes常见故障处理-k8s之连接异常（集群故障）
k8s中pod的相关操作
k8s schedule深入理解
k8s pod 优先级和抢占式调度
k8s 集群调度
k8s 去除master节点污点NoSchedule,添加mas
k8s系统预留资源
k8s搭建xxl-job测试环境
k8s下的jenkins如何设置maven
K8S-污点与污点容忍

1.高可用K8S集群在硬件资源不足条件下的临时处理

资源不足的情况下，直接开两台master节点；
⇒由于etcd集群部署在三台master上，所有为了保持etcd最小运行，必须至少开两台master；
⇒虽然生产环境下master不运行业务运行的POD，资源不足的情况下直接让master节点跑业务POD。

主机名	环境功能	IP	OS/应用版本	开关机状态
k8s-master01	K8S集群 --master	172.26.37.121	OS:AlmaLinux release 8.6 K8S Version:v1.23.8 资源：2C4G	开机
k8s-master02	K8S集群 --master	172.26.37.122	OS:AlmaLinux release 8.6 K8S Version:v1.23.8 资源：2C4G	开机
k8s-master03	K8S集群 --master	172.26.37.123	OS:AlmaLinux release 8.6 K8S Version:v1.23.8 资源：2C4G	一般关机
k8s-node01	K8S集群 --node	172.26.37.124	OS:AlmaLinux release 8.6 K8S Version:v1.23.8 资源：2C4G	一般关机
k8s-node02	K8S集群 --node	172.26.37.125	OS:AlmaLinux release 8.6 K8S Version:v1.23.8 资源：2C4G	一般关机
k8s-master-lb	K8S集群 --master-LB	172.26.37.126	-	-

查看各个节点状态：仅两台master节点运行

# kubectl get nodes
NAME           STATUS     ROLES    AGE    VERSION
k8s-master01   Ready      <none>   276d   v1.23.8
k8s-master02   Ready      <none>   276d   v1.23.8
k8s-master03   NotReady   <none>   276d   v1.23.8
k8s-node01     NotReady   <none>   276d   v1.23.8
k8s-node02     NotReady   <none>   276d   v1.23.8

确认etcd集群工作状态

# export ETCDCTL_API=3
# etcdctl --endpoints="172.26.37.123:2379,172.26.37.122:2379,172.26.37.121:2379" --cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem --cert=/etc/kubernetes/pki/etcd/etcd.pem --key=/etc/kubernetes/pki/etcd/etcd-key.pem  endpoint status --write-out=table
{"level":"warn","ts":"2023-03-23T15:17:52.600+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00041e540/172.26.37.123:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.26.37.123:2379: connect: no route to host\""}
Failed to get the status of endpoint 172.26.37.123:2379 (context deadline exceeded)
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|      ENDPOINT      |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 172.26.37.122:2379 | c79a1101ab7dd89c |   3.5.1 |  6.4 MB |      true |      false |        49 |     129359 |             129359 |        |
| 172.26.37.121:2379 | 7ee2e2811cb6a7f9 |   3.5.1 |  6.4 MB |     false |      false |        49 |     129359 |             129359 |        |
+--------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

查看两个节点角色标签

# kubectl describe node k8s-master01|grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule
# kubectl describe node k8s-master02|grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule

Kubernetes Taints状态说明：

PreferNoSchedule：kubernetes 将尽量避免把 Pod 调度到具有该污点的 Node 上，除非没有其他节点可调度
NoSchedule：kubernetes 将不会把 Pod 调度到具有该污点的 Node 上，但不会影响当前 Node 上已存在的Pod
NoExecute：kubernetes 将不会把 Pod 调度到具有该污点的 Node 上，同时也会将 Node 上已存在的 Pod 驱离

将master02节点配置为可以可调度状态

# kubectl taint nodes k8s-master02 node-role.kubernetes.io/master=:NoSchedule-
node/k8s-master02 untainted
# kubectl describe node k8s-master02|grep Taints
Taints:             <none>

污点语法：kubectl taint node [node] key=value[effect]
[effect] 可取值: [ NoSchedule | PreferNoSchedule | NoExecute ]
NoSchedule: 一定不能被调度
PreferNoSchedule: 尽量不要调度
NoExecute: 不仅不会调度, 还会驱逐Node上已有的Pod示例：

查看Taints污点：
# kubectl describe nodes k8s-master02 |grep Taints
添加Taints污点
# kubectl taint nodes k8s-master01 node-role.kubernetes.io/master=:NoSchedule 
删除Taints污点(污点名后面➕减号即可)
# kubectl taint nodes k8s-master01 node-role.kubernetes.io/master=:NoSchedule-
给节点打上role标签
# kubectl label nodes k8s-master01 node-role.kubernetes.io/node=
给节点去除role标签
# kubectl label nodes k8s-master01 node-role.kubernetes.io/node-

2.验证K8S集群仍然可用

安装busybox

# cat<<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - name: busybox
    image: busybox:1.28
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF

POD部署在master02节点

# kubectl get po -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
busybox   1/1     Running   0          76s   172.36.122.144   k8s-master02   <none>           <none>

登录容器并验证网络状态

# kubectl exec -it busybox -- /bin/sh
/ # nslookup kubernetes
Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
/ # nslookup kube-dns.kube-system
Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kube-dns.kube-system
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local
/ # nslookup www.baidu.com
Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      www.baidu.com
Address 1: 14.119.104.189
Address 2: 14.215.177.38
/ #