概述
2032456-3447c966ff470446.pngKubernetes节点有运行应用容器必备的服务,而这些都是受Master的控制。每次个节点上当然都要运行Docker。Docker来负责所有具体的映像下载和容器运行。 Kubernetes主要由以下几个核心组件组成:
- etcd保存了整个集群的状态;
- apiserver提供了资源操作的唯一入口,并提供认证、授权、访问控制、API注册和发现等机制;
- controller manager负责维护集群的状态,比如故障检测、自动扩展、滚动更新等;
- scheduler负责资源的调度,按照预定的调度策略将Pod调度到相应的机器上;
- kubelet负责维护容器的生命周期,同时也负责Volume(CVI)和网络(CNI)的管理;
- Container runtime负责镜像管理以及Pod和容器的真正运行(CRI);
- kube-proxy负责为Service提供cluster内部的服务发现和负载均衡;
- node 的职责是运行容器应用。Node 由 Master 管理,Node 负责监控并汇报容器的状态,并根据 Master 的要求管理容器的生命周期。Node 运行在 Linux 操作系统,可以是物理机或者是虚拟机。
- pod是 Kubernetes 的最小工作单元。每个 Pod 包含一个或多个容器。Pod 中的容器会作为一个整体被 Master 调度到一个 Node 上运行。pod为docker创建的一个容器。
除了核心组件,还有一些推荐的Add-ons:
- kube-dns负责为整个集群提供DNS服务
- Ingress Controller为服务提供外网入口
- Dashboard提供GUI
- Federation提供跨可用区的集群
- Fluentd-elasticsearch提供集群日志采集、存储与查询
工具及版本
工具 | 版本 |
---|---|
docker | 18.03.1.ce-1.el7.centos |
centos | 7.x |
Kubernetes | v1.18.0 |
kubeadm、kubelet、kubectl | 1.18.3-0 |
quay.io/coreos/flannel | v0.14.0 |
kubernetesui/dashboard | v2.0.0-rc7 |
registry.aliyuncs.com/google_containers/etcd | 3.4.3-0 |
k8s.gcr.io/coredns | 1.6.7 |
k8s.gcr.io/pause | 3.2 |
环境
节点 | 主机名 | IP |
---|---|---|
Master、etcd、registry | k8s-master | 192.168.3.130 |
node1 | k8s-node1 | 192.168.3.31 |
node2 | k8s-node2 | 192.168.3.32 |
hostnamectl set-hostname k8s-master
hostnamectl set-hostname k8s-node1
hostnamectl set-hostname k8s-node2
调整/etc/hosts:
#Master节点和node节点:
192.168.x.130 k8s-master
192.168.x.31 k8s-node1
192.168.x.32 k8s-node2
安装
3台服务器分别安装docker、kubeadm、kubelet、kubectl等工具。
docker
#!/bin/sh
#关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
#关闭selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0
## 关闭交换空间
swapoff -a
sed -i 's/.*swap.*/#&/' /etc/fstab
#安装docker依赖,docker源和docker
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum makecache fast
#yum install -y docker-ce-19.03.13 docker-ce-cli-19.03.13
yum install docker-ce-18.03.1.ce-1.el7.centos -y
mkdir -p /etc/docker
cat > /etc/docker/daemon.json << "EOF"
{
"registry-mirrors": ["https://zfzbet67.mirror.aliyuncs.com"],
"data-root": "/data/docker",
"insecure-registries":[
"192.168.x.234:8089"
]
}
EOF
systemctl daemon-reload
systemctl enable docker.service
systemctl restart docker
ipvs内核调整
#!/bin/sh
##
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -p
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
ipvs_modules="ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_fo ip_vs_nq ip_vs_sed ip_vs_ftp nf_conntrack_ipv4"
for kernel_module in \${ipvs_modules}; do
/sbin/modinfo -F filename \${kernel_module} > /dev/null 2>&1
if [ $? -eq 0 ]; then
/sbin/modprobe \${kernel_module}
fi
done
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep ip_vs
安装kubeadm、kubelet、kubectl
#!/bin/sh
#调整yum.repo
cat > /etc/yum.repos.d/kubernetes.repo << "EOF"
[kubernetes]
name=Kubernetes Repository
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
EOF
yum install -y kubeadm-1.18.3-0 kubelet-1.18.3-0 kubectl-1.18.3-0
systemctl start kubelet && systemctl enable kubelet && systemctl status kubelet
Master安装
通过kubeadm
搭建集群
# 切换到家目录
cd ~
# 导出配置文件
kubeadm config print init-defaults --kubeconfig ClusterConfiguration > kubeadm.yml
vi kubeadm.yaml
# 替换 imageRepository: k8s.gcr.io 为下面的内容
imageRepository: registry.aliyuncs.com/google_containers
# 查看所需镜像列表
kubeadm config images list --config kubeadm.yml
# 拉取k8s集群所需要的镜像
kubeadm config images pull --config kubeadm.yml
# 拉取完成之后查看本地镜像列表
docker images | grep registry.aliyuncs.com
kubeadm init --kubernetes-version=1.18.2 \
--apiserver-advertise-address=192.168.x.130 \
--image-repository=registry.aliyuncs.com/google_containers \
--pod-network-cidr=10.222.0.0/16
--ignore-preflight-errors=NumCPU
# 根据提示,输入如下命令
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 安装完后提示:
kubeadm join 192.168.x.130:6443 --token yey0s3.khwqa73ykgh0fmvz \
--discovery-token-ca-cert-hash sha256:606aa551fae0fba4ed1c61ca441a9f7493e9d42f3357037881dc01e8e39b4b96
安装flannel网络插件
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 会发现没法解析raw.githubusercontent.com 这个域名;会报如下图的错误信息
# 解决办法,打开下面地址,输入 raw.githubusercontent.com查看服务器的IP地址,如下图
https://site.ip138.com/raw.githubusercontent.com/
# 在/etc/hosts 中添加域名解析(我这里选择的是香港的IP)
151.101.76.133 raw.githubusercontent.com
# 重新获取flannel 插件yaml配置信息
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 获取成功之后安装
kubectl apply -f kube-flannel.yml
# 查看所有的pods信息
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7ff77c879f-pw7tl 0/1 Pending 0 44m
kube-system coredns-7ff77c879f-zhzvz 0/1 Pending 0 44m
kube-system etcd-node1 1/1 Running 0 44m
kube-system kube-apiserver-node1 1/1 Running 0 44m
kube-system kube-controller-manager-node1 1/1 Running 0 44m
kube-system kube-flannel-ds-amd64-58j6d 0/1 Init:0/1 0 22s
kube-system kube-proxy-mnvfl 1/1 Running 0 44m
kube-system kube-scheduler-node1 1/1 Running 0 44m
# 会发现STATUS有的处于Pending状态,说明正在安装
# 等所有的STATUS状态都变成Running状态之后再次查看节点信息
kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready master 49m v1.18.2
# 这时候发现当前节点node1即master节点已经初始化完成了
安装kubernetes-dashboard
# 下载dashboard的配置yaml文件
wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-rc7/aio/deploy/recommended.yaml
# kubernetes-dashboard 默认只能集群内部访问;所以这里修改配置文件让端口暴露给宿主机
vim recommended.yaml
# 加入下面的字段到recommended.yaml(如下图)
type: NodePort
image-20210907170326202.png
# 安装dashboard
kubectl apply -f recommended.yaml
# 查看kubernetes-dashboard映射到宿主机的端口
kubectl get svc -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.106.200.77 <none> 8000/TCP 10m
kubernetes-dashboard NodePort 10.100.186.120 <none> 443:31178/TCP 10m
# 在火狐浏览器中访问 dashboard,如下图选择Token登录
https://192.168.x.130:31121
# 获取kubernetes-dashboard 登录Token
kubectl -n kube-system describe $(kubectl get secret -n kube-system -o name | grep namespace) | grep token:
# 使用获取到的Token登录,如下图所示:
image-20210907170500262.png
Work安装
将work节点加入集群
kubeadm join 192.168.x.130:6443 --token yey0s3.khwqa73ykgh0fmvz --discovery-token-ca-cert-hash sha256:606aa551fae0fba4ed1c61ca441a9f7493e9d42f3357037881dc01e8e39b4b96
高可用部署
architecture-ha-k8s-cluster.png如上图部署,增加2个master节点,keepalived+haproxy部署3台做高可用。
环境
节点 | 主机名 | IP |
---|---|---|
Master、etcd、registry、keepalived、haproxy | k8s-master | 192.168.3.130 |
Master、etcd、registry | k8s-master-1 | 192.168.3.151 |
Master、etcd、registry | k8s-master-2 | 192.168.3.77 |
VIP | k8s-vip | 192.168.3.138 |
设置主机名以及/etc/hosts,这里vip没有用域名,建议统一用域名。
hostnamectl set-hostname k8s-master-1
hostnamectl set-hostname k8s-master-2
#/etc/hosts配置:
192.168.x.130 k8s-master
192.168.x.151 k8s-master-1
192.168.x.77 k8s-master-2
192.168.x.31 k8s-node1
192.168.x.32 k8s-node2
安装
安装keepalived 和 haproxy
- 安装keepalived 和 haproxy
yum install keepalived haproxy -y
- 配置 keepalived
节点配置根据节点的域名、ip、priority(主节点100、从节点 50)、interface替换即可。
cat <<EOF > /etc/keepalived/keepalived.conf
global_defs {
script_user root
enable_script_security
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight 2
}
vrrp_instance VI_1 {
state MASTER
interface ens192
virtual_router_id 161
priority 100
authentication {
auth_type PASS
auth_pass nice
}
unicast_src_ip 192.168.x.130
unicast_peer {
192.168.x.77
192.168.x.151
}
virtual_ipaddress {
192.168.x.138
}
track_script {
check_apiserver
}
}
EOF
- 配置 keepalived健康检查,设置chmod +x check_apiserver.sh可以执行。
vi /etc/keepalived/check_apiserver.sh
### 添加内容
#!/bin/sh
if [ $(ps -C haproxy --no-header | wc -l) -eq 0 ];then
systemctl stop keepalived
fi
- 配置haproxy
# /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log /dev/log local0
log /dev/log local1 notice
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 20s
timeout server 20s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
bind *:16443
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
server k8s-master 192.168.x.130:6443 check
server k8s-master-1 192.168.x.151:6443 check
server k8s-master-2 192.168.x.77:6443 check
- 上面的配置完成后启动
keepalived
和haproxy
,并设置为自动启动。
systemctl enable haproxy --now
systemctl enable keepalived --now
Master节点安装
先卸载之前的master节点,再通过kubeadm
搭建集群,与非高可用安装对比,调整如下,先在其中一台服务器执行以下命令:
kubeadm init --control-plane-endpoint "192.168.x.138:16443" \
--kubernetes-version=1.18.2 \
--image-repository=registry.aliyuncs.com/google_containers \
--pod-network-cidr=10.222.0.0/16 \
--service-cidr=10.222.0.0/16 \
--upload-certs \
--ignore-preflight-errors=NumCPU | tee kubeadm-init.log
输出日志:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 192.168.x.138:16443 --token hflxeu.7kbw3ayaqb9r6mi7 \
--discovery-token-ca-cert-hash sha256:1b87cc4635ed2005d2d2780cc0fdc779bcfcd38a359c8783d70388ce123c59a0 \
--control-plane --certificate-key 62fa89c160a01997ef9388d6bc182abf09da2ce9c2ce82ce3b4d83621227ebfa
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.x.138:16443 --token hflxeu.7kbw3ayaqb9r6mi7 \
--discovery-token-ca-cert-hash sha256:1b87cc4635ed2005d2d2780cc0fdc779bcfcd38a359c8783d70388ce123c59a0
其他的master节点执行以下命令:
kubeadm join 192.168.x.138:16443 --token tb07he.ytnyrlvhlehxx5n6 \
--discovery-token-ca-cert-hash sha256:1b87cc4635ed2005d2d2780cc0fdc779bcfcd38a359c8783d70388ce123c59a0 \
--control-plane --certificate-key e8172057200f1ebfeec1c143b2652ae240d8d67aec2149c4bcbc68421f285123
work节点安装
先卸载之前的worker节点,再加入kubeadm
搭建集群
kubeadm join 192.168.x.138:16443 --token tb07he.ytnyrlvhlehxx5n6 \
--discovery-token-ca-cert-hash sha256:1b87cc4635ed2005d2d2780cc0fdc779bcfcd38a359c8783d70388ce123c59a0
验证
- 执行命令查看:
[root@k8s-master ~]# kubectl get pods,svc --all-namespaces -o wide
[root@k8s-master ~]# kubectl get nodes -o wide
image-20210914164740270.png
image-20210914164834862.png
-
验证haproxy+keepalived高可用,停用192.168.x.130上的haproxy或者keepalived,发现虚拟ip漂移到192.168.x.151。
image-20210914165007503.png image-20210914165047895.png
其他必备组件安装
NFS
#任意一台做服务器,安装如下:
root# yum install -y nfs-utils rpcbind
root# mkdir -p /data/work/nfs-share/
root# vi /etc/exports #添加以下内容
/data/work/nfs-share *(insecure,rw,async,no_root_squash)
root# systemctl start rpcbind
root# systemctl enable rpcbind
root# systemctl enable nfs && systemctl restart nfs
#客户端安装,可以制作脚本,方便后续执行:
#!/bin/sh
yum install -y nfs-utils rpcbind
mkdir -p /data/work/nfs-share/
mount -t nfs 192.168.x.31:/data/work/nfs-share/ /data/work/nfs-share/
INGRESS
bVbJjhR.png这里我们选择traefik,编辑traffic-ingress.yaml文件如下并执行:
apiVersion: v1
kind: ServiceAccount
metadata:
name: traefik-ingress-controller
namespace: kube-system
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: traefik-ingress-controller
namespace: kube-system
labels:
k8s-app: traefik-ingress-lb
spec:
selector:
matchLabels:
k8s-app: traefik-ingress-lb
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
k8s-app: traefik-ingress-lb
name: traefik-ingress-lb
spec:
serviceAccountName: traefik-ingress-controller
terminationGracePeriodSeconds: 60
hostNetwork: true
containers:
- image: traefik:1.7.24
name: traefik-ingress-lb
ports:
- name: http
containerPort: 80
hostPort: 80
- name: admin
containerPort: 8080
hostPort: 8080
- name: https
containerPort: 443
hostPort: 443
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
args:
- --api
- --kubernetes
- --logLevel=ERROR
- --metrics.prometheus
- --web
- --metrics
- --configFile=/etc/traefik/config.toml
volumeMounts:
- mountPath: /etc/traefik
name: config
volumes:
- configMap:
defaultMode: 420
name: traefik-conf
name: config
---
kind: Service
apiVersion: v1
metadata:
name: traefik-ingress-service
namespace: kube-system
spec:
selector:
k8s-app: traefik-ingress-lb
ports:
- protocol: TCP
port: 80
name: web
- protocol: TCP
port: 8080
name: admin
- protocol: TCP
port: 443
name: https
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- secrets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses/status
verbs:
- update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: traefik-ingress-controller
subjects:
- kind: ServiceAccount
name: traefik-ingress-controller
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
name: traefik-web-ui
namespace: kube-system
spec:
selector:
k8s-app: traefik-ingress-lb
ports:
- name: web
port: 80
targetPort: 8080
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: traefik-web-ui
namespace: kube-system
spec:
rules:
- host: traefik-ui.minikube
http:
paths:
- path: /
backend:
serviceName: traefik-web-ui
servicePort: web
---
apiVersion: v1
data:
config.toml: |
defaultEntryPoints = ["http","https"]
insecureSkipVerify = true
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.https]
address = ":443"
kind: ConfigMap
metadata:
name: traefik-conf
namespace: kube-system
[root@k8s-master ~]# kubectl apply -f traffic-ingress.yaml
可以看到执行结果如下:
image-20210914162236780.pngDASHBOARD
如安装kubernetes-dashboard这节所述:
[root@k8s-master ~]# wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.5/aio/deploy/recommended.yaml
[root@k8s-master ~]# kubectl apply -f recommended.yaml
image-20210914165220272.png
异常
1,Etcd报错:error #0: dial tcp 192.168.x.130:2379
[root@localhost ~]# etcdctl -C http://etcd:2379 cluster-health
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.x.130:2379: connect: connection refused
error #0: dial tcp 192.168.x.130:2379: connect: connection refused
解决办法:
ETCD_LISTEN_CLIENT_URLS="http://localhost:2379" 改成 ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
2, 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
解决办法:
节点有了污点无法容忍,执行 kubectl get no -o yaml | grep taint -A 5
之后发现该节点是不可调度的。这是因为kubernetes出于安全考虑默认情况下无法在master节点上部署pod。
kubectl taint nodes --all node-role.kubernetes.io/master-
3,Work节点从机报错:
[root@k8s-node2 ~]# kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
解决办法:
拷贝主机admin.conf文件到work节点到~目录,并export KUBECONFIG=$HOME/admin.conf.
[root@k8s-master ~]# ll /etc/kubernetes/admin.conf
-rw------- 1 root root 5453 Sep 7 15:23 /etc/kubernetes/admin.conf
4,安装docker出问题,怎么卸载?
[root@PTC-PTesting-151 ~]# systemctl stop docker
[root@PTC-PTesting-151 ~]# rpm -aq | grep docker
docker-ce-cli-19.03.4-3.el7.x86_64
docker-ce-19.03.4-3.el7.x86_64
[root@PTC-PTesting-151 ~]# yum remove -y docker-ce-cli-19.03.4-3.el7.x86_64
5,安装集群节点出问题,怎么卸载?
kubeadm reset
yum remove -y kubeadm kubelet kubectl
6, (VI_1): ip address associated with VRID 51 not present in MASTER advert : 10.12.50.198
解决方法:
virtual_router_id和局域网内的其它的keepalive master的virtual_router_id有冲突,修改 /etc/keepalived/keepalived.conf 配置文件里的 virtual_router_id 。
7,nginx-ingress-controller的pod节点状态为CrashLoopBackOff,日志分析10.1.0.1:443网络不通。
[root@k8s-master ~]# kubectl logs nginx-ingress-controller-84d976849b-9pdnm -n microservice-component
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: 0.30.0
Build: git-7e65b90c4
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.17.8
-------------------------------------------------------------------------------
W0909 07:59:13.830981 6 flags.go:260] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
W0909 07:59:13.831045 6 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0909 07:59:13.831223 6 main.go:193] Creating API client for https://10.1.0.1:443
解决方法:
root# kubectl edit configmap kube-proxy -n kube-system
# 调整:将 mode: “” 为空 改成 mode: “ipvs” ,masqueradeAll: null 改成 masqueradeAll: true
root# kubectl get pod -n kube-system | grep kube-proxy | awk '{system(" kubectl delete pod "$1" -n kube-system")}'
输入ip route show table local 可以看到路由信息上有ipvs这个网卡
image-20210914165357543.png
8,加入新的master节点报错:
couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "hflxeu"
解决办法:
kubeadm token create #生成新的token
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' #生成openssl 的discovery-token-ca-cert-hash
error downloading certs: error downloading the secret: Secret "kubeadm-certs" was not found in the "kube-system" Namespace. This Secret might have expired. Please, run kubeadm init phase upload-certs --upload-certs
on a control plane to generate a new one
解决方法:
[root@k8s-master ~]# kubeadm init phase upload-certs --upload-certs
9,卸掉master节点重新加入报错:error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint
root kubectl describe configmaps kubeadm-config -n kube-system
root kubectl get pods -n kube-system | grep etcd
root kubectl exec -it etcd-k8s-master sh -n kube-system
#进入etcd-k8s-master容器后
## 配置环境
$ export ETCDCTL_API=3
$ alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
## 查看 etcd 集群成员列表
$ etcdctl member list
63bfe05c4646fb08, started, k8s-master-2-11, https://192.168.2.11:2380, https://192.168.2.11:2379, false
8e41efd8164c6e3d, started, k8s-master-2-12, https://192.168.2.12:2380, https://192.168.2.12:2379, false
a61d0bd53c1cbcb6, started, k8s-master-2-13, https://192.168.2.13:2380, https://192.168.2.13:2379, false
## 删除 etcd 集群成员 k8s-master-2-11
$ etcdctl member remove 63bfe05c4646fb08
Member 63bfe05c4646fb08 removed from cluster ed984b9o8w35cap2
## 再次查看 etcd 集群成员列表
$ etcdctl member list
8e41efd8164c6e3d, started, k8s-master-2-12, https://192.168.2.12:2380, https://192.168.2.12:2379, false
a61d0bd53c1cbcb6, started, k8s-master-2-13, https://192.168.2.13:2380, https://192.168.2.13:2379, false
## 退出容器
$ exit
10,设置了污点也发现pod调度到了master节点,怎么办?
kubectl cordon master #禁止节点调度
kubeclt uncordon master #允许节点调度
11,ping 192.168.x.x 集群外部IP访问不通。
解决方法:
[root@k8s-master ~]# kubectl run busybox -it --image=datica/busybox-dig --namespace=kube-system --restart=Never --rm sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
/ # ping 192.168.x.x 集群外部IP访问不通
PING 192.168.x.X (192.168.x.X): 56 data bytes
##于是发现pod cidr网段与网络插件flannel网段不一样
#读取configmap
kubectl get configmap -n kube-system -o yaml kube-flannel-cfg > flannel.yaml
#修改flannel net-conf.json网段配置,设置为与pod cidr一致
vi flannel.yaml
net-conf.json: |
{
"Network": "10.222.0.0/16", #修改此行
"Backend": {
"Type": "vxlan"
}
}
#应用配置
kubectl apply -f flannel.yaml
#删除flannel pod 重新生成
kubectl delete pod xxxxx -n kube-system
12,kube-proxy报错 Failed to list IPVS destinations
[root@k8s-master ~]# kubectl -n kube-system logs -f kube-proxy-9stzs
E0910 07:55:04.913607 1 proxier.go:1192] Failed to sync endpoint for service: 10.1.0.10:53/UDP, err: parseIP Error ip=[10 222 0 7 0 0 0 0 0 0 0 0 0 0 0 0]
E0910 07:55:04.913750 1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[10 222 0 7 0 0 0 0 0 0 0 0 0 0 0 0]
E0910 07:55:04.913778 1 proxier.go:1192] Failed to sync endpoint for service: 10.1.0.10:53/TCP, err: parseIP Error ip=[10 222 0 7 0 0 0 0 0 0 0 0 0 0 0 0]
需要要么centos 内核升级4.4,要么如下降级版本。
解决方法:
[root@k8s-master ~]# kubectl -n kube-system set image daemonset/kube-proxy *=registry.aliyuncs.com/k8sxio/kube-proxy:v1.17.6
13,验证coredns域名解析失败,状况如下:
[root@k8s-master ~]# kubectl logs -f coredns-7ff77c879f-66tp7 -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:42435->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:38594->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:51004->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:38502->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:56500->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:50625->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:42169->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:55823->192.168.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 5711299238672922888.3331703534801112281. HINFO: read udp 10.222.2.31:58438->192.168.1.1:53: i/o timeout
解决方法:
在宿主机上添加域名解析地址,再重新部署coredns,可以看到下图能够正常解析了。
[root@k8s-master echo "nameserver 114.114.114.114" >> /etc/resolv.conf
[root@k8s-master echo "nameserver 8.8.8.8" >> /etc/resolv.conf
image-20210910162554513.png
14, kubernetes从镜像仓库下载报错:pull: unauthorized to access repository:
解决方法:
kubectl create secret docker-registry harbor-archetype --namespace=archetype --docker-server=192.168.x.234:8089 --docker-username=liuqianding --docker-password=123456@Ding --docker-email=dingge8311@dingtalk.com
#在部署的yaml文件添加imagePullSecrets
......
imagePullSecrets:
- name: harbor
containers:
- name: harbor-archetype
........
#注意指定同一个namespace
15,报错: nodes are available: 2 node(s) didn't match node selector
未指定节点类型,通过以下步骤指定。
解决方法:
kubectl get nodes --show-labels
kubectl label node k8s-node2 type=worker
16,请求curl -L service很慢,怎么办?
解决方法:
root# ethtool -K flannel.1 tx-checksum-ip-generic off
17,Unable to authenticate the request due to an error: [invalid bearer token
待续.
参考
基于kubernetes集群部署DashBoard
nginx-ingress官网
nginx-ingress版本关系
kubernetes集群网络原理
网友评论