都说以后开发会面像 k8s 编程,看了一些文档,头都大了。很多术语都不知道是啥意思,还有什么软件定义 IDC,啥 J8 玩意,怎么就那么火呢?学习一下吧,从基本安装做起,先摸个底
安装 docker
国内安装软件最大的问题是墙,发现 ustc 中科大或是阿里云的源还不错。我的测试机是 ubuntu 16.04 4.4.0 内核
- 添加 ustc 源的任信
curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add
- 增加 apt 源
sudo add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
- 安装 docker-ce
apt-get update
apt-get install docker-ce
安装后 systemd 会自动启动 docker, 可能会遇到如下报错信息
update-alternatives: using /usr/bin/dockerd-ce to provide /usr/bin/dockerd (dockerd) in auto mode
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
invoke-rc.d: initscript docker, action "start" failed.
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2019-02-27 11:08:53 CST; 9ms ago
Docs: https://docs.docker.com
Process: 4748 ExecStart=/usr/bin/dockerd -H fd:// (code=exited, status=1/FAILURE)
Main PID: 4748 (code=exited, status=1/FAILURE)
Feb 27 11:08:53 jjh-fusion-sre-backup0 systemd[1]: Failed to start Docker Application Container Engine.
Feb 27 11:08:53 jjh-fusion-sre-backup0 systemd[1]: docker.service: Unit entered failed state.
Feb 27 11:08:53 jjh-fusion-sre-backup0 systemd[1]: docker.service: Failed with result 'exit-code'.
其实是默认没有建立 docker0 网桥导致的,添加即可
ip link add name docker0 type bridge
ip addr add dev docker0 172.17.0.1/16
然后再重启 docker 服务 systemctl start docker
测试 docker
测试很简单,随变拉一个镜像即可
docker pull busybox
docker run -it busybox /bin/sh
安装 kubeadm
kubeadm 可以快速的搭建 k8s 环境,当然了也掩盖了很多细节。由于墙的原因,镜像和源需要换成 ustc 的。需要准备两台机器,一台用于 master, 一台用于 k8s node,两台机器都要先安装 docker
- 添加 ustc 源
cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial main
EOF
apt-get update
执行 update 会后有报错
W: GPG error: http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6A030B21BA07F4FB
W: The repository 'http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial InRelease' is not signed.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration details.
制作生成 key
gpg --keyserver keyserver.ubuntu.com --recv-keys BA07F4FB
gpg --export --armor BA07F4FB | sudo apt-key add -
注意 BA07F4FB
是上面报错的后八位,然后再重新 apt-get udpate 即可
- 安装 kubeadm
apt-get install kubeadm kubectl kubelet -y
由于 kubeadm 需要用到容器,这些镜像都是 k8s.gcr.io 的,被墙了,所以需要拉取国内的镜像再重新打上 tag, 查看需要用到哪些镜像
kubeadm config images list
下面是获取 ustc 镜像,并重新打 tag 脚本
#!/bin/bash
images=(
kube-apiserver:v1.13.3
kube-controller-manager:v1.13.3
kube-scheduler:v1.13.3
kube-proxy:v1.13.3
pause:3.1
etcd:3.2.24
coredns:1.2.6
)
for imageName in ${images[@]} ; do
docker pull gcr.mirrors.ustc.edu.cn/google-containers/$imageName
docker tag gcr.mirrors.ustc.edu.cn/google-containers/$imageName k8s.gcr.io/$imageName
done
执行 docker image 查看己下载镜像
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
busybox latest d8233ab899d4 12 days ago 1.2MB
gcr.mirrors.ustc.edu.cn/google-containers/kube-controller-manager v1.13.3 0482f6400933 3 weeks ago 146MB
k8s.gcr.io/kube-controller-manager v1.13.3 0482f6400933 3 weeks ago 146MB
gcr.mirrors.ustc.edu.cn/google-containers/kube-proxy v1.13.3 98db19758ad4 3 weeks ago 80.3MB
k8s.gcr.io/kube-proxy v1.13.3 98db19758ad4 3 weeks ago 80.3MB
k8s.gcr.io/kube-apiserver v1.13.3 fe242e556a99 3 weeks ago 181MB
gcr.mirrors.ustc.edu.cn/google-containers/kube-apiserver v1.13.3 fe242e556a99 3 weeks ago 181MB
gcr.mirrors.ustc.edu.cn/google-containers/kube-scheduler v1.13.3 3a6f709e97a0 3 weeks ago 79.6MB
k8s.gcr.io/kube-scheduler v1.13.3 3a6f709e97a0 3 weeks ago 79.6MB
gcr.mirrors.ustc.edu.cn/google-containers/coredns 1.2.6 f59dcacceff4 3 months ago 40MB
k8s.gcr.io/coredns 1.2.6 f59dcacceff4 3 months ago 40MB
gcr.mirrors.ustc.edu.cn/google-containers/etcd 3.2.24 3cab8e1b9802 5 months ago 220MB
k8s.gcr.io/etcd 3.2.24 3cab8e1b9802 5 months ago 220MB
启动 kubeadm
在 master 节点上执行 kubeadm init
kubeadm init
I0227 12:33:03.715764 31515 version.go:94] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://storage.googleapis.com/kubernetes-release/release/stable-1.txt: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I0227 12:33:03.715845 31515 version.go:95] falling back to the local client version: v1.13.3
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.2. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [jjh-fusion-sre-backup0 localhost] and IPs [10.20.76.21 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [jjh-fusion-sre-backup0 localhost] and IPs [10.20.76.21 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [jjh-fusion-sre-backup0 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.20.76.21]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 20.502631 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "jjh-fusion-sre-backup0" as an annotation
[mark-control-plane] Marking the node jjh-fusion-sre-backup0 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node jjh-fusion-sre-backup0 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: qhd3wb.udnk15a47dydix4x
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 10.20.76.21:6443 --token qhd3wb.udnk15a47dydix4x --discovery-token-ca-cert-hash sha256:ed5fb56adffed20bc9546d0b988fa5ab8cabfc35bf3d3a0859bf80b2fc930ba1
这块测试时遇到起不来的情况,原因是 docker 没用 systemd 拖管,本来想手二甲双胍制作 docker systemd unit 配置,失败了。后来重装最新的才解决。
为了方便集群搭建,需要做如下操作
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
查看当前节点状态
#kubectl get nodes
NAME STATUS ROLES AGE VERSION
jjh-fusion-sre-backup0 NotReady master 3m12s v1.13.3
只有一个 master 节点,并且是 NotReady 状态,查看 k8s 系统 pod 状态
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-86c58d9df4-btcjv 0/1 Pending 0 4m38s
coredns-86c58d9df4-qmmt8 0/1 Pending 0 4m38s
etcd-jjh-fusion-sre-backup0 1/1 Running 0 3m49s
kube-apiserver-jjh-fusion-sre-backup0 1/1 Running 0 3m46s
kube-controller-manager-jjh-fusion-sre-backup0 1/1 Running 0 3m53s
kube-proxy-6lpp7 1/1 Running 0 4m38s
kube-scheduler-jjh-fusion-sre-backup0 1/1 Running 0 3m50s
发现有 running 有 pending 状态,这是缺少网络插件导致的,安装 Weave
kubectl apply -f https://git.io/weave-kube-1.6
再次查看 pod 状态
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-86c58d9df4-btcjv 0/1 Pending 0 6m54s
coredns-86c58d9df4-qmmt8 0/1 Pending 0 6m54s
etcd-jjh-fusion-sre-backup0 1/1 Running 0 6m5s
kube-apiserver-jjh-fusion-sre-backup0 1/1 Running 0 6m2s
kube-controller-manager-jjh-fusion-sre-backup0 1/1 Running 0 6m9s
kube-proxy-6lpp7 1/1 Running 0 6m54s
kube-scheduler-jjh-fusion-sre-backup0 1/1 Running 0 6m6s
weave-net-58jhh 1/2 CrashLoopBackOff 1 33s
网络插件变成了 CrashLoopBackOff 状态,这肯定不对,google 查一下找到了原因
# kubectl logs weave-net-58jhh -c weave --namespace=kube-system
Network 10.32.0.0/12 overlaps with existing route 10.0.0.0/8 on host
看报错,和本机的路由冲突了,那我手动删除本机路由,先绕过后解决问题。
route del -net 10.0.0.0/8 device bond0
配置 k8s worker 节点
worker 节点正常安装 kubeadm, docker 即可,使用上一步 kubeadm init 生成的命令加入 k8s 集群
kubeadm join 10.20.76.21:6443 --token qhd3wb.udnk15a47dydix4x --discovery-token-ca-cert-hash sha256:ed5fb56adffed20bc9546d0b988fa5ab8cabfc35bf3d3a0859bf80b2fc930ba1
[discovery] Trying to connect to API Server "10.20.76.21:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.20.76.21:6443"
[discovery] Requesting info from "https://10.20.76.21:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.20.76.21:6443"
[discovery] Successfully established connection with API Server "10.20.76.21:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "jjh-fusion-sre-backup3" as an annotation
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
在 master 上查看状态,发现 worker 过一会变为 Ready 状态
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
jjh-fusion-sre-backup0 Ready master 84m v1.13.3
jjh-fusion-sre-backup3 Ready <none> 73m v1.13.3
部署容器持久化插件
容器可以部署在任意 worker 机器上,这就要求一个业务的容器写入了数据,一定是持久化的,并且在其它 worker 上重启后能看到,所以需要网络文件系统,类似老早的 nfs,在容器领域一般用 ceph, clusterfs 等等。本次学习笔记,安装 k8s 持久化插件 Rook,在 master 上执行
kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/cluster/examples/kubernetes/ceph/operator.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/cluster/examples/kubernetes/ceph/cluster.yaml
查看创建节果
# kubectl get pods -n rook-ceph-system
NAME READY STATUS RESTARTS AGE
rook-ceph-agent-rj5m8 1/1 Running 0 49s
rook-ceph-operator-b996864dd-2vwwf 1/1 Running 0 108s
rook-discover-kfldn 1/1 Running 0 49s
# kubectl get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
rook-ceph-detect-version-crg2k 0/1 Completed 0 58s
小结
搭建到此就算完事了,还是一脸蒙逼,需要搞一个测试案例,下一篇吧
网友评论