1. 环境准备
准备3台ubuntu18虚拟机,分别充当manager、master、node1角色
硬件配置如下:manager 处理器4 内存8G
master 16核 43.3G
node1 16核 17.3G
(1)hostname配置
通过命令vi /etc/hostname,将三台虚拟机的hostname分别修改为manager、master、node1
(2)ip配置如下
各节点通过命令vi /etc/hosts,打开hosts文件,增加如下内容
192.168.18.150 manager
192.168.18.151 master
192.168.18.152 node1
(3)ssh安装与免密登录配置
各节点执行ssh安装命令
apt-get install openssh-server
service ssh start
执行命令打开ssh配置文件,vim /etc/ssh/sshd_config
修改配置文件内容
PermitRootLogin yes
配置免密登录,各节点分别执行如下命令
ssh-keygen -t rsa
ssh-copy-id node1
ssh-copy-id master
ssh-copy-id manager
(4)时间同步ntp安装
apt install ntp
(5)docker安装
docker安装参考
https://www.cnblogs.com/wt7018/p/11880666.html
各节点通过命令vi /etc/docker/daemon.json,打开daemon.json文件,增加如下内容
{"debug": true, "registry-mirrors": ["http://192.168.18.151:30500"], "insecure-registries": ["http://192.168.18.151:30500"]}
2. k8s与openpai安装
(1)修改配置文件
首先执行命令进入到部署文件夹cd /home/pai。
通过命令修改配置文件vi config/config.yaml,kube下载路径修改成国内路径
user: root
password: root
docker_image_tag: v1.8.0
gcr_image_repo: "registry.cn-hangzhou.aliyuncs.com"
kube_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers"
openpai_kubespray_extra_var:
pod_infra_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/pause-{{ image_arch }}"
dnsautoscaler_image_repo: "docker.io/mirrorgooglecontainers/cluster-proportional-autoscaler-{{ image_arch }}"
tiller_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/kubernetes-helm/tiller"
registry_proxy_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/kube-registry-proxy"
metrics_server_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64"
addon_resizer_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/addon-resizer"
dashboard_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/kubernetes-dashboard-{{ image_arch }}"
通过命令修改配置文件vi config/layout.yaml
machine-sku:
master-machine: # define a machine sku
# the resource requirements for all the machines of this sku
# We use the same memory format as Kubernetes, e.g. Gi, Mi
# Reference: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory
mem: 28Gi
cpu:
# the number of CPU vcores
vcore: 16
cpu-machine:
computing-device:
# For `type`, please follow the same format specified in device plugin.
# For example, `nvidia.com/gpu` is for NVIDIA GPU, `amd.com/gpu` is for AMD GPU,
# and `enflame.com/dtu` is for Enflame DTU.
# Reference: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/
type: nvidia.com/cpu
model: K80
count: 4
mem: 16Gi
cpu:
vcore: 16
machine-list:
- hostname: master # name of the machine, **do not** use upper case alphabet letters for hostname
hostip: 192.168.18.151
machine-type: master-machine # only one master-machine supported
pai-master: "true"
- hostname: node1
hostip: 192.168.18.152
machine-type: cpu-machine
pai-worker: "true"
(2)k8s安装命令
cd /home/pai/contrib/kubespray
/bin/bash quick-start-kubespray.sh -v
(3)openpai安装命令
/bin/bash quick-start-service.sh
安装成功,会输出如下内容:
OpenPAI is successfully deployed, please check the following information:
Kubernetes cluster config : ~/pai-deploy/kube/config
OpenPAI cluster config : ~/pai-deploy/cluster-cfg
OpenPAI cluster ID : pai
Default username : admin
Default password : admin-password
You can go to http://192.168.18.151, then use the default username and password to log in.
3. 问题与解决方法
解决部署k8s集失败重试还需要执行git clone
vi /home/pai/contrib/kubespray/script/environment.sh
#sudo rm -rf ${HOME}/pai-deploy/kubespray
#git clone -b release-2.11 https://github.com/kubernetes-sigs/kubespray.git ${HOME}/pai-deploy/kubespray
解决下载过程中无法下载下来的文件
在部署过程中出现cni-plugins-linux-amd64-v0.8.1.tgz、calicoctl-linux-amd64这两个文件下载不下来,可以单独下载,然后传到报错所输出的路径下面(master、node1节点均上传)
网友评论