美文网首页
Tungsten Fabri(1):基于k8s的部署踩坑

Tungsten Fabri(1):基于k8s的部署踩坑

作者: ljyfree | 来源:发表于2020-03-16 16:31 被阅读0次

    Tungsten Fabri(原名opencontrail),提供了可以与编排器(openstack/k8s/vCenter)协同工作的controller,和部署在计算节点/node上的vRouter受其管控,替代原有的linux-bridge/ovs进行通信。

    前言

    • 研究一款开源控制器,最好的方法就是先部署一套,怎么方便怎么来
    • 先去TF的GitHub,无论是tf-devstack还是tf-dev-env里面的run.sh,全都卡住
    [setup contrail git sources]
    INFO: source env from /root/contrail/.env/tf-developer-sandbox.env
    INFO: current folder is
    100  2584  100  2584    0     0    934      0  0:00:02  0:00:02 --:--:--   933
    INFO: Download repo tool
    
    
    • 找到TF的中文社区,加微信,被拉入TF讨论群
    • 经过群里的大佬吴sir和杨sir的指导,开始按照wiki来部署

    实操记录

    初始准备

    • 创建三台CentOS7.7的虚拟机
    deployer 192.168.122.160
    master01 192.168.122.96  <---内存至少8G
    node01 192.168.122.250
    
    # cat /etc/redhat-release 
    CentOS Linux release 7.7.1908 (Core)
    

    基于aliyun的pip加速

    • 各个节点设置pip加速
    mkdir .pip && tee ~/.pip/pip.conf <<-'EOF'
    [global]
    trusted-host =  mirrors.aliyun.com
    index-url = https://mirrors.aliyun.com/pypi/simple
    EOF
    

    基于aliyun的docker镜像加速

    • 网上教程很多,下面的加速地址用**隐去
    sudo mkdir -p /etc/docker
    sudo tee /etc/docker/daemon.json <<-'EOF'
    {
      "registry-mirrors": ["https://********.mirror.aliyuncs.com"]
    }
    EOF
    sudo systemctl daemon-reload
    sudo systemctl restart docker
    

    一些源文件

    • 很多需要的安装文件被放到了http://35.220.208.0/ 这个服务器上,可以根据实际链接来下发命令
    mkdir pkg_python/
    cd pkg_python/
    wget http://35.220.208.0/packages_python/pip-19.3.1.tar.gz
    easy_install pip-19.3.1.tar.gz
    easy_install --upgrade --dry-run pip
    wget http://35.220.208.0/packages_python/docker_compose-1.24.1-py2.py3-none-any.whl
    pip2 install docker_compose-1.24.1-py2.py3-none-any.whl
    
    mkdir /root/pkg_k8s
    cd /root/pkg_k8s
    wget http://35.220.208.0/k8s_v1.12.9/packages/auto_download.sh
    chmod +x auto_download.sh
    ./auto_download.sh
    
    • 遇到下面的错误,但是貌似没有什么影响
    [root@localhost pkg_python]# easy_install --upgrade --dry-run pip
    Searching for pip
    Reading https://pypi.python.org/simple/pip/
    Best match: pip 20.0.2
    Downloading https://files.pythonhosted.org/packages/8e/76/66066b7bc71817238924c7e4b448abdb17eb0c92d645769c223f9ace478f/pip-20.0.2.tar.gz#sha256=7db0c8ea4c7ea51c8049640e8e6e7fde949de672bfa4949920675563a5a6967f
    Processing pip-20.0.2.tar.gz
    Writing /tmp/easy_install-bm8Ztx/pip-20.0.2/setup.cfg
    Running pip-20.0.2/setup.py -n -q bdist_egg --dist-dir /tmp/easy_install-bm8Ztx/pip-20.0.2/egg-dist-tmp-32s9sn
    /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'project_urls'
      warnings.warn(msg)
    /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)
    warning: no files found matching 'docs/docutils.conf'
    warning: no previously-included files found matching '.coveragerc'
    warning: no previously-included files found matching '.mailmap'
    warning: no previously-included files found matching '.appveyor.yml'
    warning: no previously-included files found matching '.travis.yml'
    warning: no previously-included files found matching '.readthedocs.yml'
    warning: no previously-included files found matching '.pre-commit-config.yaml'
    warning: no previously-included files found matching 'tox.ini'
    warning: no previously-included files found matching 'noxfile.py'
    warning: no files found matching 'Makefile' under directory 'docs'
    warning: no files found matching '*.bat' under directory 'docs'
    warning: no previously-included files found matching 'src/pip/_vendor/six'
    warning: no previously-included files found matching 'src/pip/_vendor/six/moves'
    warning: no previously-included files matching '*.pyi' found under directory 'src/pip/_vendor'
    no previously-included directories found matching '.github'
    no previously-included directories found matching '.azure-pipelines'
    no previously-included directories found matching 'docs/build'
    no previously-included directories found matching 'news'
    no previously-included directories found matching 'tasks'
    no previously-included directories found matching 'tests'
    no previously-included directories found matching 'tools'
    warning: install_lib: 'build/lib' does not exist -- no Python modules to install
    
    [root@localhost pkg_python]# 
    

    本地registry

    • 本地运行registry容器,宿主机的80端口映射为容器的5000端口
    [root@deployer ~]# docker run -d -p 80:5000 --restart=always --name registry registry:2
    0c17a03ebdffe3cea98d7cec42c268c1117241f236f9f2443bbb1b77d34b0082
    [root@deployer ~]# 
    [root@deployer ~]# docker ps
    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                  NAMES
    0c17a03ebdff        registry:2          "/entrypoint.sh /etc…"   About an hour ago   Up About an hour    0.0.0.0:80->5000/tcp   registry
    [root@deployer ~]# 
    

    设置yaml文件

    • 获取到contrail-ansible-deployer之后,进入文件夹,修改instances.yaml
    [root@deployer inventory]# vim  ../config/instances.yaml
    
    provider_config:
      bms:
       ssh_pwd: Password
       ssh_user: root
       ssh_public_key: /root/.ssh/id_rsa.pub
       ssh_private_key: /root/.ssh/id_rsa
       domainsuffix: local
    instances:
      bms1:
        provider: bms
        roles:
          config_database:
          config:
          control:
          analytics_database:
          analytics:
          webui:
          k8s_master:
          kubemanager:
        ip: 192.168.122.96
      bms2:
        provider: bms
        roles:
          vrouter:
          k8s_node:
        ip: 192.168.122.250
    global_configuration:
      CONTAINER_REGISTRY: hub.juniper.net
    contrail_configuration:
      CONTRAIL_VERSION: 1912-latest
    
    • CONTAINER_REGISTRY替换为本地registry,contrail的版本设置为1912-last与后面拉取镜像retag保持一致

    设置免密登录

    • 需要设置从developer不输入密码就能登录本机/master01/node01
    # ssh-keygen -t rsa
    
    # ssh-copy-id -i ~/.ssh/id_rsa.pub root@master01
    # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node01
    # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node02
    

    执行"easy_install --upgrade --dry-run pip"卡住

    • 因为非科学上网情况下,下载速度很慢,表现为卡住
    • 想办法先下载pip-20.1b1.tar.gz
    • 放到/pkg_python目录下,然后
    # tar -zxvf pip-20.1b1.tar.gz
    # cd pip-20.1b1 
    # python setup.py install
    ...
    # pip install wheel
    
    • 然后回到/pkg_python下,继续下面的步骤

    ansible

    • deployer上执行ansible会有报错
    /usr/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.24.3) or chardet (2.2.1) doesn't match a supported version!
      RequestsDependencyWarning)
    

    解决方法是

    pip uninstall urllib3    
    pip uninstall chardet
    pip install requests 
    

    拉取镜像

    • k8s的镜像还好,有aliyun加速
    • contrail的源hub.juniper.net是需要Juniper的账号,这个需要替换为opencontrailnightly
    • 杨sir提供了脚本进行拉取和推送到本地registry,后续master/node就可以直接从deployer的registry拉取了
    • 如果是用最新的contrail-ansible-deployer代码,还需要加上一个镜像:contrail-provisioner
    • 但是执行之前,需要先将本地IP设置为insecure-registry,就可以基于http而不是https下载了
    • 一种解决方法就是修改/etc/docker/daemon.json(如果没有就自己加)
    [root@node01 ~]# cat /etc/docker/daemon.json 
    {
      "insecure-registries": [ "hub.juniper.net","k8s.gcr.io" ]
    }
    [root@node01 ~]# 
    

    然后

    [root@deployer ~]# systemctl daemon-reload
    [root@deployer ~]# systemctl restart docker
    
    • 脚本如下,已经修改为deployer的IP
    # 准备Kubernetes离线镜像,运行如下脚本
    #!/bin/bash
    # Author: Alex Yang <alex890714@gmail.com>
    
    set -e
    
    REPOSITORIE="gcr.azk8s.cn/google_containers"
    LOCAL_REPO="192.168.122.160"
    IMAGES="kube-proxy:v1.12.9 kube-controller-manager:v1.12.9 kube-scheduler:v1.12.9 kube-apiserver:v1.12.9 coredns:1.2.2 coredns:1.2.6 pause:3.1 etcd:3.2.24 kubernetes-dashboard-amd64:v1.8.3"
    
    for img in $IMAGES
    do
      echo "===Pulling image: "$img
      docker pull $REPOSITORIE/$img
      echo "===Retag image ["$img"]"
      docker tag $REPOSITORIE/$img $LOCAL_REPO/$img
      echo "===Pushing image: "$LOCAL_REPO/$img
      docker push $LOCAL_REPO/$img
      docker rmi $REPOSITORIE/$img
    done
    
    # 准备TungstenFabric离线镜像,运行如下脚本
    
    #!/bin/bash
    # Author: Alex Yang <alex890714@gmail.com>
    
    set -e
    
    REGISTRY_URL=opencontrailnightly
    LOCAL_REGISTRY_URL=192.168.122.160
    IMAGE_TAG=1912-latest
    COMMON_IMAGES="contrail-node-init contrail-status contrail-nodemgr contrail-external-cassandra contrail-external-zookeeper contrail-external-kafka contrail-external-redis contrail-external-rabbitmq contrail-external-rsyslogd"
    ANALYTICS_IMAGES="contrail-analytics-query-engine contrail-analytics-api contrail-analytics-collector contrail-analytics-snmp-collector contrail-analytics-snmp-topology contrail-analytics-alarm-gen"
    CONTROL_IMAGES="contrail-controller-control-control contrail-controller-control-dns contrail-controller-control-named contrail-controller-config-api contrail-controller-config-devicemgr contrail-controller-config-schema contrail-controller-config-svcmonitor contrail-controller-config-stats contrail-controller-config-dnsmasq"
    WEBUI_IMAGES="contrail-controller-webui-job contrail-controller-webui-web"
    K8S_IMAGES="contrail-kubernetes-kube-manager contrail-kubernetes-cni-init"
    VROUTER_IMAGES="contrail-vrouter-kernel-init contrail-vrouter-agent"
    
    IMAGES=$COMMON_IMAGES" "$ANALYTICS_IMAGES" "$CONTROL_IMAGES" "$WEBUI_IMAGES" "$K8S_IMAGES" "$VROUTER_IMAGES
    
    for image in $IMAGES
    do
      echo "===Pulling image: "$image
      docker pull $REGISTRY_URL/$image:$IMAGE_TAG
      echo "===Retag image ["$image"]"
      docker tag $REGISTRY_URL/$image:$IMAGE_TAG $LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
      echo "===Pushing image: "$LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
      docker push $LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
      docker rmi $REGISTRY_URL/$image:$IMAGE_TAG
    done
    
    • 查看镜像列表
    [root@deployer ~]# docker image list
    REPOSITORY                                              TAG                 IMAGE ID            CREATED             SIZE
    ubuntu                                                  latest              72300a873c2c        3 weeks ago         64.2MB
    registry                                                2                   708bc6af7e5e        7 weeks ago         25.8MB
    registry                                                latest              708bc6af7e5e        7 weeks ago         25.8MB
    192.168.122.160/contrail-vrouter-kernel-init            1912-latest         92e9cce315a5        3 months ago        581MB
    192.168.122.160/contrail-vrouter-agent                  1912-latest         e8d9457d740e        3 months ago        729MB
    192.168.122.160/contrail-status                         1912-latest         d2264c6741a5        3 months ago        513MB
    192.168.122.160/contrail-nodemgr                        1912-latest         c3428aa7e9b7        3 months ago        523MB
    192.168.122.160/contrail-node-init                      1912-latest         c846ff071cc8        3 months ago        506MB
    192.168.122.160/contrail-kubernetes-kube-manager        1912-latest         983a6307731b        3 months ago        517MB
    192.168.122.160/contrail-kubernetes-cni-init            1912-latest         45c88538c834        3 months ago        525MB
    192.168.122.160/contrail-external-zookeeper             1912-latest         6937c72b866c        3 months ago        290MB
    192.168.122.160/contrail-external-rsyslogd              1912-latest         812ba27a4e08        3 months ago        304MB
    192.168.122.160/contrail-external-redis                 1912-latest         3dc79f0b6eb9        3 months ago        129MB
    192.168.122.160/contrail-external-rabbitmq              1912-latest         a98ac91667b2        3 months ago        256MB
    192.168.122.160/contrail-external-kafka                 1912-latest         7b5a2ce6a656        3 months ago        665MB
    192.168.122.160/contrail-external-cassandra             1912-latest         20109c39696c        3 months ago        545MB
    192.168.122.160/contrail-controller-webui-web           1912-latest         44054aa131c5        3 months ago        552MB
    192.168.122.160/contrail-controller-webui-job           1912-latest         946e2bbd7451        3 months ago        552MB
    192.168.122.160/contrail-controller-control-named       1912-latest         81ef8223a519        3 months ago        575MB
    192.168.122.160/contrail-controller-control-dns         1912-latest         15c1ce0cf26e        3 months ago        575MB
    192.168.122.160/contrail-controller-control-control     1912-latest         ec195cc75705        3 months ago        594MB
    192.168.122.160/contrail-controller-config-svcmonitor   1912-latest         3d53781422be        3 months ago        673MB
    192.168.122.160/contrail-controller-config-stats        1912-latest         46bc77cf1c87        3 months ago        506MB
    192.168.122.160/contrail-controller-config-schema       1912-latest         75acb8ed961f        3 months ago        673MB
    192.168.122.160/contrail-controller-config-dnsmasq      1912-latest         dc2980441d51        3 months ago        506MB
    192.168.122.160/contrail-controller-config-devicemgr    1912-latest         c08868a27a0a        3 months ago        772MB
    192.168.122.160/contrail-controller-config-api          1912-latest         f39ca251b475        3 months ago        706MB
    192.168.122.160/contrail-analytics-snmp-topology        1912-latest         5ee37cbbd034        3 months ago        588MB
    192.168.122.160/contrail-analytics-snmp-collector       1912-latest         29ae502fb74f        3 months ago        588MB
    192.168.122.160/contrail-analytics-query-engine         1912-latest         b5f937d6b6e3        3 months ago        588MB
    192.168.122.160/contrail-analytics-collector            1912-latest         ee1bdbcc460a        3 months ago        588MB
    192.168.122.160/contrail-analytics-api                  1912-latest         ac5c8f7cef89        3 months ago        588MB
    192.168.122.160/contrail-analytics-alarm-gen            1912-latest         e155b24a0735        3 months ago        588MB
    192.168.10.10/kube-proxy                                v1.12.9             295526df163c        9 months ago        95.7MB
    192.168.122.160/kube-proxy                              v1.12.9             295526df163c        9 months ago        95.7MB
    192.168.122.160/kube-controller-manager                 v1.12.9             f473e8452c8e        9 months ago        164MB
    192.168.122.160/kube-apiserver                          v1.12.9             8ea704c2d4a7        9 months ago        194MB
    192.168.122.160/kube-scheduler                          v1.12.9             c79506ccc1bc        9 months ago        58.4MB
    192.168.122.160/coredns                                 1.2.6               f59dcacceff4        16 months ago       40MB
    192.168.122.160/etcd                                    3.2.24              3cab8e1b9802        18 months ago       220MB
    192.168.122.160/coredns                                 1.2.2               367cdc8433a4        18 months ago       39.2MB
    192.168.122.160/kubernetes-dashboard-amd64              v1.8.3              0c60bcf89900        2 years ago         102MB
    192.168.122.160/pause                                   3.1                 da86e6ba6ca1        2 years ago         742kB
    [root@deployer ~]# 
    
    • 查看本地仓库中的image
    [root@deployer ~]# curl -X GET http://localhost/v2/_catalog | python -m json.tool
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  1080  100  1080    0     0  18298      0 --:--:-- --:--:-- --:--:-- 18620
    {
        "repositories": [
            "contrail-analytics-alarm-gen",
            "contrail-analytics-api",
            "contrail-analytics-collector",
            "contrail-analytics-query-engine",
            "contrail-analytics-snmp-collector",
            "contrail-analytics-snmp-topology",
            "contrail-controller-config-api",
            "contrail-controller-config-devicemgr",
            "contrail-controller-config-dnsmasq",
            "contrail-controller-config-schema",
            "contrail-controller-config-stats",
            "contrail-controller-config-svcmonitor",
            "contrail-controller-control-control",
            "contrail-controller-control-dns",
            "contrail-controller-control-named",
            "contrail-controller-webui-job",
            "contrail-controller-webui-web",
            "contrail-external-cassandra",
            "contrail-external-kafka",
            "contrail-external-rabbitmq",
            "contrail-external-redis",
            "contrail-external-rsyslogd",
            "contrail-external-zookeeper",
            "contrail-kubernetes-cni-init",
            "contrail-kubernetes-kube-manager",
            "contrail-node-init",
            "contrail-nodemgr",
            "contrail-status",
            "contrail-vrouter-agent",
            "contrail-vrouter-kernel-init",
            "coredns",
            "etcd",
            "kube-apiserver",
            "kube-controller-manager",
            "kube-proxy",
            "kube-scheduler",
            "kubernetes-dashboard-amd64",
            "pause"
        ]
    }
    [root@deployer ~]# 
    
    • 至于master01和node01,就可以直接从developer上拉取k8s/contrail的镜像了,速度杠杠的!(别忘了--insecure-registry=192.168.122.160)
    # 准备Kubernetes离线镜像,运行如下脚本
    #!/bin/bash
    # Author: Alex Yang <alex890714@gmail.com>
    
    set -e
    
    REPOSITORIE="k8s.gcr.io"
    LOCAL_REPO="k8s.gcr.io"
    IMAGES="kube-proxy:v1.12.9 kube-controller-manager:v1.12.9 kube-scheduler:v1.12.9 kube-apiserver:v1.12.9 coredns:1.2.2 coredns:1.2.6 pause:3.1 etcd:3.2.24 kubernetes-dashboard-amd64:v1.8.3"
    
    for img in $IMAGES
    do
      echo "===Pulling image: "$img
      docker pull $LOCAL_REPO/$img
    done
    
    # 准备TungstenFabric离线镜像,运行如下脚本
    
    #!/bin/bash
    # Author: Alex Yang <alex890714@gmail.com>
    
    set -e
    
    REPOSITORIE=hub.juniper.net
    LOCAL_REPO=hub.juniper.net
    IMAGE_TAG=1912-latest
    COMMON_IMAGES="contrail-node-init contrail-status contrail-nodemgr contrail-external-cassandra contrail-external-zookeeper contrail-external-kafka contrail-external-redis contrail-external-rabbitmq contrail-external-rsyslogd"
    ANALYTICS_IMAGES="contrail-analytics-query-engine contrail-analytics-api contrail-analytics-collector contrail-analytics-snmp-collector contrail-analytics-snmp-topology contrail-analytics-alarm-gen"
    CONTROL_IMAGES="contrail-controller-control-control contrail-controller-control-dns contrail-controller-control-named contrail-controller-config-api contrail-controller-config-devicemgr contrail-controller-config-schema contrail-control"
    WEBUI_IMAGES="contrail-controller-webui-job contrail-controller-webui-web"
    K8S_IMAGES="contrail-kubernetes-kube-manager contrail-kubernetes-cni-init"
    VROUTER_IMAGES="contrail-vrouter-kernel-init contrail-vrouter-agent"
    
    IMAGES=$COMMON_IMAGES" "$ANALYTICS_IMAGES" "$CONTROL_IMAGES" "$WEBUI_IMAGES" "$K8S_IMAGES" "$VROUTER_IMAGES
    
    for img in $IMAGES
    do
      echo "===Pulling image: "$img
      docker pull $LOCAL_REPO/$img:$IMAGE_TAG
    done
    

    打开web

    • developer上执行过
    ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_k8s.yml
    ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_contrail.yml
    
    • web访问master01的8143端口,默认进入的是monitor页面


    • 用户名/密码:admin/contrail123,domain不需要填,总算看到WebUI了


    • 可以切换到config页面


    k8s状态

    • node
    [root@master01 ~]# kubectl get nodes
    NAME       STATUS   ROLES    AGE    VERSION
    master01   Ready    master   6h4m   v1.12.9
    node01     Ready    <none>   6h3m   v1.12.9
    [root@master01 ~]# 
    [root@master01 ~]# kubectl get namespaces
    NAME          STATUS   AGE
    contrail      Active   80m
    default       Active   6h20m
    kube-public   Active   6h20m
    kube-system   Active   6h20m
    [root@master01 ~]# 
    
    • pods
    [root@master01 ~]# kubectl get pods -n kube-system 
    NAME                                    READY   STATUS             RESTARTS   AGE
    coredns-85c98899b4-4dzzx                0/1     ImagePullBackOff   0          6h2m
    coredns-85c98899b4-w4bcs                0/1     ImagePullBackOff   0          6h2m
    etcd-master01                           1/1     Running            5          28m
    kube-apiserver-master01                 1/1     Running            4          28m
    kube-controller-manager-master01        1/1     Running            5          28m
    kube-proxy-dmmlh                        1/1     Running            5          6h2m
    kube-proxy-ph9gx                        1/1     Running            1          6h2m
    kube-scheduler-master01                 1/1     Running            5          28m
    kubernetes-dashboard-76456c6d4b-x5lz4   0/1     ImagePullBackOff   0          6h2m
    

    继续排障

    node01无法使用kubectrl命令

    • 问题如下
    [root@node01 ~]# kubectl get pods -n kube-system -o wide
    The connection to the server localhost:8080 was refused - did you specify the right host or port?
    
    [root@node01 ~]# scp root@192.168.122.250:/etc/kubernetes/admin.conf /etc/kubernetes/admin.conf
    [root@node01 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
    [root@node01 ~]# source ~/.bash_profile
    [root@node01 ~]# kubectl get pods -n kube-system -o wide
    NAME                                    READY   STATUS             RESTARTS   AGE     IP                NODE       NOMINATED NODE
    coredns-85c98899b4-4dzzx                0/1     ImagePullBackOff   0          5h45m   10.47.255.252     node01     <none>
    coredns-85c98899b4-w4bcs                0/1     ImagePullBackOff   0          5h45m   10.47.255.251     node01     <none>
    etcd-master01                           1/1     Running            3          11m     192.168.122.96    master01   <none>
    kube-apiserver-master01                 1/1     Running            3          11m     192.168.122.96    master01   <none>
    kube-controller-manager-master01        1/1     Running            3          11m     192.168.122.96    master01   <none>
    kube-proxy-dmmlh                        1/1     Running            3          5h45m   192.168.122.96    master01   <none>
    kube-proxy-ph9gx                        1/1     Running            1          5h44m   192.168.122.250   node01     <none>
    kube-scheduler-master01                 1/1     Running            3          11m     192.168.122.96    master01   <none>
    kubernetes-dashboard-76456c6d4b-x5lz4   0/1     ImagePullBackOff   0          5h44m   192.168.122.250   node01     <none>
    [root@node01 ~]# 
    

    ImagePullBackOff 的问题

    • 先看一下coredns的pod描述
    [root@master01 ~]# kubectl describe pod coredns-85c98899b4-4dzzx -n kube-system
    Name:               coredns-85c98899b4-4dzzx
    Namespace:          kube-system
    ...
    Events:
      Type     Reason                  Age                    From               Message
      ----     ------                  ----                   ----               -------
      Warning  FailedScheduling        75m (x281 over 4h40m)  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
      Warning  FailedCreatePodSandBox  71m                    kubelet, node01    Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1af3fb24d906d5f82ad3bdcf6d65be328302d3c596e63fc79ed0c134390b4753" network for pod "coredns-85c98899b4-4dzzx": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-4dzzx_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Failed HTTP Get operation. Return code 404
      Normal   SandboxChanged          70m (x3 over 71m)      kubelet, node01    Pod sandbox changed, it will be killed and re-created.
      Normal   Pulling                 70m (x3 over 70m)      kubelet, node01    pulling image "k8s.gcr.io/coredns:1.2.6"
      Warning  Failed                  70m (x3 over 70m)      kubelet, node01    Failed to pull image "k8s.gcr.io/coredns:1.2.6": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 192.168.122.160:443: getsockopt: no route to host
      Warning  Failed                  70m (x3 over 70m)      kubelet, node01    Error: ErrImagePull
      Warning  Failed                  6m52s (x282 over 70m)  kubelet, node01    Error: ImagePullBackOff
      Normal   BackOff                 103s (x305 over 70m)   kubelet, node01    Back-off pulling image "k8s.gcr.io/coredns:1.2.6"
    [root@master01 ~]# 
    
    • 看来是启动pod的时候,insecure-registry还没有设置,强制重启pod
    [root@master01 ~]# kubectl get pod coredns-85c98899b4-4dzzx -n kube-system -o yaml | kubectl replace --force -f -
    pod "coredns-85c98899b4-4dzzx" deleted
    pod/coredns-85c98899b4-4dzzx replaced
    [root@master01 ~]# 
    
    • 发现还没有up,继续查看
    [root@master01 ~]# kubectl describe pod coredns-85c98899b4-4dzzx -n kube-system
    Events:
      Type     Reason                  Age                   From               Message
      ----     ------                  ----                  ----               -------
      Normal   Scheduled               6m29s                 default-scheduler  Successfully assigned kube-system/coredns-85c98899b4-fnpd7 to master01
      Warning  FailedCreatePodSandBox  6m26s                 kubelet, master01  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "3074c719934789cef519eeae16d2eca4e272fb6bda1b157cee1dbdf2f597a59f" network for pod "coredns-85c98899b4-fnpd7": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-fnpd7_kube-system" network: failed to find plugin "contrail-k8s-cni" in path [/opt/cni/bin], failed to clean up sandbox container "3074c719934789cef519eeae16d2eca4e272fb6bda1b157cee1dbdf2f597a59f" network for pod "coredns-85c98899b4-fnpd7": NetworkPlugin cni failed to teardown pod "coredns-85c98899b4-fnpd7_kube-system" network: failed to find plugin "contrail-k8s-cni" in path [/opt/cni/bin]]
      Normal   SandboxChanged          76s (x25 over 6m25s)  kubelet, master01  Pod sandbox changed, it will be killed and re-created.
    
    • 缺少contrail-k8s-cni,从node01复制一个过来
    [root@master01 ~]# scp root@node01:/opt/cni/bin/contrail-k8s-cni /opt/cni/bin/
    
    • 再重建
    [root@master01 ~]# kubectl get pod coredns-85c98899b4-fnpd7 -n kube-system -o yaml | kubectl replace --force -f -
    pod "coredns-85c98899b4-fnpd7" deleted
    pod/coredns-85c98899b4-fnpd7 replaced
    [root@master01 ~]# 
    
    • 可惜重启之后还是有报错
    Events:
      Type     Reason                  Age                  From               Message
      ----     ------                  ----                 ----               -------
      Normal   Scheduled               18m                  default-scheduler  Successfully assigned kube-system/coredns-85c98899b4-8zq9h to master01
      Warning  FailedCreatePodSandBox  17m                  kubelet, master01  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ffe9745c42750850e44035ee6413bf573148759738fc6131ce970537e03a5d13" network for pod "coredns-85c98899b4-8zq9h": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-8zq9h_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Get http://127.0.0.1:9091/vm-cfg/9bf51269-675b-11ea-ac43-525400c1ec4f: dial tcp 127.0.0.1:9091: connect: connection refused
    

    隔天kebectl的命令都不能用了

    • 无论是在master01上还是在node01上
    [root@master01 ~]# kubectl get nodes
    The connection to the server 192.168.122.96:6443 was refused - did you specify the right host or port?
    [root@master01 ~]# 
    
    • 多次重启kubelet没有用,虽然运行但是有报错
    [root@master01 ~]# journalctl -xe -u kubelet
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.336303   28722 kubelet.go:2236] node "master01" not found
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.425393   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://192.168.122.96:6443/api/v1/nodes?fieldSelector=metadata.name%3Dma
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.426388   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://192.168.122.96:6443/api/v1/services?limit=500&resourceVersion=
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.436468   28722 kubelet.go:2236] node "master01" not found
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.536632   28722 kubelet.go:2236] node "master01" not found
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.636848   28722 kubelet.go:2236] node "master01" not found
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.636961   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.122.96:6443/api/v1/pods?fieldSelector=spec.nodeNam
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.737070   28722 kubelet.go:2236] node "master01" not found
    3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.837781   28722 kubelet.go:2236] node "master01" not found
    
    
    • 搜索发现有很多人也遇到了这个问题,链接
    • 据说可能是kube-apiserver没有启动造成的,但是当前环境无法启动kube-apiserver
    [root@master01 ~]# systemctl start kube-apiserver
    Failed to start kube-apiserver.service: Unit not found.
    [root@master01 ~]# 
    

    调用北向接口

    • 参考文档戳这里
    • 例如最简单的获取virtual-networks列表(使用最简单用户名/密码认证方法)
    [root@master01 ~]# curl -X GET -u "admin:contrail123" -H "Content-Type: application/json; charset=UTF-8" http://192.168.122.96:8082/virtual-networks
    {"virtual-networks": [{"href": "http://192.168.122.96:8082/virtual-network/99c4144d-a7b7-4fb1-833e-887f21144320", "fq_name": ["default-domain", "default-project", "default-virtual-network"], "uuid": "99c4144d-a7b7-4fb1-833e-887f21144320"}, {"href": "http://192.168.122.96:8082/virtual-network/6e90abe8-91b6-48ad-99d2-fba6c9e29de4", "fq_name": ["default-domain", "k8s-default", "k8s-default-service-network"], "uuid": "6e90abe8-91b6-48ad-99d2-fba6c9e29de4"}, {"href": "http://192.168.122.96:8082/virtual-network/ab12e6dc-be52-407d-8f1d-37e6d29df0b1", "fq_name": ["default-domain", "default-project", "ip-fabric"], "uuid": "ab12e6dc-be52-407d-8f1d-37e6d29df0b1"}, {"href": "http://192.168.122.96:8082/virtual-network/915156f1-cec3-44eb-b15e-742452084d67", "fq_name": ["default-domain", "k8s-default", "k8s-default-pod-network"], "uuid": "915156f1-cec3-44eb-b15e-742452084d67"}, {"href": "http://192.168.122.96:8082/virtual-network/64a648ee-3ba6-4348-a543-07de6f225486", "fq_name": ["default-domain", "default-project", "dci-network"], "uuid": "64a648ee-3ba6-4348-a543-07de6f225486"}, {"href": "http://192.168.122.96:8082/virtual-network/82890bf9-a8e5-4c85-a32c-e307d9447a0a", "fq_name": ["default-domain", "default-project", "__link_local__"], "uuid": "82890bf9-a8e5-4c85-a32c-e307d9447a0a"}]}[root@master01 ~]# 
    [root@master01 ~]# 
    

    重新部署

    • 下定决心,重新部署1-master/2-node的k8s场景,还是使用之前的deployer
    • 记录
    [root@deployer contrail-ansible-deployer]# cat install_k8s_3node.log 
    ...
    PLAY RECAP **********************************************************************************************************************************************************************************************************************************
    192.168.122.116            : ok=31   changed=15   unreachable=0    failed=0   
    192.168.122.146            : ok=23   changed=8    unreachable=0    failed=0   
    192.168.122.204            : ok=23   changed=8    unreachable=0    failed=0   
    localhost                  : ok=62   changed=4    unreachable=0    failed=0  
    
    [root@deployer contrail-ansible-deployer]# cat install_contrail_3node.log
    ...
    PLAY RECAP **********************************************************************************************************************************************************************************************************************************
    192.168.122.116            : ok=76   changed=45   unreachable=0    failed=0   
    192.168.122.146            : ok=37   changed=17   unreachable=0    failed=0   
    192.168.122.204            : ok=37   changed=17   unreachable=0    failed=0   
    localhost                  : ok=66   changed=4    unreachable=0    failed=0   
    
    
    
    • 发现新的master的状态是NotReady,查看状态
    [root@master02 ~]# systemctl status kubelet
    ● kubelet.service - kubelet: The Kubernetes Node Agent
       Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
      Drop-In: /usr/lib/systemd/system/kubelet.service.d
               └─10-kubeadm.conf
       Active: active (running) since 三 2020-03-18 16:04:35 +08; 32min ago
         Docs: https://kubernetes.io/docs/
     Main PID: 18801 (kubelet)
        Tasks: 20
       Memory: 60.3M
       CGroup: /system.slice/kubelet.service
               └─18801 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni
    
    3月 18 16:36:51 master02 kubelet[18801]: W0318 16:36:51.929447   18801 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d
    3月 18 16:36:51 master02 kubelet[18801]: E0318 16:36:51.929572   18801 kubelet.go:2167] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready...fig uninitialized
    3月 18 16:36:56 master02 kubelet[18801]: W0318 16:36:56.930736   18801 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d
    

    发现master上确实没有 /etc/cni/net.d这个目录,所以将node02的拷贝过来

    [root@master02 ~]# mkdir -p /etc/cni/net.d/
    [root@master02 ~]# scp root@node02:/etc/cni/net.d/10-contrail.conf /etc/cni/net.d/10-contrail.conf
    
    [root@master02 ~]# systemctl restart kubelet
    

    问题解决

    [root@master02 ~]# kubectl get node
    NAME                    STATUS   ROLES    AGE   VERSION
    localhost.localdomain   Ready    <none>   35m   v1.12.9
    master02                Ready    master   35m   v1.12.9
    node03                  Ready    <none>   35m   v1.12.9
    [root@master02 ~]# 
    
    • 如果用一个deployer部署两套环境,打开web的时候会提示


    • 解决方法参考这里

    • pod状态正常了

    [root@master02 ~]# kubectl get pods -n kube-system -o wide
    NAME                                    READY   STATUS    RESTARTS   AGE   IP                NODE                    NOMINATED NODE
    coredns-85c98899b4-4vgk4                1/1     Running   0          69m   10.47.255.252     node03                  <none>
    coredns-85c98899b4-thpz6                1/1     Running   0          69m   10.47.255.251     localhost.localdomain   <none>
    etcd-master02                           1/1     Running   0          55m   192.168.122.116   master02                <none>
    kube-apiserver-master02                 1/1     Running   0          55m   192.168.122.116   master02                <none>
    kube-controller-manager-master02        1/1     Running   0          55m   192.168.122.116   master02                <none>
    kube-proxy-6sp2n                        1/1     Running   0          69m   192.168.122.116   master02                <none>
    kube-proxy-8gpgd                        1/1     Running   0          69m   192.168.122.204   node03                  <none>
    kube-proxy-wtvhd                        1/1     Running   0          69m   192.168.122.146   localhost.localdomain   <none>
    kube-scheduler-master02                 1/1     Running   0          55m   192.168.122.116   master02                <none>
    kubernetes-dashboard-76456c6d4b-9s6vc   1/1     Running   0          69m   192.168.122.204   node03                  <none>
    [root@master02 ~]# 
    

    相关文章

      网友评论

          本文标题:Tungsten Fabri(1):基于k8s的部署踩坑

          本文链接:https://www.haomeiwen.com/subject/mwsmshtx.html