美文网首页
k3s/k8s使用阿里云oss静态挂载卷记录

k3s/k8s使用阿里云oss静态挂载卷记录

作者: xun66 | 来源:发表于2023-08-18 20:10 被阅读0次

背景

尝试使用k3s构建一个工程,遇到了阿里云内网连接oss作为pv的需求。oss卷读写性能一般,更适合于一次性读取(不写入)的情况。

直接看结论

  • 阿里云官方文档只有在自家ACK等集群上使用oss的方法,其他机器(如普通ecs)则需要手动安装csi插件。
  • 阿里云在 kubernetes-sigs/alibaba-cloud-csi-driver 项目中有一些样例yml文件(分为ecs和非ecs,差一个云盘挂载卷的支持,本文不涉及),是主要的参考来源。
  • 网络上有一些k8s使用oss的记录,比较旧但依然可用(2023.8验证)。
  • 本文补充了k3s与k8s环境下的区别,新版安装记录,错误排查过程。

关于k3s区别

  • k3s的pv的spec里不允许有selector字段,需要使用storageClassName进行关联。
  • k3s的local-path类型accessModes只能为ReadWriteOnce(ReadWriteOnce是指限制同时挂载到1个node,注意不是限制1个pod访问)。

过程记录

过程参考了csdn博客(相关链接2)和阿里云官方教程(相关链接3)。

1. 安装ossfs

阿里云ossfs安装说明页 找到下载链接,并下载对应的平台版本。

注意,需要将url引起来,或者删除掉get参数,否则url里面的&会影响下载。

# 下载
wget 'https://gosspublic.alicdn.com/ossfs/ossfs_1.91.1_ubuntu22.04_amd64.deb'
# 安装
sudo apt-get update
sudo apt-get install gdebi-core
sudo gdebi ossfs_1.91.1_ubuntu16.04_amd64.deb

2. 准备yml文件

  • oss-secret.yml
apiVersion: v1
kind: Secret
metadata:
  name: oss-secret
  namespace: default
stringData:
  akId: <yourAccessKey ID>
  akSecret: <yourAccessKey Secret>
  • oss-sci-rbac.yml
    内容比较长,直接从github下载。如果无法访问,也可以使用 GitCode加速源
wget https://raw.githubusercontent.com/kubernetes-sigs/alibaba-cloud-csi-driver/master/deploy/nonecs/rbac.yaml
# 国内加速源
wget https://gitcode.net/mirrors/kubernetes-sigs/alibaba-cloud-csi-driver/-/raw/master/deploy/nonecs/rbac.yaml
  • csi-plugin.yml
    这个是根据阿里云官方csi-driver.yamlcsi-plugin.yml合并的版本,并去除了挂载nas相关的逻辑。可以看到volumes中有很多目录,相比三年前文章中的版本多出不少,应该还能去除掉一部分。
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  name: ossplugin.csi.alibabacloud.com
spec:
  attachRequired: false
  podInfoOnMount: true
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: csi-plugin
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: csi-plugin
  template:
    metadata:
      labels:
        app: csi-plugin
    spec:
      tolerations:
        - operator: Exists
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: type
                    operator: NotIn
                    values:
                      - virtual-kubelet
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccount: csi-admin
      priorityClassName: system-node-critical
      hostNetwork: true
      hostPID: true
      dnsPolicy: ClusterFirst
      containers:
        - name: oss-driver-registrar
          image: registry-cn-hangzhou.ack.aliyuncs.com/acs/csi-node-driver-registrar:v2.3.1-038aeb6-aliyun
          resources:
            requests:
              cpu: 10m
              memory: 16Mi
            limits:
              cpu: 500m
              memory: 1024Mi
          args:
            - "--v=5"
            - "--csi-address=/var/lib/kubelet/csi-plugins/ossplugin.csi.alibabacloud.com/csi.sock"
            - "--kubelet-registration-path=/var/lib/kubelet/csi-plugins/ossplugin.csi.alibabacloud.com/csi.sock"
          volumeMounts:
            - name: kubelet-dir
              mountPath: /var/lib/kubelet/
            - name: registration-dir
              mountPath: /registration
        - name: csi-plugin
          securityContext:
            privileged: true
            allowPrivilegeEscalation: true
          image: registry-cn-hangzhou.ack.aliyuncs.com/acs/csi-plugin:v1.24.9-74f8490-aliyun
          args:
            - "--endpoint=$(CSI_ENDPOINT)"
            - "--v=2"
            - "--nodeid=$(KUBE_NODE_NAME)"
            - "--driver=oss"
          env:
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
            - name: CSI_ENDPOINT
              value: unix://var/lib/kubelet/csi-plugins/driverplugin.csi.alibabacloud.com-replace/csi.sock
            - name: MAX_VOLUMES_PERNODE
              value: "15"
            - name: SERVICE_TYPE
              value: "plugin"
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 1024Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: healthz
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 30
            timeoutSeconds: 5
            failureThreshold: 5
          readinessProbe:
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 10
            periodSeconds: 30
            timeoutSeconds: 5
            failureThreshold: 5
          ports:
            - name: healthz
              containerPort: 11260
          volumeMounts:
            - name: kubelet-dir
              mountPath: /var/lib/kubelet/
              mountPropagation: "Bidirectional"
            - name: etc
              mountPath: /host/etc
            - name: host-log
              mountPath: /var/log/
            - name: ossconnectordir
              mountPath: /host/usr/
            - name: container-dir
              mountPath: /var/lib/container
              mountPropagation: "Bidirectional"
            - name: host-dev
              mountPath: /dev
              mountPropagation: "HostToContainer"
            - mountPath: /var/addon
              name: addon-token
              readOnly: true
            - mountPath: /host/var/run/
              name: fuse-metrics-dir
            - mountPath: /etc/csi-plugin/config
              name: csi-plugin-cm
            - name: host-mnt
              mountPath: /mnt
              mountPropagation: "Bidirectional"
      volumes:
        - name: fuse-metrics-dir
          hostPath:
            path: /var/run/
            type: DirectoryOrCreate
        - name: registration-dir
          hostPath:
            path: /var/lib/kubelet/plugins_registry
            type: DirectoryOrCreate
        - name: container-dir
          hostPath:
            path: /var/lib/container
            type: DirectoryOrCreate
        - name: kubelet-dir
          hostPath:
            path: /var/lib/kubelet
            type: Directory
        - name: host-dev
          hostPath:
            path: /dev
        - name: host-log
          hostPath:
            path: /var/log/
        - name: etc
          hostPath:
            path: /etc
        - name: ossconnectordir
          hostPath:
            path: /usr/
        - name: host-mnt
          hostPath:
            path: /mnt
            type: DirectoryOrCreate
        - name: csi-plugin-cm
          configMap:
            name: csi-plugin
            optional: true
        - name: addon-token
          secret:
            defaultMode: 420
            optional: true
            items:
              - key: addon.token.config
                path: token-config
            secretName: addon.csi.token
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 20%
    type: RollingUpdate
  • oss-pv.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: oss-pv-oss
  labels:
    alicloud-pvname: oss-pv-oss
spec:
  storageClassName: oss-pv-oss # <-- k3s需要加上这个
  capacity:
    storage: 2Gi
  accessModes:
    - ReadWriteOnce # <-- k3s不支持ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: oss-pv-oss
    nodePublishSecretRef:
      name: oss-secret
      namespace: default
    volumeAttributes:
      bucket: "<bucket name>" # <--换成你的bucket名
      url: "http://oss-cn-shanghai-internal.aliyuncs.com" # <-- 换成你的bucket endpoint,非内网环境需要去掉-internal
      otherOpts: "-o max_stat_cache_size=0 -o allow_other"
      path: "/model/embedding-models/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: oss-pvc-oss
spec:
  storageClassName: oss-pv-oss # <-- k3s需要加上这个
  accessModes:
    - ReadWriteOnce # <-- k3s不支持ReadWriteMany
  resources:
    requests:
      storage: 2Gi
  # selector: <-- k3s需要注释掉这个
  #   matchLabels:
  #     alicloud-pvname: oss-pv-oss
  • nginx-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
        volumeMounts:
          - name: oss-pvc
            mountPath: "/data"
      volumes:
        - name: oss-pvc
          persistentVolumeClaim:
            claimName: oss-pvc-oss

3.创建资源

按顺序应用这些文件,如果在一个目录里可以直接-f 目录名。

kubectl apply -f oss-secret.yml
kubectl apply -f oss-sci-rbac.yml
kubectl apply -f csi-plugin.yml
kubectl apply -f oss-pv.yml
kubectl apply -f nginx-deployment.yml

4. 等待、验证资源状态

执行后,可以等待5-10分钟,期间可以通过kubectl get events查看有无报错事件。过程中会有一些CSIDriver没有ready导致的报错(rpc error)和pvc不存在等的报错,可以稍等一段时间再看。执行kubectl get pod,pv,pvc查看结果,如果nginx已经running表示没问题。可以使用kubectl exec登录到nginx容器看下/data/目录是否有正确的内容。

root@iZuf6d2kzza6r37xg9w6yvZ:~/kube# kube get pod,pv,pvc,CSIDriver
NAME                                    READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-5f75c98766-wjlvl   1/1     Running   0          32m

NAME                          CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                 STORAGECLASS   REASON   AGE
persistentvolume/oss-pv-oss   2Gi        RWO            Retain           Bound    default/oss-pvc-oss   oss-pv-oss              32m

NAME                                STATUS   VOLUME       CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/oss-pvc-oss   Bound    oss-pv-oss   2Gi        RWO            oss-pv-oss     32m

NAME                                                      ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES        AGE
csidriver.storage.k8s.io/ossplugin.csi.alibabacloud.com   false            true             false             <unset>         false               Persistent   32m
# 查看csi插件容器运行状态
root@iZuf6d2kzza6r37xg9w6yvZ:~/kube# kube get pods -n kube-system
NAME                                     READY   STATUS      RESTARTS      AGE
helm-install-traefik-s8nkx               0/1     Completed   1             38d
helm-install-traefik-crd-k5p96           0/1     Completed   0             38d
svclb-traefik-25217d25-9k424             2/2     Running     8 (74m ago)   38d
traefik-64f55bb67d-lxqfv                 1/1     Running     4 (74m ago)   38d
coredns-77ccd57875-nh5vz                 1/1     Running     4 (74m ago)   38d
local-path-provisioner-957fdf8bc-zwxqj   1/1     Running     7 (73m ago)   38d
metrics-server-648b5df564-9drvq          1/1     Running     7 (73m ago)   38d
csi-plugin-q4ns4 # 这个就是csi-oss插件容器  2/2     Running     0             32m 

5. 故障排查

如果遇到问题,可以组合使用以下命令进行排查

kubectl describe pod <pod-name>
kubectl describe pvc <pvc-name>
kubectl get events # 这些事件在describe命令中也可见
  • 如果存在配置错误,或不支持的特性(AccessMode等),通常会在describe pvc的事件中体现。
  • 如果发现 FailedMount 事件(如下例): 在CSI尚未完全启动的过程中,也可能会看到这样的event。如果想要查看错误原因,可以根据 官方指引 将"--"后具体的命令粘贴到节点命令行中,尝试运行,就会提示相应的错误(如endpoint url格式不对、libssl依赖库缺失等)

Warning FailedMount 3s kubelet MountVolume.SetUp failed for volume "<PV_NAME>" : rpc error: code = Unknown desc = Mount is failed in host, mntCmd:systemd-run --scope -- /usr/local/bin/ossfs xxx:/path/xxx /var/lib/kubelet/pods/pod_uid_xxxx/volumes/kubernetes.io~csi/pv_name_xxx/mount -ourl=oss-cn-beijing-internal.aliyuncs.com -o allow_other , err: error_message_xxx with error: exit status 1

Todo

  1. 把sci插件、rbac角色、secret凭据等放到另外的namespace(相关链接1)
  2. 如有必要,应对secret进行加密

相关链接

  1. https://blog.etby.org/2020/04/28/k8s-aliyun-oss/
  2. https://blog.csdn.net/weixin_40449300/article/details/106938845
  3. https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/mount-statically-provisioned-oss-volumes
  4. https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver

关键词

k8s k3s oss csi 阿里云 挂载卷

相关文章

网友评论

      本文标题:k3s/k8s使用阿里云oss静态挂载卷记录

      本文链接:https://www.haomeiwen.com/subject/hdedmdtx.html