背景
尝试使用k3s构建一个工程,遇到了阿里云内网连接oss作为pv的需求。oss卷读写性能一般,更适合于一次性读取(不写入)的情况。
直接看结论
- 阿里云官方文档只有在自家ACK等集群上使用oss的方法,其他机器(如普通ecs)则需要手动安装csi插件。
- 阿里云在 kubernetes-sigs/alibaba-cloud-csi-driver 项目中有一些样例yml文件(分为ecs和非ecs,差一个云盘挂载卷的支持,本文不涉及),是主要的参考来源。
- 网络上有一些k8s使用oss的记录,比较旧但依然可用(2023.8验证)。
- 本文补充了k3s与k8s环境下的区别,新版安装记录,错误排查过程。
关于k3s区别
- k3s的pv的spec里不允许有selector字段,需要使用storageClassName进行关联。
- k3s的local-path类型accessModes只能为ReadWriteOnce(ReadWriteOnce是指限制同时挂载到1个node,注意不是限制1个pod访问)。
过程记录
过程参考了csdn博客(相关链接2)和阿里云官方教程(相关链接3)。
1. 安装ossfs
从 阿里云ossfs安装说明页 找到下载链接,并下载对应的平台版本。
注意,需要将url引起来,或者删除掉get参数,否则url里面的
&
会影响下载。
# 下载
wget 'https://gosspublic.alicdn.com/ossfs/ossfs_1.91.1_ubuntu22.04_amd64.deb'
# 安装
sudo apt-get update
sudo apt-get install gdebi-core
sudo gdebi ossfs_1.91.1_ubuntu16.04_amd64.deb
2. 准备yml文件
- oss-secret.yml
apiVersion: v1
kind: Secret
metadata:
name: oss-secret
namespace: default
stringData:
akId: <yourAccessKey ID>
akSecret: <yourAccessKey Secret>
- oss-sci-rbac.yml
内容比较长,直接从github下载。如果无法访问,也可以使用 GitCode加速源。
wget https://raw.githubusercontent.com/kubernetes-sigs/alibaba-cloud-csi-driver/master/deploy/nonecs/rbac.yaml
# 国内加速源
wget https://gitcode.net/mirrors/kubernetes-sigs/alibaba-cloud-csi-driver/-/raw/master/deploy/nonecs/rbac.yaml
- csi-plugin.yml
这个是根据阿里云官方csi-driver.yaml
和csi-plugin.yml
合并的版本,并去除了挂载nas相关的逻辑。可以看到volumes中有很多目录,相比三年前文章中的版本多出不少,应该还能去除掉一部分。
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
name: ossplugin.csi.alibabacloud.com
spec:
attachRequired: false
podInfoOnMount: true
---
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: csi-plugin
namespace: kube-system
spec:
selector:
matchLabels:
app: csi-plugin
template:
metadata:
labels:
app: csi-plugin
spec:
tolerations:
- operator: Exists
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: NotIn
values:
- virtual-kubelet
nodeSelector:
kubernetes.io/os: linux
serviceAccount: csi-admin
priorityClassName: system-node-critical
hostNetwork: true
hostPID: true
dnsPolicy: ClusterFirst
containers:
- name: oss-driver-registrar
image: registry-cn-hangzhou.ack.aliyuncs.com/acs/csi-node-driver-registrar:v2.3.1-038aeb6-aliyun
resources:
requests:
cpu: 10m
memory: 16Mi
limits:
cpu: 500m
memory: 1024Mi
args:
- "--v=5"
- "--csi-address=/var/lib/kubelet/csi-plugins/ossplugin.csi.alibabacloud.com/csi.sock"
- "--kubelet-registration-path=/var/lib/kubelet/csi-plugins/ossplugin.csi.alibabacloud.com/csi.sock"
volumeMounts:
- name: kubelet-dir
mountPath: /var/lib/kubelet/
- name: registration-dir
mountPath: /registration
- name: csi-plugin
securityContext:
privileged: true
allowPrivilegeEscalation: true
image: registry-cn-hangzhou.ack.aliyuncs.com/acs/csi-plugin:v1.24.9-74f8490-aliyun
args:
- "--endpoint=$(CSI_ENDPOINT)"
- "--v=2"
- "--nodeid=$(KUBE_NODE_NAME)"
- "--driver=oss"
env:
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CSI_ENDPOINT
value: unix://var/lib/kubelet/csi-plugins/driverplugin.csi.alibabacloud.com-replace/csi.sock
- name: MAX_VOLUMES_PERNODE
value: "15"
- name: SERVICE_TYPE
value: "plugin"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 1024Mi
livenessProbe:
httpGet:
path: /healthz
port: healthz
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 5
readinessProbe:
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 5
ports:
- name: healthz
containerPort: 11260
volumeMounts:
- name: kubelet-dir
mountPath: /var/lib/kubelet/
mountPropagation: "Bidirectional"
- name: etc
mountPath: /host/etc
- name: host-log
mountPath: /var/log/
- name: ossconnectordir
mountPath: /host/usr/
- name: container-dir
mountPath: /var/lib/container
mountPropagation: "Bidirectional"
- name: host-dev
mountPath: /dev
mountPropagation: "HostToContainer"
- mountPath: /var/addon
name: addon-token
readOnly: true
- mountPath: /host/var/run/
name: fuse-metrics-dir
- mountPath: /etc/csi-plugin/config
name: csi-plugin-cm
- name: host-mnt
mountPath: /mnt
mountPropagation: "Bidirectional"
volumes:
- name: fuse-metrics-dir
hostPath:
path: /var/run/
type: DirectoryOrCreate
- name: registration-dir
hostPath:
path: /var/lib/kubelet/plugins_registry
type: DirectoryOrCreate
- name: container-dir
hostPath:
path: /var/lib/container
type: DirectoryOrCreate
- name: kubelet-dir
hostPath:
path: /var/lib/kubelet
type: Directory
- name: host-dev
hostPath:
path: /dev
- name: host-log
hostPath:
path: /var/log/
- name: etc
hostPath:
path: /etc
- name: ossconnectordir
hostPath:
path: /usr/
- name: host-mnt
hostPath:
path: /mnt
type: DirectoryOrCreate
- name: csi-plugin-cm
configMap:
name: csi-plugin
optional: true
- name: addon-token
secret:
defaultMode: 420
optional: true
items:
- key: addon.token.config
path: token-config
secretName: addon.csi.token
updateStrategy:
rollingUpdate:
maxUnavailable: 20%
type: RollingUpdate
- oss-pv.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: oss-pv-oss
labels:
alicloud-pvname: oss-pv-oss
spec:
storageClassName: oss-pv-oss # <-- k3s需要加上这个
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce # <-- k3s不支持ReadWriteMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: ossplugin.csi.alibabacloud.com
volumeHandle: oss-pv-oss
nodePublishSecretRef:
name: oss-secret
namespace: default
volumeAttributes:
bucket: "<bucket name>" # <--换成你的bucket名
url: "http://oss-cn-shanghai-internal.aliyuncs.com" # <-- 换成你的bucket endpoint,非内网环境需要去掉-internal
otherOpts: "-o max_stat_cache_size=0 -o allow_other"
path: "/model/embedding-models/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: oss-pvc-oss
spec:
storageClassName: oss-pv-oss # <-- k3s需要加上这个
accessModes:
- ReadWriteOnce # <-- k3s不支持ReadWriteMany
resources:
requests:
storage: 2Gi
# selector: <-- k3s需要注释掉这个
# matchLabels:
# alicloud-pvname: oss-pv-oss
- nginx-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
volumeMounts:
- name: oss-pvc
mountPath: "/data"
volumes:
- name: oss-pvc
persistentVolumeClaim:
claimName: oss-pvc-oss
3.创建资源
按顺序应用这些文件,如果在一个目录里可以直接-f 目录名。
kubectl apply -f oss-secret.yml
kubectl apply -f oss-sci-rbac.yml
kubectl apply -f csi-plugin.yml
kubectl apply -f oss-pv.yml
kubectl apply -f nginx-deployment.yml
4. 等待、验证资源状态
执行后,可以等待5-10分钟,期间可以通过kubectl get events
查看有无报错事件。过程中会有一些CSIDriver没有ready导致的报错(rpc error)和pvc不存在等的报错,可以稍等一段时间再看。执行kubectl get pod,pv,pvc
查看结果,如果nginx已经running表示没问题。可以使用kubectl exec
登录到nginx容器看下/data/
目录是否有正确的内容。
root@iZuf6d2kzza6r37xg9w6yvZ:~/kube# kube get pod,pv,pvc,CSIDriver
NAME READY STATUS RESTARTS AGE
pod/nginx-deployment-5f75c98766-wjlvl 1/1 Running 0 32m
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/oss-pv-oss 2Gi RWO Retain Bound default/oss-pvc-oss oss-pv-oss 32m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/oss-pvc-oss Bound oss-pv-oss 2Gi RWO oss-pv-oss 32m
NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGE
csidriver.storage.k8s.io/ossplugin.csi.alibabacloud.com false true false <unset> false Persistent 32m
# 查看csi插件容器运行状态
root@iZuf6d2kzza6r37xg9w6yvZ:~/kube# kube get pods -n kube-system
NAME READY STATUS RESTARTS AGE
helm-install-traefik-s8nkx 0/1 Completed 1 38d
helm-install-traefik-crd-k5p96 0/1 Completed 0 38d
svclb-traefik-25217d25-9k424 2/2 Running 8 (74m ago) 38d
traefik-64f55bb67d-lxqfv 1/1 Running 4 (74m ago) 38d
coredns-77ccd57875-nh5vz 1/1 Running 4 (74m ago) 38d
local-path-provisioner-957fdf8bc-zwxqj 1/1 Running 7 (73m ago) 38d
metrics-server-648b5df564-9drvq 1/1 Running 7 (73m ago) 38d
csi-plugin-q4ns4 # 这个就是csi-oss插件容器 2/2 Running 0 32m
5. 故障排查
如果遇到问题,可以组合使用以下命令进行排查
kubectl describe pod <pod-name>
kubectl describe pvc <pvc-name>
kubectl get events # 这些事件在describe命令中也可见
- 如果存在配置错误,或不支持的特性(AccessMode等),通常会在describe pvc的事件中体现。
- 如果发现 FailedMount 事件(如下例): 在CSI尚未完全启动的过程中,也可能会看到这样的event。如果想要查看错误原因,可以根据 官方指引 将"
--
"后具体的命令粘贴到节点命令行中,尝试运行,就会提示相应的错误(如endpoint url格式不对、libssl依赖库缺失等)
Warning FailedMount 3s kubelet MountVolume.SetUp failed for volume "<PV_NAME>" : rpc error: code = Unknown desc = Mount is failed in host, mntCmd:systemd-run --scope -- /usr/local/bin/ossfs xxx:/path/xxx /var/lib/kubelet/pods/pod_uid_xxxx/volumes/kubernetes.io~csi/pv_name_xxx/mount -ourl=oss-cn-beijing-internal.aliyuncs.com -o allow_other , err: error_message_xxx with error: exit status 1
Todo
- 把sci插件、rbac角色、secret凭据等放到另外的namespace(相关链接1)
- 如有必要,应对secret进行加密
相关链接
- https://blog.etby.org/2020/04/28/k8s-aliyun-oss/
- https://blog.csdn.net/weixin_40449300/article/details/106938845
- https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/mount-statically-provisioned-oss-volumes
- https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver
关键词
k8s k3s oss csi 阿里云 挂载卷
网友评论