美文网首页
Openshift3.11 如何调度GPU任务POD

Openshift3.11 如何调度GPU任务POD

作者: frederickhou | 来源:发表于2019-10-08 14:56 被阅读0次

参考https://blog.openshift.com/how-to-use-gpus-with-deviceplugin-in-openshift-3-10/

基础环境安装:NVIDIA驱动、nvidia-docker安装与测试详见参考链接。
本文主要提供经过实践验证过的创建SCC服务的nvidia-deviceplugin-scc.yaml 文件和 创建NVIDIA Device Plugin daemonset 的nvidia-deviceplugin.yaml文件以及测试调用GPU POD test-gpu.yaml 文件。

  • 1.创建SCC服务 "oc create -f nvidia-deviceplugin-scc.yaml"

      allowHostDirVolumePlugin: true
      allowHostIPC: true
      allowHostNetwork: true
      allowHostPID: true
      allowHostPorts: true
      allowPrivilegedContainer: true
      allowedCapabilities:
      - '*'
      allowedFlexVolumes: null
      apiVersion: v1
      defaultAddCapabilities:
      - '*'
      fsGroup:
      type: RunAsAny
      groups:
      - system:cluster-admins
      - system:nodes
      - system:masters
      kind: SecurityContextConstraints
      metadata:
      annotations:
      kubernetes.io/description: anyuid provides all features of the restricted SCC
          but allows users to run with any UID and any GID.
      creationTimestamp: null
      name: nvidia-deviceplugin
      priority: 10
      readOnlyRootFilesystem: false
      requiredDropCapabilities:
      runAsUser:
      type: RunAsAny
      seLinuxContext:
      type: RunAsAny
      seccompProfiles:
      - '*'
      supplementalGroups:
      type: RunAsAny
      users:
      - system:serviceaccount:nvidia:nvidia-deviceplugin
      volumes:
      - '*':
    
  • 2.创建 NVIDIA Device Plugin daemonset "oc create -f nvidia-deviceplugin.yaml"

    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
    name: nvidia-device-plugin-daemonset
    namespace: nvidia
    spec:
    template:
        metadata:
        # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
        # reserves resources for critical add-on pods so that they can be rescheduled after
        # a failure.  This annotation works in tandem with the toleration below.
        annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ""
        labels:
            name: nvidia-device-plugin-ds
        spec:
        affinity:
            nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                - key: openshift.com/gpu-accelerator
                    operator: Exists
        tolerations:
        # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
        # This, along with the annotation above marks this pod as a critical add-on.
        - key: CriticalAddonsOnly
            operator: Exists
        - key: nvidia.com/gpu
            operator: Exists
            effect: NoSchedule
        serviceAccount: nvidia-deviceplugin
        serivceAccountName: nvidia-deviceplugin
        hostNetwork: true
        hostPID: true
        containers:
        - image: nvidia/k8s-device-plugin:1.11
            name: nvidia-device-plugin-ctr
            securityContext:
            allowPrivilegeEscalation: false
            capabilities:
                drop: ["ALL"]
            seLinuxOptions:
                type: nvidia_container_t
            volumeMounts:
            - name: device-plugin
                mountPath: /var/lib/kubelet/device-plugins
        volumes:
            - name: device-plugin
            hostPath:
                path: /var/lib/kubelet/device-plugins
Inked图片3_LI.jpg
  • 3.测试调用GPU "测试调用GPU POD test-gpu.yaml"
    apiVersion: v1
    kind: Pod
    metadata:
    name: cuda3
    namespace: nvidia
    spec:
    restartPolicy: OnFailure
    containers:
        - name: cuda3
        image: "docker.io/nvidia/cuda:9.0-base"
        args: ["nvidia-smi"]
        resources:
            limits:
            nvidia.com/gpu: 3 # requesting 3 GPU
图片5.png

相关文章

  • Openshift3.11 如何调度GPU任务POD

    参考:https://blog.openshift.com/how-to-use-gpus-with-device...

  • k8s 集群调度

    关于调度 k8s内pod由scheduler调度,scheduler的任务是把pod分配到合适的node节点上。s...

  • 把显卡/GPU跑在k8s集群里

    使k8s集群可以调度GPU 背景:最近接到任务,要使k8s集群支持调度GPU,我对硬件资源不是很懂,大概看了看官方...

  • 关于kubernates的Pod调度策略

    Pod调度方式有,自动调度、定向调度、Node亲和性调度、Pod亲和性和互斥性调度 自动调度:Deployment...

  • K8s Scheduler(4)

    scheduler故名思维负责资源的调度,按照预定的调度策略将Pod调度到相应的机器上。 调度流程 给pod做调度...

  • K8s Scheduler(4-2)

    理解resource limit 会如何影响pod 调度 Kubernetes 提供request 和 limit...

  • Kubernetes的Pod调度算法

    Kubernetes的Pod调度算法 在本文档中,解释了如何为Pod选择节点的算法。在选择Pod的目标节点之前有两...

  • 容器云基础系列 01 pod

    pod是容器云调度的基础单位也就是最小单位,在容器集群中,容器是以pod为单位进行调度的。 pod是什么? pod...

  • spring 任务调度(定时任务)

    spring 任务调度(定时任务) 本文将告诉你如何使用spring的任务调度。主要使用@Scheduled注解 ...

  • K8S-污点与污点容忍

    Taints(污点):避免Pod调度到特定的Node上Tolerations(污点容忍): 允许Pod调度到持有T...

网友评论

      本文标题:Openshift3.11 如何调度GPU任务POD

      本文链接:https://www.haomeiwen.com/subject/nhmbpctx.html