美文网首页Kubernetes
kubernets pod调度原理

kubernets pod调度原理

作者: ywhu | 来源:发表于2017-10-25 23:16 被阅读545次

    kubernets中负责pod调度的重要模块是kube-schduler。kube-scheduler就是调度安排Pod到具体的Node,,kube-scheduler通过API Server提供的接口监听Pod任务列表,获取待调度pod,然后根据一系列的预选策略和优选策略给各个Node节点打分,然后将Pod发送到得分最高的Node节点上,同时将绑定信息写入etcd.

    node节点上的kubelet通过kuber-apiserver的监听,获取kube-scheduler产生的绑定事件,获取pod清单,下载镜像,启动容器。

    调度策略

    Kubernetes的调度策略分为Predicates(预选策略)和Priorites(优选策略),整个调度过程分为两步:

    1. 预选策略,Predicates是强制性规则,遍历所有的Node节点,按照具体的预选策略筛选出符合要求的Node列表,如没有Node符合Predicates策略规则,那该Pod就会被挂起,直到有Node能够满足。

    2. 优选策略,在第一步筛选的基础上,按照优选策略为待选Node打分排序,获取最优者。

    1. 源码位置:

    predicates包为k8s支持的所有预选策略

    priorites包为k8s支持的所有优选策略

    algorithmprovider包下的defaults包为默认的预选和优选策略

    Predicates 预选策略

    v1.7支持15个策略,Kubernetes(v1.7)中可用的Predicates策略有:

    • MatchNodeSelector:检查spec.nodeSelector是否包含Node节点的label定义
    • PodFitsResources:检查主机的资源(cpu和内存)是否满足Pod的需求,根据实际已经分配(Limit)的资源量做调度
    • PodFitsHostPorts:检查Pod内每一个容器所需的HostPort是否已被其它容器占用,如果有所需的HostPort不满足需求,那么Pod不能调度到这个主机上
    • HostName:检查主机名称是不是Pod指定的NodeName
    • NoDiskConflict:根据pod.spec.volumes检查在此主机上是否存在卷冲突。如果这个主机已经挂载了卷,其它同样使用这个卷的Pod不能调度到这个主机上,不同的存储后端具体规则不同
    • NoVolumeZoneConflict:检查给定的zone限制前提下,检查如果在此主机上部署Pod是否存在卷冲突
    • PodToleratesNodeTaints:确保pod定义的tolerates能接纳node定义的taints
    • CheckNodeMemoryPressure:检查pod是否可以调度到已经报告了主机内存压力过大的节点
    • CheckNodeDiskPressure:检查pod是否可以调度到已经报告了主机的存储压力过大的节点
    • MaxEBSVolumeCount:确保已挂载的EBS存储卷不超过设置的最大值,默认39
    • MaxGCEPDVolumeCount:确保已挂载的GCE存储卷不超过设置的最大值,默认16
    • MaxAzureDiskVolumeCount:确保已挂载的Azure存储卷不超过设置的最大值,默认16
    • MatchInterPodAffinity:检查pod和其他pod是否符合亲和性规则
    • GeneralPredicates:检查pod与主机上kubernetes相关组件是否匹配
    • NoVolumeNodeConflict:检查给定的Node限制前提下,检查如果在此主机上部署Pod是否存在卷冲突

    Priorites 优选策略

    Kubernetes(v1.7)中可用的Priorites策略有:

    • EqualPriority:所有节点同样优先级
    • ImageLocalityPriority:根据主机上是否已具备Pod运行的环境来打分,得分计算:不存在所需镜像,返回0分,存在镜像,镜像越大得分越高
    • LeastRequestedPriority:计算Pods需要的CPU和内存在当前节点可用资源的百分比,具有最小百分比的节点就是最优,得分计算公式
    cpu((capacity – sum(requested)) * 10 / capacity) + memory((capacity – sum(requested)) * 10 / capacity) / 2
    
    • BalancedResourceAllocation:节点上各项资源(CPU、内存)使用率最均衡的为最优,得分计算公式
    10 – abs(totalCpu/cpuNodeCapacity-totalMemory/memoryNodeCapacity)*10
    
    • SelectorSpreadPriority:按Service和Replicaset归属计算Node上分布最少的同类Pod数量,得分计算:数量越少得分越高
    • NodePreferAvoidPodsPriority:判断alpha.kubernetes.io/preferAvoidPods属性,设置权重为10000,覆盖其他策略
    • NodeAffinityPriority:节点亲和性选择策略,提供两种选择器支持:requiredDuringSchedulingIgnoredDuringExecution(保证所选的主机必须满足所有Pod对主机的规则要求)、preferresDuringSchedulingIgnoredDuringExecution(调度器会尽量但不保证满足NodeSelector的所有要求)
    • TaintTolerationPriority:类似于Predicates策略中的PodToleratesNodeTaints,优先调度到标记了Taint的节点
    • InterPodAffinityPriority:pod亲和性选择策略,类似NodeAffinityPriority,提供两种选择器支持:requiredDuringSchedulingIgnoredDuringExecution(保证所选的主机必须满足所有Pod对主机的规则要求)、preferresDuringSchedulingIgnoredDuringExecution(调度器会尽量但不保证满足NodeSelector的所有要求)
    • MostRequestedPriority:动态伸缩集群环境比较适用,会优先调度pod到使用率最高的主机节点,这样在伸缩集群时,就会腾出空闲机器,从而进行停机处理。

    默认策略

    默认预选策略

    func defaultPredicates() sets.String {
        predSet := sets.NewString(
            
            factory.RegisterFitPredicateFactory(
                "NoVolumeZoneConflict",
                func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
                    return predicates.NewVolumeZonePredicate(args.PVInfo, args.PVCInfo)
                },
            ),
        
            factory.RegisterFitPredicateFactory(
                "MaxEBSVolumeCount",
                func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
                    // TODO: allow for generically parameterized scheduler predicates, because this is a bit ugly
                    maxVols := getMaxVols(aws.DefaultMaxEBSVolumes)
                    return predicates.NewMaxPDVolumeCountPredicate(predicates.EBSVolumeFilter, maxVols, args.PVInfo, args.PVCInfo)
                },
            ),
        
            factory.RegisterFitPredicateFactory(
                "MaxGCEPDVolumeCount",
                func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
                    // TODO: allow for generically parameterized scheduler predicates, because this is a bit ugly
                    maxVols := getMaxVols(DefaultMaxGCEPDVolumes)
                    return predicates.NewMaxPDVolumeCountPredicate(predicates.GCEPDVolumeFilter, maxVols, args.PVInfo, args.PVCInfo)
                },
            ),
        
            factory.RegisterFitPredicateFactory(
                "MaxAzureDiskVolumeCount",
                func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
                    // TODO: allow for generically parameterized scheduler predicates, because this is a bit ugly
                    maxVols := getMaxVols(DefaultMaxAzureDiskVolumes)
                    return predicates.NewMaxPDVolumeCountPredicate(predicates.AzureDiskVolumeFilter, maxVols, args.PVInfo, args.PVCInfo)
                },
            ),
        
            factory.RegisterFitPredicateFactory(
                predicates.MatchInterPodAffinity,
                func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
                    return predicates.NewPodAffinityPredicate(args.NodeInfo, args.PodLister)
                },
            ),
    
        
            factory.RegisterFitPredicate("NoDiskConflict", predicates.NoDiskConflict),
    
        
    
            factory.RegisterFitPredicate("GeneralPredicates", predicates.GeneralPredicates),
    
        
            factory.RegisterFitPredicate("CheckNodeMemoryPressure", predicates.CheckNodeMemoryPressurePredicate),
    
        
            factory.RegisterFitPredicate("CheckNodeDiskPressure", predicates.CheckNodeDiskPressurePredicate),
    
            
            factory.RegisterFitPredicateFactory(
                "NoVolumeNodeConflict",
                func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
                    return predicates.NewVolumeNodePredicate(args.PVInfo, args.PVCInfo, nil)
                },
            ),
        )
    
        if utilfeature.DefaultFeatureGate.Enabled(features.TaintNodesByCondition) {
        
            predSet.Insert(factory.RegisterMandatoryFitPredicate("PodToleratesNodeTaints", predicates.PodToleratesNodeTaints))
            glog.Warningf("TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory")
        } else {
        
            predSet.Insert(factory.RegisterMandatoryFitPredicate("CheckNodeCondition", predicates.CheckNodeConditionPredicate))
            
            predSet.Insert(factory.RegisterFitPredicate("PodToleratesNodeTaints", predicates.PodToleratesNodeTaints))
        }
    
        return predSet
    }
    

    默认优选策略

        
        func defaultPriorities() sets.String {
        return sets.NewString(
        
            factory.RegisterPriorityConfigFactory(
                "SelectorSpreadPriority",
                factory.PriorityConfigFactory{
                    Function: func(args factory.PluginFactoryArgs) algorithm.PriorityFunction {
                        return priorities.NewSelectorSpreadPriority(args.ServiceLister, args.ControllerLister, args.ReplicaSetLister, args.StatefulSetLister)
                    },
                    Weight: 1,
                },
            ),
        
            
            factory.RegisterPriorityConfigFactory(
                "InterPodAffinityPriority",
                factory.PriorityConfigFactory{
                    Function: func(args factory.PluginFactoryArgs) algorithm.PriorityFunction {
                        return priorities.NewInterPodAffinityPriority(args.NodeInfo, args.NodeLister, args.PodLister, args.HardPodAffinitySymmetricWeight)
                    },
                    Weight: 1,
                },
            ),
        
        
            factory.RegisterPriorityFunction2("LeastRequestedPriority", priorities.LeastRequestedPriorityMap, nil, 1),
        
        
            factory.RegisterPriorityFunction2("BalancedResourceAllocation", priorities.BalancedResourceAllocationMap, nil, 1),
        
            
            factory.RegisterPriorityFunction2("NodePreferAvoidPodsPriority", priorities.CalculateNodePreferAvoidPodsPriorityMap, nil, 10000),
        
            
            factory.RegisterPriorityFunction2("NodeAffinityPriority", priorities.CalculateNodeAffinityPriorityMap, priorities.CalculateNodeAffinityPriorityReduce, 1),
        
        
            factory.RegisterPriorityFunction2("TaintTolerationPriority", priorities.ComputeTaintTolerationPriorityMap, priorities.ComputeTaintTolerationPriorityReduce, 1),
        )
    }
    

    默认注册但不加载的策略

    预选策略

    // Registers predicates and priorities that are not enabled by default, but user can pick when creating his
    // own set of priorities/predicates.
    
    factory.RegisterFitPredicate("PodFitsPorts", predicates.PodFitsHostPorts)
    
    factory.RegisterFitPredicate("PodFitsHostPorts", predicates.PodFitsHostPorts)
    
    factory.RegisterFitPredicate("PodFitsResources", predicates.PodFitsResources)
    
    factory.RegisterFitPredicate("HostName", predicates.PodFitsHost)
    
    factory.RegisterFitPredicate("MatchNodeSelector", predicates.PodMatchNodeSelector)
    

    优选策略

    
    factory.RegisterPriorityFunction2("EqualPriority", core.EqualPriorityMap, nil, 1)
    
    factory.RegisterPriorityFunction2("ImageLocalityPriority", priorities.ImageLocalityPriorityMap, nil, 1)
    
    factory.RegisterPriorityFunction2("MostRequestedPriority", priorities.MostRequestedPriorityMap, nil, 1)
    

    相关文章

      网友评论

        本文标题:kubernets pod调度原理

        本文链接:https://www.haomeiwen.com/subject/hgzcpxtx.html