美文网首页
volcano基于guarantee和capability的调度

volcano基于guarantee和capability的调度

作者: oo的布丁 | 来源:发表于2022-08-02 13:48 被阅读0次

    一、问题背景

    在使用kubernenetes云原生系统的过程中,随着业务场景的不断复杂,kubernetes默认的调度器便难以支撑,因此很多公司都会选择一些优秀的开源项目来替换默认的调度器,volcano就是其中之一。volcano有诸多有点,比如gang、backfill等,在此就不一一展开。但在使用中也存在一些不便,比如下面这种场景:
    集群一共20C资源,3个团队,要求每个团队最少要保证5C资源,最多不能超过10C,这个时候如果将volcano的queue的guarantee设置成5C,capability设置成10C的话,就会出现超配。

    二、问题解析

    我们再剖析一下造成“一” 中问题的原因,volcano在设计Queue的过程中,为了支持各种场景下的调度策略,给Queue加了几个属性:weight、guarantee和capability。weight表示集群剩余资源的分配权重,是可动态多次分配的;guarantee表示预占资源,是一个最小资源保障;capability表示最大资源,是一个最大资源限制;deserved表示当前session轮次queue分配到的资源。当某个queue的guarantee超过weight所占集群的资源的时候,会出现既要满足各个queue的weight,又要满足某个queue的guarantee,最终的deserved总和就会超过集群总资源的问题。
    官方代码:

    `
    for {
    totalWeight := int32(0)
    for _, attr := range pp.queueOpts {
    if _, found := meet[attr.queueID]; found {
    continue
    }
    totalWeight += attr.weight
    }

        // If no queues, break
        if totalWeight == 0 {
            klog.V(4).Infof("Exiting when total weight is 0")
            break
        }
    
        oldRemaining := remaining.Clone()
        // Calculates the deserved of each Queue.
        // increasedDeserved is the increased value for attr.deserved of processed queues
        // decreasedDeserved is the decreased value for attr.deserved of processed queues
        increasedDeserved := api.EmptyResource()
        decreasedDeserved := api.EmptyResource()
        for _, attr := range pp.queueOpts {
            klog.V(4).Infof("Considering Queue <%s>: weight <%d>, total weight <%d>.",
                attr.name, attr.weight, totalWeight)
            if _, found := meet[attr.queueID]; found {
                continue
            }
    
            oldDeserved := attr.deserved.Clone()
            attr.deserved.Add(remaining.Clone().Multi(float64(attr.weight) / float64(totalWeight)))
    
            if attr.realCapability != nil {
                attr.deserved.MinDimensionResource(attr.realCapability, api.Infinity)
            }
            attr.deserved.MinDimensionResource(attr.request, api.Zero)
    
            klog.V(4).Infof("Format queue <%s> deserved resource to <%v>", attr.name, attr.deserved)
    
            if attr.request.LessEqual(attr.deserved, api.Zero) {
                meet[attr.queueID] = struct{}{}
                klog.V(4).Infof("queue <%s> is meet", attr.name)
            } else if reflect.DeepEqual(attr.deserved, oldDeserved) {
                meet[attr.queueID] = struct{}{}
                klog.V(4).Infof("queue <%s> is meet cause of the capability", attr.name)
            }
            attr.deserved = helpers.Max(attr.deserved, attr.guarantee)
            pp.updateShare(attr)
    
            klog.V(4).Infof("The attributes of queue <%s> in proportion: deserved <%v>, realCapability <%v>, allocate <%v>, request <%v>, share <%0.2f>",
                attr.name, attr.deserved, attr.realCapability, attr.allocated, attr.request, attr.share)
    
            increased, decreased := attr.deserved.Diff(oldDeserved, api.Zero)
            increasedDeserved.Add(increased)
            decreasedDeserved.Add(decreased)
    
            // Record metrics
            metrics.UpdateQueueDeserved(attr.name, attr.deserved.MilliCPU, attr.deserved.Memory)
        }
    
        remaining.Sub(increasedDeserved).Add(decreasedDeserved)
        klog.V(4).Infof("Remaining resource is  <%s>", remaining)
        if remaining.IsEmpty() || reflect.DeepEqual(remaining, oldRemaining) {
            klog.V(4).Infof("Exiting when remaining is empty or no queue has more reosurce request:  <%v>", remaining)
            break
        }
    }
    

    `

    三、我的场景及设计

    场景:假设集群一共有20C资源,现在有3个团队,要求每个团队最少要保证使用5C资源,最多不能超过10C资源。
    设计:在此场景下,其实是只需要设置queue的guarantee为5C,capability为10C,而不需要考虑weight属性。所以在此场景下,我们重新设计了方案,步骤如下:
    1、优先分配各个queue的guarantee资源,前提是guarantee总和小于集群总资源,否则可以直接panic出来;
    2、

    相关文章

      网友评论

          本文标题:volcano基于guarantee和capability的调度

          本文链接:https://www.haomeiwen.com/subject/cdxjwrtx.html