美文网首页
Binary Search模板总结与LeetCode经典范题

Binary Search模板总结与LeetCode经典范题

作者: 廖致君 | 来源:发表于2020-08-05 01:02 被阅读0次

    Binary Seach在概念上很容易理解,无非就是每次将搜索空间划分为两份,只保留这二者中可能包含目标值的那一份,持续二分下去,直到完成搜索为止,由此将搜索复杂度从线性时间O(n)减少至对数时间O(log\ n)。但在具体实现上,想要在几分钟内成功写出一份bug free的代码并不是件简单的事情。我们常常会遇到一些恼人的细节问题,诸如while循环的条件应该用while left < right还是用while left <= right,初始化边界的时候leftright该设为什么值,更新边界的时候应该如何从left = mid, left = mid + 1, right = mid, right = mid - 1之中选择合适的组合。另外,Binary Search并不是只能适用于「给定一个数组,搜索一个目标数字」这样简单的场合,它有着更为一般化的应用场景。

    我会在这篇文章里详细地总结这些内容,并把它应用到LeetCode的实际题目中。我不希望只是简单地贴出每道题目的代码,我所希望分享的是思路,是如何将最一般化的Binary Search模板应用到各种题目上面,让大家不要再有“诶呀原来这道题目可以用二分查找来做! 我怎么就没有想到呢!”的懊恼。


    一般化的Binary Search

    给定一个从小到大排好序的数组array,在不同的场景下,我们会有不同的搜索目标,诸如搜索特定的数字target,给一个数字寻找插入的位置等。对于绝大多数任务,我们都可以将其转化下面的最一般化的形式
    min\ k \\ s.t. \ condition(k) \ is\ True\\
    下面是最一般化的Binary Search模板:

    def binary_search(array) -> int:
        def condition(value) -> bool:
            pass
    
        left, right = 0, len(array)
        while left < right:
            mid = left + (right - left) // 2
            if condition(mid):
                right = mid
            else:
                left = mid + 1
        return left
    

    这一模板的强大之处在于,对于绝大多数的Binary Search题目,我们只需要修改三个部分就可以照搬这个模板,而不必再去担心代码bug和角点解的问题。

    • 正确地初始化leftright这两个变量。只需要满足一点要求即可:初始化之后的边界必须囊括所有可能的元素
    • 根据具体应用场景修改返回值,是return left还是return left - 1还是return -1。只需要记住一点:在while循环终止之后,left便是上面一般化公式中的最小的k值,并且此时leftright相等;
    • 构思好condition函数具体是什么,这是最困难最巧妙的部分,也是最需要勤加练习的部分。

    下面我会结合一些经典高质量的LeetCode题目来展示如何套用这个万能的模板。


    基本应用

    278. First Bad Version [Easy]

    You are a product manager and currently leading a team to develop a new product. Since each version is developed based on the previous version, all the versions after a bad version are also bad. Suppose you have n versions [1, 2, ..., n] and you want to find out the first bad one, which causes all the following ones to be bad. You are given an API bool isBadVersion(version) which will return whether version is bad.

    Example:

    Given n = 5, and version = 4 is the first bad version.
    
    call isBadVersion(3) -> false
    call isBadVersion(5) -> true
    call isBadVersion(4) -> true
    
    Then 4 is the first bad version. 
    

    首先初始化边界,因为搜索范围是[1, 2, ..., n]所以我们设置left, right = 1, n,然后我们注意到,模板里的condition函数实际上已经给好了(直接调用提供的API即可),我们的搜索目标是找到最小的索引值k^{*},此时版本k^{*}是First Bad Version,而版本k^{*} - 1则是Last Good Version,于是解法如下:

    class Solution:
        def firstBadVersion(self, n) -> int:
            left, right = 1, n
            while left < right:
                mid = left + (right - left) // 2
                if isBadVersion(mid):
                    right = mid
                else:
                    left = mid + 1
            return left
    

    69. Sqrt(x) [Easy]

    Implement int sqrt(int x). Compute and return the square root of x, where x is guaranteed to be a non-negative integer. Since the return type is an integer, the decimal digits are truncated and only the integer part of the result is returned.

    Example:

    Input: 4
    Output: 2
    
    Input: 8
    Output: 2
    

    简单明了的题目,我们需要找的是满足k^2 \le x的最大k值,于是很轻松地就能写出解法:

    def mySqrt(x: int) -> int:
        left, right = 0, x + 1
        while left < right:
            mid = left + (right - left) // 2
            if mid * mid <= x:
                left = mid + 1
            else:
                right = mid
        return left - 1
    

    这里有一点需要注明的是,文章开头我们说「寻找满足条件的最小k值是最具一般化的形式」,而现在却在搜索最大的k值,是否前后矛盾?不矛盾。满足feasible(k) == False的最大k值加上1,便等于满足feasible(k) == True的最小k值。可以这样理解:整个搜索空间分为左右两部分(左边小右边大),左半部分里的所有k都满足k^2 \le x,右半部分里的所有k都满足k^2 \gt x,左半部分的最大值和右半部分的最小值,二者相差1。


    35. Search Insert Position [Easy]

    Given a sorted array and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. You may assume no duplicates in the array.

    Example:

    Input: [1,3,5,6], 5
    Output: 2
    
    Input: [1,3,5,6], 2
    Output: 1
    

    Binary Search很经典的应用。我们需要找的是满足nums[k] >= target的最小k值,于是套用模板就能轻松搞定,无论输入数组中有没有duplicate我们的模板解法都是正确的。另外,在初始化边界的时候要注意,target有可能大于数组中的所有元素,此时我们会将target插入到数组的最后面,效果等同于nums.append(target),所以应该初始化right = len(nums)而不能写right = len(nums) - 1

    class Solution:
        def searchInsert(self, nums: List[int], target: int) -> int:
            left, right = 0, len(nums)
            while left < right:
                mid = left + (right - left) // 2
                if nums[mid] >= target:
                    right = mid
                else:
                    left = mid + 1
            return left
    

    高级应用

    上面的题目之所以简单,是因为它们都已经把数组摆了出来,让我们能够很快就意识到应该使用Binary Search来求解,但更多的时候,我们需要搜索的空间、搜索的目标以及搜索的途径并不是那么地清晰可得,初看题目时甚至可能意识不到可以采用Binary Search来做。对于「什么时候可以使用Binary Search」这个问题,我的回答是,如果能够从题目中挖掘出某种单调性,例如当condition(k)成立时,condition(k+1)也会成立,这个时候可以考虑使用Binary Search。


    1011. Capacity To Ship Packages Within D Days [Medium]

    A conveyor belt has packages that must be shipped from one port to another within D days. The i-th package on the conveyor belt has a weight of weights[i]. Each day, we load the ship with packages on the conveyor belt (in the order given by weights). We may not load more weight than the maximum weight capacity of the ship.

    Return the least weight capacity of the ship that will result in all the packages on the conveyor belt being shipped within D days.

    Example :

    Input: weights = [1,2,3,4,5,6,7,8,9,10], D = 5
    Output: 15
    Explanation: 
    A ship capacity of 15 is the minimum to ship all the packages in 5 days like this:
    1st day: 1, 2, 3, 4, 5
    2nd day: 6, 7
    3rd day: 8
    4th day: 9
    5th day: 10
    
    Note that the cargo must be shipped in the order given, so using a ship of capacity 14 and splitting the packages into parts like (2, 3, 4, 5), (1, 6, 7), (8), (9), (10) is not allowed. 
    

    初看这道题目时可能意识不到可以用Binary Search来做,因为我们很可能会下意识地把weights作为搜索空间,然后无功而返。实际上我们需要搜索的目标是,从诸多可行的capacity中,找到取值最小的那一个。我们可以挖掘出题目中蕴含的单调性——如果运载量为m时可以在D天之内运完所有货物,那么任何比m更大的运载量一定也能顺利运完。接下来我们需要确定搜索空间,capacity至少应该为max(weights),否则传送带无法搭载重量最大的那份货物,capacity至多只需要sum(weights),此时我们可以在一天之内就运送完所有货物。

    我们需要写一个帮助函数feasible,任意输入一个capacity,判断以这个运载量能否顺利完成运货,这个函数以greedy的方式来放置货物,如果当前这个货物放到传送带上后总重量不超过运载量,那么就放置这个货物;否则就把已经放置到传送带上的货物都运走,然后在新一天把这个货物放到传送带上。如果需要的总天数小于等于D天,函数便返回True,否则返回False。我们搜索的最终结果,便是满足feasible(k) == True的最小k值。

    现在我们可以照搬之前的模板了。这一解法的时间复杂度是O(nlog\ S),其中n = len(weights), S = sum(weights)

    def shipWithinDays(weights: List[int], D: int) -> int:
        def feasible(capacity) -> bool:
            days = 1
            total = 0
            for weight in weights:
                total += weight
                if total > capacity:  # 以Greedy的方式来放置货物,如果超过最大运载量,便顺延到下一天
                    total = weight
                    days += 1
                    if days > D:  # 不能在D天之内运完货物,返回False
                        return False
            return True
    
        left, right = max(weights), sum(weights)
        while left < right:
            mid = left + (right - left) // 2
            if feasible(mid):
                right = mid
            else:
                left = mid + 1
        return left
    

    410. Split Array Largest Sum [Hard]

    Given an array which consists of non-negative integers and an integer m, you can split the array into m non-empty continuous subarrays. Write an algorithm to minimize the largest sum among these m subarrays.

    Example:

    Input:
    nums = [7,2,5,10,8]
    m = 2
    
    Output:
    18
    
    Explanation:
    There are four ways to split nums into two subarrays. The best way is to split it into [7,2,5] and [10,8], where the largest sum among the two subarrays is only 18.
    

    仔细观察后便能发现这道题和1011题之间的相似之处。类似地,我们可以写一个帮助函数feasible,输入一个threshold,判断能否将原数组划分为m个子数组,使得每个子数组之和都小于等于threshold。同样地,我们可以发掘出题目的单调性:如果feasible(m) == True,那么所有大于m的输入都能满足feasbile。于是我们照搬模板来解决本题,可以看到,这道题目的代码和题目1011是完全相同的:

    def splitArray(nums: List[int], m: int) -> int:        
        def feasible(threshold) -> bool:
            count = 1
            total = 0
            for num in nums:
                total += num
                if total > threshold:
                    total = num
                    count += 1
                    if count > m:
                        return False
            return True
    
        left, right = max(nums), sum(nums)
        while left < right:
            mid = left + (right - left) // 2
            if feasible(mid):
                right = mid     
            else:
                left = mid + 1
        return left
    

    和题目1011不同之处在于,我们可能会有这样的疑惑——虽然这个解法返回的变量left确实是满足feasible条件的最小值,但是怎么能证明「原数组可以通过某种划分方式拆分成m个子数组,其中最大的子数组之和正好就等于left」呢?或者换一个表达方式,我们知道,将原数组拆分之后,最大子数组之和必定落入闭区间[max(nums), sum(nums)]中,这个闭区间便是我们的搜索空间,这个区间很大,也许对于区间里的某些取值,无论我们如何拆分原数组,都无法得到等于这些取值的子数组之和。也就是说,搜索空间和划分原数组产生的子数组之和,这二者并不是一一对应的。例如nums = [7,2,5,10,8], m = 2,我们可以得到不同划分方式下的最大子数组之和:25:[[7], [2,5,10,8]], 23:[[7,2], [5,10,8]], 18:[[7,2,5], [10,8]], 24:[[7,2,5,10], [8]] , 只有这4个取值,而搜索空间却是[10, 32],此空间中绝大多数的取值都与原数组的子数组之和对应不上。

    证明上面的结论并不困难,只要分析feasible函数即可,函数中的total变量表示当前子数组之和,当它超过threshold时,我们就切换到下一个子数组。设k是使得feasible()成立的最小输入值,那么按照greedy的方式去划分原数组时,所有子数组之和都必定小于等于k。现在我们采用反证法,假设所有子数组之和都小于k,即没有任何一个子数组之和正好等于k,那么在feasible函数运行时,变量total必定始终小于k,最后函数返回True,因此feasible(k - 1)必定也返回True,因为变量total至多只会等于k - 1,永远不会触发条件语句if total > threshold,所以返还结果和feasible(k)是一样的。但是k却是使得feasible函数成立的最小输入值,那么feasible(k - 1)必定返还False,于是出现了矛盾,所以原假设不成立,由此我们得出结论,必定有某个子数组之和正好等于k,这样一来,我们就确保了算法的正确性。


    875. Koko Eating Bananas [Medium]

    Koko loves to eat bananas. There are N piles of bananas, the i-th pile has piles[i] bananas. The guards have gone and will come back in H hours. Koko can decide her bananas-per-hour eating speed of K. Each hour, she chooses some pile of bananas, and eats K bananas from that pile. If the pile has less than K bananas, she eats all of them instead, and won't eat any more bananas during this hour.

    Koko likes to eat slowly, but still wants to finish eating all the bananas before the guards come back. Return the minimum integer K such that she can eat all the bananas within H hours.

    Example :

    Input: piles = [3,6,7,11], H = 8
    Output: 4
    
    Input: piles = [30,11,23,4,20], H = 5
    Output: 30
    
    Input: piles = [30,11,23,4,20], H = 6
    Output: 23
    

    和上面的1011题以及410题是一样的思路,构造一个feasible函数,任意输入一个speed,判断Koko能否在H个小时之内吃完所有香蕉。搜索空间的下界是1,上界是max(piles),因为一次只能吃一个堆里的香蕉。

    def minEatingSpeed(piles: List[int], H: int) -> int:
        def feasible(speed) -> bool:
            # return sum(math.ceil(pile / speed) for pile in piles) <= H  # slower        
            return sum((pile - 1) / speed + 1 for pile in piles) <= H  # faster
    
        left, right = 1, max(piles)
        while left < right:
            mid = left  + (right - left) // 2
            if feasible(mid):
                right = mid
            else:
                left = mid + 1
        return left
    

    1482. Minimum Number of Days to Make m Bouquets [Medium]

    Given an integer array bloomDay, an integer m and an integer k. We need to make m bouquets. To make a bouquet, you need to use k adjacent flowers from the garden. The garden consists of n flowers, the ith flower will bloom in the bloomDay[i] and then can be used in exactly one bouquet. Return the minimum number of days you need to wait to be able to make m bouquets from the garden. If it is impossible to make m bouquets return -1.

    Examples:

    Input: bloomDay = [1,10,3,10,2], m = 3, k = 1
    Output: 3
    Explanation: Let's see what happened in the first three days. x means flower bloomed and _ means flower didn't bloom in the garden.
    We need 3 bouquets each should contain 1 flower.
    After day 1: [x, _, _, _, _]   // we can only make one bouquet.
    After day 2: [x, _, _, _, x]   // we can only make two bouquets.
    After day 3: [x, _, x, _, x]   // we can make 3 bouquets. The answer is 3.
    
    Input: bloomDay = [1,10,3,10,2], m = 3, k = 2
    Output: -1
    Explanation: We need 3 bouquets each has 2 flowers, that means we need 6 flowers. We only have 5 flowers so it is impossible to get the needed bouquets and we return -1.
    

    有了上面三道题目做基础之后,求解这道题应该很轻松了,题目中的单调性很明显:如果等待d天可以顺利制作m个花束的话,那么等待更久一定也可以顺利完成,我们需要搜索的是满足feasible函数的最小输入值。

    def minDays(bloomDay: List[int], m: int, k: int) -> int:
        def feasible(days) -> bool:
            bonquets, flowers = 0, 0
            for bloom in bloomDay:
                if bloom > days:
                    flowers = 0
                else:
                    bonquets += (flowers + 1) // k
                    flowers = (flowers + 1) % k
            return bonquets >= m
    
        if len(bloomDay) < m * k:
            return -1
        left, right = 1, max(bloomDay)
        while left < right:
            mid = left + (right - left) // 2
            if feasible(mid):
                right = mid
            else:
                left = mid + 1
        return left
    

    668. Kth Smallest Number in Multiplication Table [Hard]

    Nearly every one have used the Multiplication Table. But could you find out the k-th smallest number quickly from the multiplication table? Given the height m and the length n of a m * n Multiplication Table, and a positive integer k, you need to return the k-th smallest number in this table.

    乘法表

    Example :

    Input: m = 3, n = 3, k = 5
    Output: 3
    Explanation: 
    The Multiplication Table:
    1   2   3
    2   4   6
    3   6   9
    
    The 5-th smallest number is 3 (1, 2, 2, 3, 3).
    

    对于Kth Smallest类型的题目,最容易想到的解法是用Heap来做,维护一个Min Heap然后取出k次Heap的顶部即可。但在这道题目里不可行,因为我们只有乘法表的长和宽,而没有表格中的具体每个数字,如果用heap来做,我们需要计算出表格中的所有数字并全部保存到Heap里,这一操作的时间复杂度和空间复杂度均为O(mn),是很低效的做法。这个时候就该Binary Search登场了,还记得我们在上面介绍模板的时候提到,构造condition函数是最困难最巧妙的部分,为了从乘法表里找出第k小的数字,我们可以设计一个enough函数,任意给定一个数字num,判断乘法表里是否有k个数值小于等于num,这样一来,满足enough函数的最小输入值便是最终答案。我们说过,Binary Search的关键在于挖掘出单调性,这道题目的单调性在于,如果k满足enough函数,那么任意大于k的输入都能满足,这便是我们使用Binary Search的基础。

    很明显,我们的搜索空间的下界是乘法表的左上角(最小的数字,即1),上界是乘法表的右下角(最大的数字,即mn),于是空间为[1, mn]。Binary Search相比于Heap的优势在于,我们不需要explicitly地计算得到乘法表中的所有数字,我们只需要从搜索空间中挑选出一个数字,判断它是否满足enough函数,然后将整个搜索空间减少一半,直到搜索成功为止,这样一来空间复杂度仅为O(1),远远小于Heap解法。

    接下来我们想想如何高效实现enough函数,观察乘法表可以发现一条最基本的规律:每一行的数字都是这一行的索引的倍数,例如第3行的数字[3,6,9,12,15...]全都是3的倍数,模式很清晰。因此,要统计整个表格里小于等于num的数值有多少个,只需要逐行统计就够了。至此,我们就完成了套用Binary Search模板的所有步骤,这道题目也就完成了,算法的时间复杂度是O(m*log\ (m*n))

    def findKthNumber(m: int, n: int, k: int) -> int:
        def enough(num) -> bool:
            count = 0
            for val in range(1, m + 1):
                add = min(num // val, n)
                if add == 0:
                    break
                count += add
            return count >= k                
    
        left, right = 1, n * m
        while left < right:
            mid = left + (right - left) // 2
            if enough(mid):
                right = mid
            else:
                left = mid + 1
        return left 
    

    在上面的410题里,我们提问「Binary Search返回的结果一定能通过划分原数组为子数组得到吗?」;在这道题里,我们会有类似的困惑「Binary Search返回的结果一定会在乘法表里吗?」,在这道题目的LeetCode Discussion里不少人有相同的疑惑: Why it is guaranteed that the result from binary search is actually in the table?

    答案是肯定的,证明的逻辑和上面410题的证明逻辑相似,观察函数enough内部构造,输入Binary Search最终的返还结果num后函数将返回True。我们同样是使用反证法,假设num在乘法表中不存在,那么num对于所有\forall\ val \in [1, m]都无法整除,即num % val > 0,因此,函数输入num - 1必定也返回True,因为代码add = min(num // val, n)没有任何实际变化。但我们知道num是使得enough函数成立的最小输入值,于是有了矛盾,所以原假设不成立,我们证明了num必定在乘法表中存在。


    719. Find K-th Smallest Pair Distance [Hard]

    Given an integer array, return the k-th smallest distance among all the pairs. The distance of a pair (A, B) is defined as the absolute difference between A and B.

    Example :

    Input:
    nums = [1,3,1]
    k = 1
    Output: 0 
    Explanation:
    Following are all the pairs. The 1st smallest distance pair is (1,1), and its distance is 0.
    (1,3) -> 2
    (1,1) -> 0
    (3,1) -> 2
    

    和上面的668题很相似,都是要找出Kth Smallest,所以我们可以很模式化地设计一个enough函数,任意输入一个distance,判断原数组中是否至少有k个数对的距离小于等于distance。我们可以将原数组从小到大排列,用双指针来线性扫描这个数组,指针同时从左端出发,如果指针当前指向的数对距离小于等于distance,将快指针右移一个单位,显然此时两个指针之间的所有数对的距离都是小于等于distance的;如果当前数对距离大于distance,就将慢指针右移一个单位。当双指针都移动到右端时,扫描结束,统计pair总量是否达到k个。

    def enough(distance) -> bool:  # two pointers
        count, i, j = 0, 0, 0
        while i < n or j < n:
            while j < n and nums[j] - nums[i] <= distance:  # move fast pointer
                j += 1
            count += j - i - 1  # count pairs
            i += 1  # move slow pointer
        return count >= k
    

    显然,我们的搜索空间是[0, max(nums) - min(nums)]。现在,我们可以套用模板了,这一算法的时间复杂度是O(nlog\ S ), 其中Smax(nums) - min(nums)

    def smallestDistancePair(nums: List[int], k: int) -> int:
        nums.sort()
        n = len(nums)
        left, right = 0, nums[-1] - nums[0]
        while left < right:
            mid = left + (right - left) // 2
            if enough(mid):
                right = mid
            else:
                left = mid + 1
        return left
    

    1201. Ugly Number III [Medium]

    Write a program to find the n-th ugly number. Ugly numbers are positive integers which are divisible by a or b or c.

    Example :

    Input: n = 3, a = 2, b = 3, c = 5
    Output: 4
    Explanation: The ugly numbers are 2, 3, 4, 5, 6, 8, 9, 10... The 3rd is 4.
    
    Input: n = 4, a = 2, b = 3, c = 4
    Output: 6
    Explanation: The ugly numbers are 2, 3, 4, 6, 8, 9, 10, 12... The 4th is 6.
    

    这道题和上面的题目大同小异,找出第n小的ugly number,我们构造一个enough函数,任意输入一个值num,判断是否存在n个ugly number小于等于num。注意到,一个数字只要能被a,b,c中的任意一个整除就满足ugly number定义,而a, b, c三者可能存在倍数关系,所以我们需要借助最大公约数来避免统计duplicate的数字。

    def nthUglyNumber(n: int, a: int, b: int, c: int) -> int:
        def enough(num) -> bool:
            total = mid//a + mid//b + mid//c - mid//ab - mid//ac - mid//bc + mid//abc
            return total >= n
    
        ab = a * b // math.gcd(a, b)
        ac = a * c // math.gcd(a, c)
        bc = b * c // math.gcd(b, c)
        abc = a * bc // math.gcd(a, bc)
        left, right = 1, 10 ** 10
        while left < right:
            mid = left + (right - left) // 2
            if enough(mid):
                right = mid
            else:
                left = mid + 1
        return left
    

    1283. Find the Smallest Divisor Given a Threshold [Medium]

    Given an array of integers nums and an integer threshold, we will choose a positive integer divisor and divide all the array by it and sum the result of the division. Find the smallest divisor such that the result mentioned above is less than or equal to threshold.

    Each result of division is rounded to the nearest integer greater than or equal to that element. (For example: 7/3 = 3 and 10/2 = 5). It is guaranteed that there will be an answer.

    Example :

    Input: nums = [1,2,5,9], threshold = 6
    Output: 5
    Explanation: We can get a sum to 17 (1+2+5+9) if the divisor is 1. 
    If the divisor is 4 we can get a sum to 7 (1+1+2+3) and if the divisor is 5 the sum will be 5 (1+1+1+2). 
    

    已经讲解了这么多道题目,做这道题就像砍瓜切菜一样简单了,甚至都不需要费脑去构思condition函数,因为题目已经明白无误地告诉我们需要满足的条件是什么了。

    def smallestDivisor(nums: List[int], threshold: int) -> int:
        def condition(divisor) -> bool:
            return sum((num - 1) // divisor + 1 for num in nums) <= threshold
    
        left, right = 1, max(nums)
        while left < right:
            mid = left + (right - left) // 2
            if condition(mid):
                right = mid
            else:
                left = mid + 1
        return left
    

    最后

    可以看到,上面每一道题目的解法用的都是同一套模板,没有例外,表明了这套模板的强大和普适性。但是我们还是需要通过大量练习来培养Binary Search的手感,培养自己挖掘题目中的单调性和构造condition函数的能力——这个函数的好坏决定了算法最终的复杂度。

    参考资料

    相关文章

      网友评论

          本文标题:Binary Search模板总结与LeetCode经典范题

          本文链接:https://www.haomeiwen.com/subject/hhdjrktx.html