Leetcode #4. Median of Two Sorte

作者: Omega_Ariston | 来源:发表于2017-12-02 13:10 被阅读29次

Description

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Example 1

nums1 = [1, 3]

nums2 = [2]

The median is 2.0

Example 2

nums1 = [1, 2]

nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5

初始分析: O(m+n)

首先，分析题目要求，该函数：输入参数为两个有序数组，输出要求为这两个数组内所有数字集的中位数。

*额外要求：算法的时间复杂度小于等于 O(log (m+n)).

先撇开额外要求不谈，碰到两个有序数组的合并，我首先想到的是归并排序(Merge Sort)的思想。归并排序本身就是不断递归地分割原始数组再进行子数组排序，最后合并这些有序子数组。而此处我只需要做有序数组的合并。


class Solution {
    public double findMedianSortedArrays(int[] nums1, int[] nums2) {
        int[] sorted = mergeSort(nums1, nums2);
        if((sorted.length&1)==1){
            return sorted[sorted.length/2];
        }else{
            return (sorted[sorted.length/2 - 1] + sorted[sorted.length/2])/2.0;
        }
    }
    
    public int[] mergeSort(int[] a, int[] b){
        int[] output = new int[a.length + b.length];
        int lt = 0;
        int rt = 0;
        int index = 0;
        while(lt<a.length && rt<b.length){
            if(a[lt]<b[rt]){
                output[index++] = a[lt++];
            }else{
                output[index++] = b[rt++];
            }
        }
        if(lt!=a.length){
            System.arraycopy(a, lt, output, lt+rt, a.length-lt);
        }else{
            System.arraycopy(b, rt, output, lt+rt, b.length-rt);
        }
        return output;
    }
}

基于这个想法的代码大概长这样(有可以改进的地方欢迎提出)。
时间复杂度为O(m+n)，能被AC，打败了40%的java算法。

深度分析: O(log(min(m,n)))，思路参考LeetCode网友MissMary

首先理解中位数的代数意义：

中位数, 统计学中的专有名词，代表一个样本、种群或概率分布中的一个数值，其可将数值集合划分为长度相等的上下两部分。

关键在于后半句：将数值集合划分为长度相等的上下两部分
除此之外，对于划分后的两组数字，其中一组内的数字总是大于另一组内的数字。

举个例子，先把第一个数组（称为数组A）在随机位置 i 处分割开：

左半边                    |  右半边
A[0], A[1],....,A[i-1]    |  A[i], A[i+1],...,A[m-1]

同样地，在随机位置 j 处将数组B分割开：

左半边                    |  右半边
B[0], B[1],....,B[j-1]    |  B[j], B[j+1],...,B[n-1]

将A、B数组的左半边放到一起，右半边放到一起，可得：

左半边                    |  右半边
A[0], A[1],....,A[i-1]    |  A[i], A[i+1],...,A[m-1]
B[0], B[1],....,B[j-1]    |  B[j], B[j+1],...,B[n-1]

如果这样的分组能满足下面两个条件：

1. len(左半边)==len(右半边)
2. max(左半边)<=min(右半边)

就说明我们就成功的将{A,B}中的所有元素分割成了等长的两部分，且左半边的数字总小于右半边的数字。
这时，中位数可以很轻易地通过 Median = (max(左半边)+min(右半边))/2 计算出来。

为了满足这两个条件，我们只需要保证：

1). 左右两部分等长：
    i + j == m - i + n - j (或者: m - i + n - j + 1)
    假设 n >= m, 只需要让: i = 0 ~ m, j = (m + n + 1)/2 - i
2). B[j-1] <= A[i] and A[i-1] <= B[j]

为什么需要假设n>=m？因为如果m>n，则B数组的下标索引 j 有可能变成负数。
对于 i 和 j 等于0的边缘情况，留到后面讨论
所以，简而言之，程序需要做的仅仅是：

在[0, m]中寻找合适的分割点 i , 使得:
    B[j-1] <= A[i] and A[i-1] <= B[j], ( 其中 j = (m + n + 1)/2 - i )

这个过程可以通过二分查找法提高效率：

<1> 设 imin = 0, imax = m, 然后在 [imin, imax] 中进行查找

<2> 设 i = (imin + imax)/2, j = (m + n + 1)/2 - i

<3> 接下来有三种情况:
    <a> B[j-1] <= A[i] and A[i-1] <= B[j]
        //找到了合适的分割点 i， 停止搜索
    <b> B[j-1] > A[i]
        //A[i] 的值偏小. 需要调整 i 使得 `B[j-1] <= A[i]`.
        //此处应该增大 i 值，因为当 i 值增大时, j 值会减小.
        //因此B[j-1]会随A[i]的增大而减小, 更容易满足 `B[j-1] <= A[i]`
        //所以应将搜索范围调整为 [i+1, imax]. 即: 使 imin = i+1, 然后回到步骤 <2>.
    <c> A[i-1] > B[j]
       //与前一种情况的处理方式相反，即: 使 imax = i-1，然后回到步骤<2>.

当找到合适的分割点 i 后，易知中位数为：

当 m + n 为奇数：max(A[i-1], B[j-1])
当 m + n 为偶数：(max(A[i-1], B[j-1]) + min(A[i], B[j]))/2

再来考虑边缘情况，即：当 i=0,i=m,j=0,j=n 时 A[i-1],B[j-1],A[i],B[j] 有可能不存在的情况：
其实这些情况非常容易分析，因为我们所要满足的仅仅是 max(左半边) <= min(右半边) 这个条件，假设A[i-1],B[j-1],A[i],B[j]都存在，则该条件可表示为 B[j-1] <= A[i] and A[i-1] <= B[j]。如果有些值不存在的话则完全不用去判断，比如我们假设 i=0，则 A[i-1]不存在，我们便不用判断 A[i-1] <= B[j]这个条件，因为此时数组A不存在左半边。所以，考虑边缘情况后，搜索思路大致为：

在 [0, m] 中寻找合适的分割点 i， 使得:
    (j == 0 or i == m or B[j-1] <= A[i]) and
    (i == 0 or j == n or A[i-1] <= B[j])，
    其中 j = (m + n + 1)/2 - i

在搜索的过程中，会出现下面三种情况：

<a> (j == 0 or i == m or B[j-1] <= A[i]) and
    (i == 0 or j = n or A[i-1] <= B[j])
    //找到合适的 i 值，停止搜索

<b> j > 0 and i < m and B[j - 1] > A[i]
    // i 值偏小，需要增加 i 值

<c> i > 0 and j < n and A[i - 1] > B[j]
    // i 值偏大，需要减小 i 值

其中<b>和<c>的判断条件可以进一步简化为

<b> i < m and B[j - 1] > A[i]
<c> i > 0 and A[i - 1] > B[j]

因为当 i < m时，j一定大于0，且 i>0 时，j一定小于n，推导过程为：

m <= n, i < m ==> j = (m+n+1)/2 - i > (m+n+1)/2 - m >= (2*m+1)/2 - m >= 0    
m <= n, i > 0 ==> j = (m+n+1)/2 - i < (m+n+1)/2 <= (2*n+1)/2 <= n

基于上述的思路的Java代码如下：

class Solution {
    public double findMedianSortedArrays(int[] nums1, int[] nums2) 
    {
        int m = nums1.length;
        int n = nums2.length;
        if(m>n)
        {
            int[] tmp = nums1;
            nums1 = nums2;
            nums2 = tmp;
            m=m^n;
            n=m^n;
            m=m^n;
        } //交换m与n的值，以及nums1和nums2的引用
        int imin = 0;
        int imax = m;
        int half = (m+n+1)>>1;
        
        while(imin<=imax)
        {
            int i = (imin + imax)>>1;
            int j = half - i;
            
            if(i<m && nums2[j-1]>nums1[i])
                imin = ++i; //i值偏小
            else if(i>0 && nums1[i-1]>nums2[j])
                imax = --i; //i值偏大
            else //完美情况
            {
                int max_left = 0;
                
                if(i==0)
                    max_left = nums2[j-1];
                else if(j==0)
                    max_left = nums1[i-1];
                else
                    max_left = Math.max(nums1[i-1], nums2[j-1]);
                
                if(((m+n)&1)==1)
                    return max_left; //m+n为奇数的情况中位数为max_left
                
                int min_right = 0;
                
                if(i==m)
                    min_right = nums2[j];
                else if(j==n)
                    min_right = nums1[i];
                else
                    min_right = Math.min(nums1[i], nums2[j]);
                return (max_left+min_right)/2.0;  //m+n为偶数的情况中位数为(max_left+min_right)/2
            }
        }
        return -1;
    }
}

上述算法的时间复杂度为O(log(min(m,n))), 代码部分有可以改进的地方欢迎提出。

网友评论

本文标题：Leetcode #4. Median of Two Sorte

本文链接：https://www.haomeiwen.com/subject/okhnbxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！