美文网首页
4. Median of Two Sorted Arrays

4. Median of Two Sorted Arrays

作者: CharlieGuo | 来源:发表于2017-07-14 21:10 被阅读0次

    Description:

    There are two sorted arrays nums1 and nums2 of size m and n respectively.
    Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

    Example 1:

    nums1 = [1, 3]
    nums2 = [2]
    The median is 2.0
    

    Example 2:

    nums1 = [1, 2]
    nums2 = [3, 4]
    The median is (2 + 3)/2 = 2.5
    

    Solution:

    What is a median?

    "The middle number in a given sequence of numbers, taken as the average of the two middle numbers when the sequence has an even number of numbers:
    4 is the median of 1, 3, 4, 8, 9." (found in dictionary.com)
    If a sequence is 1, 3, 4, 5, 8, 9, it's median is (4+5)/2=4.5.

    So we must sort the sequence if it is unordered to find the median.

    First let's see how we can find the median of one array and then we deal with two arrays.

    Find the median of one sorted array

    Let's say that "even case" is that the length of the array is even and "odd case" is that the sum length of the array is odd.
    In even case, the median is the average value of the middle 2 elements and in odd case, it is just the value of the middle element.

    • even case
      Since the array is sorted, we just need to find the middle 2 elements. Preparing for the 2 arrays problem, we can split the array into 2 equal length parts. This way, all the elements in the left part are no greater than right. For example, if array x has 6 elements, it can be split as:
                     left       right
    x    x[0] x[1] x[2]       x[3] x[4] x[5]
    

    Then the median is (x[2] + x[3])/2. So in even case, if an array's length is n, the median is (x[n/2]+x[n/2+1])/2.
    Meanwhile, x[n/2] is the first element of the right subset, and x[n/2-1] is the last element of the left subset.

    • odd case
      In odd case, the array cannot be split into 2 length-equal subsets. Say that array x has 7 elements, it can be split as:
                     left       right
    x    x[0] x[1] x[2]       x[3] x[4] x[5] x[6]
    

    But all the elements in the left part is still no greater than right. In this case, the median is x[3]. And clearly we know x[3] is the first element of the right subset.
    However, we can just use x[3] = x[7/2] to get the median. So in odd case, if an array's length is n, the median is x[n/2].
    Meanwhile, x[n/2] is the first element of the right subset and x[n/2-1] is the last element of the left subset.

    Find the median of 2 sorted array

    Let's say the "even case" is that the sum length of the arrays is even and the same meaning with the "odd case".
    From the above we know that if we can split an array into 2 subsets that is equal or "nearly equal", the median can be easily gotten. In the same way, we can split each array into 2 subsets.
    We assume the arrays are x and y. After split, x and y are as follows:

                   left   right
    x    x[0]... x[i-1]   x[i]... x[n-1]
    y    y[0]... y[j-1]   y[j]... y[m-1]
    

    To find the median, there are 2 conditions to be met.

    1. the sum length of the left is equal or "nearly equal" to the right.
    2. all the elements in the left subset are no greater than the right

    This way we can get the median by comparing or calculating with x[i-1], x[i], y[j-1], y[j]. We'll talk about it later.

    To meet the first condition, we calculate the sum length of the subsets is i+j, the right n-i+m-j. If left length is equal to right, then i+j = n-i+m-j.
    So i can be expressed as

    i = (m+n)/2 - j
    

    But how do we split the 2 arrays. At first we just let

    lo = 0;
    hi = m;
    j = (lo + hi)/2 = m/2;
    

    and then we can get i through that equation. One more thing to note is that n >= m is a must. This is because:

    i = (m+n)/2 - j = (m+n)/2 - m/2 = (n-m)/2
    

    If n < m, i will be negative. This will cause ArrayIndexOutOfBoundaryException. To deal with this issue, we can just change x and y and call the function again if n < m. That is :

    if (n < m) return f(y, x);
    

    To meet the second condition, we let xl = x[i-1], xh = [xi], yl = y[j-1], yh = y[j]. Since xl < xh and yl < yh are definitely correct. We just need
    xl < yh and yl < xh. If these two cannot be met, we need to use binary search to adjust the search range. We woulddiscuss these cases in 3 parts:

    1. if xl < yh and yl < xh
      This means the median is found. To find the median, however, the method of the even case is a little different from the odd case.
    • even case
      We have to find the 2 middle elements and calculate the average. That is, we have to find the maximum element in the left subset and the minimum elements in the right subset.
      As the arrays are sorted and split, this can be down by following steps:
    leftMax = max(xl, yl);
    minRight = min(xh, yh);
    median = (leftMax + minRight)/2;
    
    • odd case
      As discussed in the one sorted array case, the middle element is the median, which is the same in the two sorted arrays. Now, the key becomes how can we know which element is the middle element? Can we make sure that the middle element is the one in [xl, yl, xh, yr]? More discussion is needed.
      Calling that at first j = m/2 and i = (m+n)/2 - j, we can use two example to make it simple.
      • x's length is odd and y's length is even
        That is n is odd and m is even. So m%2 = 0 and y can be perfectly split into 2 length-equal subset. However, i = (m+n)/2 - j = (m+n)/2 - m/2 = n/2. Since n is odd, so n/2 actually means (n-1)/2, which would lead the right subset of y is longer than the left (for example, n = 5, then n/2 = (n-1)/2 = 2/2 = 2, so the left subset is [x[0], x[1]] and the right is [x[2], x[3], x[4]]). The split result can be shown as following:
                                        left    right
    x   x[0] ... x[(n-1)/2-1](x[i-1])    (x[i])x[(n-1)/2] ... x[n-1]
    y   y[0] ... y[m/2-1](y[j-1])        (y[j])y[m/2] ... y[m-1]
               \_________________________/      \________________________/
                 left_length = (m+n-1)/2         right_length = (m+n+1)/2
    

    Then right_length - left_length = 1. As discussed above, the middle element is the minimum element of the right subset. So,

    median = min(xh, yh)
    
    - `x`'s length is odd and `y`'s length is even
    

    I'll skip because the result is also

    median = min(xh, yh)
    

    You can do it yourself.

    1. if xl > yh
      It means that y[j] is too small, we need to split y in a way which make s the right subset shorter and left subset longer so that y[j] can be big enough, in the same way i will decrease and x[i-1] will be smaller. How can we do that?
      We can use binary search method. Since j = (lo + hi)/2, let lo = j+1, then j would be bigger and y[j] is too.

    2. if yl > xh
      It means that y[j-1] is too big, in contrast we set hi = j-1 and calculate j again to make y[j-1] smaller.

    By updating lo and hi and then checking if xl < yh and yl < xh are met. We can finally find the median.

    More discussion: boundary condition

    From the above we know j and i can be change if some condition can not be met. Meanwhile, xl = x[i-1], xh = x[i], yl = y[j-1], yh = y[j]. Here comes the problem: what if i = 0 or n, j = 0 or m ? This would make x[-1], x[n], y[-1], y[m] senseless.
    A little trick could handle this. We just need to extend both x and y, which means x[-1] = y[-1] = Integer.MIN_VALUE and x[n] = y[m] = Integer.MAX_VALUE. The sample code is:

    xl = (i == 0 ? Integer.MIN_VALUE : x[i-1]);
    xh = (i == n ? Integer.MAX_VALUE : x[i]);
    yl = (j == 0 ? Integer.MIN_VALUE : y[j-1]);
    yh = (j == m ? Integer.MAX_VALUE : y[j]);
    

    At last, an accepted code is posted:

    public class Solution {
        public double findMedianSortedArrays(int[] x, int[] y) {
            int n = x.length, m = y.length;
    
            // must note that 'return' is needed or the programme 
            // would continue if n < m and this will cause the 
            // programme to fail
            if (n < m) return findMedianSortedArrays(y, x);
            
            // set the initial lo and hi
            int lo = 0, hi = m;
            while (lo <= hi) {
                int j = (lo+hi)/2, i = (m+n)/2-j; // i+j == n-i+m-j
                
                // note that how to deal with the boundary condition
                int xl = (i == 0 ? Integer.MIN_VALUE : x[i-1]);
                int xh = (i == n ? Integer.MAX_VALUE : x[i]);
                int yl = (j == 0 ? Integer.MIN_VALUE : y[j-1]);
                int yh = (j == m ? Integer.MAX_VALUE : y[j]);
                
                if (xl > yh) lo = j+1; // y[j] too small, j need to be greater 
                else if (yl > xh) hi = j-1; // y[j-1] too big, j need to be smaller
                else {
                    if ((m+n)%2 == 1) return xh < yh ? xh : yh; // odd case
                    int leftMax = xl > yl ? xl : yl;
                    int rightMin = xh < yh ? xh : yh;
                    return (double)(leftMax + rightMin)/2; // even case
                }
            }
            return -1; // actually never execute
        }
    }
    

    相关文章

      网友评论

          本文标题:4. Median of Two Sorted Arrays

          本文链接:https://www.haomeiwen.com/subject/veenhxtx.html