美文网首页
Kth Quantiles. Θ(nlgk)

Kth Quantiles. Θ(nlgk)

作者: R0b1n_L33 | 来源:发表于2018-05-14 21:48 被阅读67次

    The kth quantiles of an n-element set are the k-1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O(nlgk)-time algorithm to list the kth quantiles of a set.

    I did a lot of jobs to figure out what the hell the question is actually talking about. In fact, there would have been k groups fragments in an n-size set/list which are divided by k-1 order statistics named quantiles after evaluating one's algorithm. At first, the set/list is unsorted and it can be hardly unsorted after execution.

    So hereafter I make up 2 functions - k_quantiles_sorted_n(which costs Θ(n)) & k_quantiles_sorted_lgk(which takes Θ(lgk)) - to solve the already-sorted-array problems. In these cases, I just work out an array of indices which are on behave of the positions of dividers.

    In the end, I take out the original problem with function k_quantiles_unsorted_nlgk and make it by recursion which consumes Θ(nlgk). As you'll see below, the gross problem is separated into 2 subroutines, each of n(1-1/k)/2 to n/2 scale.

    This works homogeneous with the same size of group when the number of elements (aka n) is ak+k−1 for a positive integer a. Once it were a different number, the interval among groups would be heterogeneous. some additive care with rounding needs to be taken in order to avoid creating two segments that differ by more than 1.

    The inherent consumption takes place in the select function which takes up to Θ(n) time. The base case is that the subproblem is divided into a scale of T(n/k) with a item of Θ(1) so that you could immediately return. The recursion here is

    T(n) <= 2T(n/2) + Θ(n) <= 4T(n/4) + 2Θ(n) <= 8T(n/8) + 3Θ(n) ...
    <= kT(n/k) + lgkΘ(n) = Θ(nlgk)

    It would be nice to illustrate by a binary recursion tree with a depth of lgk and leaves of T(n/k). As we could see, the last subproblem scale determines the log part in final Θ(nlgk), which reflected by the depth of the recursion tree. Furthermore, we could find that if the last problem scale is Θ(1), that is n/n, then the consumption goes up to Θ(nlgn), which means the groups count will be the last divisor. The less groups we divide at last, the less log part we spend.

    i.e. Note that the slice of array in python is returning a new array(aka slice for copy) which doesn't correspond to Swift. I've spent a lot of time getting rid of the startIndex & endIndex parameters using slice of array. But it all went in vain as a result since we cannot regard the slice as a reference of original array partially. Probably it'd take effect with memoryview or something which is quite annoying.

    As we used to say, unit tests run first.
    Test Driven Code:

    import unittest
    
    class TestCase(unittest.TestCase):
    
        def setUp(self):
            self.maxDiff = None
    
        def test_k_quantiles_sorted(self):
            for j in range(0, 100):
                n = randint(10, 20)
                k = randint(0, n+1)
                # l = k_quantiles_sorted_n(n, k, 0)
                l = k_quantiles_sorted_lgk(n, k)
                count = len(l)
                self.assertGreaterEqual(count, 0)
                if count == 0:
                    continue
                sentinel_1 = l[0]
                sentinel_2 = -1
                self.assertLess(sentinel_1, n) 
                for i in range(1, count+1):
                    scope = 0
                    if i == count:
                        scope = n - l[i-1] - 1
                    else:
                        self.assertLess(l[i], n) 
                        scope = l[i] - l[i-1] - 1
                    difference = abs(sentinel_1 - scope)
                    self.assertLessEqual(difference, 1)
                    if difference == 0:
                        continue
                    else:
                        if sentinel_2 < 0:
                            sentinel_2 = difference
                        else:
                            self.assertEqual(sentinel_2, difference)
    
        def test_k_quantiles_unsorted_nlgk(self):
            for j in range(0, 100):
                n = randint(50, 100)
                k = randint(0, n+1)
                l = list(range(0, n))
                quantiles = k_quantiles_unsorted_nlgk(l, 0, len(l)-1, k)
                count = len(quantiles)
                self.assertGreaterEqual(count, 0)
                if count == 0:
                    continue
                dividor = k - 1
                remain = n - dividor 
                base = remain // k
                additive = base + 1
                additiveGroup = remain % k
                baseGroup = k - additiveGroup
                current = -1
                quantiles.append(n)
                for i in range(0, len(quantiles)):
                    scope = quantiles[i]-current-1
                    current = quantiles[i]
                    self.assertTrue(scope==base or scope==additive)
    
                    if scope == base:
                        baseGroup -= 1
                    elif scope == additive:
                        additiveGroup -= 1
                self.assertTrue(baseGroup==0 and additiveGroup==0)
    
    
    if __name__ == '__main__':
        unittest.main()
    

    Then it goes with source code:

    def k_quantiles_sorted_n(n:int, k:int, base:int) -> []:
        if k <= 1 or k > n+1:
            return[]
        remainder = k % 2
        if remainder:
            midst = n // k
            remains = n-midst-2
            n_left = remains // 2
            n_right = remains - n_left
            k_child = (k-1) // 2
            left = base + n_left
            right = base + n - n_right - 1
            return k_quantiles_sorted_n(n_left, k_child, base) + \
            [left, right] + k_quantiles_sorted_n(n_right, k_child, right+1)
        else:
            k_child = k // 2
            n_left = n // 2
            n_right = n - n_left - 1
            halver = base + n_left
            return k_quantiles_sorted_n(n_left, k_child, base) + \
            [halver] + k_quantiles_sorted_n(n_right, k_child, halver+1)
    
    def k_quantiles_sorted_lgk(n:int, k:int) -> []:
        if k < 2 or k > n+1:
            return[]
        dividor = k - 1
        remain = n - dividor 
        base = remain // k
        additiveGroup = remain % k
        additiveDividor = additiveGroup - 1
        baseGroup = k - additiveGroup
        baseDividor = baseGroup - 1
        if additiveGroup > 0:
            baseDividor += 1
        quantiles = []
        current = -1
        for i in range(0, baseDividor):
            current += base+1
            quantiles.append(current)
        for j in range(0, additiveDividor):
            current += base+2
            quantiles.append(current)
        return quantiles
    
    def k_quantiles_unsorted_nlgk(array:[], startIndex:int, endIndex:int, k:int) -> []:
        n = endIndex - startIndex + 1
        if k < 2 or k > n+1:
            return[]
        k_child = k // 2
        remainder = k % 2
        if remainder:
            midstCount = n // k
            remainCount = n - midstCount - 2
            leftCount = remainCount // 2
            rightCount = remainCount - leftCount
            leftIndex = startIndex + leftCount
            rightIndex = leftIndex + midstCount + 1
            localLeftIndex = leftCount
            localRightIndex = midstCount
            left = select(array, startIndex, endIndex, localLeftIndex)
            right = select(array, leftIndex+1, endIndex, localRightIndex)
            return k_quantiles_unsorted_nlgk(array, startIndex, leftIndex-1, k_child) + \
            [left, right] + k_quantiles_unsorted_nlgk(array, rightIndex+1, endIndex, k_child)
        else:
            leftCount = n // 2
            halverIndex = startIndex + leftCount
            localHalverIndex = leftCount
            halver = select(array, startIndex, endIndex, localHalverIndex)
            return k_quantiles_unsorted_nlgk(array, startIndex, halverIndex-1, k_child) + \
            [halver] + k_quantiles_unsorted_nlgk(array, halverIndex+1, endIndex, k_child)
    

    相关文章

      网友评论

          本文标题:Kth Quantiles. Θ(nlgk)

          本文链接:https://www.haomeiwen.com/subject/ctzldftx.html