Kth Quantiles. Θ(nlgk)

作者: R0b1n_L33 | 来源:发表于2018-05-14 21:48 被阅读67次

The kth quantiles of an n-element set are the k-1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O(nlgk)-time algorithm to list the kth quantiles of a set.

I did a lot of jobs to figure out what the hell the question is actually talking about. In fact, there would have been k groups fragments in an n-size set/list which are divided by k-1 order statistics named quantiles after evaluating one's algorithm. At first, the set/list is unsorted and it can be hardly unsorted after execution.

So hereafter I make up 2 functions - k_quantiles_sorted_n(which costs Θ(n)) & k_quantiles_sorted_lgk(which takes Θ(lgk)) - to solve the already-sorted-array problems. In these cases, I just work out an array of indices which are on behave of the positions of dividers.

In the end, I take out the original problem with function k_quantiles_unsorted_nlgk and make it by recursion which consumes Θ(nlgk). As you'll see below, the gross problem is separated into 2 subroutines, each of n(1-1/k)/2 to n/2 scale.

This works homogeneous with the same size of group when the number of elements (aka n) is ak+k−1 for a positive integer a. Once it were a different number, the interval among groups would be heterogeneous. some additive care with rounding needs to be taken in order to avoid creating two segments that differ by more than 1.

The inherent consumption takes place in the select function which takes up to Θ(n) time. The base case is that the subproblem is divided into a scale of T(n/k) with a item of Θ(1) so that you could immediately return. The recursion here is

T(n) <= 2T(n/2) + Θ(n) <= 4T(n/4) + 2Θ(n) <= 8T(n/8) + 3Θ(n) ...
<= kT(n/k) + lgkΘ(n) = Θ(nlgk)

It would be nice to illustrate by a binary recursion tree with a depth of lgk and leaves of T(n/k). As we could see, the last subproblem scale determines the log part in final Θ(nlgk), which reflected by the depth of the recursion tree. Furthermore, we could find that if the last problem scale is Θ(1), that is n/n, then the consumption goes up to Θ(nlgn), which means the groups count will be the last divisor. The less groups we divide at last, the less log part we spend.

i.e. Note that the slice of array in python is returning a new array(aka slice for copy) which doesn't correspond to Swift. I've spent a lot of time getting rid of the startIndex & endIndex parameters using slice of array. But it all went in vain as a result since we cannot regard the slice as a reference of original array partially. Probably it'd take effect with memoryview or something which is quite annoying.

As we used to say, unit tests run first.
Test Driven Code:

import unittest

class TestCase(unittest.TestCase):

    def setUp(self):
        self.maxDiff = None

    def test_k_quantiles_sorted(self):
        for j in range(0, 100):
            n = randint(10, 20)
            k = randint(0, n+1)
            # l = k_quantiles_sorted_n(n, k, 0)
            l = k_quantiles_sorted_lgk(n, k)
            count = len(l)
            self.assertGreaterEqual(count, 0)
            if count == 0:
                continue
            sentinel_1 = l[0]
            sentinel_2 = -1
            self.assertLess(sentinel_1, n) 
            for i in range(1, count+1):
                scope = 0
                if i == count:
                    scope = n - l[i-1] - 1
                else:
                    self.assertLess(l[i], n) 
                    scope = l[i] - l[i-1] - 1
                difference = abs(sentinel_1 - scope)
                self.assertLessEqual(difference, 1)
                if difference == 0:
                    continue
                else:
                    if sentinel_2 < 0:
                        sentinel_2 = difference
                    else:
                        self.assertEqual(sentinel_2, difference)

    def test_k_quantiles_unsorted_nlgk(self):
        for j in range(0, 100):
            n = randint(50, 100)
            k = randint(0, n+1)
            l = list(range(0, n))
            quantiles = k_quantiles_unsorted_nlgk(l, 0, len(l)-1, k)
            count = len(quantiles)
            self.assertGreaterEqual(count, 0)
            if count == 0:
                continue
            dividor = k - 1
            remain = n - dividor 
            base = remain // k
            additive = base + 1
            additiveGroup = remain % k
            baseGroup = k - additiveGroup
            current = -1
            quantiles.append(n)
            for i in range(0, len(quantiles)):
                scope = quantiles[i]-current-1
                current = quantiles[i]
                self.assertTrue(scope==base or scope==additive)

                if scope == base:
                    baseGroup -= 1
                elif scope == additive:
                    additiveGroup -= 1
            self.assertTrue(baseGroup==0 and additiveGroup==0)


if __name__ == '__main__':
    unittest.main()

Then it goes with source code:

def k_quantiles_sorted_n(n:int, k:int, base:int) -> []:
    if k <= 1 or k > n+1:
        return[]
    remainder = k % 2
    if remainder:
        midst = n // k
        remains = n-midst-2
        n_left = remains // 2
        n_right = remains - n_left
        k_child = (k-1) // 2
        left = base + n_left
        right = base + n - n_right - 1
        return k_quantiles_sorted_n(n_left, k_child, base) + \
        [left, right] + k_quantiles_sorted_n(n_right, k_child, right+1)
    else:
        k_child = k // 2
        n_left = n // 2
        n_right = n - n_left - 1
        halver = base + n_left
        return k_quantiles_sorted_n(n_left, k_child, base) + \
        [halver] + k_quantiles_sorted_n(n_right, k_child, halver+1)

def k_quantiles_sorted_lgk(n:int, k:int) -> []:
    if k < 2 or k > n+1:
        return[]
    dividor = k - 1
    remain = n - dividor 
    base = remain // k
    additiveGroup = remain % k
    additiveDividor = additiveGroup - 1
    baseGroup = k - additiveGroup
    baseDividor = baseGroup - 1
    if additiveGroup > 0:
        baseDividor += 1
    quantiles = []
    current = -1
    for i in range(0, baseDividor):
        current += base+1
        quantiles.append(current)
    for j in range(0, additiveDividor):
        current += base+2
        quantiles.append(current)
    return quantiles

def k_quantiles_unsorted_nlgk(array:[], startIndex:int, endIndex:int, k:int) -> []:
    n = endIndex - startIndex + 1
    if k < 2 or k > n+1:
        return[]
    k_child = k // 2
    remainder = k % 2
    if remainder:
        midstCount = n // k
        remainCount = n - midstCount - 2
        leftCount = remainCount // 2
        rightCount = remainCount - leftCount
        leftIndex = startIndex + leftCount
        rightIndex = leftIndex + midstCount + 1
        localLeftIndex = leftCount
        localRightIndex = midstCount
        left = select(array, startIndex, endIndex, localLeftIndex)
        right = select(array, leftIndex+1, endIndex, localRightIndex)
        return k_quantiles_unsorted_nlgk(array, startIndex, leftIndex-1, k_child) + \
        [left, right] + k_quantiles_unsorted_nlgk(array, rightIndex+1, endIndex, k_child)
    else:
        leftCount = n // 2
        halverIndex = startIndex + leftCount
        localHalverIndex = leftCount
        halver = select(array, startIndex, endIndex, localHalverIndex)
        return k_quantiles_unsorted_nlgk(array, startIndex, halverIndex-1, k_child) + \
        [halver] + k_quantiles_unsorted_nlgk(array, halverIndex+1, endIndex, k_child)

网友评论

本文标题：Kth Quantiles. Θ(nlgk)

本文链接：https://www.haomeiwen.com/subject/ctzldftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Kth Quantiles. Θ(nlgk)

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读