295 Find Median from Data Stream

题目

输入一个数据流，求这个数据流的中位数median。中位数的意思就是对数据进行排序，如果总数是奇数个，那么就是最中间的数，如果是偶数个，那么就是中间两个数的平均值。

例子

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3)
findMedian() -> 2

方法

Simple Sorting

最直接的方法就是有序的存住目前为止的数，用一个PriorityQueue可以实现。
查找的时候，扫描PriorityQueue取中间值。
时间复杂度：O(NlogN)
空间复杂度：O(N)

Insertion Sort

当一个序列是有序的，新加入一个数，如果还想保持有序，可以用Insertion Sort。
Insertion Sort需要我们找到新数需要插入的位置，我们可以用Binary Search实现（时间复杂度：O(logN)）。
查找的时候，在有序序列中取中间值。
时间复杂度：O(logN)+O(N) = O(N)
空间复杂度：O(N)
在Insertion Sort的实现中，如果用LinkedList，那么插入可为O(1)，查找位置是O(N)。如果用ArrayList，那么插入可为O(N) （因为需要把插入处的元素往后移动），查找位置是O(logN)。所以总的时间复杂度为O(N)。

[Best] Two Heaps
前两种方法都保证了整体序列有序，但其实我们只想要Median，不一定要整体有序。

维护两个PriorityQueue：一个存小数maxHeap，一个存大数minHeap，并且尽量维持左右PriorityQueue的size大小相同，或者小的那个size比大的大1。
当新的元素x进来，比较maxHeap里面的最大值，minHeap里面的最小值，和x。再次平衡两个heap。
时间复杂度：O(logN)
空间复杂度：O(N)

Multiset and Two Pointers
方法4本质上和3是一样的。只不过是用了Self-balancing Binary Search Trees（比如AVL Tree）去实现。
实现起来有点复杂，在这里就不展开了。个人觉得方法3简单容易理解。

时间复杂度：O(logN)
空间复杂度：O(N)

Others
Buckets 如果stream里面的数是 statistically distributed，那么很容易通过bucket去获得中位数。一旦知道了正确的bucket，sort这个bucket就能知道中位数。如果bucket大小 << stream的大小，就大大节省了时间。
Reservoir Sampling（蓄水池）. Following along the lines of using buckets: if the stream is statistically distributed, you can rely on Reservoir Sampling. Basically, if you could maintain just one good bucket (or reservoir) which could hold a representative sample of the entire stream, you could estimate the median of the entire stream from just this one bucket. This means good time and memory performance. Reservoir Sampling lets you do just that. Determining a "good" size for your reservoir? Now, that's a whole other challenge. A good explanation for this can be found in this StackOverflow answer.
Segment Trees 如果想对有限范围内的数据进行大量的读取或者大量的插入操作，Segment Tree是个很好的数据结构。They allow us to do all such operations fast and in roughly the same amount of time, always. 缺点是不容易实现（写起来麻烦）。参考 introductory article on Segment Trees。
Order Statistic Trees are data structures which seem to be tailor-made for this problem. They have all the nice features of a BST, but also let you find the kth order element stored in the tree. 缺点是不容易实现（写起来麻烦），面试时几乎不会考到。但如果这个结构已经提供，用起来会很有趣。