Problem Statement in English

You’re supposed to find the median of a data stream in constant time. Your data structure should support the following operations:

  • addNum(int num): Add an integer num from the data stream to the data structure.
  • findMedian(): Return the median of all elements so far.

Approach

We can solve this problem using 2 heaps.

The idea is that we maintain a sorted view, except that we can only view the greatest element from the left half and the smallest element from the right half at any time.

One heap will be a max heap and the other heap will be a min heap.

The max heap will store the left half of the data stream and the min heap will store the right half of the data stream.

From the max heap, we can get the maximum element of the left half of the data stream. From the min heap, we can get the minimum element of the right half of the data stream.

The median will be the average of the maximum element of the left half of the data stream and the minimum element of the right half of the data stream, in case of even number of elements.

In case of odd number of elements, the median will be the maximum element of the left half of the data stream. Since we ensure that the left half will always have 1 more element than the right half in case of odd number of elements.


Solution in Python


from heapq import heappush, heappop

class MedianFinder:

    def __init__(self):
        self.maxheap, self.minheap = [], []

    def addNum(self, num: int) -> None:
        # Add to max heap
        heappush(self.maxheap, -num)

        # Balance max heap to min heap
        # this ensures that when we have an odd number of elements,
        # the min heap has 1 more element than the max heap.
        # 
        # In the next step we balance out the state in case this was a bad addition
        heappush(self.minheap, -heappop(self.maxheap))

        # if the min heap has more than 1 + half the elements, 
        # return one element to max heap
        if len(self.minheap) > len(self.maxheap) + 1:
            heappush(self.maxheap, -heappop(self.minheap))
        

    def findMedian(self) -> float:
        # If both heaps are of equal length, 
        # return the average of the two middle elements
        if len(self.maxheap) == len(self.minheap):
            return (-self.maxheap[0] + self.minheap[0]) / 2
        # else return the middle element from the min heap
        # since we ensure that the min heap has 1 more element
        # than the max heap in case of odd number of elements
        return self.minheap[0]
        

Complexity

  • Time: $O(1)$
    We have constant time complexity for the findMedian operation, and $O(log(n))$ for the addNum operation.

  • Space: $O(n)$
    Since we are storing all the elements in the heaps, the space complexity is $O(n)$.


Mistakes I Made

I had to look this one up :(


And we are done.