Heap

Table of content

Basic concept

  1. Heap is data structure organized by a binary tree in which the value of a parent node is greater than its children (max heap).

  2. Heap in Python is implemented with the heapq module including the following functionalities

    1. heapq.heappush(heap, item): push an item into heap
    2. heapq.heappop(heap): pop an item from heap
    3. heapq.pushpop(heap, item): push an item into heap and pop an item from heap
    4. heapq.heapify(list): transform a list into heap
    5. heapq.heapreplace(heap, tiem): pop an item from heap and push an new item from heap
  3. Time complexity of a binary heap

    1. Insertion: O(logN)
    2. Deletion of min: O(logN)
    3. Remove: O(logN)
    4. findMin: O(1)
  4. Time Complexity compare to other data structures

    StructureInsertionDeletion of MinRemovefindMinsort
    heapO(logN)O(logN)O(logN)O(1)O(NlogN)
    orderedarrayO(N)O(1)O(1)O(N)O(N)
    unordered arrayO(1)O(N)O(1)O(N)O(NlogN)
    BSTO(logN)O(logN)O(logN)O(logN)O(N)

Related coding exercises

Kth Largest Element in an Array

  1. Kth Largest Element in an Array problem is defined as to find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.
  2. The problem can be approached with a heap data structure where
    1. First to construct a heap array with all numbers in O(nlogn) time
    2. Then pop k times, each pop operation uses O(1) time
  3. An example Python solution using heap and internal sort is given as the following
class Solution:
    # @param {integer[]} nums
    # @param {integer} k
    # @return {integer}
    def findKthLargest(self, nums, k):
        '''
        '''
        #return solution_sort(nums,k)
        return solution_heap(nums,k)
import heapq
def solution_heap(nums,k):
    h = []
    for num in nums:
        h.append(-num)
    heapq.heapify(h)
    print h 
    res = []
    for i in xrange(k):
        res = (heapq.heappop(h))
        print res
    return -res
    pass
def solution_sort(nums,k):
    '''
    use build-in sort
    '''
    if len(nums) == 0: return None
    nums.sort()
    return nums[len(nums)-k]
    pass

Sliding Window Maximum

  1. Sliding Window Maximum problem is defined as given an array nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position.
  2. heap approach: add element into a heap one by another, in each step
    1. Pop an element from heap until the index is within the k different from the current index
    2. The solution roughly takes O(nlogn) time
  3. ‘deque’ approach:
    1. Maintain a deque structure
    2. In each iteration
      1. if the deque is empty, add current number into the head of deque
      2. pop the element from the tail of the queue when its index is out of range
  4. An example Python solution is given as the following
class Solution:
    # @param {integer[]} nums
    # @param {integer} k
    # @return {integer[]}
    def maxSlidingWindow(self, nums, k):
        return solution(nums,k)
def solution(nums,k):
    '''
    29
    '''
    if len(nums) == 0: return nums
    h = []
    res = []
    for i in xrange(k):
        heapq.heappush( h, (-nums[i],i) )
    (val,i) = heapq.heappop(h)
    res.append(-val)
    heapq.heappush( h, (-nums[i],i) )
    print res
    for j in xrange(k,len(nums)):
        heapq.heappush( h, (-nums[j],j) )
        (val,i) = heapq.heappop(h)
        while i<j-k+1:
            (val,i) = heapq.heappop(h)
        res.append(-val)
        heapq.heappush( h,(-nums[i],i) )
    return res

Merge k Sorted Lists

  1. Merge k Sorted Lists problem is to merge k sorted linked lists and return it as one sorted list.
  2. The problem can be solved by maintaining a heap with $k$ elements, and in each step
    1. pop an element from the heap
    2. add one element from the corresponding list into heap
  3. An example Python solution is given as the following
# Definition for singly-linked list.
# class ListNode:
#     def __init__(self, x):
#         self.val = x
#         self.next = None
class Solution:
    # @param {ListNode[]} lists
    # @return {ListNode}
    def mergeKLists(self, lists):
        return solution(lists)
def solution(lists):
    if len(lists) == 0: return None
    if len(lists) == 1: return lists[0] 
    res = []
    h = []
    for i in xrange(len(lists)):
        if lists[i] != None:
            heapq.heappush(h, (lists[i].val,i) )
            lists[i] = lists[i].next
    while len(h)>0:
        (val,i) = heapq.heappop(h)
        res.append(val)
        if lists[i] != None:
            heapq.heappush(h, (lists[i].val,i))
            lists[i] = lists[i].next
    if len(res) == 0: return None
    print res
    head = ListNode(res[0])
    p = head
    for i in xrange(1,len(res)):
        tmp = ListNode(res[i])
        p.next = tmp
        p=p.next
    return head

The Skyline Problem

  1. The goal of The Skyline Problem is to find a contour formed by a collection of buildings.
  2. The problem can also be solve with a heap structure in O(nlogn) time where n is the number of buildings from the input.
  3. In particular, the solution will keep a active building together with a heap of buildings and in each iteration
    1. it takes in a new building
    2. if the new building is overlap with the active building, the algorithm will either put it into the heap if it is not as high as the active one or use it as a active building and push the previous active building into the heap
    3. if the new building is not overlap with the active building, the algorithm will work out the contour of the building in the heap.
  4. An example Python solution is given as the following
class Solution:
    # @param {integer[][]} buildings
    # @return {integer[][]}
    def getSkyline(self, buildings):
        if len(buildings) == 0: return []
        res = []
        heap = []
        activebuilding = ( buildings[0][2],buildings[0][0],buildings[0][1] )
        res.append( [activebuilding[1],activebuilding[0]] )
        i=0
        while i<len(buildings)-1:
            i+=1
            currentbuilding = (buildings[i][2],buildings[i][0],buildings[i][1])
            if len(buildings) < 10: print i,heap,res,activebuilding,currentbuilding
            if currentbuilding[1] > activebuilding[2]:
                # current start on the right
                # process current heap
                if len(heap) == 0:
                    res.append( [activebuilding[2],0] )
                    activebuilding = currentbuilding
                    res.append( [activebuilding[1],activebuilding[0]] )
                else:
                    while len(heap) > 0:
                        (a,b,c) = heapq.heappop(heap)
                        newactivebuilding = (-a,b,c)
                        if newactivebuilding[2] > activebuilding[2]:
                            break
                    if newactivebuilding == None:
                        res.append( [activebuilding[2],0] )
                        activebuilding = currentbuilding
                        res.append( [activebuilding[1],activebuilding[0]] )
                    else:
                        res.append( [activebuilding[2],newactivebuilding[0]] )
                        activebuilding = newactivebuilding
                        i-=1
            elif currentbuilding[1] < activebuilding[2]:
                # current start on the left
                if currentbuilding[0]>activebuilding[0]:
                    heapq.heappush(heap,(-activebuilding[0],activebuilding[1],activebuilding[2]))
                    activebuilding = currentbuilding
                    print '--', activebuilding
                    res.append([activebuilding[1],activebuilding[0]])
                else:
                    heapq.heappush(heap,(-currentbuilding[0],currentbuilding[1],currentbuilding[2]))
            elif currentbuilding[1] == activebuilding[2]:
                # current start on the same position
                if currentbuilding[0] >= activebuilding[0]:
                    activebuilding = currentbuilding
                else:
                    heapq.heappush(heap,(-currentbuilding[0],currentbuilding[1],currentbuilding[2]))
                    (a,b,c) = heapq.heappop(heap)
                    currentbuilding = (-a,b,c)
                    res.append([activebuilding[2],currentbuilding[0]])
                    activebuilding = currentbuilding
        if len(buildings) < 10: print 'heap', heap,res
        # no more element
        if len(heap) == 0:
            res.append( [activebuilding[2],0] )
        while len(heap) > 0:
            if len(buildings) < 10:print heap
            (a,b,c) = heapq.heappop(heap)
            currentbuilding = (-a,b,c)
            while len(heap) >0 and currentbuilding[2] <= activebuilding[2]:
                (a,b,c) = heapq.heappop(heap)
                currentbuilding = (-a,b,c)
            if currentbuilding[2] > activebuilding[2]:
                res.append( [activebuilding[2],currentbuilding[0]] )
                activebuilding = currentbuilding
        res.append([activebuilding[2],0])
        # process res
        newres = []
        for i in xrange(len(res)-1):
            if res[i][0] != res[i+1][0]:
                newres.append(res[i])
        newres.append(res[len(res)-1])
        print 'newres',newres
        return newres