Heaps & Priority Queues: Top-K, Median, and Scheduling
Learn how heaps maintain partial order for efficient min/max operations. Master the patterns: top-k elements, running median, merge k sorted lists, and task scheduling.
How Heaps Work
A binary heap is a complete binary tree where every parent is smaller (min-heap) or larger (max-heap) than its children. It's stored as an array where parent of index i is at (i-1)//2, and children are at 2i+1 and 2i+2. This gives O(log n) push/pop and O(1) peek at the min/max.
Python's heapq module implements a min-heap. For a max-heap, negate values: push -val and negate when popping. heapq.heappush() and heapq.heappop() are O(log n). heapq.heapify() is O(n).
Pattern: Top-K Elements
To find the k largest elements, maintain a min-heap of size k. For each new element, if it's larger than the heap's minimum, replace it. At the end, the heap contains the k largest. This runs in O(n log k) — better than O(n log n) sorting when k << n.
import heapq
def find_k_largest(nums: list[int], k: int) -> list[int]:
"""Find k largest elements using a min-heap of size k."""
heap = nums[:k]
heapq.heapify(heap)
for num in nums[k:]:
if num > heap[0]:
heapq.heapreplace(heap, num) # Pop min, push new
return sorted(heap, reverse=True)
def kth_largest_element(nums: list[int], k: int) -> int:
"""Find the kth largest element."""
# Min-heap of size k — root is the kth largest
heap = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap)
return heap[0]Pattern: Merge K Sorted Lists
When merging k sorted sequences, a heap of size k tracks the smallest available element from each list. Pop the smallest, advance that list's pointer, and push the next element. This runs in O(n log k) where n is the total number of elements.
def merge_k_sorted(lists: list[list[int]]) -> list[int]:
"""Merge k sorted lists into one sorted list."""
heap = []
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst[0], i, 0))
result = []
while heap:
val, list_idx, elem_idx = heapq.heappop(heap)
result.append(val)
if elem_idx + 1 < len(lists[list_idx]):
next_val = lists[list_idx][elem_idx + 1]
heapq.heappush(heap, (next_val, list_idx, elem_idx + 1))
return resultPattern: Running Median (Two Heaps)
Maintain a max-heap for the lower half and a min-heap for the upper half. The median is always at the top of one or both heaps. Balance the heaps so their sizes differ by at most 1. Each insertion is O(log n).
class MedianFinder:
def __init__(self):
self.lo = [] # Max-heap (negate values) — lower half
self.hi = [] # Min-heap — upper half
def add_num(self, num: int):
heapq.heappush(self.lo, -num)
# Ensure max of lower <= min of upper
heapq.heappush(self.hi, -heapq.heappop(self.lo))
# Balance sizes: lo can have at most 1 more than hi
if len(self.hi) > len(self.lo):
heapq.heappush(self.lo, -heapq.heappop(self.hi))
def find_median(self) -> float:
if len(self.lo) > len(self.hi):
return -self.lo[0]
return (-self.lo[0] + self.hi[0]) / 2The two-heap pattern also applies to scheduling problems and any scenario where you need to track a partition point in a stream of data.