Hash Tables: O(1) Average Lookup Explained
Understand how hash tables work internally — hash functions, collision resolution, load factors — and master hash map patterns for solving problems efficiently.
The Power of O(1) Lookup
Hash tables are arguably the most important data structure in practical programming. They provide average O(1) insertion, deletion, and lookup by mapping keys to array indices through a hash function. Python's dict and set, Java's HashMap, and JavaScript's Object/Map all use hash tables internally.
How Hashing Works
A hash function converts a key into an integer index. Good hash functions distribute keys uniformly across the table. When two keys hash to the same index (collision), we resolve it via chaining (linked lists at each bucket) or open addressing (probing for the next empty slot). Python uses open addressing with perturbation.
Average case is O(1) for all operations, but worst case is O(n) when all keys collide. In practice, with a good hash function and load factor < 0.75, collisions are rare.
Pattern: Frequency Counting
The most common hash map pattern: count occurrences of elements. This solves problems like finding duplicates, majority elements, anagram detection, and top-k frequent elements.
from collections import Counter
def top_k_frequent(nums: list[int], k: int) -> list[int]:
"""Return the k most frequent elements."""
count = Counter(nums)
# Bucket sort: index = frequency, value = list of nums with that freq
buckets = [[] for _ in range(len(nums) + 1)]
for num, freq in count.items():
buckets[freq].append(num)
result = []
for i in range(len(buckets) - 1, 0, -1):
for num in buckets[i]:
result.append(num)
if len(result) == k:
return result
return result
def group_anagrams(strs: list[str]) -> list[list[str]]:
"""Group strings that are anagrams of each other."""
groups: dict[str, list[str]] = {}
for s in strs:
key = "".join(sorted(s))
groups.setdefault(key, []).append(s)
return list(groups.values())Pattern: Two Sum with Hash Map
The classic two sum problem: find two numbers that add to a target. A hash map stores seen values, turning O(n²) brute force into O(n). This pattern extends to 3Sum, 4Sum, and subarray sum problems.
def two_sum(nums: list[int], target: int) -> list[int]:
"""Find indices of two numbers that sum to target."""
seen = {} # value -> index
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []Pattern: Hash Set for Existence Checks
When you only care about existence (not counts), use a set. Sets provide O(1) membership testing. Classic applications: detecting duplicates, finding intersections, and longest consecutive sequence.
def longest_consecutive(nums: list[int]) -> int:
"""Find the length of the longest consecutive sequence."""
num_set = set(nums)
max_length = 0
for num in num_set:
# Only start counting from the beginning of a sequence
if num - 1 not in num_set:
length = 1
while num + length in num_set:
length += 1
max_length = max(max_length, length)
return max_lengthWhen choosing between sorting (O(n log n)) and hashing (O(n) time, O(n) space): if memory is tight, sort in-place. If speed matters most, use a hash map. In interviews, discuss both trade-offs.