- Understand O(1) average lookup/insert/delete with hash function mapping
- See why collisions happen and need resolution strategies
- Learn load factor and resizing to maintain performance
Introduction to Hash Tables
You're building a contacts app. Thousands of contacts. When user searches for "Alice", you need instant results.
Arrays? O(n) linear search. Too slow.
Sorted array with binary search? O(log n). Better, but still not instant. Plus, inserting is O(n).
Hash tables? O(1) average. Instant lookup, instant insert, instant delete. That's the promise.
What's a Hash Table?
A data structure that maps keys to values using a hash function.
Components:
- Keys: What you search with (e.g., "Alice")
- Values: What you want to store (e.g., phone number)
- Hash function: Converts key to array index
- Array: Stores the values
# Simplified concept
hash_table = [None] * 10
key = "Alice"
index = hash(key) % 10 # Let's say this gives 3
hash_table[3] = "555-1234"
# Later, lookup
index = hash("Alice") % 10 # Still 3
phone = hash_table[3] # "555-1234"
Hash function turns "Alice" into a number. Modulo maps it to array index. Store value there.
O(1). Direct access. No searching.
Hash Function Basics
Takes a key, outputs an integer.
Requirements:
- Deterministic: Same key → same hash always
- Fast: O(1) to compute
- Uniform distribution: Spread keys evenly across array
Example for strings (simplified):
def hash_string(s, array_size):
hash_val = 0
for char in s:
hash_val = (hash_val * 31 + ord(char)) % array_size
return hash_val
Multiply by prime (31), add character code, modulo array size.
For "Alice": Process 'A', 'l', 'i', 'c', 'e'. Get some number. Modulo 10 → index in [0, 9].
The Problem: Collisions
Different keys can hash to the same index.
hash("Alice") % 10 = 3
hash("Bob") % 10 = 3
Both map to index 3. Collision.
You can't just overwrite Alice's value with Bob's. You need collision resolution.
Two main strategies:
- Chaining: Each array slot holds a list of entries
- Open addressing: Find another empty slot
We'll cover these in detail later.
Python's dict
Python's built-in dict is a hash table.
contacts = {}
contacts["Alice"] = "555-1234"
contacts["Bob"] = "555-5678"
print(contacts["Alice"]) # "555-1234"
print("Charlie" in contacts) # False
del contacts["Bob"]
O(1) average for insert, delete, lookup. Python handles collisions, resizing, everything.
When Hash Tables Shine
Fast lookups: Need to check if element exists? O(1).
Frequency counting: Count word occurrences.
def count_words(text):
counts = {}
for word in text.split():
counts[word] = counts.get(word, 0) + 1
return counts
O(n). One pass. Each lookup/insert is O(1) average.
Deduplication: Remove duplicates.
def remove_duplicates(arr):
seen = {}
result = []
for item in arr:
if item not in seen:
seen[item] = True
result.append(item)
return result
O(n). Check membership in O(1).
Caching: Store expensive computation results.
cache = {}
def expensive_function(x):
if x in cache:
return cache[x]
result = compute(x) # Expensive
cache[x] = result
return result
First call: compute and cache. Subsequent calls: O(1) lookup.
Arrays vs Hash Tables
| Operation | Array (unsorted) | Hash Table |
|---|---|---|
| Insert | O(1) (at end) | O(1) average |
| Delete | O(n) (find + shift) | O(1) average |
| Search | O(n) | O(1) average |
| Order | Preserved | Not preserved |
| Memory | Exact size | Extra for collisions |
Hash tables trade memory for speed. Worth it for lookup-heavy workloads.
Load Factor
Load factor = number of entries / array size
Example: 7 entries, array size 10 → load factor 0.7
High load factor → more collisions → slower operations.
Typical strategy: when load factor > 0.75, resize array (double size, rehash all entries).
Python's dict resizes automatically. You don't see it, but it happens.
Why "Average" O(1)?
Worst case can be O(n) if all keys hash to same index (all collide).
But with a good hash function and proper resizing, collisions are rare. Average case is O(1).
That's why we say "average O(1)", not "guaranteed O(1)".
Example: Two Sum
Given array, find two numbers that sum to target.
Brute force: Nested loops. O(n²).
Hash table: One pass.
def two_sum(nums, target):
seen = {}
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return None
print(two_sum([2, 7, 11, 15], 9)) # [0, 1] (2 + 7 = 9)
For each number, check if complement exists in hash table. O(1) lookup. Total: O(n).
Example: First Non-Repeating Character
Find first character that appears only once.
def first_unique_char(s):
counts = {}
# Count occurrences
for char in s:
counts[char] = counts.get(char, 0) + 1
# Find first with count 1
for char in s:
if counts[char] == 1:
return char
return None
print(first_unique_char("leetcode")) # 'l'
print(first_unique_char("aabb")) # None
Two passes, both O(n). Hash table gives O(1) lookups.
The Power
Hash tables give you O(1) average for the big three:
- Insert
- Delete
- Search
Understanding hash tables means understanding:
- How hash functions work
- Why collisions happen
- Trade-offs: speed vs memory
Master hash tables, and you'll solve problems faster. Literally.
