- Implement insert/delete/lookup in O(1) average for both strategies
- Understand resizing: O(n) but amortized O(1) per operation
- Handle iteration and contains operations efficiently
Hash Table Operations
You know hash tables give O(1) average. But what about the details? Insert, delete, lookup, iteration, resizing.
Let's implement them properly.
Insert
Add key-value pair. If key exists, update value.
Chaining:
def insert(self, key, value):
index = self.hash(key)
# Check if key exists
for i, (k, v) in enumerate(self.table[index]):
if k == key:
self.table[index][i] = (key, value)
return
# Key doesn't exist
self.table[index].append((key, value))
self.count += 1
# Check load factor
if self.count / self.size > 0.75:
self.resize()
O(1) average if lists are short.
Open addressing:
def insert(self, key, value):
if self.count / self.size > 0.7:
self.resize()
index = self.hash(key)
while self.keys[index] is not None and self.keys[index] != key:
index = (index + 1) % self.size # Linear probing
if self.keys[index] is None:
self.count += 1
self.keys[index] = key
self.values[index] = value
O(1) average with low load factor.
Lookup
Find value for a key.
Chaining:
def get(self, key):
index = self.hash(key)
for k, v in self.table[index]:
if k == key:
return v
raise KeyError(key)
Open addressing:
def get(self, key):
index = self.hash(key)
while self.keys[index] is not None:
if self.keys[index] == key:
return self.values[index]
index = (index + 1) % self.size
raise KeyError(key)
Both O(1) average.
Delete
Remove key-value pair.
Chaining:
def delete(self, key):
index = self.hash(key)
for i, (k, v) in enumerate(self.table[index]):
if k == key:
del self.table[index][i]
self.count -= 1
return
raise KeyError(key)
Simple. Just remove from list.
Open addressing (with tombstones):
DELETED = object()
def delete(self, key):
index = self.hash(key)
while self.keys[index] is not None:
if self.keys[index] == key:
self.keys[index] = DELETED
self.values[index] = None
self.count -= 1
return
index = (index + 1) % self.size
raise KeyError(key)
Mark as deleted. Probing continues past tombstones.
Iteration
Iterate over all entries.
Chaining:
def items(self):
for bucket in self.table:
for key, value in bucket:
yield (key, value)
Visit each bucket, each entry. O(n + size).
Open addressing:
def items(self):
for i in range(self.size):
if self.keys[i] is not None and self.keys[i] is not DELETED:
yield (self.keys[i], self.values[i])
Scan array, skip empty/deleted slots. O(size).
Contains
Check if key exists.
def contains(self, key):
try:
self.get(key)
return True
except KeyError:
return False
Or implement directly for efficiency.
Resizing
When load factor exceeds threshold, resize.
def resize(self):
old_table = self.table
self.size *= 2
self.table = [[] for _ in range(self.size)]
self.count = 0
for bucket in old_table:
for key, value in bucket:
self.insert(key, value)
Create new larger table. Rehash all entries. O(n).
Amortized: resizing happens rarely (when size doubles). Cost spreads over many insertions. Amortized O(1) per insert.
Clear
Remove all entries.
def clear(self):
self.table = [[] for _ in range(self.size)]
self.count = 0
O(size) to create new buckets.
Example: Complete Hash Table with Chaining
class HashTable:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)]
self.count = 0
def hash(self, key):
return hash(key) % self.size
def insert(self, key, value):
index = self.hash(key)
for i, (k, v) in enumerate(self.table[index]):
if k == key:
self.table[index][i] = (key, value)
return
self.table[index].append((key, value))
self.count += 1
if self.count / self.size > 0.75:
self.resize()
def get(self, key):
index = self.hash(key)
for k, v in self.table[index]:
if k == key:
return v
raise KeyError(key)
def delete(self, key):
index = self.hash(key)
for i, (k, v) in enumerate(self.table[index]):
if k == key:
del self.table[index][i]
self.count -= 1
return
raise KeyError(key)
def resize(self):
old_table = self.table
self.size *= 2
self.table = [[] for _ in range(self.size)]
self.count = 0
for bucket in old_table:
for key, value in bucket:
self.insert(key, value)
# Usage
ht = HashTable()
ht.insert("name", "Alice")
ht.insert("age", 30)
print(ht.get("name")) # Alice
ht.delete("age")
Common Mistakes
Forgetting to update existing keys:
# Wrong: always appends
self.table[index].append((key, value))
# Right: check first
for i, (k, v) in enumerate(self.table[index]):
if k == key:
self.table[index][i] = (key, value)
return
self.table[index].append((key, value))
Breaking probe sequences in open addressing:
# Wrong: just set to None
self.keys[index] = None # Breaks search for later keys!
# Right: use tombstone
self.keys[index] = DELETED # Probing continues
Not resizing:
Load factor too high → performance degrades. Monitor and resize.
Performance Summary
| Operation | Chaining | Open Addressing |
|---|---|---|
| Insert | O(1) average | O(1) average |
| Lookup | O(1) average | O(1) average |
| Delete | O(1) average | O(1) average |
| Iteration | O(n + size) | O(size) |
| Resize | O(n) | O(n) |
| Space | Extra for pointers | No extra (but lower load) |
All assume good hash function and proper load factor management.
The Pattern
Hash table operations:
- Insert/Delete/Lookup: O(1) average
- Iteration: O(n + size)
- Resize: O(n), amortized O(1)
Understanding operations means knowing:
- When to resize (load factor thresholds)
- How collisions affect performance
- Tombstones in open addressing
Master these, and you'll implement hash tables from scratch confidently.
