foreach-ui logo
codeLanguages
account_treeDSA

Quick Actions

quizlock Random Quiz
trending_uplock Progress
  • 1
  • 2
  • 3
  • 4
  • 5
  • quiz
Data Structures
  • Compare chaining (linked lists) vs open addressing (probing) strategies
  • Understand linear probing clustering and quadratic/double hashing improvements
  • Learn tombstone technique for deletion in open addressing

Collision Resolution

Collisions are inevitable. Different keys will hash to the same index.

How you handle collisions determines your hash table's performance.

Two main strategies: Chaining and Open Addressing.

Chaining (Separate Chaining)

Each array slot holds a linked list (or other structure) of entries.

class HashTableChaining:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]
    
    def hash(self, key):
        return hash(key) % self.size
    
    def insert(self, key, value):
        index = self.hash(key)
        
        # Check if key already exists, update value
        for i, (k, v) in enumerate(self.table[index]):
            if k == key:
                self.table[index][i] = (key, value)
                return
        
        # Key doesn't exist, append
        self.table[index].append((key, value))
    
    def get(self, key):
        index = self.hash(key)
        
        for k, v in self.table[index]:
            if k == key:
                return v
        
        raise KeyError(key)
    
    def delete(self, key):
        index = self.hash(key)
        
        for i, (k, v) in enumerate(self.table[index]):
            if k == key:
                del self.table[index][i]
                return
        
        raise KeyError(key)

Example:

Array:
[0]: []
[1]: []
[2]: []
[3]: [("Alice", "555-1234"), ("Bob", "555-5678")]  # Collision!
[4]: []
...

Both "Alice" and "Bob" hash to index 3. Stored in same list.

Lookup: Hash key, traverse list at that index. Average O(1) if lists are short. Worst O(n) if everything collides to one index.

Pros:

  • Simple to implement
  • Handles high load factors well
  • Never "fills up"

Cons:

  • Extra memory for pointers
  • Cache unfriendly (linked list)

Open Addressing

Store everything in the array itself. No separate structures.

When collision occurs, find another empty slot.

Linear Probing

Try next slot, then next, until you find empty.

class HashTableLinearProbing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size
    
    def hash(self, key):
        return hash(key) % self.size
    
    def insert(self, key, value):
        index = self.hash(key)
        
        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.values[index] = value  # Update
                return
            index = (index + 1) % self.size  # Next slot
        
        self.keys[index] = key
        self.values[index] = value
    
    def get(self, key):
        index = self.hash(key)
        
        while self.keys[index] is not None:
            if self.keys[index] == key:
                return self.values[index]
            index = (index + 1) % self.size
        
        raise KeyError(key)
    
    def delete(self, key):
        index = self.hash(key)
        
        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.keys[index] = None
                self.values[index] = None
                # Need to rehash following entries!
                return
            index = (index + 1) % self.size
        
        raise KeyError(key)

Example:

Insert "Alice" (hash 3), "Bob" (hash 3), "Charlie" (hash 4)

Array after:
[0]: None
[1]: None
[2]: None
[3]: Alice
[4]: Bob    # Collision, moved to next slot
[5]: Charlie  # Collision, moved to next slot
...

Clustering: Consecutive filled slots. Makes future insertions slower (longer probing sequences).

Pros:

  • No extra memory for pointers
  • Cache friendly (array access)

Cons:

  • Primary clustering
  • Deletion is tricky (can break probe sequences)
  • Requires low load factor (< 0.7)

Quadratic Probing

Instead of checking next slot (i+1), check i+1², i+2², i+3², ...

def quadratic_probe(self, key):
    index = self.hash(key)
    i = 0
    
    while self.keys[index] is not None:
        if self.keys[index] == key:
            return index
        i += 1
        index = (self.hash(key) + i * i) % self.size
    
    return index

Reduces primary clustering. But can create secondary clustering (keys with same hash probe same sequence).

Double Hashing

Use second hash function for probe step.

def double_hash(self, key):
    index = self.hash(key)
    step = self.hash2(key)  # Second hash function
    i = 0
    
    while self.keys[index] is not None:
        if self.keys[index] == key:
            return index
        index = (index + i * step) % self.size
        i += 1
    
    return index

def hash2(self, key):
    # Must not return 0
    return 7 - (hash(key) % 7)

Different keys probe different sequences. Minimal clustering.

Deletion in Open Addressing

Problem: Deleting creates hole. Breaks probe sequences.

Solution 1: Tombstones

Mark deleted slots as "deleted", not "empty". Probing continues past tombstones.

DELETED = object()  # Unique marker

def delete(self, key):
    index = self.hash(key)
    
    while self.keys[index] is not None:
        if self.keys[index] == key:
            self.keys[index] = DELETED
            self.values[index] = None
            return
        index = (index + 1) % self.size

Downside: Tombstones accumulate, slow down lookups. Eventually need to rebuild table.

Solution 2: Rehashing

After deletion, rehash all following entries in cluster.

Complex but avoids tombstones.

Chaining vs Open Addressing

Feature Chaining Open Addressing
Collision handling Linked lists Probe sequence
Memory Extra for pointers No extra (but lower load factor)
Cache performance Poor (pointers) Good (array)
Load factor Can exceed 1 Must stay < 0.7-0.8
Deletion Easy Tricky (tombstones)
When to use High load factor, many collisions Low load factor, cache important

Python's dict uses open addressing (with custom probing). Java's HashMap uses chaining.

Resizing

When load factor gets too high, resize.

Process:

  1. Create new larger array (usually double size)
  2. Rehash all entries into new array
  3. Replace old array with new
def resize(self):
    old_keys = self.keys
    old_values = self.values
    
    self.size *= 2
    self.keys = [None] * self.size
    self.values = [None] * self.size
    
    for key, value in zip(old_keys, old_values):
        if key is not None and key is not DELETED:
            self.insert(key, value)

O(n) operation. But amortized over many insertions, it's O(1) per operation.

The Choice

Collisions are unavoidable. Your choice:

  • Chaining: Simple, handles high load, but pointers cost memory
  • Open Addressing: Cache-friendly, no pointers, but needs low load factor

Understanding collision resolution means understanding the trade-offs and when each strategy shines.

Master this, and you'll design hash tables for your specific needs.

© 2026 forEach. All rights reserved.

Privacy Policy•Terms of Service