Introduction to Hash Tables

You're building a contacts app. Thousands of contacts. When user searches for "Alice", you need instant results.

Arrays? O(n) linear search. Too slow.

Sorted array with binary search? O(log n). Better, but still not instant. Plus, inserting is O(n).

Hash tables? O(1) average. Instant lookup, instant insert, instant delete. That's the promise.

What's a Hash Table?

A data structure that maps keys to values using a hash function.

Components:

Keys: What you search with (e.g., "Alice")
Values: What you want to store (e.g., phone number)
Hash function: Converts key to array index
Array: Stores the values

# Simplified concept
hash_table = [None] * 10

key = "Alice"
index = hash(key) % 10  # Let's say this gives 3
hash_table[3] = "555-1234"

# Later, lookup
index = hash("Alice") % 10  # Still 3
phone = hash_table[3]  # "555-1234"

Hash function turns "Alice" into a number. Modulo maps it to array index. Store value there.

O(1). Direct access. No searching.

Hash Function Basics

Takes a key, outputs an integer.

Requirements:

Deterministic: Same key → same hash always
Fast: O(1) to compute
Uniform distribution: Spread keys evenly across array

Example for strings (simplified):

def hash_string(s, array_size):
    hash_val = 0
    for char in s:
        hash_val = (hash_val * 31 + ord(char)) % array_size
    return hash_val

Multiply by prime (31), add character code, modulo array size.

For "Alice": Process 'A', 'l', 'i', 'c', 'e'. Get some number. Modulo 10 → index in [0, 9].

The Problem: Collisions

Different keys can hash to the same index.

hash("Alice") % 10 = 3
hash("Bob") % 10 = 3

Both map to index 3. Collision.

You can't just overwrite Alice's value with Bob's. You need collision resolution.

Two main strategies:

Chaining: Each array slot holds a list of entries
Open addressing: Find another empty slot

We'll cover these in detail later.

Python's dict

Python's built-in dict is a hash table.

contacts = {}
contacts["Alice"] = "555-1234"
contacts["Bob"] = "555-5678"

print(contacts["Alice"])  # "555-1234"
print("Charlie" in contacts)  # False

del contacts["Bob"]

O(1) average for insert, delete, lookup. Python handles collisions, resizing, everything.

When Hash Tables Shine

Fast lookups: Need to check if element exists? O(1).

Frequency counting: Count word occurrences.

def count_words(text):
    counts = {}
    for word in text.split():
        counts[word] = counts.get(word, 0) + 1
    return counts

O(n). One pass. Each lookup/insert is O(1) average.

Deduplication: Remove duplicates.

def remove_duplicates(arr):
    seen = {}
    result = []
    for item in arr:
        if item not in seen:
            seen[item] = True
            result.append(item)
    return result

O(n). Check membership in O(1).

Caching: Store expensive computation results.

cache = {}

def expensive_function(x):
    if x in cache:
        return cache[x]
    
    result = compute(x)  # Expensive
    cache[x] = result
    return result

First call: compute and cache. Subsequent calls: O(1) lookup.

Arrays vs Hash Tables

Operation	Array (unsorted)	Hash Table
Insert	O(1) (at end)	O(1) average
Delete	O(n) (find + shift)	O(1) average
Search	O(n)	O(1) average
Order	Preserved	Not preserved
Memory	Exact size	Extra for collisions

Hash tables trade memory for speed. Worth it for lookup-heavy workloads.

Load Factor

Load factor = number of entries / array size

Example: 7 entries, array size 10 → load factor 0.7

High load factor → more collisions → slower operations.

Typical strategy: when load factor > 0.75, resize array (double size, rehash all entries).

Python's dict resizes automatically. You don't see it, but it happens.

Why "Average" O(1)?

Worst case can be O(n) if all keys hash to same index (all collide).

But with a good hash function and proper resizing, collisions are rare. Average case is O(1).

That's why we say "average O(1)", not "guaranteed O(1)".

Example: Two Sum

Given array, find two numbers that sum to target.

Brute force: Nested loops. O(n²).

Hash table: One pass.

def two_sum(nums, target):
    seen = {}
    
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i
    
    return None

print(two_sum([2, 7, 11, 15], 9))  # [0, 1] (2 + 7 = 9)

For each number, check if complement exists in hash table. O(1) lookup. Total: O(n).

Example: First Non-Repeating Character

Find first character that appears only once.

def first_unique_char(s):
    counts = {}
    
    # Count occurrences
    for char in s:
        counts[char] = counts.get(char, 0) + 1
    
    # Find first with count 1
    for char in s:
        if counts[char] == 1:
            return char
    
    return None

print(first_unique_char("leetcode"))  # 'l'
print(first_unique_char("aabb"))  # None

Two passes, both O(n). Hash table gives O(1) lookups.

The Power

Hash tables give you O(1) average for the big three:

Insert
Delete
Search

Understanding hash tables means understanding:

How hash functions work
Why collisions happen
Trade-offs: speed vs memory

Master hash tables, and you'll solve problems faster. Literally.