foreach-ui logo
codeLanguages
account_treeDSA

Quick Actions

quizlock Random Quiz
trending_uplock Progress
  • 1
  • 2
  • 3
  • 4
  • 5
  • quiz
Data Structures
  • Understand requirements: deterministic, fast O(1), uniform distribution
  • Learn polynomial rolling hash for strings with prime multiplier
  • Implement __hash__() and __eq__() for custom objects as keys

Hash Functions

The hash function is the heart of a hash table. It determines everything: speed, collisions, distribution.

A bad hash function? Your O(1) hash table becomes O(n). All operations slow.

A good hash function? Fast, uniform distribution, minimal collisions.

What Makes a Good Hash Function

1. Deterministic: Same input → same output, always.

hash("Alice")  # Always gives same number

If hash("Alice") gave different numbers each time, you couldn't find Alice's value later.

2. Fast: O(1) computation.

No expensive operations. Just arithmetic and bitwise ops.

3. Uniform distribution: Spread keys evenly across array.

Bad: All keys hash to index 0. Everything collides.

Good: Keys spread across [0, 1, 2, ..., size-1] evenly.

4. Minimize collisions: Different keys should rarely hash to same index.

Impossible to avoid completely (pigeonhole principle), but good functions minimize them.

Simple Hash for Integers

For integers, just use modulo:

def hash_int(key, size):
    return key % size

Keys [12, 25, 38] with size 10:

  • 12 % 10 = 2
  • 25 % 10 = 5
  • 38 % 10 = 8

No collisions if keys well-distributed.

Problem: If all keys are multiples of 10 (10, 20, 30), all hash to 0. Terrible.

Hash for Strings

Strings are sequences of characters. Convert to number.

Polynomial rolling hash:

def hash_string(s, size):
    hash_val = 0
    prime = 31  # Common choice
    
    for char in s:
        hash_val = (hash_val * prime + ord(char)) % size
    
    return hash_val

For "cat":

  • hash = 0
  • hash = (0 * 31 + ord('c')) % size = 99 % size
  • hash = (99 * 31 + ord('a')) % size = ...
  • hash = (... * 31 + ord('t')) % size = final

Why 31? Prime number. Reduces patterns. Experimentally good distribution.

Division Method

def hash_division(key, size):
    return key % size

Simple. Fast. Works if size is prime and keys are random.

Problem: If size is power of 2 (e.g., 16), only uses lower bits. Patterns in keys → patterns in hashes.

Multiplication Method

def hash_multiplication(key, size):
    A = 0.6180339887  # (sqrt(5) - 1) / 2, golden ratio
    return int(size * ((key * A) % 1))

Multiply by constant, take fractional part, multiply by size.

Less sensitive to size choice. Can use power of 2.

Universal Hashing

Choose hash function randomly from a family of functions.

import random

class UniversalHash:
    def __init__(self, size):
        self.size = size
        self.p = 2**31 - 1  # Large prime
        self.a = random.randint(1, self.p - 1)
        self.b = random.randint(0, self.p - 1)
    
    def hash(self, key):
        return ((self.a * key + self.b) % self.p) % self.size

Different instances use different (a, b). Attacker can't predict collisions. Security benefit.

Python's hash()

Python has built-in hash() function.

print(hash("Alice"))  # Some large integer
print(hash(42))       # 42 (for small ints, hash is identity)
print(hash((1, 2)))   # Hash of tuple

Then modulo to get array index:

index = hash(key) % array_size

Python's hash is well-designed. Fast, good distribution.

Hashing Custom Objects

Want to use your own class as key? Implement __hash__() and __eq__().

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def __hash__(self):
        return hash((self.x, self.y))
    
    def __eq__(self, other):
        return self.x == other.x and self.y == other.y

points = {}
p1 = Point(1, 2)
points[p1] = "origin-ish"

p2 = Point(1, 2)
print(points[p2])  # "origin-ish" (p1 and p2 are equal)

Hash based on immutable fields. If object changes, hash shouldn't.

Avalanche Effect

Small change in key → large change in hash.

print(hash("Alice"))  # e.g., 12345678
print(hash("Alicd"))  # e.g., 98765432 (very different)

One character change, completely different hash. Good for distribution.

Common Mistakes

Using mutable objects as keys: If object changes, hash changes. Can't find it anymore.

# Wrong
my_dict = {}
my_list = [1, 2, 3]
my_dict[my_list] = "value"  # TypeError: unhashable type: 'list'

# Right: use tuple (immutable)
my_tuple = (1, 2, 3)
my_dict[my_tuple] = "value"  # Works

Ignoring distribution: Bad hash function clusters keys.

# Bad: All even numbers hash to 0, odd to 1
def bad_hash(key, size):
    return key % 2

Not handling negative numbers:

# Wrong
hash(-5) % 10  # Might give negative index in some languages

# Right
abs(hash(-5)) % 10  # Ensure positive

The Pattern

Hash functions convert keys to array indices:

  • Must be deterministic and fast
  • Should distribute uniformly
  • Use math tricks (primes, golden ratio)
  • Python's built-in hash() is solid

Understanding hash functions means understanding why collisions happen and how to minimize them.

Master hash functions, and you'll design efficient hash tables.

© 2026 forEach. All rights reserved.

Privacy Policy•Terms of Service