Hash Table Applications

Hash tables aren't just theory. They're everywhere in real software.

LRU Cache

Least Recently Used cache. Store recent results. When full, evict oldest.

Combine hash table (O(1) lookup) + doubly linked list (O(1) add/remove).

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}  # key -> node
        self.head = Node(0, 0)  # Dummy head
        self.tail = Node(0, 0)  # Dummy tail
        self.head.next = self.tail
        self.tail.prev = self.head
    
    def get(self, key):
        if key not in self.cache:
            return -1
        
        node = self.cache[key]
        self._remove(node)
        self._add(node)  # Move to front (most recent)
        return node.value
    
    def put(self, key, value):
        if key in self.cache:
            self._remove(self.cache[key])
        
        node = Node(key, value)
        self._add(node)
        self.cache[key] = node
        
        if len(self.cache) > self.capacity:
            # Evict LRU (node before tail)
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]
    
    def _remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev
    
    def _add(self, node):
        # Add to front (after head)
        node.prev = self.head
        node.next = self.head.next
        self.head.next.prev = node
        self.head.next = node

Both get and put are O(1). Hash table for lookup. Linked list for ordering.

Group Anagrams

Group words that are anagrams of each other.

def group_anagrams(words):
    anagrams = {}
    
    for word in words:
        # Sort characters as key
        key = ''.join(sorted(word))
        
        if key not in anagrams:
            anagrams[key] = []
        anagrams[key].append(word)
    
    return list(anagrams.values())

words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

Sorted string as key. Anagrams have same sorted form. O(n × k log k) where k is average word length.

Subarray Sum Equals K

Find number of subarrays with sum equal to k.

def subarray_sum(nums, k):
    count = 0
    sum_map = {0: 1}  # sum -> frequency
    curr_sum = 0
    
    for num in nums:
        curr_sum += num
        
        # If curr_sum - k exists, we have subarray(s) with sum k
        if curr_sum - k in sum_map:
            count += sum_map[curr_sum - k]
        
        sum_map[curr_sum] = sum_map.get(curr_sum, 0) + 1
    
    return count

print(subarray_sum([1, 1, 1], 2))  # 2 ([1,1] appears twice)
print(subarray_sum([1, 2, 3], 3))  # 2 ([1,2] and [3])

Track cumulative sums. If curr_sum - k seen before, those subarrays sum to k. O(n).

Longest Consecutive Sequence

Find longest sequence of consecutive numbers.

def longest_consecutive(nums):
    num_set = set(nums)  # Hash set for O(1) lookup
    longest = 0
    
    for num in num_set:
        # Only start counting if it's the beginning of a sequence
        if num - 1 not in num_set:
            current_num = num
            current_length = 1
            
            while current_num + 1 in num_set:
                current_num += 1
                current_length += 1
            
            longest = max(longest, current_length)
    
    return longest

print(longest_consecutive([100, 4, 200, 1, 3, 2]))  # 4 ([1,2,3,4])

Only count from sequence starts (where num-1 doesn't exist). Each number processed once. O(n).

Frequency Counter

Count occurrences of elements.

from collections import Counter

def top_k_frequent(nums, k):
    counts = Counter(nums)
    return [num for num, count in counts.most_common(k)]

print(top_k_frequent([1,1,1,2,2,3], 2))  # [1, 2]

Counter is a hash table. most_common uses heap. O(n + k log n).

Checking for Duplicates

Has duplicates?

def contains_duplicate(nums):
    return len(nums) != len(set(nums))

Set removes duplicates. If lengths differ, there were duplicates. O(n).

Within distance k?

def contains_nearby_duplicate(nums, k):
    seen = {}
    
    for i, num in enumerate(nums):
        if num in seen and i - seen[num] <= k:
            return True
        seen[num] = i
    
    return False

Track last index of each number. If seen within distance k, duplicate exists. O(n).

Word Pattern Matching

Check if pattern matches string.

def word_pattern(pattern, s):
    words = s.split()
    
    if len(pattern) != len(words):
        return False
    
    char_to_word = {}
    word_to_char = {}
    
    for char, word in zip(pattern, words):
        if char in char_to_word:
            if char_to_word[char] != word:
                return False
        else:
            char_to_word[char] = word
        
        if word in word_to_char:
            if word_to_char[word] != char:
                return False
        else:
            word_to_char[word] = char
    
    return True

print(word_pattern("abba", "dog cat cat dog"))  # True
print(word_pattern("abba", "dog cat cat fish"))  # False

Bijection: each char maps to one word, each word to one char. Two hash tables. O(n).

Best Practices

Choose right initial size: If you know expected size, initialize appropriately. Avoid repeated resizing.

# Good: know ~1000 items
my_dict = dict()  # Python handles this, but in custom implementation...
hash_table = HashTable(size=2048)  # Power of 2, > 1000 / 0.75

Use immutable keys: Strings, tuples, frozensets. Not lists or dicts.

Profile before optimizing: Hash tables are usually not the bottleneck. Measure first.

Consider memory: Hash tables use extra memory. If memory-constrained, maybe sorted array + binary search.

Know your language's implementation: Python's dict uses open addressing. Java's HashMap uses chaining. Different trade-offs.

When NOT to Use Hash Tables

Need sorted order: Use balanced BST (TreeMap in Java, SortedDict in some libraries).

Range queries: "Find all values between x and y." BST is O(log n + k), hash table is O(n).

Memory-constrained: Hash tables need low load factor. Alternatives may use less memory.

Iteration order matters: Regular hash tables don't preserve order. Use OrderedDict (Python) or LinkedHashMap (Java).

The Versatility

Hash tables solve:

Caching (LRU)
Grouping (anagrams)
Cumulative sums (subarray problems)
Deduplication
Pattern matching

Understanding applications means recognizing when O(1) lookup is the key to the solution.

Master hash tables, and you'll solve problems faster and more elegantly.