- Build LRU cache with hash table + doubly linked list for O(1) operations
- Solve subarray sum and longest consecutive with cumulative sum patterns
- Know when NOT to use: sorted order, range queries, memory constraints
Hash Table Applications
Hash tables aren't just theory. They're everywhere in real software.
LRU Cache
Least Recently Used cache. Store recent results. When full, evict oldest.
Combine hash table (O(1) lookup) + doubly linked list (O(1) add/remove).
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.prev = None
self.next = None
class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {} # key -> node
self.head = Node(0, 0) # Dummy head
self.tail = Node(0, 0) # Dummy tail
self.head.next = self.tail
self.tail.prev = self.head
def get(self, key):
if key not in self.cache:
return -1
node = self.cache[key]
self._remove(node)
self._add(node) # Move to front (most recent)
return node.value
def put(self, key, value):
if key in self.cache:
self._remove(self.cache[key])
node = Node(key, value)
self._add(node)
self.cache[key] = node
if len(self.cache) > self.capacity:
# Evict LRU (node before tail)
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def _add(self, node):
# Add to front (after head)
node.prev = self.head
node.next = self.head.next
self.head.next.prev = node
self.head.next = node
Both get and put are O(1). Hash table for lookup. Linked list for ordering.
Group Anagrams
Group words that are anagrams of each other.
def group_anagrams(words):
anagrams = {}
for word in words:
# Sort characters as key
key = ''.join(sorted(word))
if key not in anagrams:
anagrams[key] = []
anagrams[key].append(word)
return list(anagrams.values())
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Sorted string as key. Anagrams have same sorted form. O(n × k log k) where k is average word length.
Subarray Sum Equals K
Find number of subarrays with sum equal to k.
def subarray_sum(nums, k):
count = 0
sum_map = {0: 1} # sum -> frequency
curr_sum = 0
for num in nums:
curr_sum += num
# If curr_sum - k exists, we have subarray(s) with sum k
if curr_sum - k in sum_map:
count += sum_map[curr_sum - k]
sum_map[curr_sum] = sum_map.get(curr_sum, 0) + 1
return count
print(subarray_sum([1, 1, 1], 2)) # 2 ([1,1] appears twice)
print(subarray_sum([1, 2, 3], 3)) # 2 ([1,2] and [3])
Track cumulative sums. If curr_sum - k seen before, those subarrays sum to k. O(n).
Longest Consecutive Sequence
Find longest sequence of consecutive numbers.
def longest_consecutive(nums):
num_set = set(nums) # Hash set for O(1) lookup
longest = 0
for num in num_set:
# Only start counting if it's the beginning of a sequence
if num - 1 not in num_set:
current_num = num
current_length = 1
while current_num + 1 in num_set:
current_num += 1
current_length += 1
longest = max(longest, current_length)
return longest
print(longest_consecutive([100, 4, 200, 1, 3, 2])) # 4 ([1,2,3,4])
Only count from sequence starts (where num-1 doesn't exist). Each number processed once. O(n).
Frequency Counter
Count occurrences of elements.
from collections import Counter
def top_k_frequent(nums, k):
counts = Counter(nums)
return [num for num, count in counts.most_common(k)]
print(top_k_frequent([1,1,1,2,2,3], 2)) # [1, 2]
Counter is a hash table. most_common uses heap. O(n + k log n).
Checking for Duplicates
Has duplicates?
def contains_duplicate(nums):
return len(nums) != len(set(nums))
Set removes duplicates. If lengths differ, there were duplicates. O(n).
Within distance k?
def contains_nearby_duplicate(nums, k):
seen = {}
for i, num in enumerate(nums):
if num in seen and i - seen[num] <= k:
return True
seen[num] = i
return False
Track last index of each number. If seen within distance k, duplicate exists. O(n).
Word Pattern Matching
Check if pattern matches string.
def word_pattern(pattern, s):
words = s.split()
if len(pattern) != len(words):
return False
char_to_word = {}
word_to_char = {}
for char, word in zip(pattern, words):
if char in char_to_word:
if char_to_word[char] != word:
return False
else:
char_to_word[char] = word
if word in word_to_char:
if word_to_char[word] != char:
return False
else:
word_to_char[word] = char
return True
print(word_pattern("abba", "dog cat cat dog")) # True
print(word_pattern("abba", "dog cat cat fish")) # False
Bijection: each char maps to one word, each word to one char. Two hash tables. O(n).
Best Practices
Choose right initial size: If you know expected size, initialize appropriately. Avoid repeated resizing.
# Good: know ~1000 items
my_dict = dict() # Python handles this, but in custom implementation...
hash_table = HashTable(size=2048) # Power of 2, > 1000 / 0.75
Use immutable keys: Strings, tuples, frozensets. Not lists or dicts.
Profile before optimizing: Hash tables are usually not the bottleneck. Measure first.
Consider memory: Hash tables use extra memory. If memory-constrained, maybe sorted array + binary search.
Know your language's implementation: Python's dict uses open addressing. Java's HashMap uses chaining. Different trade-offs.
When NOT to Use Hash Tables
Need sorted order: Use balanced BST (TreeMap in Java, SortedDict in some libraries).
Range queries: "Find all values between x and y." BST is O(log n + k), hash table is O(n).
Memory-constrained: Hash tables need low load factor. Alternatives may use less memory.
Iteration order matters: Regular hash tables don't preserve order. Use OrderedDict (Python) or LinkedHashMap (Java).
The Versatility
Hash tables solve:
- Caching (LRU)
- Grouping (anagrams)
- Cumulative sums (subarray problems)
- Deduplication
- Pattern matching
Understanding applications means recognizing when O(1) lookup is the key to the solution.
Master hash tables, and you'll solve problems faster and more elegantly.
