Trie Data Structure

You're building autocomplete for a search bar. Type "alg" and it suggests "algorithm", "algebra", "algae". How does it find all words starting with "alg" so fast?

A trie (pronounced "try") is the answer. It's a tree where each path from root to leaf spells out a word. Words sharing prefixes share the path for that prefix.

The Structure

Think of a trie like a decision tree for spelling words. At each node, you choose the next letter.

For the words "cat", "car", and "dog":

       root
      /    \
     c      d
     |      |
     a      o
    / \     |
   t   r    g
  (cat)(car)(dog)

Each node has:

Children (map from character to node)
A flag: is this the end of a word?

class TrieNode:
    def __init__(self):
        self.children = {}  # char -> TrieNode
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

Inserting a Word

To insert "cat":

Start at root
For 'c': if child 'c' exists, go there; else create it
For 'a': same
For 't': same
Mark the 't' node as end of word

def insert(self, word):
    node = self.root
    for char in word:
        if char not in node.children:
            node.children[char] = TrieNode()
        node = node.children[char]
    node.is_end_of_word = True

After inserting "cat", "car", "dog":

root
├─ c
│  └─ a
│     ├─ t (END: "cat")
│     └─ r (END: "car")
└─ d
   └─ o
      └─ g (END: "dog")

Time: O(m) where m is word length. You visit each character once.

Space: O(m) in the worst case if no prefix is shared. But with shared prefixes, much better.

Searching for a Word

To search for "car":

Start at root
Follow 'c' → 'a' → 'r'
Check if the 'r' node is marked end of word

def search(self, word):
    node = self.root
    for char in word:
        if char not in node.children:
            return False  # Path doesn't exist
        node = node.children[char]
    return node.is_end_of_word  # True only if word ends here

For "ca" in the trie above: we reach the 'a' node, but it's not marked end of word, so return False.

Time: O(m). Follow the path, check the flag.

Prefix Search

Check if any word starts with a prefix. For "ca", we'd find "cat" and "car".

def starts_with(self, prefix):
    node = self.root
    for char in prefix:
        if char not in node.children:
            return False  # Prefix doesn't exist
        node = node.children[char]
    return True  # We found the prefix path

Time: O(m) where m is prefix length.

Finding All Words with a Prefix

For autocomplete, you need all words starting with "ca":

def find_words_with_prefix(self, prefix):
    node = self.root
    
    # Navigate to prefix
    for char in prefix:
        if char not in node.children:
            return []  # No words with this prefix
        node = node.children[char]
    
    # DFS from this node to collect all words
    words = []
    self._dfs(node, prefix, words)
    return words

def _dfs(self, node, current_word, words):
    if node.is_end_of_word:
        words.append(current_word)
    
    for char, child in node.children.items():
        self._dfs(child, current_word + char, words)

For prefix "ca" in our trie:

Navigate to the 'a' node
DFS from there: find "cat" and "car"

Time: O(p + n) where p is prefix length and n is total characters in all matching words.

Deletion

Deleting is trickier. Remove "cat" from the trie:

def delete(self, word):
    def _delete(node, word, index):
        if index == len(word):
            # Reached end of word
            if not node.is_end_of_word:
                return False  # Word doesn't exist
            node.is_end_of_word = False
            # Delete this node only if it has no children
            return len(node.children) == 0
        
        char = word[index]
        if char not in node.children:
            return False  # Word doesn't exist
        
        child = node.children[char]
        should_delete_child = _delete(child, word, index + 1)
        
        if should_delete_child:
            del node.children[char]
            # Delete current node if it's not end of another word and has no children
            return not node.is_end_of_word and len(node.children) == 0
        
        return False
    
    _delete(self.root, word, 0)

Why the complexity? After deleting "cat", we still need "car". We can't delete the 'c' and 'a' nodes because they're part of "car". We only delete the 't' node.

But if we delete "dog" and it's the only word starting with 'd', we can delete the entire branch.

Space Efficiency

The naive implementation uses a hash map at each node. For lowercase letters, that's potentially 26 entries per node.

With many words, this adds up. For 10,000 words averaging 10 characters, you might have 100,000 nodes, each with a hash map.

Optimization: use an array of size 26 instead of a hash map (for lowercase letters):

class TrieNode:
    def __init__(self):
        self.children = [None] * 26  # a-z
        self.is_end_of_word = False

def char_to_index(char):
    return ord(char) - ord('a')

def insert(self, word):
    node = self.root
    for char in word:
        idx = char_to_index(char)
        if node.children[idx] is None:
            node.children[idx] = TrieNode()
        node = node.children[idx]
    node.is_end_of_word = True

Trade-off: fixed memory per node (26 pointers) vs dynamic memory (hash map grows as needed).

When Tries Shine

Autocomplete: Type "alg", get all words starting with "alg". Hash maps can't do this efficiently.

Spell checking: Store dictionary in a trie. Check if a word exists in O(m).

IP routing: Routers use tries for longest prefix matching on IP addresses.

Dictionary applications: Find all words with a given prefix for word games, predictive text, etc.

vs Hash Maps

Hash map: O(m) insert, O(m) search, O(1) for exact match (if hash is O(1)).

Trie: O(m) insert, O(m) search, but also O(m) prefix search and can enumerate all words with a prefix.

Hash maps can't do prefix operations efficiently. Tries can't beat O(1) for exact key lookup (though O(m) is often acceptable).

vs Binary Search Trees

BST: O(log n) search where n is number of words, but comparisons are O(m) each, so O(m log n) total.

Trie: O(m) search regardless of how many words are stored.

For string operations, tries are usually better.

The Trade-off

Tries use more memory than hash maps for the same data (all those node objects and child pointers).

But they enable operations that would otherwise be expensive: prefix search, enumeration, range queries on strings.

If you need prefix operations or autocomplete, the memory cost is worth it. If you only need exact lookups, a hash map is simpler and often faster.

Variations

Compressed trie (Patricia trie): Merge nodes with single children to save space.

Ternary search trie: Each node has 3 children (less than, equal, greater than) for better space efficiency.

Suffix trie: Store all suffixes of a string for substring search operations.

The basic trie is the foundation. Once you understand it, the variations make sense as optimizations for specific use cases.