Generators and Iterators

Introduction

Imagine you're a chef preparing a large banquet. Traditionally, you'd cook all the dishes at once, requiring massive amounts of stove space, ingredients, and cleanup. But what if you could cook each dish only when a guest is ready to eat it?

This is the power of generators - they produce values on demand, using minimal memory and processing power. Instead of creating all data at once, generators create values one at a time, only when needed. This "lazy evaluation" approach is fundamental to efficient Python programming.

Iterables vs Iterators

Before understanding generators, we need to distinguish between iterables and iterators.

Iterables

An iterable is any object that can be looped over. It has an __iter__ method that returns an iterator:

# Lists are iterables
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    print(num)

# Strings are iterables
for char in "hello":
    print(char)

# Dictionaries are iterables
for key in {"a": 1, "b": 2}:
    print(key)

Iterators

An iterator is an object that performs the actual iteration. It has both __iter__ and __next__ methods:

# Get an iterator from an iterable
numbers = [1, 2, 3, 4, 5]
iterator = iter(numbers)  # Creates iterator

print(next(iterator))  # 1
print(next(iterator))  # 2
print(next(iterator))  # 3

# Manual iteration
try:
    while True:
        print(next(iterator))
except StopIteration:
    print("Iteration complete")

Generator Functions

Generator functions use the yield keyword instead of return. They pause execution and can be resumed later:

def countdown(n):
    """Generator that counts down from n."""
    while n > 0:
        yield n
        n -= 1

# Create generator
counter = countdown(5)

print(next(counter))  # 5
print(next(counter))  # 4
print(next(counter))  # 3

# Use in loop
for num in counter:
    print(num)  # 2, 1

How Generators Work

def simple_generator():
    print("Starting")
    yield 1
    print("After first yield")
    yield 2
    print("After second yield")
    yield 3
    print("Done")

gen = simple_generator()
print(next(gen))  # "Starting" then 1
print(next(gen))  # "After first yield" then 2
print(next(gen))  # "After second yield" then 3
# print(next(gen))  # StopIteration

Generator Expressions

Generator expressions are like list comprehensions but create generators instead of lists:

# List comprehension (creates list)
squares_list = [x**2 for x in range(10)]  # [0, 1, 4, 9, 16, ...]

# Generator expression (creates generator)
squares_gen = (x**2 for x in range(10))  # <generator object>

# Memory comparison
import sys
print(sys.getsizeof(squares_list))  # ~200 bytes
print(sys.getsizeof(squares_gen))   # ~100 bytes

# Usage
for square in squares_gen:
    print(square)

Conditional Generator Expressions

# Even numbers
evens = (x for x in range(20) if x % 2 == 0)

# Filtered and transformed
processed = (x.upper() for x in ['hello', 'world', 'python'] if len(x) > 4)

Custom Iterators

You can create custom iterator classes by implementing __iter__ and __next__:

class FibonacciIterator:
    def __init__(self, max_value=None):
        self.a, self.b = 0, 1
        self.max_value = max_value
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.max_value and self.a > self.max_value:
            raise StopIteration
        
        result = self.a
        self.a, self.b = self.b, self.a + self.b
        return result

# Usage
fib = FibonacciIterator(max_value=100)
for num in fib:
    print(num, end=" ")  # 0 1 1 2 3 5 8 13 21 34 55 89

Memory Efficiency

Generators are memory-efficient because they don't store all values in memory:

# Memory-intensive approach
def read_large_file_traditional(filename):
    with open(filename) as f:
        return f.readlines()  # Loads entire file into memory

# Memory-efficient approach
def read_large_file_generator(filename):
    with open(filename) as f:
        for line in f:  # Reads one line at a time
            yield line.strip()

# Usage
for line in read_large_file_generator('large_file.txt'):
    process_line(line)  # Only one line in memory at a time

Infinite Sequences

Generators can represent infinite sequences without memory issues:

def primes():
    """Generate prime numbers indefinitely."""
    yield 2
    primes_found = [2]
    
    candidate = 3
    while True:
        is_prime = True
        for prime in primes_found:
            if prime * prime > candidate:
                break
            if candidate % prime == 0:
                is_prime = False
                break
        
        if is_prime:
            primes_found.append(candidate)
            yield candidate
        
        candidate += 2

# Generate first 10 primes
prime_gen = primes()
for _ in range(10):
    print(next(prime_gen), end=" ")  # 2 3 5 7 11 13 17 19 23 29

Real-World Applications

Data Pipeline Processing

def read_csv(filename):
    """Read CSV file line by line."""
    with open(filename) as f:
        next(f)  # Skip header
        for line in f:
            yield line.strip().split(',')

def filter_valid_data(rows):
    """Filter out invalid rows."""
    for row in rows:
        if len(row) == 3 and all(field for field in row):
            yield row

def convert_to_dict(rows):
    """Convert rows to dictionaries."""
    for row in rows:
        yield {
            'name': row[0],
            'age': int(row[1]),
            'city': row[2]
        }

# Pipeline
data_pipeline = convert_to_dict(
    filter_valid_data(
        read_csv('data.csv')
    )
)

for person in data_pipeline:
    print(person)

Web Scraping with Rate Limiting

import time
import requests

def scrape_pages(urls, delay=1):
    """Scrape web pages with rate limiting."""
    for url in urls:
        response = requests.get(url)
        yield response.text
        time.sleep(delay)

# Usage
urls = [f"https://api.example.com/page/{i}" for i in range(1, 11)]
for page_content in scrape_pages(urls, delay=0.5):
    process_page(page_content)

Database Query Results

def query_large_table(db_connection, batch_size=1000):
    """Query large database table in batches."""
    offset = 0
    while True:
        results = db_connection.execute(
            "SELECT * FROM large_table LIMIT ? OFFSET ?",
            (batch_size, offset)
        ).fetchall()
        
        if not results:
            break
        
        for row in results:
            yield row
        
        offset += batch_size

# Process millions of rows without loading all into memory
for row in query_large_table(db_conn):
    process_row(row)

Generator Methods

Generators have additional methods beyond regular iterators:

def counter():
    count = 0
    while True:
        received = yield count
        if received is not None:
            count = received
        else:
            count += 1

gen = counter()

print(next(gen))  # 0
print(next(gen))  # 1
print(next(gen))  # 2

# Send value to generator
gen.send(10)      # Resets count to 10
print(next(gen))  # 11

# Close generator
gen.close()
# next(gen) would raise StopIteration

Generator with Cleanup

def file_processor(filename):
    try:
        with open(filename) as f:
            for line in f:
                value = yield line.strip()
                if value == 'quit':
                    return  # Early termination
    finally:
        print(f"Cleaning up {filename}")

processor = file_processor('data.txt')
for line in processor:
    print(f"Processing: {line}")
    if some_condition:
        processor.send('quit')  # Tell generator to stop

itertools Module

The itertools module provides powerful tools for working with iterators:

import itertools

# Infinite iterators
counter = itertools.count(start=10, step=2)  # 10, 12, 14, ...
cycle = itertools.cycle(['A', 'B', 'C'])     # A, B, C, A, B, C, ...
repeat = itertools.repeat('X', 5)            # X, X, X, X, X

# Combinatorics
combinations = itertools.combinations('ABC', 2)  # AB, AC, BC
permutations = itertools.permutations('ABC', 2)  # AB, BA, AC, CA, BC, CB

# Grouping and filtering
grouped = itertools.groupby([1, 1, 2, 2, 3, 3, 3])  # Groups consecutive equal elements
filtered = itertools.filterfalse(lambda x: x < 5, [1, 6, 3, 8, 2])  # [6, 8]

# Chaining
combined = itertools.chain([1, 2, 3], ['a', 'b', 'c'])  # 1, 2, 3, 'a', 'b', 'c'

Performance Comparison

Approach	Memory Usage	Speed	Use Case
List	High (all data)	Fast access	Small datasets, random access
Generator	Low (one item)	Fast generation	Large datasets, sequential access
Iterator	Low (one item)	Moderate	Custom iteration logic

Best Practices

1. Use Generators for Large Data

# Good for large files
def process_large_file(filename):
    with open(filename) as f:
        for line in f:
            yield process_line(line)

# Avoid loading everything
# bad = [process_line(line) for line in open(filename)]

2. Generator Functions vs Expressions

# Use function for complex logic
def complex_generator():
    # Multiple yield statements
    # Complex setup/cleanup
    pass

# Use expression for simple transformations
simple_gen = (x**2 for x in range(1000))

3. Handle Generator Exhaustion

def safe_next(iterator, default=None):
    try:
        return next(iterator)
    except StopIteration:
        return default

gen = (x for x in range(3))
print(safe_next(gen))  # 0
print(safe_next(gen))  # 1
print(safe_next(gen))  # 2
print(safe_next(gen))  # None (default)

Key Points to Remember

Generators use yield to produce values on demand, saving memory
Iterables can be iterated over, iterators perform the iteration
Generator expressions are memory-efficient alternatives to list comprehensions
Custom iterators implement __iter__ and __next__ methods
Generators enable lazy evaluation and infinite sequences
Use generators for large datasets and streaming data
itertools provides powerful iterator manipulation tools

Generators are powerful for memory-efficient programming, but there are advanced patterns that make them even more useful. In the next lesson, we'll explore advanced generator patterns including generator delegation, context management, and complex data processing pipelines.