- Understand the difference between iterators and iterables
- Create and use generator functions and expressions
- Implement custom iterators
- Apply generators for memory-efficient programming
Generators and Iterators
Introduction
Imagine you're a chef preparing a large banquet. Traditionally, you'd cook all the dishes at once, requiring massive amounts of stove space, ingredients, and cleanup. But what if you could cook each dish only when a guest is ready to eat it?
This is the power of generators - they produce values on demand, using minimal memory and processing power. Instead of creating all data at once, generators create values one at a time, only when needed. This "lazy evaluation" approach is fundamental to efficient Python programming.
Iterables vs Iterators
Before understanding generators, we need to distinguish between iterables and iterators.
Iterables
An iterable is any object that can be looped over. It has an __iter__ method that returns an iterator:
# Lists are iterables
numbers = [1, 2, 3, 4, 5]
for num in numbers:
print(num)
# Strings are iterables
for char in "hello":
print(char)
# Dictionaries are iterables
for key in {"a": 1, "b": 2}:
print(key)
Iterators
An iterator is an object that performs the actual iteration. It has both __iter__ and __next__ methods:
# Get an iterator from an iterable
numbers = [1, 2, 3, 4, 5]
iterator = iter(numbers) # Creates iterator
print(next(iterator)) # 1
print(next(iterator)) # 2
print(next(iterator)) # 3
# Manual iteration
try:
while True:
print(next(iterator))
except StopIteration:
print("Iteration complete")
Generator Functions
Generator functions use the yield keyword instead of return. They pause execution and can be resumed later:
def countdown(n):
"""Generator that counts down from n."""
while n > 0:
yield n
n -= 1
# Create generator
counter = countdown(5)
print(next(counter)) # 5
print(next(counter)) # 4
print(next(counter)) # 3
# Use in loop
for num in counter:
print(num) # 2, 1
How Generators Work
def simple_generator():
print("Starting")
yield 1
print("After first yield")
yield 2
print("After second yield")
yield 3
print("Done")
gen = simple_generator()
print(next(gen)) # "Starting" then 1
print(next(gen)) # "After first yield" then 2
print(next(gen)) # "After second yield" then 3
# print(next(gen)) # StopIteration
Generator Expressions
Generator expressions are like list comprehensions but create generators instead of lists:
# List comprehension (creates list)
squares_list = [x**2 for x in range(10)] # [0, 1, 4, 9, 16, ...]
# Generator expression (creates generator)
squares_gen = (x**2 for x in range(10)) # <generator object>
# Memory comparison
import sys
print(sys.getsizeof(squares_list)) # ~200 bytes
print(sys.getsizeof(squares_gen)) # ~100 bytes
# Usage
for square in squares_gen:
print(square)
Conditional Generator Expressions
# Even numbers
evens = (x for x in range(20) if x % 2 == 0)
# Filtered and transformed
processed = (x.upper() for x in ['hello', 'world', 'python'] if len(x) > 4)
Custom Iterators
You can create custom iterator classes by implementing __iter__ and __next__:
class FibonacciIterator:
def __init__(self, max_value=None):
self.a, self.b = 0, 1
self.max_value = max_value
def __iter__(self):
return self
def __next__(self):
if self.max_value and self.a > self.max_value:
raise StopIteration
result = self.a
self.a, self.b = self.b, self.a + self.b
return result
# Usage
fib = FibonacciIterator(max_value=100)
for num in fib:
print(num, end=" ") # 0 1 1 2 3 5 8 13 21 34 55 89
Memory Efficiency
Generators are memory-efficient because they don't store all values in memory:
# Memory-intensive approach
def read_large_file_traditional(filename):
with open(filename) as f:
return f.readlines() # Loads entire file into memory
# Memory-efficient approach
def read_large_file_generator(filename):
with open(filename) as f:
for line in f: # Reads one line at a time
yield line.strip()
# Usage
for line in read_large_file_generator('large_file.txt'):
process_line(line) # Only one line in memory at a time
Infinite Sequences
Generators can represent infinite sequences without memory issues:
def primes():
"""Generate prime numbers indefinitely."""
yield 2
primes_found = [2]
candidate = 3
while True:
is_prime = True
for prime in primes_found:
if prime * prime > candidate:
break
if candidate % prime == 0:
is_prime = False
break
if is_prime:
primes_found.append(candidate)
yield candidate
candidate += 2
# Generate first 10 primes
prime_gen = primes()
for _ in range(10):
print(next(prime_gen), end=" ") # 2 3 5 7 11 13 17 19 23 29
Real-World Applications
Data Pipeline Processing
def read_csv(filename):
"""Read CSV file line by line."""
with open(filename) as f:
next(f) # Skip header
for line in f:
yield line.strip().split(',')
def filter_valid_data(rows):
"""Filter out invalid rows."""
for row in rows:
if len(row) == 3 and all(field for field in row):
yield row
def convert_to_dict(rows):
"""Convert rows to dictionaries."""
for row in rows:
yield {
'name': row[0],
'age': int(row[1]),
'city': row[2]
}
# Pipeline
data_pipeline = convert_to_dict(
filter_valid_data(
read_csv('data.csv')
)
)
for person in data_pipeline:
print(person)
Web Scraping with Rate Limiting
import time
import requests
def scrape_pages(urls, delay=1):
"""Scrape web pages with rate limiting."""
for url in urls:
response = requests.get(url)
yield response.text
time.sleep(delay)
# Usage
urls = [f"https://api.example.com/page/{i}" for i in range(1, 11)]
for page_content in scrape_pages(urls, delay=0.5):
process_page(page_content)
Database Query Results
def query_large_table(db_connection, batch_size=1000):
"""Query large database table in batches."""
offset = 0
while True:
results = db_connection.execute(
"SELECT * FROM large_table LIMIT ? OFFSET ?",
(batch_size, offset)
).fetchall()
if not results:
break
for row in results:
yield row
offset += batch_size
# Process millions of rows without loading all into memory
for row in query_large_table(db_conn):
process_row(row)
Generator Methods
Generators have additional methods beyond regular iterators:
def counter():
count = 0
while True:
received = yield count
if received is not None:
count = received
else:
count += 1
gen = counter()
print(next(gen)) # 0
print(next(gen)) # 1
print(next(gen)) # 2
# Send value to generator
gen.send(10) # Resets count to 10
print(next(gen)) # 11
# Close generator
gen.close()
# next(gen) would raise StopIteration
Generator with Cleanup
def file_processor(filename):
try:
with open(filename) as f:
for line in f:
value = yield line.strip()
if value == 'quit':
return # Early termination
finally:
print(f"Cleaning up {filename}")
processor = file_processor('data.txt')
for line in processor:
print(f"Processing: {line}")
if some_condition:
processor.send('quit') # Tell generator to stop
itertools Module
The itertools module provides powerful tools for working with iterators:
import itertools
# Infinite iterators
counter = itertools.count(start=10, step=2) # 10, 12, 14, ...
cycle = itertools.cycle(['A', 'B', 'C']) # A, B, C, A, B, C, ...
repeat = itertools.repeat('X', 5) # X, X, X, X, X
# Combinatorics
combinations = itertools.combinations('ABC', 2) # AB, AC, BC
permutations = itertools.permutations('ABC', 2) # AB, BA, AC, CA, BC, CB
# Grouping and filtering
grouped = itertools.groupby([1, 1, 2, 2, 3, 3, 3]) # Groups consecutive equal elements
filtered = itertools.filterfalse(lambda x: x < 5, [1, 6, 3, 8, 2]) # [6, 8]
# Chaining
combined = itertools.chain([1, 2, 3], ['a', 'b', 'c']) # 1, 2, 3, 'a', 'b', 'c'
Performance Comparison
| Approach | Memory Usage | Speed | Use Case |
|---|---|---|---|
| List | High (all data) | Fast access | Small datasets, random access |
| Generator | Low (one item) | Fast generation | Large datasets, sequential access |
| Iterator | Low (one item) | Moderate | Custom iteration logic |
Best Practices
1. Use Generators for Large Data
# Good for large files
def process_large_file(filename):
with open(filename) as f:
for line in f:
yield process_line(line)
# Avoid loading everything
# bad = [process_line(line) for line in open(filename)]
2. Generator Functions vs Expressions
# Use function for complex logic
def complex_generator():
# Multiple yield statements
# Complex setup/cleanup
pass
# Use expression for simple transformations
simple_gen = (x**2 for x in range(1000))
3. Handle Generator Exhaustion
def safe_next(iterator, default=None):
try:
return next(iterator)
except StopIteration:
return default
gen = (x for x in range(3))
print(safe_next(gen)) # 0
print(safe_next(gen)) # 1
print(safe_next(gen)) # 2
print(safe_next(gen)) # None (default)
Key Points to Remember
- Generators use
yieldto produce values on demand, saving memory - Iterables can be iterated over, iterators perform the iteration
- Generator expressions are memory-efficient alternatives to list comprehensions
- Custom iterators implement
__iter__and__next__methods - Generators enable lazy evaluation and infinite sequences
- Use generators for large datasets and streaming data
itertoolsprovides powerful iterator manipulation tools
Generators are powerful for memory-efficient programming, but there are advanced patterns that make them even more useful. In the next lesson, we'll explore advanced generator patterns including generator delegation, context management, and complex data processing pipelines.
