foreach-ui logo
codeLanguages
account_treeDSA

Quick Actions

quizlock Random Quiz
trending_uplock Progress
  • 1
  • 2
  • 3
  • 4
  • quiz
Python
  • Understand what regular expressions are and why they're useful
  • Learn basic regex syntax and metacharacters
  • Use Python's re module for pattern matching
  • Apply simple patterns for text searching

Introduction to Regular Expressions

Introduction

Imagine you're a librarian in a vast library with millions of books. A patron asks for all books about "cats" - but they want books that mention cats as pets, wild cats, even the musical "Cats." How do you find exactly what they need?

This is where regular expressions shine. They're like a super-powered search tool that can find patterns in text, not just exact words. Regular expressions (regex) are a sequence of characters that define a search pattern, allowing you to match, search, and manipulate text in incredibly flexible ways.

In Python, we use the re module to work with regular expressions. Let's explore how this powerful tool can transform the way you work with text.


What Are Regular Expressions?

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.

Think of regex as a "mini programming language" specifically designed for text matching. Instead of writing complex loops to search through text, you write a pattern that describes what you're looking for.

Why Use Regular Expressions?

  • Powerful text processing: Find, replace, and extract text patterns
  • Data validation: Check if input matches expected formats (emails, phone numbers, etc.)
  • Text parsing: Extract structured data from unstructured text
  • Search and replace: Complex find-and-replace operations
  • Efficiency: Process large amounts of text quickly

Basic Regex Syntax

Let's start with the fundamental building blocks of regular expressions.

Literal Characters

The simplest regex patterns are just literal characters:

import re

# Match the word "cat"
pattern = r"cat"
text = "The cat sat on the mat."

match = re.search(pattern, text)
if match:
    print("Found:", match.group())  # "cat"

Metacharacters

Metacharacters are special characters with specific meanings:

Metacharacter Description Example Matches
. Any character except newline c.t "cat", "cut", "c9t"
^ Start of string/line ^Hello "Hello world" (but not "world Hello")
$ End of string/line world$ "Hello world" (but not "world Hello")
* Zero or more ca*t "ct", "cat", "caat", "caaat"
+ One or more ca+t "cat", "caat", "caaat" (but not "ct")
? Zero or one ca?t "ct", "cat" (but not "caat")
{n} Exactly n times ca{2}t "caat"
{n,} n or more times ca{2,}t "caat", "caaaat"
{n,m} Between n and m times ca{1,3}t "cat", "caat", "caaat"

Character Classes

Character classes let you match any one of a set of characters:

# Match any vowel
pattern = r"[aeiou]"
text = "The quick brown fox"
matches = re.findall(pattern, text)
print(matches)  # ['e', 'u', 'i', 'o', 'o']

# Match any digit
pattern = r"[0-9]"
text = "Order #123 and #456"
matches = re.findall(pattern, text)
print(matches)  # ['1', '2', '3', '4', '5', '6']

# Match word characters (letters, digits, underscore)
pattern = r"\w+"
text = "Hello, world! 123"
matches = re.findall(pattern, text)
print(matches)  # ['Hello', 'world', '123']

Predefined Character Classes

Class Description Equivalent
\d Digit (0-9) [0-9]
\D Non-digit [^0-9]
\w Word character [a-zA-Z0-9_]
\W Non-word character [^a-zA-Z0-9_]
\s Whitespace [ \t\n\r\f\v]
\S Non-whitespace [^ \t\n\r\f\v]

Python's re Module

Python provides the re module for working with regular expressions. Here are the most important functions:

re.search() - Find First Match

import re

pattern = r"Python"
text = "I love Python programming!"

match = re.search(pattern, text)
if match:
    print("Found at position:", match.start())  # 7
    print("Match:", match.group())              # "Python"

re.findall() - Find All Matches

pattern = r"\d+"
text = "I have 3 cats and 2 dogs."

numbers = re.findall(pattern, text)
print(numbers)  # ['3', '2']

re.match() - Match from Start

pattern = r"Hello"
text1 = "Hello world"
text2 = "world Hello"

match1 = re.match(pattern, text1)  # Matches
match2 = re.match(pattern, text2)  # No match (doesn't start with "Hello")

re.sub() - Replace Matches

pattern = r"old"
replacement = "new"
text = "The old man and the old dog"

new_text = re.sub(pattern, replacement, text)
print(new_text)  # "The new man and the new dog"

re.split() - Split by Pattern

pattern = r"\s+"
text = "Split   this    text"

parts = re.split(pattern, text)
print(parts)  # ['Split', 'this', 'text']

Compiling Regular Expressions

For better performance when using the same pattern multiple times, compile it:

import re

# Compile once, use many times
email_pattern = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')

emails = [
    "user@example.com",
    "invalid-email",
    "another@test.org"
]

for email in emails:
    if email_pattern.match(email):
        print(f"Valid: {email}")
    else:
        print(f"Invalid: {email}")

Validating Email Addresses Examples

Email Validation

def is_valid_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

emails = ["user@example.com", "invalid@", "test.email@domain.co.uk"]
for email in emails:
    print(f"{email}: {'Valid' if is_valid_email(email) else 'Invalid'}")

Phone Number Extraction

def extract_phone_numbers(text):
    # Match various phone number formats
    pattern = r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
    return re.findall(pattern, text)

text = "Call me at 555-123-4567 or 555.987.6543"
phones = extract_phone_numbers(text)
print("Found phones:", phones)

URL Extraction

def extract_urls(text):
    pattern = r'https?://(?:[-\w.])+(?:[:\d]+)?(?:/(?:[\w/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:\w*))*)?'
    return re.findall(pattern, text)

text = "Visit https://www.example.com and http://test.org/page?q=search"
urls = extract_urls(text)
print("Found URLs:", urls)

Common Patterns and Best Practices

Escaping Special Characters

When you want to match literal special characters, escape them with backslash:

# Match literal dot
pattern = r"\."
text = "file.txt and file.py"
matches = re.findall(pattern, text)
print(matches)  # ['.', '.']

Raw Strings

Always use raw strings (r"pattern") for regex patterns to avoid double-escaping:

# Good
pattern = r"\d+\.\d+"

# Avoid (hard to read)
pattern = "\\d+\\.\\d+"

Greedy vs Non-Greedy Matching

By default, quantifiers are greedy (match as much as possible):

text = "<div>content</div><div>more content</div>"

# Greedy (matches everything between first <div> and last </div>)
greedy = re.search(r'<div>.*</div>', text)
print(greedy.group())  # "<div>content</div><div>more content</div>"

# Non-greedy (matches as little as possible)
nongreedy = re.search(r'<div>.*?</div>', text)
print(nongreedy.group())  # "<div>content</div>"

Performance Considerations

  • Compile patterns when used multiple times
  • Use specific patterns rather than broad ones
  • Avoid catastrophic backtracking with nested quantifiers
  • Consider alternatives like string methods for simple searches

Key Points to Remember

  • Regular expressions are powerful patterns for text matching and manipulation
  • Use metacharacters like ., *, +, ? for flexible matching
  • Character classes [abc] and predefined classes \d, \w help match specific types of characters
  • Python's re module provides search(), findall(), match(), sub(), and split()
  • Always use raw strings for regex patterns
  • Consider performance and use compiled patterns for repeated use

Now that you understand the basics of regular expressions, we'll dive deeper into pattern matching with groups, which allow you to extract specific parts of matched text and create more complex patterns. This will give you the power to parse structured data from unstructured text.

© 2026 forEach. All rights reserved.

Privacy Policy•Terms of Service