foreach-ui logo
codeLanguages
account_treeDSA

Quick Actions

quizlock Random Quiz
trending_uplock Progress
  • 1
  • 2
  • 3
  • 4
  • 5
  • quiz
Java
  • Understand regex syntax and common metacharacters
  • Use Pattern and Matcher classes for searching and extraction
  • Apply regex for validation and text processing tasks

Regular Expressions in Java

You need to validate email addresses, extract phone numbers from documents, or find URLs in web pages. Writing character-by-character checks is tedious and error-prone. Enter regular expressions.

Regex is a powerful language for describing text patterns. It lets you define complex searches concisely. Works in Java, Python, JavaScript, text editors, command-line—everywhere.

What is a Regular Expression?

A regex is a sequence of characters defining a search pattern. Like "find" on steroids. Simple search finds exact text. Regex finds patterns like "any word starting with 'J' and ending with 'a'" or "exactly 10 digits."

Example: \d{3}-\d{4} matches phone numbers like "555-1234"—three digits, hyphen, four digits.

The Two Main Classes: Pattern and Matcher

Java provides regex support through two main classes in the java.util.regex package:

Pattern - Represents a compiled regular expression. You compile your regex string into a Pattern object once, then use it multiple times.

Matcher - The engine that performs matching operations against a string using a Pattern.

// Step 1: Compile the pattern
Pattern pattern = Pattern.compile("hello");

// Step 2: Create a matcher for your input
Matcher matcher = pattern.matcher("hello world");

// Step 3: Perform operations
boolean found = matcher.find();  // true

For simple one-time checks, you can use shorthand methods:

// Quick check if entire string matches
boolean matches = Pattern.matches("hello", "hello");  // true

// Even shorter using String's matches method
boolean quick = "hello world".matches(".*world");  // true

Building Patterns: Character Classes

The real power of regex comes from special pattern syntax. Let's start with character classes - ways to match specific sets of characters.

Literal Characters

The simplest pattern matches literal characters. The pattern cat matches the text "cat".

The Dot (.) - Any Character

The dot matches any single character (except newline):

"cat".matches(".at")  // true - 'c' matches '.'
"bat".matches(".at")  // true - 'b' matches '.'
"hat".matches(".at")  // true - 'h' matches '.'
"at".matches(".at")   // false - nothing to match '.'

Character Sets with Square Brackets

Square brackets define a set of characters to match:

Pattern Meaning Examples
[abc] Match a, b, OR c "a" ✓, "d" ✗
[a-z] Match any lowercase letter "m" ✓, "M" ✗
[A-Z] Match any uppercase letter "M" ✓, "m" ✗
[0-9] Match any digit "5" ✓, "x" ✗
[a-zA-Z] Match any letter "a" ✓, "Z" ✓
[^abc] Match anything EXCEPT a, b, c "d" ✓, "a" ✗
"cat".matches("[abc]at")  // true - 'c' is in [abc]
"hat".matches("[abc]at")  // false - 'h' is not in [abc]
"hat".matches("[^abc]at") // true - 'h' is NOT a, b, or c

Predefined Character Classes

Writing [0-9] every time you want a digit is tedious. Java provides shorthand:

Shorthand Meaning Equivalent
\d Any digit [0-9]
\D Any non-digit [^0-9]
\w Any word character [a-zA-Z0-9_]
\W Any non-word character [^a-zA-Z0-9_]
\s Any whitespace [ \t\n\r\f]
\S Any non-whitespace [^ \t\n\r\f]

Important: In Java strings, backslash is an escape character, so you must double it: \\d instead of \d.

"123".matches("\\d+")     // true - one or more digits
"hello".matches("\\w+")   // true - one or more word characters
"   ".matches("\\s+")     // true - one or more whitespace

Quantifiers: How Many?

Quantifiers specify how many times a pattern element should appear.

Basic Quantifiers

Quantifier Meaning Example Pattern Matches
* Zero or more ab*c "ac", "abc", "abbc"
+ One or more ab+c "abc", "abbc" (not "ac")
? Zero or one colou?r "color", "colour"
{n} Exactly n a{3} "aaa" only
{n,} n or more a{2,} "aa", "aaa", "aaaa"...
{n,m} Between n and m a{2,4} "aa", "aaa", "aaaa"

Understanding the Difference

Let's see how these differ in practice:

// * means "zero or more" - the 'b' is optional and can repeat
"ac".matches("ab*c")    // true (zero b's)
"abc".matches("ab*c")   // true (one b)
"abbbc".matches("ab*c") // true (three b's)

// + means "one or more" - at least one 'b' required
"ac".matches("ab+c")    // false (no b - doesn't match)
"abc".matches("ab+c")   // true (one b)

// ? means "zero or one" - the 'u' is optional
"color".matches("colou?r")  // true (American spelling)
"colour".matches("colou?r") // true (British spelling)

Anchors: Position Matters

Sometimes you need to match at specific positions in the string.

Anchor Meaning
^ Start of string (or line in multiline mode)
$ End of string (or line in multiline mode)
\b Word boundary
\B Not a word boundary
// ^ ensures pattern is at the start
"hello world".matches("^hello.*")  // true
"say hello".matches("^hello.*")    // false - "hello" not at start

// $ ensures pattern is at the end
"hello world".matches(".*world$")  // true

// \b matches word boundaries
Pattern word = Pattern.compile("\\bcat\\b");
word.matcher("cat").find()           // true - "cat" is a whole word
word.matcher("catch").find()         // false - "cat" is part of "catch"
word.matcher("the cat sat").find()   // true - "cat" is a word here

Groups: Capturing Parts of the Match

Parentheses create groups that allow you to:

  1. Extract specific parts of a match
  2. Apply quantifiers to multiple characters
  3. Use back-references

Basic Capturing Groups

Pattern pattern = Pattern.compile("(\\d{3})-(\\d{4})");
Matcher matcher = pattern.matcher("My number is 555-1234");

if (matcher.find()) {
    System.out.println(matcher.group(0));  // "555-1234" (entire match)
    System.out.println(matcher.group(1));  // "555" (first group)
    System.out.println(matcher.group(2));  // "1234" (second group)
}

Named Groups (Java 7+)

For better readability, you can name your groups:

Pattern pattern = Pattern.compile("(?<area>\\d{3})-(?<number>\\d{4})");
Matcher matcher = pattern.matcher("555-1234");

if (matcher.find()) {
    System.out.println(matcher.group("area"));   // "555"
    System.out.println(matcher.group("number")); // "1234"
}

Non-Capturing Groups

Sometimes you need grouping for structure but don't need to capture:

// (?:...) groups without capturing
Pattern pattern = Pattern.compile("(?:Mr|Mrs|Ms)\\.?\\s+(\\w+)");
Matcher matcher = pattern.matcher("Mr. Smith");

if (matcher.find()) {
    // Group 1 is the name, not the title
    System.out.println(matcher.group(1));  // "Smith"
}

Common Practical Patterns

Email Validation

String emailPattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$";

"user@example.com".matches(emailPattern)     // true
"user.name@domain.co.uk".matches(emailPattern)  // true
"invalid@".matches(emailPattern)             // false

Breaking down the pattern:

  • ^ - Start of string
  • [a-zA-Z0-9._%+-]+ - One or more valid characters for the local part
  • @ - Literal @ sign
  • [a-zA-Z0-9.-]+ - One or more valid domain characters
  • \\. - Literal dot (escaped)
  • [a-zA-Z]{2,} - Two or more letters for TLD
  • $ - End of string

Password Strength

// At least 8 chars, one uppercase, one lowercase, one digit, one special
String strongPassword = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[@$!%*?&]).{8,}$";

The (?=...) is a lookahead - it checks if a pattern exists without consuming characters. This pattern uses multiple lookaheads to ensure all requirements are met.

Extracting Data

String text = "Contact: john@email.com or jane@company.org";
Pattern emailFinder = Pattern.compile("[\\w.+-]+@[\\w.-]+\\.[a-zA-Z]{2,}");
Matcher matcher = emailFinder.matcher(text);

while (matcher.find()) {
    System.out.println("Found: " + matcher.group());
}
// Output:
// Found: john@email.com
// Found: jane@company.org

String Methods Using Regex

Java's String class has several methods that use regex:

split() - Divide a String

// Split by comma
String csv = "apple,banana,cherry";
String[] fruits = csv.split(",");
// Result: ["apple", "banana", "cherry"]

// Split by any whitespace (one or more)
String text = "Hello    World   Java";
String[] words = text.split("\\s+");
// Result: ["Hello", "World", "Java"]

replaceAll() - Replace Patterns

// Mask all digits
String phone = "Call: 123-456-7890";
String masked = phone.replaceAll("\\d", "*");
// Result: "Call: ***-***-****"

// Normalize whitespace
String messy = "Hello    World\n\tJava";
String clean = messy.replaceAll("\\s+", " ");
// Result: "Hello World Java"

// Reorder with back-references
String name = "Smith, John";
String reordered = name.replaceAll("(\\w+), (\\w+)", "$2 $1");
// Result: "John Smith"

find() vs matches()

Understanding the difference is crucial:

  • matches() - The ENTIRE string must match the pattern
  • find() - Searches for the pattern ANYWHERE in the string
Pattern pattern = Pattern.compile("\\d+");

// matches() - entire string must be digits
pattern.matcher("123").matches()     // true
pattern.matcher("abc123").matches()  // false - has letters

// find() - looks for digits anywhere
pattern.matcher("abc123def").find()  // true - found "123"

Performance Tips

  1. Compile patterns once: If you use the same pattern multiple times, compile it once and reuse

    // Good - compile once
    private static final Pattern EMAIL = Pattern.compile("...");
    
    // Bad - recompiles every call
    public boolean isValid(String s) {
        return s.matches("...");  // Creates new Pattern each time
    }
    
  2. Be specific: Greedy quantifiers (*, +) can cause performance issues. Be as specific as possible.

  3. Use non-capturing groups when you don't need the captured value: (?:...) instead of (...)

Regex lets you match character classes ([abc], \d, \w), specify repetition (*, +, ?), match positions (^, $, \b), and capture groups. Use Pattern.compile() to create reusable patterns. matcher.find() finds the pattern anywhere. matcher.matches() checks if the entire string matches. Takes practice, but once you get it, you'll use it everywhere.

© 2026 forEach. All rights reserved.

Privacy Policy•Terms of Service