How do I use predefined character classes regex?

In regex, you also have a number of predefined character classes that provide you with a shorthand notation for commonly used sets of characters.

Here are the list:

Predefined Class Matches
. Any character
d Any digit, shorthand for [0-9]
D A non digit, [^0-9]
s A whitespace character [^s]
S Any non whitespace character
w Word character [a-zA-Z_0-9]
W A non word character
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PredefinedCharacterClassDemo {
    public static void main(String[] args) {
        // Define regex that will search a whitespace followed by f
        // and two any characters.
        String regex = "\\sf..";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text " fox" found at 15 to 19.

How do I do boundary matching in regex?

If you want to find the occurrence of a pattern in more precise position, for example at the beginning or the end of line, you can use boundary matcher. Boundary matcher are special sequences in a regular expression when you want to match a particular boundary.

Here are the list:

Matcher Matches
^ The beginning of line
$ The end of line
b A word boundary
B A non word boundary
A The beginning of the input
G The end of previous match
Z The end of the input but for the final terminator, if any
z The end of the input

Some examples:

  • ^Java will find the word Java at the beginning of any line.
  • Java$ will find the word Java at the end of any line.
  • \bJ..a\b will find the word beginning with 'J' and ending with 'a'.
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class BoundaryMatcherDemo {
    public static void main(String[] args) {
        // Define regex to find the word "dog" at the end of the line.
        String regex = "dog$";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text "dog" found at 40 to 43.

How do I use logical OR operator in regex?

You can use the | operator (logical OR) to match characters or expression of either the left or right of the | operator. For example the (t|T) will match either t or T from the input string.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LogicalOrRegexDemo {
    public static void main(String[] args) {
        // Define regex that will search characters 't' or 'T'
        String regex = "(t|T)";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        String input = "The quick brown fox jumps over the lazy dog";
        Matcher matcher = pattern.matcher(input);

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The program print the following result:

Text "T" found at 0 to 1.
Text "t" found at 31 to 32.

How do I write union character class regex?

To create a single character class comprised of two or more separate character classes use unions. To create a union, simply nest one class inside the other, such as [0-3[7-9]].

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassUnionDemo {
    public static void main(String[] args) {
        // Defines regex that matches the number 0, 1, 2, 3, 7, 8, 9
        String regex = "[0-3[7-9]]";
        String input = "0123456789";

        // Compiles the given regular expression into a pattern.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(input);

        // Find matches and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
        }
    }
}

Here is the result of the program:

Text "0" found at 0 to 1.
Text "1" found at 1 to 2.
Text "2" found at 2 to 3.
Text "3" found at 3 to 4.
Text "7" found at 7 to 8.
Text "8" found at 8 to 9.
Text "9" found at 9 to 10.

How do I write character class intersection regex?

You can use the && operator to combine classes that define a sets of characters. It will only match characters common to both classes (intersection).

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassIntersectionDemo {
    public static void main(String[] args) {
        // Define regex that will search characters from 'a' to 'z'
        // and is a 'c' or 'a' or 't' character.
        String regex = "[a-z&&[cat]]";

        // Compiles the given regular expression into a pattern.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The program print the following result:

Text "c" found at 7 to 8.
Text "t" found at 31 to 32.
Text "a" found at 36 to 37.