How do I use logical OR operator in regex?

You can use the | operator (logical OR) to match characters or expression of either the left or right of the | operator. For example the (t|T) will match either t or T from the input string.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LogicalOrRegexDemo {
    public static void main(String[] args) {
        // Define regex that will search characters 't' or 'T'
        String regex = "(t|T)";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        String input = "The quick brown fox jumps over the lazy dog";
        Matcher matcher = pattern.matcher(input);

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The program print the following result:

Text "T" found at 0 to 1.
Text "t" found at 31 to 32.

How do I use predefined character classes regex?

In regex, you also have a number of predefined character classes that provide you with a shorthand notation for commonly used sets of characters.

Here are the list:

Predefined Class Matches
. Any character
d Any digit, shorthand for [0-9]
D A non digit, [^0-9]
s A whitespace character [^s]
S Any non whitespace character
w Word character [a-zA-Z_0-9]
W A non word character
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PredefinedCharacterClassDemo {
    public static void main(String[] args) {
        // Define regex that will search a whitespace followed by f
        // and two any characters.
        String regex = "\\sf..";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text " fox" found at 15 to 19.

How do I do boundary matching in regex?

If you want to find the occurrence of a pattern in more precise position, for example at the beginning or the end of line, you can use boundary matcher. Boundary matcher are special sequences in a regular expression when you want to match a particular boundary.

Here are the list:

Matcher Matches
^ The beginning of line
$ The end of line
b A word boundary
B A non word boundary
A The beginning of the input
G The end of previous match
Z The end of the input but for the final terminator, if any
z The end of the input

Some examples:

  • ^Java will find the word Java at the beginning of any line.
  • Java$ will find the word Java at the end of any line.
  • \bJ..a\b will find the word beginning with 'J' and ending with 'a'.
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class BoundaryMatcherDemo {
    public static void main(String[] args) {
        // Define regex to find the word "dog" at the end of the line.
        String regex = "dog$";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text "dog" found at 40 to 43.

How do I write character class subtraction regex?

You can use subtraction to negate one or more nested character classes. This example creates a single character class that matches everything from a to z, except the vowels (‘a’, ‘i’, ‘u’, ‘e’, ‘o’). This can be written in a subtraction pattern as [a-z&&[^aiueo]].

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassSubtractionDemo {
    public static void main(String[] args) {
        // Define regex that will search characters from 'a' to 'z'
        // and excluding vowels.
        String regex = "[a-z&&[^aiueo]]";

        // Compiles the given regular expression into a pattern and
        // Creates a matcher that will match the given input against
        // this pattern.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher("The quick brown fox.");

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

Here are the result of the program:

Text "h" found at 1 to 2.
Text "q" found at 4 to 5.
Text "c" found at 7 to 8.
Text "k" found at 8 to 9.
Text "b" found at 10 to 11.
Text "r" found at 11 to 12.
Text "w" found at 13 to 14.
Text "n" found at 14 to 15.
Text "f" found at 16 to 17.
Text "x" found at 18 to 19.

How do I write union character class regex?

To create a single character class comprised of two or more separate character classes use unions. To create a union, simply nest one class inside the other, such as [0-3[7-9]].

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassUnionDemo {
    public static void main(String[] args) {
        // Defines regex that matches the number 0, 1, 2, 3, 7, 8, 9
        String regex = "[0-3[7-9]]";
        String input = "0123456789";

        // Compiles the given regular expression into a pattern.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(input);

        // Find matches and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
        }
    }
}

Here is the result of the program:

Text "0" found at 0 to 1.
Text "1" found at 1 to 2.
Text "2" found at 2 to 3.
Text "3" found at 3 to 4.
Text "7" found at 7 to 8.
Text "8" found at 8 to 9.
Text "9" found at 9 to 10.