How do I use predefined character classes regex?

In regex, you also have a number of predefined character classes that provide you with a shorthand notation for commonly used sets of characters.

Here are the list:

  • : represent any character.
  • d : represent any digit, shorthand for [0-9]
  • D : represent a non digit, [^0-9]
  • s : represent a whitespace character [^s]
  • S : represent any non whitespace character
  • w : represent word character [a-zA-Z_0-9]
  • W : represent a non word character

 

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PredefinedCharacterClassDemo {
    public static void main(String[] args) {
        //
        // Define regex that will search a whitespace followed by f 
        // and two any characters.
        //
        String regex = "\sf..";

        //
        // Compiles the pattern and obtains the matcher object.
        //
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        //
        // find every match and print it
        //
        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text " fox" found at 15 to 19.

How do I do boundary matching in regex?

If you want to find the occurrence of a pattern in more precise position, for example at the beginning or the end of line, you can use boundary matcher. Boundary matcher are special sequences in a regular expression when you want to match a particular boundary.

Here are the list:

  • : the beginning of line
  • : the end of line
  • b : a word boundary
  • B : a non word boundary
  • A : the beginning of the input
  • G : the end of previous match
  • Z : the end of the input but for the final terminator, if any
  • z : The end of the input

 

Some examples:

  • ^Java will find the word Java at the beginning of any line
  • Java$ will find the word Java at the end of any line
  • \bJ..a\b will find the word beginning with 'J' and ending with 'a'

 

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class BoundaryMatcherDemo {
    public static void main(String[] args) {
        //
        // Define regex to find the word "dog" at the end of the 
        // line.
        //
        String regex = "dog$";

        //
        // Compiles the pattern and obtains the matcher object.
        //
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        //
        // Find every match and print it
        //
        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text "dog" found at 40 to 43.

How do I write union character class regex?

To create a single character class comprised of two or more separate character classes use unions. To create a union, simply nest one class inside the other, such as [1-3[5-7]].

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassUnionDemo {
    public static void main(String[] args) {
        //
        // Defines regex that matches the number 1, 2, 3, 5, 6, 7
        //
        String regex = "[1-3[5-7]]";
        String input = "1234567890";

        //
        // Compiles the given regular expression into a pattern.
        //
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(input);

        //
        // Find matches and print it
        //
        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                    matcher.group(), matcher.start(), 
                    matcher.end());
        }


    }
}

Here is the result of the program:

Text "1" found at 0 to 1.
Text "2" found at 1 to 2.
Text "3" found at 2 to 3.
Text "5" found at 4 to 5.
Text "6" found at 5 to 6.
Text "7" found at 6 to 7.

How do I write character class intersection regex?

You can use the && operator to combine classes that define sets of characters. It will only match characters common to both classes (intersection).

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassIntersectionDemo {
    public static void main(String[] args) {
        //
        // Define regex that will search characters from 'a' to 'z'
        // and  is a 'c' or 'a' or 't' character.
        //
        String regex = "[a-z&&[cat]]";

        //
        // Compiles the given regular expression into a pattern.
        //
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        //
        // Find every match and print it
        //
        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The program print the following result:

Text "c" found at 7 to 8.
Text "t" found at 31 to 32.
Text "a" found at 36 to 37.

How do I write character class subtraction regex?

You can use subtraction to negate one or more nested character classes.
This example creates a single character class that matches everything from a to z, except the vowels (‘a’, ‘i’, ‘u’, ‘e’, ‘o’). This can be written in a subtraction pattern as [a-z&&[^aiueo]].

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassSubtractionDemo {
    public static void main(String[] args) {
        //
        // Define regex that will search characters from 'a' to 'z'
        // and excluding vowels.
        //
        String regex = "[a-z&&[^aiueo]]";

        //
        // Compiles the given regular expression into a pattern and
        // Creates a matcher that will match the given input against
        // this pattern.
        //
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher("The quick brown fox.");

        //
        // Find every match and print it
        //
        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

Here are the result of the program:

Text "h" found at 1 to 2.
Text "q" found at 4 to 5.
Text "c" found at 7 to 8.
Text "k" found at 8 to 9.
Text "b" found at 10 to 11.
Text "r" found at 11 to 12.
Text "w" found at 13 to 14.
Text "n" found at 14 to 15.
Text "f" found at 16 to 17.
Text "x" found at 18 to 19.

How do I write negated character class regex?

A negation class is a character class that begins with a "^" metacharacter which will exclude a set of defined characters within a square brackets. For example the negation class h[^ao]t in the example below match only the word hit and exclude the words hat and hot.

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassesNegationClassDemo {
    public static void main(String[] args) {
        //
        // Defines a regular expression that will search all
        // sequences of string that begin with 'h' and end with 't'
        // and have a middle letter except those appearing to the
        // right of the ^ character within the square brackets
        // ('a' and 'o')
        //
        String regex = "h[^ao]t";

        //
        // Compiles the pattern and obtains the matcher object.
        //
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher =
                pattern.matcher("Wow, that hot hat will make a hit");

        //
        // Find every matches and prints it.
        //
        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The program output the following result:

Text "hit" found at 30 to 33.

How do I write range character class regex?

To define a character class that includes a range of values, put "-" metacharacter between the first and last character to be matched. For example [a-e]. You can also specify multiple ranges like this [a-zA-Z]. This will match any letter of the alphabet from a to z (lowercase) or A to Z (uppercase).

In the example below we are matching the word that begins with bat and ends with a single number that have a value range from 3 to 7.

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassesRangeClassDemo {
    public static void main(String[] args) {
        //
        // Defines regex that will search all sequences of string
        // that begin with bat and number which range [3-7]
        //
        String regex = "bat[3-7]";
        String input =
                "bat1, bat2, bat3, bat4, bat5, bat6, bat7, bat8";

        //
        // Compiles the given regular expression into a pattern.
        //
        Pattern pattern = Pattern.compile(regex);

        //
        // Creates a matcher that will match the given input
        // against this pattern.
        //
        Matcher matcher = pattern.matcher(input);

        //
        // Find every matches and prints it.
        //
        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
        }
    }
}

The program will match the following string from the input:

Text "bat3" found at 12 to 16.
Text "bat4" found at 18 to 22.
Text "bat5" found at 24 to 28.
Text "bat6" found at 30 to 34.
Text "bat7" found at 36 to 40.

How do I write simple character class regex?

A character class in the context of regular expression is a set of characters enclosed within a square brackets "[]". It specifies the characters that will successfully match a single character from the given input.

A simple class, the most basic form of character class, is formed simply by placing a set of characters side-by-side within square brackets. For example the regular expression b[ai]t will match the words "bit" or "bat" because the pattern defines a character class accepting either "i" or "a" as the middle character.

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassesSimpleClassDemo {
    public static void main(String[] args) {
        //
        // Creating a simple class type of character classes.
        // The regular expression below will search all sequences
        // of string that begins with 'b', ends with 't' and have
        // a middle letter of 'a' or 'i'.
        //
        String regex = "b[ai]t";

        //
        // Compiles the pattern and obtains the matcher object.
        //
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher =
                pattern.matcher("I'm a little bit afraid of bats " +
                        "but not cats.");

        //
        // Find every matches and prints it.
        //
        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The program will print the following output:

Text "bit" found at 13 to 16.
Text "bat" found at 27 to 30.

How do I match a regex pattern in case insensitive?

Finding the next subsequence of the input sequence that matches the pattern while ignoring the case of the string in regular expression can simply applied by create a pattern using compile(String regex, int flags) method and specifies a second argument with PATTERN.CASE_INSENSITIVE constant.

package org.kodejava.example.util.regex;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class IgnoreCaseDemo {
    public static void main(String[] args) {
        String sentence =
                "The quick brown fox and BROWN tiger jumps " +
                "over the lazy dog";

        Pattern pattern = Pattern.compile("brown", 
                Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(sentence);

        while (matcher.find()) {
            System.out.format("Text "%s" found at %d to %d.%n",
                matcher.group(), matcher.start(), matcher.end());
        }
    }
}

Here is the result of the program:

Text "brown" found at 10 to 15.
Text "BROWN" found at 24 to 29.

How do I determine if a string match a pattern exactly?

If you want the entire string to match your regular expression pattern you can use the Matcher.matches() method. This method will return true if and only if entire input string matches with the matcher’s pattern.

If the pattern only needs to match the beginning of the string you can use the Matcher.lookingAt() method. You can find its example on the following address How do I check if a string starts with a pattern?.

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class MatcherMatchesExample {

    public static void main(String[] args) {
        String[] inputs = {
                "blue sky",
                "blue sea",
                "blue",
                "blue lagoon"
        };

        // Creates an instance of Patter using the compile method.
        Pattern pattern = Pattern.compile("blue");

        int match = 0;
        for (String s : inputs) {
            // Creates a matcher that will match the given input
            // against this pattern.
            Matcher matcher = pattern.matcher(s);

            // Check if the input match the pattern exactly and
            // increment the match counter.
            if (matcher.matches()) {
                match++;
            }

        }

        System.out.println("Number of input matched: " + match);
    }
}

The code above will only matches one input that match exactly with the pattern (“blue”), because the other three elements of the array has another word beside blue.