How do I count the number of capturing groups?

Capturing groups are numbered by counting the opening parentheses from left to right. To find out how many groups are present in the expression, call the groupCount() method on a matcher object. The groupCount() method returns an int showing the number of capturing groups present in the matcher’s pattern.

package org.kodejava.example.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CountingGroupDemo {
    public static void main(String[] args) {
        // Define regex to find the word 'quick' or 'lazy' or 'dog'
        String regex = "(quick)|(lazy)|(dog)";
        String text = "the quick brown fox jumps over the lazy dog";

        // Obtain the required matcher
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        int groupCount = matcher.groupCount();
        System.out.println("Number of group = " + groupCount);

        // Find every match and print it
        while (matcher.find()) {
            for (int i = 0; i <= groupCount; i++) {
                // Group i substring
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }
    }
}

The result of the program:

Number of group = 3
Group 0: quick
Group 1: quick
Group 2: null
Group 3: null
Group 0: lazy
Group 1: null
Group 2: lazy
Group 3: null
Group 0: dog
Group 1: null
Group 2: null
Group 3: dog

How do I compile character classes with quantifier?

This example show you how to attach quantifier to character classes or capturing group in regular expressions.

package org.kodejava.example.regex;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class CombineWithQuantifier {
    public static void main(String[] args) {
        // [abc]{3} --> apply quantifier in character class.
        // Find 'a' or 'b' or 'c', three times in a row.
        //
        // (abc){3} --> apply quantifier in capturing group.
        // Find 'abc', three times in a row.
        //
        // abc{3} --> apply quantifier in character class.
        // Find character 'c', three times in a row.
        String[] regexs = {"[abc]{3}", "(abc){3}", "abc{3}"};
        String text = "abcabcabcabcaba";

        for (String regex : regexs) {
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(text);

            // Find every match and print it
            System.out.format("Regex:  %s %n", regex);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
            }
            System.out.println("------------------------------");
        }
    }
}

This program will print the following output:

Regex:  [abc]{3} 
Text "abc" found at 0 to 3.
Text "abc" found at 3 to 6.
Text "abc" found at 6 to 9.
Text "abc" found at 9 to 12.
Text "aba" found at 12 to 15.
------------------------------
Regex:  (abc){3} 
Text "abcabcabc" found at 0 to 9.
------------------------------
Regex:  abc{3} 
------------------------------

How do I use reluctant quantifier regex?

The reluctant quantifiers start the matcher at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ReluctantQuantifierDemo {
    public static void main(String[] args) {
        String[] expressions =
                {"x??", "x*?", "x+?", "x{2}?", "x{2,}?", "x{2,5}?"};
        String input = "xxxxxxx";

        for (String expression : expressions) {
            Pattern pattern = Pattern.compile(expression);
            Matcher matcher = pattern.matcher(input);

            // Find every match and print it
            System.out.println("------------------------------");
            System.out.format("regex:  %s %n", expression);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d%n",
                        matcher.group(), matcher.start(),
                        matcher.end());
            }
        }
    }
}

The results of the snippet shown below:

regex:  x?? 
Text "" found at 0 to 0
Text "" found at 1 to 1
Text "" found at 2 to 2
Text "" found at 3 to 3
Text "" found at 4 to 4
Text "" found at 5 to 5
Text "" found at 6 to 6
Text "" found at 7 to 7
------------------------------
regex:  x*? 
Text "" found at 0 to 0
Text "" found at 1 to 1
Text "" found at 2 to 2
Text "" found at 3 to 3
Text "" found at 4 to 4
Text "" found at 5 to 5
Text "" found at 6 to 6
Text "" found at 7 to 7
------------------------------
regex:  x+? 
Text "x" found at 0 to 1
Text "x" found at 1 to 2
Text "x" found at 2 to 3
Text "x" found at 3 to 4
Text "x" found at 4 to 5
Text "x" found at 5 to 6
Text "x" found at 6 to 7
------------------------------
regex:  x{2}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
------------------------------
regex:  x{2,}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
------------------------------
regex:  x{2,5}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6

How do I use possessive quantifier regex?

The possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PossessiveQuantifierDemo {
    public static void main(String[] args) {
        String[] regexs = {
                "x?+",
                "x*+",
                "x++",
                "x{2}+",
                "x{2,}+",
                "x{2,5}+"
        };
        String input = "xxxxxxx";

        for (String r : regexs) {
            Pattern pattern = Pattern.compile(r);
            Matcher matcher = pattern.matcher(input);

            // Find every match and print it
            System.out.format("Regex:  %s%n", r);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d.%n",
                        matcher.group(), matcher.start(),
                        matcher.end());
            }
            System.out.println("------------------------------");
        }
    }
}

The output of the code snippet above are:

Regex:  x?+
Text "x" found at 0 to 1.
Text "x" found at 1 to 2.
Text "x" found at 2 to 3.
Text "x" found at 3 to 4.
Text "x" found at 4 to 5.
Text "x" found at 5 to 6.
Text "x" found at 6 to 7.
Text "" found at 7 to 7.
------------------------------
Regex:  x*+
Text "xxxxxxx" found at 0 to 7.
Text "" found at 7 to 7.
------------------------------
Regex:  x++
Text "xxxxxxx" found at 0 to 7.
------------------------------
Regex:  x{2}+
Text "xx" found at 0 to 2.
Text "xx" found at 2 to 4.
Text "xx" found at 4 to 6.
------------------------------
Regex:  x{2,}+
Text "xxxxxxx" found at 0 to 7.
------------------------------
Regex:  x{2,5}+
Text "xxxxx" found at 0 to 5.
Text "xx" found at 5 to 7.
------------------------------

How do I write embedded flag expression?

It’s also possible to enable various flags using embedded flag expressions. Embedded flag expressions are an alternative to the two-argument version of compile, and are specified in the regular expression itself. The example below is use (?i) flag expression to enable case-insensitive matching.

Another flag expressions are listed below:

  • (?x), equivalent with Pattern.COMMENTS
  • (?m), equivalent with Pattern.MULTILINE
  • (?s), equivalent with Pattern.DOTTAL
  • (?u), equivalent with Pattern.UNICODE_CASE
  • (?d), equivalent with Pattern.UNIX_LINES
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class EmbeddedFlagDemo {
    public static void main(String[] args) {
        // Define regex which starting with (?i) to enable
        // case-insensitive matching
        String regex = "(?i)the";
        String text = "The quick brown fox jumps over the lazy dog";

        // Obtain the required matcher
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
        }
    }
}

The result of the program is:

Text "The" found at 0 to 3.
Text "the" found at 31 to 34.

How do I use quantifier in regex?

A quantifier following a subsequence of a pattern determines the possibilities for how that subsequence of a pattern can repeat. Quantifiers allow you to specify the number of occurrences to match against.

Quantifiers

  • X? : X, once or not at all
  • X* : X, zero or more times
  • X+ : X, one or more times
  • X{n} : X, exactly n times
  • X{n,} : X, at least n times
  • X{n,m} : X, at least n but not more than m times
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexQuantifierDemo {
    public static void main(String[] args) {
        String[] expressions =
                {"x?", "x*", "x+", "x{2}", "x{2,}", "x{2,5}"};

        String input = "xxxxxx yyyxxxxxx zzzxxxxxx";

        for (String expression : expressions) {
            // Compiles the given regular expression into a
            // pattern and creates a matcher that will match0
            // the given input against this pattern.
            Pattern pattern = Pattern.compile(expression);
            Matcher matcher = pattern.matcher(input);

            // Find every match and print it
            System.out.format("regex:  %s %n", expression);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d%n",
                        matcher.group(), matcher.start(),
                        matcher.end());
            }
            System.out.println("------------------------------");
        }
    }
}

Here are the result of the program:

regex:  x? 
Text "x" found at 0 to 1
Text "x" found at 1 to 2
Text "x" found at 2 to 3
Text "x" found at 3 to 4
Text "x" found at 4 to 5
Text "x" found at 5 to 6
Text "" found at 6 to 6
Text "" found at 7 to 7
Text "" found at 8 to 8
Text "" found at 9 to 9
Text "x" found at 10 to 11
Text "x" found at 11 to 12
Text "x" found at 12 to 13
Text "x" found at 13 to 14
Text "x" found at 14 to 15
Text "x" found at 15 to 16
Text "" found at 16 to 16
Text "" found at 17 to 17
Text "" found at 18 to 18
Text "" found at 19 to 19
Text "x" found at 20 to 21
Text "x" found at 21 to 22
Text "x" found at 22 to 23
Text "x" found at 23 to 24
Text "x" found at 24 to 25
Text "x" found at 25 to 26
Text "" found at 26 to 26
------------------------------
regex:  x* 
Text "xxxxxx" found at 0 to 6
Text "" found at 6 to 6
Text "" found at 7 to 7
Text "" found at 8 to 8
Text "" found at 9 to 9
Text "xxxxxx" found at 10 to 16
Text "" found at 16 to 16
Text "" found at 17 to 17
Text "" found at 18 to 18
Text "" found at 19 to 19
Text "xxxxxx" found at 20 to 26
Text "" found at 26 to 26
------------------------------
regex:  x+ 
Text "xxxxxx" found at 0 to 6
Text "xxxxxx" found at 10 to 16
Text "xxxxxx" found at 20 to 26
------------------------------
regex:  x{2} 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
Text "xx" found at 10 to 12
Text "xx" found at 12 to 14
Text "xx" found at 14 to 16
Text "xx" found at 20 to 22
Text "xx" found at 22 to 24
Text "xx" found at 24 to 26
------------------------------
regex:  x{2,} 
Text "xxxxxx" found at 0 to 6
Text "xxxxxx" found at 10 to 16
Text "xxxxxx" found at 20 to 26
------------------------------
regex:  x{2,5} 
Text "xxxxx" found at 0 to 5
Text "xxxxx" found at 10 to 15
Text "xxxxx" found at 20 to 25
------------------------------

How do I use capturing groups in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters d, o and g.

Regular expressions can also define other capturing groups that correspond to parts of the pattern. Each pair of parentheses in a regular expression defines a separate capturing group in addition to the group that the whole expression defines.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CapturingGroupDemo {
    public static void main(String[] args) {
        // Define regex to find the word 'the' or 'quick'
        String regex = "(the)|(quick)";
        String text = "the quick brown fox jumps over the lazy dog";

        // Compiles the given regular expression into a pattern and
        // Creates a matcher that will match the given input against
        // this pattern.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The results of the program are:

Text "the" found at 0 to 3.
Text "quick" found at 4 to 9.
Text "the" found at 31 to 34.

How do I do boundary matching in regex?

If you want to find the occurrence of a pattern in more precise position, for example at the beginning or the end of line, you can use boundary matcher. Boundary matcher are special sequences in a regular expression when you want to match a particular boundary.

Here are the list:

Matcher Matches
^ The beginning of line
$ The end of line
b A word boundary
B A non word boundary
A The beginning of the input
G The end of previous match
Z The end of the input but for the final terminator, if any
z The end of the input

Some examples:

  • ^Java will find the word Java at the beginning of any line.
  • Java$ will find the word Java at the end of any line.
  • \bJ..a\b will find the word beginning with 'J' and ending with 'a'.
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class BoundaryMatcherDemo {
    public static void main(String[] args) {
        // Define regex to find the word "dog" at the end of the line.
        String regex = "dog$";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text "dog" found at 40 to 43.

How do I use logical OR operator in regex?

You can use the | operator (logical OR) to match characters or expression of either the left or right of the | operator. For example the (t|T) will match either t or T from the input string.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LogicalOrRegexDemo {
    public static void main(String[] args) {
        // Define regex that will search characters 't' or 'T'
        String regex = "(t|T)";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        String input = "The quick brown fox jumps over the lazy dog";
        Matcher matcher = pattern.matcher(input);

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The program print the following result:

Text "T" found at 0 to 1.
Text "t" found at 31 to 32.

How do I use predefined character classes regex?

In regex, you also have a number of predefined character classes that provide you with a shorthand notation for commonly used sets of characters.

Here are the list:

Predefined Class Matches
. Any character
d Any digit, shorthand for [0-9]
D A non digit, [^0-9]
s A whitespace character [^s]
S Any non whitespace character
w Word character [a-zA-Z_0-9]
W A non word character
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PredefinedCharacterClassDemo {
    public static void main(String[] args) {
        // Define regex that will search a whitespace followed by f
        // and two any characters.
        String regex = "\\sf..";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text " fox" found at 15 to 19.