Matcher - Learn Java by Examples

How do I count the number of capturing groups?

By Wayan in Regex Last modified: May 31, 2023 0 Comment

Capturing groups are numbered by counting the opening parentheses from left to right. To find out how many groups are present in the expression, call the groupCount() method on a matcher object. The groupCount() method returns an int showing the number of capturing groups present in the matcher’s pattern.

package org.kodejava.example.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CountingGroupDemo {
    public static void main(String[] args) {
        // Define regex to find the word 'quick' or 'lazy' or 'dog'
        String regex = "(quick)|(lazy)|(dog)";
        String text = "the quick brown fox jumps over the lazy dog";

        // Obtain the required matcher
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        int groupCount = matcher.groupCount();
        System.out.println("Number of group = " + groupCount);

        // Find every match and print it
        while (matcher.find()) {
            for (int i = 0; i <= groupCount; i++) {
                // Group i substring
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }
    }
}

The result of the program:

Number of group = 3
Group 0: quick
Group 1: quick
Group 2: null
Group 3: null
Group 0: lazy
Group 1: null
Group 2: lazy
Group 3: null
Group 0: dog
Group 1: null
Group 2: null
Group 3: dog

How do I compile character classes with quantifier?

By Wayan in Regex Last modified: May 31, 2023 0 Comment

This example show you how to attach quantifier to character classes or capturing group in regular expressions.

package org.kodejava.example.regex;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class CombineWithQuantifier {
    public static void main(String[] args) {
        // [abc]{3} --> apply quantifier in character class.
        // Find 'a' or 'b' or 'c', three times in a row.
        //
        // (abc){3} --> apply quantifier in capturing group.
        // Find 'abc', three times in a row.
        //
        // abc{3} --> apply quantifier in character class.
        // Find character 'c', three times in a row.
        String[] regexs = {"[abc]{3}", "(abc){3}", "abc{3}"};
        String text = "abcabcabcabcaba";

        for (String regex : regexs) {
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(text);

            // Find every match and print it
            System.out.format("Regex:  %s %n", regex);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
            }
            System.out.println("------------------------------");
        }
    }
}

This program will print the following output:

Regex:  [abc]{3} 
Text "abc" found at 0 to 3.
Text "abc" found at 3 to 6.
Text "abc" found at 6 to 9.
Text "abc" found at 9 to 12.
Text "aba" found at 12 to 15.
------------------------------
Regex:  (abc){3} 
Text "abcabcabc" found at 0 to 9.
------------------------------
Regex:  abc{3} 
------------------------------

How do I use reluctant quantifier regex?

By Wayan in Regex Last modified: May 31, 2023 2 Comments

The reluctant quantifiers start the matcher at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ReluctantQuantifierDemo {
    public static void main(String[] args) {
        String[] expressions =
                {"x??", "x*?", "x+?", "x{2}?", "x{2,}?", "x{2,5}?"};
        String input = "xxxxxxx";

        for (String expression : expressions) {
            Pattern pattern = Pattern.compile(expression);
            Matcher matcher = pattern.matcher(input);

            // Find every match and print it
            System.out.println("------------------------------");
            System.out.format("regex:  %s %n", expression);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d%n",
                        matcher.group(), matcher.start(),
                        matcher.end());
            }
        }
    }
}

The results of the snippet shown below:

regex:  x?? 
Text "" found at 0 to 0
Text "" found at 1 to 1
Text "" found at 2 to 2
Text "" found at 3 to 3
Text "" found at 4 to 4
Text "" found at 5 to 5
Text "" found at 6 to 6
Text "" found at 7 to 7
------------------------------
regex:  x*? 
Text "" found at 0 to 0
Text "" found at 1 to 1
Text "" found at 2 to 2
Text "" found at 3 to 3
Text "" found at 4 to 4
Text "" found at 5 to 5
Text "" found at 6 to 6
Text "" found at 7 to 7
------------------------------
regex:  x+? 
Text "x" found at 0 to 1
Text "x" found at 1 to 2
Text "x" found at 2 to 3
Text "x" found at 3 to 4
Text "x" found at 4 to 5
Text "x" found at 5 to 6
Text "x" found at 6 to 7
------------------------------
regex:  x{2}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
------------------------------
regex:  x{2,}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
------------------------------
regex:  x{2,5}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6

How do I use possessive quantifier regex?

By Wayan in Regex Last modified: May 31, 2023 0 Comment

The possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PossessiveQuantifierDemo {
    public static void main(String[] args) {
        String[] regexs = {
                "x?+",
                "x*+",
                "x++",
                "x{2}+",
                "x{2,}+",
                "x{2,5}+"
        };
        String input = "xxxxxxx";

        for (String r : regexs) {
            Pattern pattern = Pattern.compile(r);
            Matcher matcher = pattern.matcher(input);

            // Find every match and print it
            System.out.format("Regex:  %s%n", r);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d.%n",
                        matcher.group(), matcher.start(),
                        matcher.end());
            }
            System.out.println("------------------------------");
        }
    }
}

The output of the code snippet above are:

Regex:  x?+
Text "x" found at 0 to 1.
Text "x" found at 1 to 2.
Text "x" found at 2 to 3.
Text "x" found at 3 to 4.
Text "x" found at 4 to 5.
Text "x" found at 5 to 6.
Text "x" found at 6 to 7.
Text "" found at 7 to 7.
------------------------------
Regex:  x*+
Text "xxxxxxx" found at 0 to 7.
Text "" found at 7 to 7.
------------------------------
Regex:  x++
Text "xxxxxxx" found at 0 to 7.
------------------------------
Regex:  x{2}+
Text "xx" found at 0 to 2.
Text "xx" found at 2 to 4.
Text "xx" found at 4 to 6.
------------------------------
Regex:  x{2,}+
Text "xxxxxxx" found at 0 to 7.
------------------------------
Regex:  x{2,5}+
Text "xxxxx" found at 0 to 5.
Text "xx" found at 5 to 7.
------------------------------

How do I write embedded flag expression?

By Wayan in Regex Last modified: May 31, 2023 0 Comment

It’s also possible to enable various flags using embedded flag expressions. Embedded flag expressions are an alternative to the two-argument version of compile, and are specified in the regular expression itself. The example below is use (?i) flag expression to enable case-insensitive matching.

Another flag expressions are listed below:

(?x), equivalent with Pattern.COMMENTS
(?m), equivalent with Pattern.MULTILINE
(?s), equivalent with Pattern.DOTTAL
(?u), equivalent with Pattern.UNICODE_CASE
(?d), equivalent with Pattern.UNIX_LINES

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class EmbeddedFlagDemo {
    public static void main(String[] args) {
        // Define regex which starting with (?i) to enable
        // case-insensitive matching
        String regex = "(?i)the";
        String text = "The quick brown fox jumps over the lazy dog";

        // Obtain the required matcher
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
        }
    }
}

The result of the program is:

Text "The" found at 0 to 3.
Text "the" found at 31 to 34.