How do I split a string using Pattern.splitAsStream() method?

Pattern.splitAsStream() in Java is a method that allows us to split a string into a stream of substrings given a regex pattern. This can be useful when we’re working with large strings, and we want to process the substrings in a functional style using Java Streams.

Here’s a basic example of how you can use the Pattern.splitAsStream() method:

package org.kodejava.regex;

import java.util.regex.Pattern;
import java.util.stream.Stream;

public class SplitAsStreamExample {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile(",");
        String testStr = "one,two,three,four,five";
        Stream<String> words = pattern.splitAsStream(testStr);
        words.forEach(System.out::println);
    }
}

In this example, the pattern is a comma (,), and it is used to split the testStr string in the splitAsStream() call. This creates a Stream of strings. Each element in the stream is a substring of testStr that falls between the commas. The Stream.forEach() method is then used to print out each of these substrings. You can replace the comma with any regex pattern based on your requirements.

The result will be:

one
two
three
four
five

Consider a situation where we have a CSV (Comma Separated Values) file. We need to read the file, process each line and extract the fields. For this purpose, let’s use the BufferedReader, Pattern.splitAsStream(), and Java 8’s Stream API:

package org.kodejava.regex;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public class CSVProcessor {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile(",");
        String pathToFile = "country.csv";

        try (BufferedReader reader = new BufferedReader(new FileReader(pathToFile))) {
            List<String[]> records = reader.lines()
                    .map(line -> pattern.splitAsStream(line).toArray(String[]::new))
                    .toList();

            // Print each record
            records.forEach(record -> System.out.println(Arrays.toString(record)));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

In this code snippet, we are reading a file line by line using a BufferedReader. Then for each line, we are splitting it into a Stream of fields using Pattern.splitAsStream(). This produces a Stream of fields for each line of the file. We then collect the fields into a list of String arrays, with each array representing a line in the CSV (as split by commas).

Finally, we print out each record (which is a String array) to the console.

Remember that Pattern.splitAsStream() returns a stream of String and we can use it effectively in a functional programming style, and it integrates very well with the Java’s Stream API.

How do I get pattern string of a SimpleDateFormat?

To format a java.util.Date object we use the SimpleDateFormat class. To get back the string pattern that were used to format the date we can use the toPattern() method of this class.

package org.kodejava.text;

import java.text.SimpleDateFormat;
import java.util.Date;

public class SimpleDateFormatToPattern {
    public static void main(String[] args) {
        SimpleDateFormat format = new SimpleDateFormat("EEEE, dd/MM/yyyy");

        // Gets a pattern string describing this date format used by the
        // SimpleDateFormat object.
        String pattern = format.toPattern();

        System.out.println("Pattern = " + pattern);
        System.out.println("Date    = " + format.format(new Date()));
    }
}

The result of the program will be as follow:

Pattern = EEEE, dd/MM/yyyy
Date    = Tuesday, 26/10/2021

How do I count the number of capturing groups?

Capturing groups are numbered by counting the opening parentheses from left to right. To find out how many groups are present in the expression, call the groupCount() method on a matcher object. The groupCount() method returns an int showing the number of capturing groups present in the matcher’s pattern.

package org.kodejava.example.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CountingGroupDemo {
    public static void main(String[] args) {
        // Define regex to find the word 'quick' or 'lazy' or 'dog'
        String regex = "(quick)|(lazy)|(dog)";
        String text = "the quick brown fox jumps over the lazy dog";

        // Obtain the required matcher
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text);

        int groupCount = matcher.groupCount();
        System.out.println("Number of group = " + groupCount);

        // Find every match and print it
        while (matcher.find()) {
            for (int i = 0; i <= groupCount; i++) {
                // Group i substring
                System.out.println("Group " + i + ": " + matcher.group(i));
            }
        }
    }
}

The result of the program:

Number of group = 3
Group 0: quick
Group 1: quick
Group 2: null
Group 3: null
Group 0: lazy
Group 1: null
Group 2: lazy
Group 3: null
Group 0: dog
Group 1: null
Group 2: null
Group 3: dog

How do I compile character classes with quantifier?

This example show you how to attach quantifier to character classes or capturing group in regular expressions.

package org.kodejava.example.regex;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class CombineWithQuantifier {
    public static void main(String[] args) {
        // [abc]{3} --> apply quantifier in character class.
        // Find 'a' or 'b' or 'c', three times in a row.
        //
        // (abc){3} --> apply quantifier in capturing group.
        // Find 'abc', three times in a row.
        //
        // abc{3} --> apply quantifier in character class.
        // Find character 'c', three times in a row.
        String[] regexs = {"[abc]{3}", "(abc){3}", "abc{3}"};
        String text = "abcabcabcabcaba";

        for (String regex : regexs) {
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(text);

            // Find every match and print it
            System.out.format("Regex:  %s %n", regex);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
            }
            System.out.println("------------------------------");
        }
    }
}

This program will print the following output:

Regex:  [abc]{3} 
Text "abc" found at 0 to 3.
Text "abc" found at 3 to 6.
Text "abc" found at 6 to 9.
Text "abc" found at 9 to 12.
Text "aba" found at 12 to 15.
------------------------------
Regex:  (abc){3} 
Text "abcabcabc" found at 0 to 9.
------------------------------
Regex:  abc{3} 
------------------------------

How do I use reluctant quantifier regex?

The reluctant quantifiers start the matcher at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ReluctantQuantifierDemo {
    public static void main(String[] args) {
        String[] expressions =
                {"x??", "x*?", "x+?", "x{2}?", "x{2,}?", "x{2,5}?"};
        String input = "xxxxxxx";

        for (String expression : expressions) {
            Pattern pattern = Pattern.compile(expression);
            Matcher matcher = pattern.matcher(input);

            // Find every match and print it
            System.out.println("------------------------------");
            System.out.format("regex:  %s %n", expression);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d%n",
                        matcher.group(), matcher.start(),
                        matcher.end());
            }
        }
    }
}

The results of the snippet shown below:

regex:  x?? 
Text "" found at 0 to 0
Text "" found at 1 to 1
Text "" found at 2 to 2
Text "" found at 3 to 3
Text "" found at 4 to 4
Text "" found at 5 to 5
Text "" found at 6 to 6
Text "" found at 7 to 7
------------------------------
regex:  x*? 
Text "" found at 0 to 0
Text "" found at 1 to 1
Text "" found at 2 to 2
Text "" found at 3 to 3
Text "" found at 4 to 4
Text "" found at 5 to 5
Text "" found at 6 to 6
Text "" found at 7 to 7
------------------------------
regex:  x+? 
Text "x" found at 0 to 1
Text "x" found at 1 to 2
Text "x" found at 2 to 3
Text "x" found at 3 to 4
Text "x" found at 4 to 5
Text "x" found at 5 to 6
Text "x" found at 6 to 7
------------------------------
regex:  x{2}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
------------------------------
regex:  x{2,}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
------------------------------
regex:  x{2,5}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6