How to truncate a string after n number of words?

package org.kodejava.example.lang;

public class GetNumberOfWordsFromString {
    public static void main(String[] args) {
        String text = "The quick brown fox jumps over the lazy dog.";

        String one = truncateAfterWords(1, text);
        System.out.println("1 = " + one);

        String two = truncateAfterWords(2, text);
        System.out.println("2 = " + two);

        String four = truncateAfterWords(4, text);
        System.out.println("4 = " + four);
    }

    /**
     * Truncate a string after n number of words.
     *
     * @param numberOfWords number of words to truncate after.
     * @param longString the long string.
     * @return truncated long string.
     */
    public static String truncateAfterWords(int numberOfWords, String longString)     {
        return longString.replaceAll("^((?:\W*\w+){" +
                numberOfWords + "}).*$", "$1");
    }
}

The result of the snippet:

1 = The
2 = The quick
4 = The quick brown fox

How to remove non ASCII characters from a string?

The code snippet below remove the characters from a string that is not inside the range of x20 and x7E ASCII code. The regex below strips non-printable and control characters. But it also keeps the linefeed character n (x0A) and the carriage return r (x0D) characters.

package org.kodejava.example.regex;

public class ReplaceNonAscii {
    public static void main(String[] args) {
        String str = "Thè quïck brøwn føx jumps over the lãzy dôg.";
        System.out.println("str = " + str);

        // Replace all non ascii chars in the string.
        str = str.replaceAll("[^\x0A\x0D\x20-\x7E]", "");
        System.out.println("str = " + str);
    }
}

Snippet output:

str = Thè quïck brøwn føx jumps over the lãzy dôg.
str = Th quck brwn fx jumps over the lzy dg.

How do I compile character classes with quantifier?

This example show you how to attach quantifier to character classes or capturing group in regular expressions.

package org.kodejava.example.util.regex;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class CombineWithQuantifier {
    public static void main(String[] args) {
        //
        // [abc]{3} --> apply quantifier in character class.
        // Find 'a' or 'b' or 'c', three times in a row.
        //
        // (abc){3} --> apply quantifier in capturing group.
        // Find 'abc', three times in a row.
        //
        // abc{3} --> apply quantifier in character class.
        // Find character 'c', three times in a row.
        //
        String[] regexs = {"[abc]{3}", "(abc){3}", "abc{3}"};
        String text = "abcabcabcabcaba";

        for (String regex : regexs) {
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(text);

            //
            // Find every match and print it
            //
            System.out.println("------------------------------");
            System.out.format("Regex:  %s %n", regex);
            while (matcher.find()) {
                System.out.format("Text "%s" found at %d to %d.%n",
                        matcher.group(), matcher.start(),
                        matcher.end());
            }
        }
    }
}

This program will print the following output:

Regex:  [abc]{3} 
Text "abc" found at 0 to 3.
Text "abc" found at 3 to 6.
Text "abc" found at 6 to 9.
Text "abc" found at 9 to 12.
Text "aba" found at 12 to 15.
------------------------------
Regex:  (abc){3} 
Text "abcabcabc" found at 0 to 9.
------------------------------
Regex:  abc{3} 

How do I use reluctant quantifier regex?

The reluctant quantifiers start the matcher at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ReluctantQuantifierDemo {
    public static void main(String[] args) {
        String[] expressions =
                {"x??", "x*?", "x+?", "x{2}?", "x{2,}?", "x{2,5}?"};
        String input = "xxxxxxx";

        for (String expression : expressions) {
            Pattern pattern = Pattern.compile(expression);
            Matcher matcher = pattern.matcher(input);

            // Find every match and print it
            System.out.println("------------------------------");
            System.out.format("regex:  %s %n", expression);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d%n",
                        matcher.group(), matcher.start(),
                        matcher.end());
            }
        }
    }
}

The results of the snippet shown below:

regex:  x?? 
Text "" found at 0 to 0
Text "" found at 1 to 1
Text "" found at 2 to 2
Text "" found at 3 to 3
Text "" found at 4 to 4
Text "" found at 5 to 5
Text "" found at 6 to 6
Text "" found at 7 to 7
------------------------------
regex:  x*? 
Text "" found at 0 to 0
Text "" found at 1 to 1
Text "" found at 2 to 2
Text "" found at 3 to 3
Text "" found at 4 to 4
Text "" found at 5 to 5
Text "" found at 6 to 6
Text "" found at 7 to 7
------------------------------
regex:  x+? 
Text "x" found at 0 to 1
Text "x" found at 1 to 2
Text "x" found at 2 to 3
Text "x" found at 3 to 4
Text "x" found at 4 to 5
Text "x" found at 5 to 6
Text "x" found at 6 to 7
------------------------------
regex:  x{2}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
------------------------------
regex:  x{2,}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6
------------------------------
regex:  x{2,5}? 
Text "xx" found at 0 to 2
Text "xx" found at 2 to 4
Text "xx" found at 4 to 6

How do I use possessive quantifier regex?

The possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

package org.kodejava.example.util.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PossessiveQuantifierDemo {
    public static void main(String[] args) {
        String[] regexs = {
                "x?+",
                "x*+",
                "x++",
                "x{2}+",
                "x{2,}+",
                "x{2,5}+"
        };
        String input = "xxxxxxx";

        for (String r : regexs) {
            Pattern pattern = Pattern.compile(r);
            Matcher matcher = pattern.matcher(input);

            //
            // Find every match and print it
            //
            System.out.println("------------------------------");
            System.out.format("Regex:  %s %n", r);
            while (matcher.find()) {
                System.out.format("Text "%s" found at %d to %d.%n",
                        matcher.group(), matcher.start(), 
                        matcher.end());
            }
        }
    }
}