How do I check if a character is a whitespace in Java?

Whitespace characters in Java (or programming in general) aren’t just the space ' ' character. It also includes other characters that create some form of space or break in the text. The most common ones include:

  • space ' '
  • tab '\t'
  • newline '\n'
  • carriage return '\r'
  • form feed '\f'.

All these characters fall into the category of whitespace characters.

Now, if we want to check if a character in Java is one of these whitespace characters, we can make use of the built-in method Character.isWhitespace(char ch). Character is a class in Java that provides a number of useful class (i.e., static) methods for working with characters. And the isWhitespace() method is one of them which checks if the provided character is a whitespace character.

Here is a simple code snippet:

package org.kodejava.lang;

public class CharacterIsWhitespace {
    public static void main(String[] args) {
        char ch = ' ';

        if (Character.isWhitespace(ch)) {
            System.out.println(ch + " is a whitespace character.");
        } else {
            System.out.println(ch + " is not a whitespace character.");
        }
    }
}

This code first defines a character ch and then uses Character.isWhitespace(ch) to check if it is a whitespace character. The isWhitespace() method returns true if the given character is a space, new line, tab, or other whitespace characters, false otherwise.

Here’s a little more expansive example:

package org.kodejava.lang;

import java.util.Arrays;
import java.util.List;

public class CharacterIsWhitespaceDemo {
    public static void main(String[] args) {
        List<Character> characters = Arrays.asList(' ', '\t', '\n', '\r', '\f', 'a', '1');
        for (char ch : characters) {
            if (Character.isWhitespace(ch)) {
                System.out.println("'" + ch + "' is a whitespace character.");
            } else {
                System.out.println("'" + ch + "' is not a whitespace character.");
            }
        }
    }
}

Output:

' ' is a whitespace character.
'   ' is a whitespace character.
'
' is a whitespace character.
' is a whitespace character.
'' is a whitespace character.
'a' is not a whitespace character.
'1' is not a whitespace character.

In this code snippet, we are checking and outputting whether each character in a list of characters is a whitespace character or not. The list includes a space, a tab, newline, carriage return, form feed, an alphabetic character, and a digit. The isWhitespace() method identifies correctly which ones are the whitespace characters.

The Character.isWhitespace(char ch) method in Java also considers Unicode whitespace. It checks for whitespace according to the Unicode standard. The method considers a character as a whitespace if and only if it is a Unicode space separator (category “Zs”), or if it is one of the following explicit characters:

  • U+0009, HORIZONTAL TABULATION (‘\t’)
  • U+000A, LINE FEED (‘\n’)
  • U+000B, VERTICAL TABULATION
  • U+000C, FORM FEED (‘\f’)
  • U+000D, CARRIAGE RETURN (‘\r’)

Here is an example of checking Unicode whitespace:

package org.kodejava.lang;

public class CharacterIsWhitespaceUnicode {
    public static void main(String[] args) {
        char ch = '\u2003';  // EM SPACE

        if (Character.isWhitespace(ch)) {
            System.out.println("Character '" + ch + "' (\\u2003) is a whitespace character.");
        } else {
            System.out.println("Character '" + ch + "' (\\u2003) is not a whitespace character.");
        }
    }
}

Output:

Character ' ' (\u2003) is a whitespace character.

In this example, \u2003 is a Unicode representation of the “EM SPACE” character, which is a type of space character in the Unicode standard. The isWhitespace() method correctly identifies it as a whitespace character.

How do I compile character classes with quantifier?

This example show you how to attach quantifier to character classes or capturing group in regular expressions.

package org.kodejava.example.regex;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class CombineWithQuantifier {
    public static void main(String[] args) {
        // [abc]{3} --> apply quantifier in character class.
        // Find 'a' or 'b' or 'c', three times in a row.
        //
        // (abc){3} --> apply quantifier in capturing group.
        // Find 'abc', three times in a row.
        //
        // abc{3} --> apply quantifier in character class.
        // Find character 'c', three times in a row.
        String[] regexs = {"[abc]{3}", "(abc){3}", "abc{3}"};
        String text = "abcabcabcabcaba";

        for (String regex : regexs) {
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(text);

            // Find every match and print it
            System.out.format("Regex:  %s %n", regex);
            while (matcher.find()) {
                System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
            }
            System.out.println("------------------------------");
        }
    }
}

This program will print the following output:

Regex:  [abc]{3} 
Text "abc" found at 0 to 3.
Text "abc" found at 3 to 6.
Text "abc" found at 6 to 9.
Text "abc" found at 9 to 12.
Text "aba" found at 12 to 15.
------------------------------
Regex:  (abc){3} 
Text "abcabcabc" found at 0 to 9.
------------------------------
Regex:  abc{3} 
------------------------------

How do I use predefined character classes regex?

In regex, you also have a number of predefined character classes that provide you with a shorthand notation for commonly used sets of characters.

Here are the list:

Predefined Class Matches
. Any character
d Any digit, shorthand for [0-9]
D A non digit, [^0-9]
s A whitespace character [^s]
S Any non whitespace character
w Word character [a-zA-Z_0-9]
W A non word character
package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PredefinedCharacterClassDemo {
    public static void main(String[] args) {
        // Define regex that will search a whitespace followed by f
        // and two any characters.
        String regex = "\\sf..";

        // Compiles the pattern and obtains the matcher object.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

This program output the following result:

Text " fox" found at 15 to 19.

How do I write union character class regex?

To create a single character class comprised of two or more separate character classes use unions. To create a union, simply nest one class inside the other, such as [0-3[7-9]].

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassUnionDemo {
    public static void main(String[] args) {
        // Defines regex that matches the number 0, 1, 2, 3, 7, 8, 9
        String regex = "[0-3[7-9]]";
        String input = "0123456789";

        // Compiles the given regular expression into a pattern.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(input);

        // Find matches and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(),
                    matcher.end());
        }
    }
}

Here is the result of the program:

Text "0" found at 0 to 1.
Text "1" found at 1 to 2.
Text "2" found at 2 to 3.
Text "3" found at 3 to 4.
Text "7" found at 7 to 8.
Text "8" found at 8 to 9.
Text "9" found at 9 to 10.

How do I write character class intersection regex?

You can use the && operator to combine classes that define a sets of characters. It will only match characters common to both classes (intersection).

package org.kodejava.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CharacterClassIntersectionDemo {
    public static void main(String[] args) {
        // Define regex that will search characters from 'a' to 'z'
        // and is a 'c' or 'a' or 't' character.
        String regex = "[a-z&&[cat]]";

        // Compiles the given regular expression into a pattern.
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(
                "The quick brown fox jumps over the lazy dog");

        // Find every match and print it
        while (matcher.find()) {
            System.out.format("Text \"%s\" found at %d to %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
        }
    }
}

The program print the following result:

Text "c" found at 7 to 8.
Text "t" found at 31 to 32.
Text "a" found at 36 to 37.