What is the purpose of String.strip() method of Java 11?

The purpose of the String.strip() method in Java 11 is to remove whitespaces from both the beginning and end of a string. This is very similar to the String.trim() method available in earlier versions of Java, but there is a key difference between them.

Here’s the difference:

  • String.strip(): Introduced in Java 11, strip() uses the unicode definition of whitespace. It removes not only space characters but also all other types of unicode-defined spaces, such as the thin space \u2009, etc.
  • String.trim(): Available from Java 1.0, trim() is more limited. It considers a whitespace to be any character whose ASCII value is less than or equal to 32 (a space, tab, newline, and a few other control characters).

Here are examples of how they work:

package org.kodejava.lang;

public class StringStripExample {
    public static void main(String[] args) {
        // String.strip()
        String first = " \u2009Hello  ";
        System.out.println(first.strip()); // Outputs "Hello"

        // String.trim()
        String second = " \u2009Hello  ";
        System.out.println(second.trim()); // Outputs "\u2009Hello"
    }
}

Output:

Hello
 Hello

Thus, strip() method is more comprehensive in removing different types of whitespace defined in Unicode, while trim() only removes ASCII control characters and spaces.

There are also String.stripLeading() and String.stripTrailing() methods that were introduced in Java 11, and they are similar to the strip() method, but they only remove the whitespace characters from either the beginning or the end of the string, respectively.

Here is what they do:

  • String.stripLeading(): This method removes any leading whitespace from the string. “Leading” in this context means any whitespace characters at the beginning of the string.
  • String.stripTrailing(): This method removes any trailing whitespace from the string. “Trailing” in this context means any whitespace characters at the end of the string.

Both stripLeading() and stripTrailing() use the Unicode definition of whitespace, the same as strip() method.

Here are examples of how they work:

package org.kodejava.lang;

public class StringStripLeadingTrailingExample {
    public static void main(String[] args) {
        // Strip leading whitespace
        String first = " \u2009Hello World  ";
        System.out.println(first.stripLeading());  // Outputs "Hello World  "

        // Strip trailing whitespace
        String second = " \u2009Hello World  ";
        System.out.println(second.stripTrailing()); // Outputs " \u2009Hello World"
    }
}

Output:

Hello World  
  Hello World

As demonstrated, stripLeading() removed the whitespace characters from the front of the string, and stripTrailing() removed the whitespace characters from the end of the string.

While \u00A0 is technically a type of whitespace (specifically, a non-breaking space or NBSP), it isn’t considered as such by the strip(), stripLeading(), and stripTrailing() methods, which follow the Character.isWhitespace(char) method’s definition of what constitutes a whitespace character.

According to the Java documentation, the Character.isWhitespace(char) method, which the strip() methods use, considers the following characters as whitespace:

  • ‘\t’ U+0009 HORIZONTAL TABULATION
  • ‘\n’ U+000A LINE FEED
  • ‘\u000B’ U+000B VERTICAL TABULATION
  • ‘\f’ U+000C FORM FEED
  • ‘\r’ U+000D CARRIAGE RETURN
  • ‘\u001C’ U+001C FILE SEPARATOR
  • ‘\u001D’ U+001D GROUP SEPARATOR
  • ‘\u001E’ U+001E RECORD SEPARATOR
  • ‘\u001F’ U+001F UNIT SEPARATOR
  • SPACE_SEPARATOR category types

The \u2009 (thin space) and \u0020 (space) are part of SPACE_SEPARATOR category according to Unicode standard and will be correctly stripped.

The \u00A0 (non-breaking space) is actually part of a different category called the NO-BREAK_SPACE and is not considered whitespace by Character.isWhitespace(char), so it won’t be stripped.

Wayan

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.