How do I detect non-ASCII characters in string?

The code below detect if a given string has a non ASCII characters in it. We use the CharsetDecoder class from the java.nio package to decode string to be a valid US-ASCII charset.

package org.kodejava.io;

import java.nio.charset.CharsetDecoder;
import java.nio.charset.CharacterCodingException;
import java.nio.CharBuffer;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class NonAsciiValidation {
    public static void main(String[] args) {
        // This string contains a non ASCII character which will produce exception
        // in this program. While the second string has a valid ASCII only characters.
        byte[] invalidBytes = "Copyright © 2021 Kode Java Org".getBytes();
        byte[] validBytes = "Copyright (c) 2021 Kode Java Org".getBytes();

        // Returns a charset object for the named charset.
        CharsetDecoder decoder = StandardCharsets.US_ASCII.newDecoder();
        try {
            CharBuffer buffer = decoder.decode(ByteBuffer.wrap(validBytes));
            System.out.println(Arrays.toString(buffer.array()));

            buffer = decoder.decode(ByteBuffer.wrap(invalidBytes));
            System.out.println(Arrays.toString(buffer.array()));
        } catch (CharacterCodingException e) {
            System.err.println("The information contains a non ASCII character(s).");
            e.printStackTrace();
        }
    }
}

Below is the result of the program:

[C, o, p, y, r, i, g, h, t,  , (, c, ),  , 2, 0, 2, 1,  , K, o, d, e,  , J, a, v, a,  , O, r, g]
The information contains a non ASCII character(s).
java.nio.charset.MalformedInputException: Input length = 1
    at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
    at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:820)
    at org.kodejava.io.NonAsciiValidation.main(NonAsciiValidation.java:23)
Wayan

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.