How do I get pattern string of a SimpleDateFormat?

To format a java.util.Date object we use the SimpleDateFormat class. To get back the string pattern that were used to format the date we can use the toPattern() method of this class.

package org.kodejava.example.text;

import java.text.SimpleDateFormat;
import java.util.Date;

public class SimpleDateFormatToPattern {
    public static void main(String[] args) {
        SimpleDateFormat format = new SimpleDateFormat("EEEE, dd/MM/yyyy");

        //
        // Gets a pattern string describing this date format used by the
        // SimpleDateFormat object.
        //
        String pattern = format.toPattern();

        System.out.println("Pattern = " + pattern);
        System.out.println("Date    = " + format.format(new Date()));
    }
}

The result of the program will be as follow:

Pattern = EEEE, dd/MM/yyyy
Date    = Thursday, 16/06/2011

How do I breaks a paragraph into sentences?

This example show you how to use the BreakIterator.getSentenceInstance() to breaks a paragraphs into sentences that composes the paragraph. To get the BreakIterator instance we call the getSentenceInstance() factory method and passes a locale information.

In the count(BreakIterator bi, String source) method we iterates the the break to extract sentences that composes the paragraph which value is stored in the paragraph variable.

package org.kodejava.example.text;

import java.text.BreakIterator;
import java.util.Locale;

public class BreakSentenceExample {
    public static void main(String[] args) {
        String paragraph =
                "Line boundary analysis determines where a text " +
                "string can be broken when line-wrapping. The " +
                "mechanism correctly handles punctuation and " +
                "hyphenated words. Actual line breaking needs to " +
                "also consider the available line width and is " +
                "handled by higher-level software. ";

        BreakIterator iterator =
                BreakIterator.getSentenceInstance(Locale.US);

        int sentences = count(iterator, paragraph);
        System.out.println("Number of sentences: " + sentences);
    }

    private static int count(BreakIterator bi, String source) {
        int counter = 0;
        bi.setText(source);

        int lastIndex = bi.first();
        while (lastIndex != BreakIterator.DONE) {
            int firstIndex = lastIndex;
            lastIndex = bi.next();

            if (lastIndex != BreakIterator.DONE) {
                String sentence = source.substring(firstIndex, lastIndex);
                System.out.println("sentence = " + sentence);
                counter++;
            }
        }
        return counter;
    }
}

Our program will print the following result on the console screen:

sentence = Line boundary analysis determines where a text string can be broken when line-wrapping. 
sentence = The mechanism correctly handles punctuation and hyphenated words. 
sentence = Actual line breaking needs to also consider the available line width and is handled by higher-level software. 
Number of sentences: 3

How do I breaks a text or sentence into words?

At first it might look simple. We can just split the text using the String.split(), the word is splitted using space. But what if a word ends with questions marks (?) or exclamation marks (!) instead? There might be some other rules that we also need to care.

Using the java.text.BreakIterator makes it much simpler. The class’s getWordInstance() factory method creates a BreakIterator instance for words break. Instantiating a BreakIterator and passing a locale information makes the iterator to breaks the text or sentence according the rule of the locale. This is really helpful when we are working with a complex language such as Japanese or Chinese.

Let us see an example of using the BreakIterator below.

package org.kodejava.example.text;

import java.text.BreakIterator;
import java.util.Locale;

public class BreakIteratorExample {
    public static void main(String[] args) {
        String data = "The quick brown fox jumps over the lazy dog.";
        String search = "dog";

        //
        // Gets an instance of BreakIterator for word break for the
        // given locale. We can instantiate a BreakIterator without
        // specifying the locale. The locale is important when we
        // are working with languages like Japanese or Chinese where
        // the breaks standard may be different compared to English.
        //
        BreakIterator bi = BreakIterator.getWordInstance(Locale.US);

        //
        // Set the text string to be scanned.
        //
        bi.setText(data);

        //
        // Iterates the boundary / breaks
        //
        System.out.println("Iterates each word: ");
        int count = 0;
        int lastIndex = bi.first();
        while (lastIndex != BreakIterator.DONE) {
            int firstIndex = lastIndex;
            lastIndex = bi.next();

            if (lastIndex != BreakIterator.DONE
                    && Character.isLetterOrDigit(
                    data.charAt(firstIndex))) {
                String word = data.substring(firstIndex, lastIndex);
                System.out.println("'" + word + "' found at (" +
                        firstIndex + ", " + lastIndex + ")");

                //
                // Counts how many times the word dog occurs.
                //
                if (word.equalsIgnoreCase(search)) {
                    count++;
                }
            }
        }

        System.out.println("");
        System.out.println("Number of word '" + search +
                "' found = " + count);
    }
}

Here are the program output:

Iterates each word: 
'The' found at (0, 3)
'quick' found at (4, 9)
'brown' found at (10, 15)
'fox' found at (16, 19)
'jumps' found at (20, 25)
'over' found at (26, 30)
'the' found at (31, 34)
'lazy' found at (35, 39)
'dog' found at (40, 43)

Number of word 'dog' found = 1