How do I breaks a paragraph into sentences?

This example show you how to use the BreakIterator.getSentenceInstance() to breaks a paragraphs into sentences that composes the paragraph. To get the BreakIterator instance we call the getSentenceInstance() factory method and passes a locale information.

In the count(BreakIterator bi, String source) method we iterates the the break to extract sentences that composes the paragraph which value is stored in the paragraph variable.

package org.kodejava.example.text;

import java.text.BreakIterator;
import java.util.Locale;

public class BreakSentenceExample {
    public static void main(String[] args) {
        String paragraph =
                "Line boundary analysis determines where a text " +
                "string can be broken when line-wrapping. The " +
                "mechanism correctly handles punctuation and " +
                "hyphenated words. Actual line breaking needs to " +
                "also consider the available line width and is " +
                "handled by higher-level software. ";

        BreakIterator iterator =
                BreakIterator.getSentenceInstance(Locale.US);

        int sentences = count(iterator, paragraph);
        System.out.println("Number of sentences: " + sentences);
    }

    private static int count(BreakIterator bi, String source) {
        int counter = 0;
        bi.setText(source);

        int lastIndex = bi.first();
        while (lastIndex != BreakIterator.DONE) {
            int firstIndex = lastIndex;
            lastIndex = bi.next();

            if (lastIndex != BreakIterator.DONE) {
                String sentence = source.substring(firstIndex, lastIndex);
                System.out.println("sentence = " + sentence);
                counter++;
            }
        }
        return counter;
    }
}

Our program will print the following result on the console screen:

sentence = Line boundary analysis determines where a text string can be broken when line-wrapping. 
sentence = The mechanism correctly handles punctuation and hyphenated words. 
sentence = Actual line breaking needs to also consider the available line width and is handled by higher-level software. 
Number of sentences: 3
Wayan Saryada

Wayan Saryada

A programmer, runner, recreational diver, currently living in the island of Bali, Indonesia. Mostly programming in Java, creating web based application with Spring Framework, JPA, etc. If you need help on Java programming you can hire me on Fiverr.
Wayan Saryada

Leave a Reply