This example show you how to use the BreakIterator.getSentenceInstance()
to breaks a paragraphs into sentences that composes the paragraph. To get the BreakIterator
instance we call the getSentenceInstance()
factory method and passes a locale information.
In the count(BreakIterator bi, String source)
method we iterate the break to extract sentences that composes the paragraph which value is stored in the paragraph
variable.
package org.kodejava.text;
import java.text.BreakIterator;
import java.util.Locale;
public class BreakSentenceExample {
public static void main(String[] args) {
String paragraph = """
Line boundary analysis determines where a text \
string can be broken when line-wrapping. The \
mechanism correctly handles punctuation and \
hyphenated words. Actual line breaking needs to \
also consider the available line width and is \
handled by higher-level software.
""";
BreakIterator iterator = BreakIterator.getSentenceInstance(Locale.US);
int sentences = count(iterator, paragraph);
System.out.println("Number of sentences: " + sentences);
}
private static int count(BreakIterator bi, String source) {
int counter = 0;
bi.setText(source);
int lastIndex = bi.first();
while (lastIndex != BreakIterator.DONE) {
int firstIndex = lastIndex;
lastIndex = bi.next();
if (lastIndex != BreakIterator.DONE) {
String sentence = source.substring(firstIndex, lastIndex);
System.out.println("sentence = " + sentence);
counter++;
}
}
return counter;
}
}
Our program will print the following result on the console screen:
sentence = Line boundary analysis determines where a text string can be broken when line-wrapping.
sentence = The mechanism correctly handles punctuation and hyphenated words.
sentence = Actual line breaking needs to also consider the available line width and is handled by higher-level software.
Number of sentences: 3