Why do I get NotOLE2FileException: Invalid header signature error?

When I was creating the code snippet for the How do I replace text in Microsoft Word document using Apache POI? I got the following error when running the snippet . As an additional information, the code snippet for the Kodejava website is written as a Maven project.

org.apache.poi.poifs.filesystem.NotOLE2FileException: Invalid header signature; read 0xE011BDBFEFBDBFEF, expected 0xE11AB1A1E011CFD0 – Your file appears not to be a valid OLE2 document

This error message was produce when the code trying to open a Word document. The Apache POI reports that the Word document has an invalid header signature, not a valid OLE2 document. Comparing the original document with the document under maven’s target directory I found out that the file size was different. This could mean that something alter the document and corrupt the header.

After doing some googling I find out that this error is due to maven resource filtering. The maven resource filtering process cause the Word document header corrupt during the copy phase. The solution to this problem is to make sure that the filtering process is set to false. The pom.xml of the maven project should be altered to have the following configuration.

<build>
    <resources>
        <resource>
            <directory>src/main/resources</directory>
            <filtering>false</filtering>
        </resource>
    </resources>
</build>

How do I replace text in Microsoft Word document using Apache POI?

The code snippet below show you how you can replace string in Microsoft Word document using the Apache POI library. The class below have three method, the openDocument(), saveDocument() and replaceText().

The routine for replacing text is implemented in the replaceText() method. This method take the HWPFDocument, the String to find and the String to replace it as parameters. The openDocument() opens the Word document. When the text replacement is done the Word document will be saved by the saveDocument() method.

This code snippet requires the following library dependency to work:

Maven Dependencies

<!-- http://repo1.maven.org/maven2/org/apache/poi/poi/3.15/poi-3.15.jar -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>3.15</version>
</dependency>
<!-- http://repo1.maven.org/maven2/org/apache/poi/poi-scratchpad/3.15/poi-scratchpad-3.15.jar -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>3.15</version>
</dependency>

And here is the complete code snippet. It will replace every o characters with 0 character in the source document, the lipsum.doc and save the result in a new document called new-lipsum.doc.

package org.kodejava.example.poi;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.CharacterRun;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Section;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;

public class WordReplaceText {
    public static final String SOURCE_FILE = "lipsum.doc";
    public static final String OUTPUT_FILE = "new-lipsum.doc";

    public static void main(String[] args) throws Exception {
        WordReplaceText instance = new WordReplaceText();
        HWPFDocument doc = instance.openDocument(SOURCE_FILE);
        if (doc != null) {
            doc = instance.replaceText(doc, "o", "0");
            instance.saveDocument(doc, OUTPUT_FILE);
        }
    }

    private HWPFDocument replaceText(HWPFDocument doc, String findText, String replaceText) {
        Range r = doc.getRange();
        for (int i = 0; i < r.numSections(); ++i) {
            Section s = r.getSection(i);
            for (int j = 0; j < s.numParagraphs(); j++) {
                Paragraph p = s.getParagraph(j);
                for (int k = 0; k < p.numCharacterRuns(); k++) {
                    CharacterRun run = p.getCharacterRun(k);
                    String text = run.text();
                    if (text.contains(findText)) {
                        run.replaceText(findText, replaceText);
                    }
                }
            }
        }
        return doc;
    }

    private HWPFDocument openDocument(String file) throws Exception {
        URL res = getClass().getClassLoader().getResource(file);
        HWPFDocument document = null;
        if (res != null) {
            document = new HWPFDocument(new POIFSFileSystem(
                    new File(res.getPath())));
        }
        return document;
    }

    private void saveDocument(HWPFDocument doc, String file) {
        try (FileOutputStream out = new FileOutputStream(file)) {
            doc.write(out);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

How do I check if a cell contains a date value?

The code below check if the first cell of an Excel file contains a date value. In Excel the cell type for date is returned as HSSFCell.CELL_TYPE_NUMERIC, to make sure if it contains a date we can use a utility method HSSFDateUtil.isCellDateFormatted(HSSFCell cell), this method will check if the cell value is a valid date.

package org.kodejava.example.poi;

import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFDateUtil;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.CellType;

import java.io.FileInputStream;
import java.io.FileNotFoundException;

public class DateCellType {
    public static void main(String[] args) throws Exception {
        String filename = "datecelltype.xls";

        try (FileInputStream fis = new FileInputStream(filename)) {
            HSSFWorkbook workbook = new HSSFWorkbook(fis);
            HSSFSheet sheet = workbook.getSheetAt(0);

            // Read a cell the first cell on the sheet.
            HSSFCell cell = sheet.getRow(0).getCell(0);
            if (cell.getCellTypeEnum() == CellType.NUMERIC) {
                System.out.println("Cell type for date data type is numeric.");
            }

            // Using HSSFDateUtil to check if a cell contains a date.
            if (HSSFDateUtil.isCellDateFormatted(cell)) {
                System.out.println("The cell contains a date value: " + cell.getDateCellValue());
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

The example result of the code snippet:

Cell type for date data type is numeric.
The cell contains a date value: Wed Mar 01 17:36:59 WITA 2017

Maven Dependencies

<!-- http://repo1.maven.org/maven2/org/apache/poi/poi/3.15/poi-3.15.jar -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>3.15</version>
</dependency>