How do I sort files and directories based on their size?

In this example, you will learn how to sort files and directories based on their size. Using the Apache Commons IO, we can utilize the SizeFileComparator class. This class provides some instances to sort file size such as:

Comparator Description
SIZE_COMPARATOR Size comparator instance – directories are treated as zero size
SIZE_REVERSE Reverse size comparator instance – directories are treated as zero size
SIZE_SUMDIR_COMPARATOR Size comparator instance which sums the size of a directory’s contents
SIZE_SUMDIR_REVERSE Reverse size comparator instance which sums the size of a directory’s contents

Now let’s jump to the code snippet below:

package org.kodejava.commons.io;

import org.apache.commons.io.FileUtils;

import java.io.File;
import java.util.Arrays;

import static org.apache.commons.io.comparator.SizeFileComparator.*;

public class FileSortBySize {
    public static void main(String[] args) {
        File dir = new File(".");
        File[] files = dir.listFiles();

        if (files != null) {
            // Sort files in ascending order based on file size.
            System.out.println("Ascending order.");
            Arrays.sort(files, SIZE_COMPARATOR);
            FileSortBySize.displayFileOrder(files, false);

            // Sort files in descending order based on file size
            System.out.println("Descending order.");
            Arrays.sort(files, SIZE_REVERSE);
            FileSortBySize.displayFileOrder(files, false);

            // Sort files in ascending order based on file / directory
            // size
            System.out.println("Ascending order with directories.");
            Arrays.sort(files, SIZE_SUMDIR_COMPARATOR);
            FileSortBySize.displayFileOrder(files, true);

            // Sort files in descending order based on file / directory
            // size
            System.out.println("Descending order with directories.");
            Arrays.sort(files, SIZE_SUMDIR_REVERSE);
            FileSortBySize.displayFileOrder(files, true);
        }
    }

    private static void displayFileOrder(File[] files, boolean displayDirectory) {
        for (File file : files) {
            if (!file.isDirectory()) {
                System.out.printf("%-25s - %s%n", file.getName(),
                        FileUtils.byteCountToDisplaySize(file.length()));
            } else if (displayDirectory) {
                long size = FileUtils.sizeOfDirectory(file);
                String friendlySize = FileUtils.byteCountToDisplaySize(size);
                System.out.printf("%-25s - %s%n", file.getName(),
                        friendlySize);
            }
        }
        System.out.println("------------------------------------");
    }
}

In the code snippet above we use a couple method from the FileUtils class such as the FileUtils.sizeOfDirectory() to calculate the size of a directory and FileUtils.byteCountToDisplaySize() to create human-readable file size.

The result of the code snippet:

Ascending order.
.editorconfig             - 389 bytes
kodejava.iml              - 868 bytes
pom.xml                   - 1 KB
------------------------------------
Descending order.
pom.xml                   - 1 KB
kodejava.iml              - 868 bytes
.editorconfig             - 389 bytes
------------------------------------
Ascending order with directories.
.editorconfig             - 389 bytes
src                       - 851 bytes
kodejava.iml              - 868 bytes
pom.xml                   - 1 KB
apache-commons-example    - 8 KB
hibernate-example         - 29 KB
.idea                     - 85 KB
------------------------------------
Descending order with directories.
.idea                     - 85 KB
hibernate-example         - 29 KB
apache-commons-example    - 8 KB
pom.xml                   - 1 KB
kodejava.iml              - 868 bytes
src                       - 851 bytes
.editorconfig             - 389 bytes

Maven Dependencies

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.16.1</version>
</dependency>

Maven Central

How do I read text file content line by line to a List of Strings using Commons IO?

The following example show how to use the Apache Commons IO library to read a text file line by line to a List of String. In the code snippet below we will read the contents of a file called sample.txt using FileUtils class. We use FileUtils.readLines() method to read the contents line by line and return the result as a List of Strings.

package org.kodejava.commons.io;

import org.apache.commons.io.FileUtils;

import java.io.File;
import java.io.IOException;
import java.util.List;

public class ReadFileToListSample {
    public static void main(String[] args) {
        // Create a file object of sample.txt
        File file = new File("README.md");

        try {
            List<String> contents = FileUtils.readLines(file, "UTF-8");

            // Iterate the result to print each line of the file.
            for (String line : contents) {
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Maven Dependencies

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.16.1</version>
</dependency>

Maven Central

Why do I get NotOLE2FileException: Invalid header signature error?

When I was creating the code snippet for the How do I replace text in Microsoft Word document using Apache POI? I got the following error when running the snippet . As additional information, the code snippet for the Kodejava website is written as a Maven project.

org.apache.poi.poifs.filesystem.NotOLE2FileException: Invalid header signature; read 0xE011BDBFEFBDBFEF, expected 0xE11AB1A1E011CFD0 – Your file appears not to be a valid OLE2 document

This error message was produce when the code trying to open a Word document. The Apache POI reports that the Word document has an invalid header signature, not a valid OLE2 document. Comparing the original document with the document under maven’s target directory I found out that the file size was different. This could mean that something alter the document and corrupt the header.

After doing some googling, I found out that this error is due to maven resource filtering. The maven resource filtering process cause the Word document header corrupt during the copy phase. The solution to this problem is to make sure that the filtering process is set to false. The pom.xml of the maven project should be altered to have the following configuration.

<build>
    <resources>
        <resource>
            <directory>src/main/resources</directory>
            <filtering>false</filtering>
        </resource>
    </resources>
</build>

How do I replace text in Microsoft Word document using Apache POI?

The code snippet below show you how you can replace string in Microsoft Word document using the Apache POI library. The class below have three method, the openDocument(), saveDocument() and replaceText().

The routine for replacing text is implemented in the replaceText() method. This method take the HWPFDocument, the String to find and the String to replace it as parameters. The openDocument() opens the Word document. When the text replacement is done the Word document will be saved by the saveDocument() method.

And here is the complete code snippet. It will replace every dolor texts into d0l0r texts in the source document, the lipsum.doc, and save the result in a new document called new-lipsum.doc.

package org.kodejava.poi;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.CharacterRun;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Section;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;

public class WordReplaceText {
    private static final String SOURCE_FILE = "lipsum.doc";
    private static final String OUTPUT_FILE = "new-lipsum.doc";

    public static void main(String[] args) throws Exception {
        WordReplaceText instance = new WordReplaceText();
        try (HWPFDocument doc = instance.openDocument(SOURCE_FILE)) {
            if (doc != null) {
                HWPFDocument newDoc = instance.replaceText(doc, "dolor", "d0l0r");
                instance.saveDocument(newDoc, OUTPUT_FILE);
            }
        }
    }

    private HWPFDocument replaceText(HWPFDocument doc, String findText, String replaceText) {
        Range range = doc.getRange();
        for (int numSec = 0; numSec < range.numSections(); ++numSec) {
            Section sec = range.getSection(numSec);
            for (int numPara = 0; numPara < sec.numParagraphs(); numPara++) {
                Paragraph para = sec.getParagraph(numPara);
                for (int numCharRun = 0; numCharRun < para.numCharacterRuns(); numCharRun++) {
                    CharacterRun charRun = para.getCharacterRun(numCharRun);
                    String text = charRun.text();
                    if (text.contains(findText)) {
                        charRun.replaceText(findText, replaceText);
                    }
                }
            }
        }
        return doc;
    }

    private HWPFDocument openDocument(String file) throws Exception {
        URL res = getClass().getClassLoader().getResource(file);
        HWPFDocument document = null;
        if (res != null) {
            document = new HWPFDocument(new POIFSFileSystem(
                    new File(res.getPath())));
        }
        return document;
    }

    private void saveDocument(HWPFDocument doc, String file) {
        try (FileOutputStream out = new FileOutputStream(file)) {
            doc.write(out);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Maven Dependencies

<dependencies>
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi</artifactId>
        <version>5.2.5</version>
    </dependency>
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-scratchpad</artifactId>
        <version>5.2.5</version>
    </dependency>
</dependencies>

Maven Central Maven Central

Using DigestUtils.sha1hex() method to generate SHA-1 digest

In this example you’ll learn how to generate an SHA-1 digest using the Apache Commons Codec DigestUtils class. In the last two examples you’ve already seen how to generate the MD5 digest using the same library. Compared to the MD5 version the SHA-1 digest is known to be stronger to brute force attacks, but it is slower to generate. The SHA-1 produces a 160 bit (20 byte) message digest while the MD5 produces only a 128 bit message digest (16 byte).

In the code snippet below we demonstrate three different ways to use the DigestUtils.sha1Hex() method. In the first method in the example, the byteDigest(), we calculate the digest from an array of byte data. Followed by the second method, the inputStreamDigest() where we calculate the digest of an InputStream object. And on the last method we call the overload version of the sha1Hex() method to calculate the digest of a string.

Let’s see the full code snippet.

package org.kodejava.commons.codec;

import org.apache.commons.codec.digest.DigestUtils;

import java.io.*;
import java.nio.charset.StandardCharsets;

public class SHAHashDemo {
    public static void main(String[] args) {
        SHAHashDemo demo = new SHAHashDemo();
        demo.byteDigest();
        demo.inputStreamDigest();
        demo.stringDigest();
    }

    /**
     * Calculates SHA-1 digest from byte array.
     */
    private void byteDigest() {
        System.out.println("SHAHashDemo.byteDigest");
        byte[] data = "The quick brown fox jumps over the lazy dog."
                .getBytes(StandardCharsets.UTF_8);
        String digest = DigestUtils.sha1Hex(data);
        System.out.println("Digest          = " + digest);
        System.out.println("Digest.length() = " + digest.length());
    }

    /**
     * Calculates SHA-1 digest of InputStream object.
     */
    private void inputStreamDigest() {
        System.out.println("SHAHashDemo.inputStreamDigest");
        String data = System.getProperty("user.dir") + "/data.txt";
        File file = new File(data);
        try (InputStream is = new FileInputStream(file)) {
            String digest = DigestUtils.sha1Hex(is);
            System.out.println("Digest          = " + digest);
            System.out.println("Digest.length() = " + digest.length());
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * Calculate SHA-1 digest of a string / text.
     */
    private void stringDigest() {
        System.out.println("SHAHashDemo.stringDigest");
        String data = "This is just a simple data message for SHA digest demo.";
        String digest = DigestUtils.sha1Hex(data);
        System.out.println("Digest          = " + digest);
        System.out.println("Digest.length() = " + digest.length());
    }
}

When you run the code it will output the following result:

SHAHashDemo.byteDigest
Digest          = 408d94384216f890ff7a0c3528e8bed1e0b01621
Digest.length() = 40
SHAHashDemo.inputStreamDigest
Digest          = da39a3ee5e6b4b0d3255bfef95601890afd80709
Digest.length() = 40
SHAHashDemo.stringDigest
Digest          = 4290d13ca383c2159c442d75355d83e310a2ea15
Digest.length() = 40

Maven Dependencies

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.16.0</version>
</dependency>

Maven Central