How do I split large excel file into multiple smaller files?

A friend of mine told me that he has a large Excel file, and he asked me if I could split the file into multiple smaller Excel files. So I write this little program using Apache POI to do it.

The code snippet below basically contains the following steps:

  1. Load the Excel as an InputStream from the classpath, if the file was not found the program will exit.
  2. Create XSSFWorkbook from the input stream, and get the first XSSFSheet from the workbook.
  3. Iterate the rows of data from the source worksheet.
  4. On the first rownum for each split file we create a new workbook using SXSSFWorkbook and also create the SXSSFSheet.
  5. Read the first row from the source worksheet, store it in headerRow to be used for creating header row in each new sheet.
  6. If we are at the first row, write the header.
  7. Write each row from the source sheet to the destination sheet untuk the max rows is reached.
  8. Write the workbook into a new file.

And here is the complete code that you can try.

package org.kodejava.poi;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.streaming.SXSSFCell;
import org.apache.poi.xssf.streaming.SXSSFRow;
import org.apache.poi.xssf.streaming.SXSSFSheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.time.LocalDateTime;
import java.time.temporal.ChronoUnit;

public class SplitExcelDemo {
    public static final int MAX_ROWS_PER_FILE = 1000 - 1;
    public static final boolean WITH_HEADER = true;

    public static void main(String[] args) {
        LocalDateTime startTime = LocalDateTime.now();
        String filename = "/stock.xlsx";
        try (InputStream is = SplitExcelDemo.class.getResourceAsStream(filename)) {
            if (is == null) {
                System.out.println("Source file was not found!");
                System.exit(1);
            }

            XSSFWorkbook srcWorkbook = new XSSFWorkbook(is);
            XSSFSheet srcSheet = srcWorkbook.getSheetAt(0);
            int physicalNumberOfRows = srcSheet.getPhysicalNumberOfRows();

            int rownum = 0;
            int splitCounter = 0;

            SXSSFWorkbook destWorkbook = null;
            SXSSFSheet destSheet = null;

            XSSFRow headerRow = null;

            for (Row srcRow : srcSheet) {
                if (rownum == 0) {
                    // At the beginning let's create a new workbook and worksheet
                    destWorkbook = new SXSSFWorkbook();
                    destSheet = destWorkbook.createSheet();
                }

                if (srcRow.getRowNum() == 0 && WITH_HEADER) {
                    // Copy header row to be use in each split file
                    headerRow = (XSSFRow) srcRow;
                }

                if (rownum == 0 && WITH_HEADER) {
                    // Add row header to each split file
                    if (headerRow != null) {
                        SXSSFRow firstRow = destSheet.createRow(rownum);
                        int index = 0;
                        for (Cell cell : headerRow) {
                            SXSSFCell headerCell = firstRow.createCell(index++, cell.getCellType());
                            if (cell.getCellType() == CellType.STRING) {
                                headerCell.setCellValue(cell.getStringCellValue());
                            }
                        }
                    }
                } else {
                    // Copy rows from source worksheet into destination worksheet
                    SXSSFRow descRow = destSheet.createRow(rownum);
                    int index = 0;
                    for (Cell cell : srcRow) {
                        SXSSFCell destCell = descRow.createCell(index++, cell.getCellType());
                        switch (cell.getCellType()) {
                            case NUMERIC -> destCell.setCellValue(cell.getNumericCellValue());
                            case STRING -> destCell.setCellValue(cell.getStringCellValue());
                        }
                    }
                }

                // When a max number of rows copied are reached, or when we are at the end of worksheet, 
                // write data into a new file 
                if (rownum == MAX_ROWS_PER_FILE || srcRow.getRowNum() == physicalNumberOfRows - 1) {
                    rownum = -1;
                    String output = String.format("split-%03d.xlsx", splitCounter++);
                    System.out.println("Writing " + output);
                    try (OutputStream os = new FileOutputStream(output)) {
                        destWorkbook.write(os);
                    } catch (IOException e){
                        e.printStackTrace();
                    }
                }

                rownum = rownum + 1;
            }

            // Display processing time
            LocalDateTime endTime = LocalDateTime.now();
            long minutes = startTime.until(endTime, ChronoUnit.MINUTES);
            startTime = startTime.plusMinutes(minutes);
            long seconds = startTime.until(endTime, ChronoUnit.SECONDS);
            System.out.printf("Splitting finished in %d minutes and %d seconds %n", minutes, seconds);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The output of running this program will look like this:

Writing split-000.xlsx
Writing split-001.xlsx
Writing split-002.xlsx
Writing split-003.xlsx
Writing split-004.xlsx
Writing split-005.xlsx
Writing split-006.xlsx
Writing split-007.xlsx
Writing split-008.xlsx
Writing split-009.xlsx
Splitting finished in 0 minutes and 8 seconds 

Maven Dependencies

<dependencies>
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi</artifactId>
        <version>5.2.5</version>
    </dependency>
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>5.2.5</version>
    </dependency>
</dependencies>

Maven CentralMaven Central

Wayan

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.