dcsimg
June 18, 2018
Hot Topics:

How to Create an Excel Document from a Java Program Using Apache POI

  • January 15, 2018
  • By Manoj Debnath
  • Send Email »
  • More Articles »

The Apache POI is a popular open source library used to read, write, and manipulate MS Office and Open Office files by using Java code. The library is one of the many open source products maintained by Apache Software Foundation (ASF) contributed to the Java community. The library contains classes and methods to decode file formats based on Open Office XML standards and Microsoft OLE2. Although the library is capable of manipulating Word, Excel spreadsheet and PowerPoint files, this article mainly focuses on spreadsheet documents, only to make it brief.

The Apache POI Library

Intriguingly, in the name Apache POI, the POI stands for "Poor Obfuscation Implementation" and the goal of the library is to provide Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). In short, this enables one to read and write MS Excel, MS Word, and MS PowerPoint files by using Java code. Most Microsoft Office files—such as XLS, DOC, PPT, and MFC serialization API-based file formats—are based on OLE2 standard. The OLE, basically, is a proprietary technique developed by Microsoft, and provides the format for linking objects and embedding objects within container documents. The first format is called OLE1.0 format where the linked object and the embedded object data is laid out as a sequence of bytes within the container document. The second version, the OLE2.0 format, leverages OLE Compound File Technology (MS-CFB) where the linked object or embedded object data is contained in this storage in the form of OLE Compound File Stream objects. Refer to OLE1.0 and OLE2.0 Formats for more details on this. The Apache POI library provides library APIs for the OLE2 file system called POIFS and OLE2 Document Properties called HPSF.

Apache POI Components

The Apache POI library provides classes and methods to work with OLE2 Compound Documents of MS Office. Here is a brief overview of the most commonly used ones:

  • POIFS for OLE2 Documents: The POIFS stands for Poor Obfuscation Implementation File System. This is the basic POI element implemented in the library to port OLE2 Compound Document. It supports read and write functionality for the non-XML binary Microsoft Office format. All APIs of the POI library are built upon this.
  • HSSF and XSSF: The HSSF stands for Horrible Spread Sheet Format. It is a Java implementation port for the Excel 97 file format or for .xls files. XSSF stands for XML Spread Sheet Format and it is a port for the OOXML file format or .xlsx file format.
  • HWPF and XWPF: The HWPF stands for Horrible Word Processor Format. It is a limited read-only port for the older Word 6 or Word 95 file format. The XWPF stands for XML Word Processor Format. It is a Java implementation port for the Word 2007 .docx file format. Both implementations support limited functionality.
  • HSLF and XSLF: The HSLF stands for Horrible Slide Layout Format. The XSLF stands for XML Slide Layout Format. Both of these provide read, write, create, and modify capability of PowerPoint presentations whereas HSLF supports the PowerPoint 97 format and XSLF supports later versions.
  • HPSF: The HPSF stands for Horrible Property Set Format. It is particularly used to work with document properties such as setting the title, category, author, date of modification and so forth of a document
  • HDGF and XDGF: The HDGF stands for Horrible Diagram Format. This component contains classes to work with the Visio binary file format. It provides low-level, read-only APIs to access Visio Documents and VSD files. The XDGF stands for XML Diagram Format. It is for the Visio XML file format or VSDX files.
  • HPBF: The HPBF stands for Horrible Publisher Format. It is a limited Java port to work with the MS Publisher file format.

The acronyms sound humorous because these file systems were supposed to be closed and Microsoft did their best to obfuscate the code so that they are not only difficult to understand but also hard to reverse engineer. But, the developers at Apache hacked it with ease and reverse engineered it successfully. Perhaps, as a mark of rejoice or utter condemnation of closed system, they jocularly named them as such.

Working with HSSF and XSSF Files

The HSSF and XSSF components of the Apache library provides three models of accessing a spreadsheet document as per the HSSF and XSSF Documentation. They are:

  • Low-level structures for special needs
  • The eventmodel APIs for read-only access to Excel documents
  • The usermodel APIs for creating, reading, and modifying Excel files

The limited eventmodel APIs may be used for only reading spreadsheet data. These APIs are located in the org.apache.poi.hssf.eventusermodel package and org.apache.poi.xssf.eventusermodel package, where the first one is used to read data from the .xls file format and the second is used to read data from the .xlsx file format.

The usermodel is much more flexible and easier to use; it can read, write, create, and modify an Excel spreadsheet document. But, it has a much higher memory footprint than the low-level eventmodel.

Also, accessing and manipulating the newer OOXML-based file format with XSSF has a much higher memory footprint than the old HSSF supported binary files.

From POI 3.5 onwards, the HSSF and XSSF model has been clubbed into the SS model, rather tweaked to work for both the models. It is more of tweak of name than a real change. In a way, we can say that SS=HSSF+XSSF.

Migrating Database Table Data to an Excel Spreadsheet

Here, we'll create a simple utility program to migrate some database data into an Excel sheet. This also can be tweaked to work with other ways, such as migrating Excel data to a database table. This is left as an exercise for the reader. The program is simple and self-explanatory. Visit the Apache POI Documentation for detailed information on any classes or methods. To try the following example, what we have used is as follows:

  • JDK 8
  • MS Excel 2007
  • Intellij IDEA IDE
  • Apache POI 3.17
  • Apache Derby 10.14

Visit the appropriate documents and help files for setting up the project. Here are the contents for the Maven pom.xml file we have used.

<project xmlns="http://maven.apache.org/POM/4.0.0"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
   http://maven.apache.org/xsd/maven-4.0.0.xsd">
   <modelVersion>4.0.0</modelVersion>

   <groupId>com.mano.examples</groupId>
   <artifactId>apache-poi-demo</artifactId>
   <version>1.0-SNAPSHOT</version>
   <packaging>jar</packaging>

   <name>apache-poi-demo</name>
   <url>http://maven.apache.org</url>

   <properties>
      <project.build.sourceEncoding>
         UTF-8
      </project.build.sourceEncoding>
   </properties>

   <build>
      <plugins>
         <plugin>
            <artifactId>
               maven-compiler-plugin
            </artifactId>
            <version>3.7.0</version>
            <configuration>
               <source>1.8</source>
               <target>1.8</target>
            </configuration>
         </plugin>
      </plugins>
   </build>

   <dependencies>
      <dependency>
         <groupId>junit</groupId>
         <artifactId>junit</artifactId>
         <version>3.8.1</version>
         <scope>test</scope>
      </dependency>

      <!-- https://mvnrepository.com/artifact
         /org.apache.maven.plugins/maven-compiler-plugin -->
      <dependency>
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-compiler-plugin</artifactId>
         <version>3.7.0</version>
      </dependency>

      <!-- https://mvnrepository.com/artifact/
         org.apache.poi/poi -->
      <dependency>
         <groupId>org.apache.poi</groupId>
         <artifactId>poi</artifactId>
         <version>3.17</version>
      </dependency>

      <!-- https://mvnrepository.com/artifact/
         org.apache.poi/poi-ooxml -->
      <dependency>
         <groupId>org.apache.poi</groupId>
         <artifactId>poi-ooxml</artifactId>
         <version>3.17</version>
      </dependency>

      <!-- https://mvnrepository.com/artifact/
         org.apache.derby/derby -->
      <dependency>
         <groupId>org.apache.derby</groupId>
         <artifactId>derby</artifactId>
         <version>10.14.1.0</version>
         <scope>test</scope>
      </dependency>

      <!-- https://mvnrepository.com/artifact/
         org.apache.derby/derbyclient -->
      <dependency>
         <groupId>org.apache.derby</groupId>
         <artifactId>derbyclient</artifactId>
         <version>10.14.1.0</version>
      </dependency>

   </dependencies>
</project>

Listing 1: pom.xml

A database table is created with some dummy records prior to running the utility program. Here is the code of that file.

package com.mano.examples;

import java.sql.*;


public class DummyDatabase {

   public static void createDummyDatabase()
         throws SQLException {
      Connection con=DriverManager.getConnection
         ("jdbc:derby:D:/temp/dummy;create=true");
      Statement stmt=con.createStatement();
      stmt.executeUpdate("drop table semester2");
      stmt.executeUpdate("CREATE TABLE semester2(STUDENT_ID int,
         CARCH INT, DCE INT, WEBTECH INT, JAVA INT, SAD_MIS INT,
         PRIMARY KEY(STUDENT_ID))");

      // Insert 2 rows
      stmt.executeUpdate("insert into semester2
         values (23567932,56,78,97,58,85)");
      stmt.executeUpdate("insert into semester2
         values (47250001,96,34,75,68,12)");
      stmt.executeUpdate("insert into semester2
         values (99568955,45,68,69,78,29)");
      stmt.executeUpdate("insert into semester2
         values (89376473,75,23,56,89,47)");
      stmt.executeUpdate("insert into semester2
         values (29917740,85,78,55,15,48)");
      stmt.executeUpdate("insert into semester2
         values (85776649,23,56,78,25,69)");
      stmt.executeUpdate("insert into semester2
         values (38846455,68,95,78,53,48)");
      stmt.executeUpdate("insert into semester2
         values (40028826,63,56,48,59,75)");
      stmt.executeUpdate("insert into semester2
         values (83947759,85,54,69,36,89)");
      stmt.executeUpdate("insert into semester2
         values (92884775,78,59,25,48,69)");
      stmt.executeUpdate("insert into semester2
         values (24947389,12,10,14,54,68)");
      stmt.executeUpdate("insert into semester2
         values (77399465,44,33,26,88,77)");

      // Query
      ResultSet rs = stmt.executeQuery
      ("SELECT * FROM semester2");

      // Print out query result
      while (rs.next()) {
         System.out.printf
               ("%d\t%d\t%d\t%d\t%d\t%d\n",
            rs.getLong("STUDENT_ID"),
            rs.getInt("CARCH"),
            rs.getInt("DCE"),
            rs.getInt("WEBTECH"),
            rs.getInt("JAVA"),
            rs.getInt("SAD_MIS"));
      }
      stmt.close();
      con.close();
   }
}

Listing 2: DummyDatabase.java.

This is the utility program we are talking about. The code was written in big hurry and the structure not very elegant. However, it works. Restructure or tweak it as you deem fit.

package com.mano.examples;

import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;


import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.sql.*;

public class SSFile {

   private static String[] header={"STUDENT_ID",
      "CARCH", "DCE", "WEBTECH", "JAVA",
      "SAD_MIS", "TOTAL", "AVERAGE"};

   public static void databaseToExcel(File file)
         throws IOException, SQLException {
      Workbook workbook = null;
      if (file.getName().endsWith(".xls"))
         workbook = new HSSFWorkbook();
      else if (file.getName().endsWith(".xlsx"))
         workbook = new XSSFWorkbook();
      else {
         System.out.println("Invalid filename!");
         return;
      }
      Sheet sheet = workbook.createSheet();
      Connection con = DriverManager.getConnection
         ("jdbc:derby:D:/temp/dummy;create=true");
      Statement stmt = con.createStatement();
      ResultSet rs = stmt.executeQuery("SELECT * FROM semester2");



      Row rr = sheet.createRow(0);
      for(int i=0;i<header.length;i++){
         createHeaderCell(rr, (short) i, header[i]);
      }

      int i = 1;
      while (rs.next()) {
         rr = sheet.createRow(i++);
         for(int j=0;j<header.length-2;j++){
            createDataCell(rr, (short) j,
               rs.getLong(header[j]));
         }
      }
      rr = sheet.getRow(1);
      Cell total = rr.createCell(6);
      total.setCellType(CellType.FORMULA);
      total.setCellFormula("SUM(B2:F2)");
      Cell avg = rr.createCell(7);
      avg.setCellType(CellType.FORMULA);
      avg.setCellFormula("AVERAGE(B2:F2)");

      FileOutputStream outFile = new
         FileOutputStream(file);
      workbook.write(outFile);
      outFile.flush();
      outFile.close();
      stmt.close();
      con.close();
   }

   private static void createHeaderCell(Row row,
         short col, String cellValue) {
      Cell c = row.createCell(col);
      c.setCellValue(cellValue);
   }

   private static void createDataCell(Row row,
         short col, Number cellValue) {
      Cell c = row.createCell(col);
      c.setCellType(CellType.NUMERIC);
      c.setCellValue(cellValue.doubleValue());
   }
}

Listing 3: SSFile.java

This is the control panel from which the utility program is invoked.

package com.mano.examples;

import java.io.File;
import java.io.IOException;
import java.sql.SQLException;

public class App
{
   public static void main( String[] args )
         throws IOException,SQLException{
      // DummyDatabase.createDummyDatabase();
      SSFile.databaseToExcel(new
         File("d://temp//test1.xls"));
   }
}

Listing 4: App.java

Before Running…

Make sure that the test1.xls or test1.xlsx files do not exist in the d://temp directory prior to running the program because the program neither overwrites nor checks the file with the same name in the directory where the file is supposed to be created. Make sure of this every time the program is run; otherwise, the code gives a nasty error message. However, you may tweak the code to put some checking.

Conclusion

There is another alternative to working with spreadsheets, as the Apache POI Documentation suggests via the Cocoon serializer, although it still uses HSSF indirectly. Cocoon can serialize any XML data source by applying the style sheet and designating the serializer. The HSSF and XSSF model is quite powerful and provides a number of classes and methods to deal with different aspects of an Excel document. This article tried to give a glimpse of what we can do with Apache POI. We often need to write a utility program to bridge between an open and closed system. Apache POI can definitely serve our purpose as one of its kind.

References






Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

By submitting your information, you agree that developer.com may send you developer offers via email, phone and text message, as well as email offers about other products and services that developer believes may be of interest to you. developer will process your information in accordance with the Quinstreet Privacy Policy.

Sitemap

×
We have made updates to our Privacy Policy to reflect the implementation of the General Data Protection Regulation.
Thanks for your registration, follow us on our social networks to keep up-to-date