October 23, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Reading Excel Files with Apache POI HSSF

  • December 22, 2008
  • By Scott Nelson
  • Send Email »
  • More Articles »

In one of my former positions, I had the delightful task of cataloging potential solutions to new requirements and then rating them (yes, I'm the kind of masochist who enjoys that sort of thing). A requirement came along to create a fully portable version of Business Intelligence Portal Application that could be run from a non-technical user's laptop and have the data updated real-time through Excel. The business case for this was that a technology executive had been presenting static HTML demonstrations of BI portal applications he was proposing and found that the resistance he received was mostly on the data in the demo being out-of-date, distracting his audience from the benefits of the application itself.

The technical solutions needed were a web application that could be run from the tech-exec's laptop with no knowledge of application servers and the need to read Excel files quickly as a data source, preferably with run-time updates. The first part was solved with an application I had built in my spare time (see Open Source-Based Portal-Lite). This left the reading of the Excel files. I found both open-source and commercial solutions to this. The commercial solutions immediately went to the bottom of the list because this was an application for selling applications, which meant the budget was limited. Of the open source applications, the hands-down, stand-out winner of the evaluation was the Apache POI HSSF APIs.

The Apache POI project is a set of Java APIs to read and write Microsoft Office files. J2EE web applications are primarily used to create a browser interface to databases. No matter what your opinion of Microsoft Office may be, it is ubiquitous in the workplace and there is a huge amount of data stored in Excel files on any network or small business PC.

Creating a Java Object from the Excel File

The first step is to take the Excel file itself and turn it into something that Java will work with. This is accomplished by creating a POIFSFileSystem object and an HSSFWorkbook object. The HSSFWorkbook object is the root object from which you will access the part of the Excel file you are interested in later on.

The example class adapted for this article is from an application where Excel acted as the database, so it uses a file path as an argument to a static constructor:

public static ExcelReader getInstance(String sDocPath)
   throws IOException
{
   return new    ' ExcelReader(sDocPath);
}

Which in turns calls a private constructor to initialize the POIFSFileSystem and HSSFWorkbook objects:

private ExcelReader(String sDocPath) throws IOException
{
   this.instance = new ExcelReader();
   this.docPath  = sDocPath;
   this.document = new POIFSFileSystem(new FileInputStream(docPath));
   this.workbook = new HSSFWorkbook(document);
}

In a web application where the Excel file is uploaded, you would create a java.io.InputStream with whatever mechanism your web application has for handling uploads. For example, a Struts 1.x application would use the getInputStream() method from the org.apache.struts.upload.FormFile interface to provide the parameter for the POIFSFileSystem constructor.

Either way, it is handy to abstract the huge number of HSSF APIs with your own class that focuses on just the key methods you will need for your application. For instance, the preceding constructor creates both the POIFSFileSystem and HSSFWorkbook objects because clients of this class are only interested reading values from the Excel file where the POIFSFileSystem class can be used for both reading and writing Excel files.

Identifying the Parts of the Workbook

Most people are aware that an Excel file contains Worksheets, which contain cells arranged in rows and columns. Even if you are aware that the Excel file is a Workbook, you probably seldom think about it, which is why the example implementation here abstracts the Workbook concept away from the client. Although the HSSF libraries provide methods to do just about anything in Java with a Workbook that you would do in Excel, what you are interested in here is simply reading the data from the file so you can use it however you wish. This means you will want your Java class to do exactly what you as users in Excel would do, in the same sequence, only much, much faster. These steps are:

  • Go to a Worksheet
  • Identify the header row
  • Read the data by rows
  • Read the data by columns

Reading Values

Worksheets can be derived from HSSFWorkbook objects either by name or index. Generally, the client application should be more interested in the name of the worksheet if there are multiple worksheets, and the index if there is supposed to be a single worksheet in Excel file where your data is located. The client application should only be interested in the values in the worksheet, so your reader class does not even need to expose the methods for retrieving the HSSFSheet object, only methods for returning values in a useful manner.

For most purposes, assuming the first row in a spreadsheet provided as a data source that contains the field names is safe. From a practical standpoint, data submitted as input should have a defined format. Even though it is possible to come up with logic for discovering format from any layout the user provides, it would be extremely complex. So, your first convenience method will be to return the values of the first row of a given worksheet:

public ArrayList<String> getColNames(int sheetIndex)
{
   ArrayList<String> colNames = new ArrayList<String>();
   HSSFSheet sheet = this.workbook.getSheetAt(sheetIndex);
   HSSFRow   row   = sheet.getRow(0);
   HSSFCell  cell  = null;
   int       cols  = 0;
   if (row != null)
   {
      cols = row.getPhysicalNumberOfCells();
      for (int i = 0; i < cols; i++)
      {
         cell = row.getCell(i);
         if (cell != null && cell.getCellType() == stringCell)
         {
            colNames.add(cell.getRichStringCellValue().getString());
         }
      }
   }
   return colNames;
}

Now, overload the method to work with the name of a worksheet for the multi-worksheet inputs:

public ArrayList<String> getColNames(String sheetName)
{
   return getColNames(this.workbook.getSheetIndex(sheetName));
}




Page 1 of 2



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel