JavaData & JavaWorking with Java Archive Files

Working with Java Archive Files

An archive refers to a collection of one or more files put together as a single unit. Often in Java programs, we come across an archive file called a JAR (Java Archive). This type of file is common to every Java programmer. Archive files are created by using file archive software such as WinZip, 7-zip, tar, and so forth. These types of files are particularly useful to store and transmit multiple files as a single unit. File archives sometime employ data compression and encryption as well. This article delves into some of the key concepts of working with archive files by using Java programming.

Data Compression Overview

Archive files may employ different data encoding techniques to reduce the overall size of a file’s content. There are several compression algorithms to reduce file size; typically they are of two types: lossless and lossy compression. Lossless compression algorithms work without compromising any data loss due to the reduction of data file size. Lossy compression, on the other hand, assumes that some loss of information is acceptable. For example, when compressing an image file, a loss of few colour bits would not make much of a difference visually, but can reduce the file size considerably. This may be acceptable for type of files such as images, video, and so on. But, such lossy compression is not acceptable for say, a file containing product information. Here, we need lossless data compression techniques to be applied because, even when decompressed, we need the exact information without any data loss.

For example, in a lossless data compression, a string such as ‘AAAAAABBBB’ may be stored as ‘6A4B’; in other words, ‘six A’s and 4 B’s’. Storing ‘6A4B’ takes much less space than storing a repeated sequence of characters. This simple technique is called Run Length Algorithm (RLE). RLE reduces file size by the method called statistical redundancy where a repeated sequence of characters is replaced by a counter. There are many algorithms representing lossy and lossless data compression. For example, the variation of Lempel-Ziv (LZ) algorithm – LZR (Lempel-Ziv-Renau) algorithm, which forms the basis of a ZIP archive, LZW (Lempel-Ziv-Welch) used in GIF images, aand the like.

Encoding techniques such as MP3, Vorbis, and AAC are lossy; JPEG 2000, FLIF, and so on are lossless. They are very good for images or audio/video compression where dropping out a few bits does not really matter to the overall appeal of the content but definitely reduces quality if you are picky about minute details. Just listen to an MP3 audio and the same file but in uncompressed audio formats, such as WAV, AIFF, AU, or raw, header-less PCM.

Refer to the following articles for more information:

This process, when is reversed to get the actual data content, is called decompression.

Archive
Figure 1: How the Archiver works

Data Compression in Java

Java provides two API classes, called Deflater and Inflater, in the java.util.zip package to compress and decompress data respectively. These two classes provide the core compression, decompression utility in Java. The way to implement these classes is as follows.

The steps to deflate/compress are as follows:

  1. Create a Deflater instance.
  2. Pour in the byte format of the input string.
  3. Call the finish() method to signal end of input data.
  4. Call the deflate() method to compress.
  5. Call the end() method to end the compression process.

Similarly, to inflate/decompress, follow these steps:

  1. Create an Inflater instance.
  2. Pour in the compressed byte as input data.
  3. Call the inflate() method to decompress.
  4. Call the end() method to end the decompression process.

Let’s try it in a simple string. Do not get confused: Deflater is for compression and Inflater is for decompression.

package org.mano.example;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.Inflater;

public class ArchiveDemo {

   public static void main(String[] args)
         throws IOException, DataFormatException {

      String inputData = "Lorem ipsum dolor sit amet,
         consectetur adipiscing elit, "
      + "sed do eiusmod tempor incididunt ut labore et
         dolore magna aliqua. "
      + "Ut enim ad minim veniam, quis nostrud exercitation
         ullamco laboris "
      + "nisi ut aliquip ex ea commodo consequat. Duis aute
         irure dolor in "
      + "reprehenderit in voluptate velit esse cillum dolore
         eu fugiat nulla "
      + "pariatur. Excepteur sint occaecat cupidatat non
         proidendecompressort,"
      + "sunt in culpa qui officia deserunt mollit anim id
         est laborum.";

      byte[] compressedByte = compress(inputData.getBytes(),
         Deflater.BEST_COMPRESSION, false);
      byte[] decompressedByte=decompress(compressedByte, false);

      String outputData=new String(decompressedByte);

      System.out.println("Input Data: " + inputData);
      System.out.println("Uncompressed data length: "
         + inputData.getBytes().length);
      System.out.println("Compressed data length: "
         + compressedByte.length);
      System.out.println("Decompressed data length: "
         + decompressedByte.length);
      System.out.println("Output Data: " + outputData);

   }

   public static byte[] decompress(byte[] input, boolean format)
         throws IOException, DataFormatException {
      Inflater inflater = new Inflater(format);
      inflater.setInput(input);
      ByteArrayOutputStream baout = new ByteArrayOutputStream();
      byte[] buff = new byte[1024];
      int count = 0;

      while (!inflater.finished()) {
         count = inflater.inflate(buff);
         if ( count > 0)
            baout.write(buff, 0, count);
      }
      inflater.end();
      return baout.toByteArray();
   }

   public static byte[] compress(byte[] data, int compressionLevel,
         boolean format) throws IOException {

      Deflater deflater = new Deflater(compressionLevel, format);
      deflater.setInput(data);
      deflater.finish();

      ByteArrayOutputStream baout = new ByteArrayOutputStream();
      byte[] buff = new byte[1024];
      int count = 0;

      while (!deflater.finished()) {
         count = deflater.deflate(buff);
         if (count > 0)
            baout.write(buff, 0, count);
      }
      deflater.end();
      return baout.toByteArray();
   }
}

Working with ZIP Format

Suppose we have a number of files to archive in a ZIP format; we can use classes such as ZipEntry, ZipInputStream, ZipOutputStream, and ZipFile from the java.util.zip package to work with ZIP file format as follows.

package org.mano.example;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.Deflater;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public class ArchiveDemo2 {

   private static ZipOutputStream zout;

   public static void main(String[] args)
      throws IOException, FileNotFoundException {

         String zipFile = "myzipfile.zip";
         String[] files = { "myfile.txt, /home/mano/doc/file2.txt,
             file3.png" };
         zip(zipFile, files);
      }

      public static void zip(String zipFile, String[] files)
            throws IOException, FileNotFoundException {
         String currentDirectory = System.getProperty("user.dir");

      zout = new ZipOutputStream(new
         BufferedOutputStream(new FileOutputStream(zipFile)));
      zout.setLevel(Deflater.BEST_COMPRESSION);
      for (int i = 0; i < files.length; i++) {
      File file = new File(files[i]);
      if (!file.exists()) {
         System.out.println("File " + file.getAbsolutePath()
            + " not found ");
         System.out.println("Aborted.");
            return;
      }
      ZipEntry ze = new ZipEntry(files[i]);
      zout.putNextEntry(ze);

      BufferedInputStream buffin = new BufferedInputStream(new
         FileInputStream(files[i]));

      byte[] buffer = new byte[1024];
      int count = -1;
      while ((count = buffin.read(buffer)) != -1) {
         zout.write(buffer, 0, count);
      }
         buffin.close();
      }

      zout.closeEntry();
      zout.close();
      System.out.println("Output written to "
         + currentDirectory + File.separator + zipFile);
   }
}

To read the content of a ZIP-formatted file:

package org.mano.example;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public class ArchiveDemo3 {

   public static void main(String[] args) {
      String zipFile = "myzipfile.zip";
      String unziploc = "/home/mano/test";
      unzip(zipFile, unziploc);
   }

   public static void unzip(String zipFile, String unziploc) {
      try (ZipInputStream zin = new ZipInputStream(new
         BufferedInputStream(new FileInputStream(zipFile)))) {

      ZipEntry ze = null;
      while ((ze = zin.getNextEntry()) != null) {

      File file = new File(unziploc + File.separator
         + ze.getName());
      File root = file.getParentFile();
      if (!root.exists()) root.mkdirs();
      file.createNewFile();
      BufferedOutputStream buffout = new BufferedOutputStream(
      new FileOutputStream(unziploc + File.separator
         + ze.getName()));
      byte[] buffer = new byte[1024];
      int count = -1;
      while ((count = zin.read(buffer)) != -1) {
         buffout.write(buffer, 0, count);
      }
         buffout.close();
      }

      System.out.println("Contents extracted to " + (new
         File(unziploc)).getAbsolutePath());
      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}

Conclusion

The Java API has ready classes to deal with some of the other common archive file format such as GZIP files and JAR files. An archive file basically contains the metadata information of the directory structure of the files, error detection and recovery information, and so on. The java.util.zip package provides some excellent utility classes to deal with checksum apart from archive support.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories