September 20, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Don't Be a Slave of the File System

  • March 15, 2009
  • By Liviu Tudor
  • Send Email »
  • More Articles »

Whether you worked on small-scale application or if you have developed enterprise-wide applications all of your life, at some point you've likely encountered the need to read a file. This has likely lead to your seeing the joys of the java.io package. While the package in itself is well designed and structured, it doesn’t solve a basic problem - the fact that you are dealing with a file system. As such your application's response time can be limited by the response time of the file system itself.

In the case of most local file systems that might not be a problem (though if you have a busy file system and some large reads you might start to feel it!), but it does become a problem for things like NFS or mapped drives. A lot of integrations with really old legacy systems are based on files exchanged on network paths. The hospitality industry (hotels, pubs and restaurants) in particular seems to be full of such systems that haven’t been changed since the 1980s. In such cases even though you might exchange just one kilobyte of data at a time, because you're dealing with a remote file system and in most cases a very busy one with anything from 50 to 100 such "message files" exchanged per minute and then propagated through the network, you’re occasionally looking at about 1-2 seconds for reading such a small file.

In a normal web-based application this might not be a problem because the user is used to waiting (however, even in such cases the attitude is beginning to shift!); however, it would be much nicer if instead of waiting in a InputStream.read() for the bytes to become available to your application, you were also doing something else in parallel. Sure you can turn your application execution flow around so that you only do the read at the end. In this case, the user experience suffers only in the last step - after you have offered a pretty smooth and fast experience throughout. However, this is not possible sometimes and even if it is, it doesn't eliminate the problem, but instead hides it behind the fact that most of the user actions have been responded in a timely-fashion and therefore the user will have more tolerance for such a "small" glitch at the end.

The main problem is due to a piece of code like this in your application:

//open the file
FileInputStream f = new FileInputStream( file );
...
//read, possibly repeatedly
f.read( bytes );
...
//finished, close the file
f.close();

The problem derives from the fact that all those operations are blocking operations, most importantly, read(). Because it is a blocking operation, you can't do anything in your code apart from waiting for it to return. Unless of course, the above block was happening in a separate thread!

And that's where the main idea for this article stemmed from: wouldn't it be nice to have a "framework" that does all the reading for you in background while you can carry one with other stuff and then be notified when the reading has finished so you can finally use the contents of the file read? While this doesn’t eliminate the delay incurred when reading a file, it will greatly reduce the waiting time for your reads in your application - since while the read is being performed you can carry out some other work in your application.

The architecture of this component is quite simple. You need to build a queue where "clients" add their requests for file reading and then a "read manager" will take an item from this queue and spawn a thread to perform the actual reading. Upon finishing the reading, this thread will then call back the client that placed the request in the queue and pass the contents of the file as an array of bytes.

Granted, we're making a few assumptions here:

  • That the contents of the file can be stored in a byte array. Though as of yet I am still to see an application constantly reading files of over 2Gb!
  • That the time it takes to read this file is a few seconds. If you have a file that takes minutes to load there's something wrong and while you can still use this component, unless you have work in your application to do for minutes, it is unlikely that you will get a better result. (Not to mention the poor user who might have to wait for minutes!)
  • That your system can take the hit of a few more threads without slowing down the application

But as in most of these cases you don’t have to worry about any of the above (and if you do, then it's time for the drawing board again!), I think it's fair to make such assumptions.

That being the case then, the component you need is quite simple in terms of design - a bit too simple, as I will explain later. The reading thread and manager together with supporting classes has been placed under the reader package and as you can see consists of two classes and an interface:

ReadCallback - This is the interface that will be used to notify the clients that place file read requests in our queue that the file has either completed successfully, in which case we pass the bytes read by invoking readingFinished() or that some error has occurred in which case we notify the caller about the error encountered and pass the exception object via readError(). Of course other notifications can be added to this interface if needed - like fileOpened, fileClosed, and bytesRead. From my experience in most of the scenarios described above all you care about is the full contents of the file read, so I didn’t consider it necessary to add these...just yet.

ReaderThread - This is the class that implements the reading of the file in the manner shown above via a FileInputStream. Since the reading itself is not the subject of this article, I opted for simplicity for adding all the read bytes onto a ByteOutputStream. Note that this is not optimal as it involves continuous growth of the internal byte array used by this class with each read, which leads to constant memory re-allocation. This class will be run into a separate thread by the ReadManager thread (which is why it implements the Runnable interface) and as mentioned above it spawns one of the two notifications, reading completed or reading failed, depending on whether we have encountered any I/O problems during reading the contents of the file or not.

The ReadManager - Last but probably most importantly, the ReadManager. This is the class that holds and manages an internal queue with all the file read requests placed by the clients. Internally the class has a queue where clients add their files requests. While this could have been implemented via a regular Queue class, I’ve opted for BlockingQueue so the "waiting for elements to be added to the queue" task will pass to the implementing classes rather than this task needing to be performed by you. Since the ReadManager class runs in a different thread from the main application, it doesn't matter that the take() operation blocks until an item is available in the queue.

If you need to perform other operations inside the ReadManager class while the queue is being populated, simply use a Queue and replace the call to take() with poll(). You will notice that this class runs in an infinite loop. That is purely for simplicity reasons. In a "live" environment you should design some signalling mechanism for the thread to finish cleanly (as you will notice, the main application is exited via System.exit() which forces the ReadManager thread to finish via an InterruptedException). Since all the read requests should go through this component, this has been made into a singleton. There is only one instance of this class - and therefore only one queue - serving the entire application. As a one last note, you will notice that for each file request spawned a new thread. Again, this is not optimal, since an application with hundreds of concurrent requests can end up spawning hundreds of such thread thus bringing the system on its knees. In a "live" scenario a thread pool is required around the ReaderThread.

For our "client" application, I’ve provided a very simplified implementation of a class that does some "lengthy" operations which can be done in parallel to the reading. In this case I’ve chosen to do some random double multiplication for about 100,000 times, but as I said, this is just to simulate a "lengthy" operation, so this is really left to your imagination. Obviously the client has to implement the callback interface and process the notifications received accordingly. As you should remember, these notifications are sent from a separate thread - specifically from inside the ReaderThread - so it is important to bear in mind that at the time these notifications arrive, the main client thread is also running. Some synchronization might be necessary! In this case, since I’ve stored a flag to indicate that the reading has completed (readFinished member) and an array that will contain the file contents if the reading was successful or will be set to null in the case of an error (fileContents member), there has to be some synchronization involved when operating with these. For the purpose of this exercise, the client code synchronizes on the ReadClient object itself when writing or reading these values. I’m sure you can figure out and change this for your application's needs.

Taking this information into account, the client flow is very simple. Having received a file name it needs to read in the command line, it queues a file read request with the manager and then proceeds to do the "lengthy" task while the read manager takes care of reading the file in the background. Having finished the lengthy operation, it then makes sure the file contents has been made available. For this it might have to wait occasionally (but hopefully for much less than in the case of reading the file in the first place). Once the readFinished has been set it can only mean two things, the file contents will be in fileContents member in which case you "consume" it (simply print it out on console) or the read failed and the fileContents is null in which case you bail out.

Conclusion

The presented solution is a very basic implementation of a component that allows your application to perform "asynchronous" file reads using the standard java.io package. This can be extended of course to file writes and other I/O operations, e.g. network communications for sending/receiving emails. As pointed out, for simplicity reasons, a lot of things were left out. As it stands the component is not scalable out-of-the-box, however, small changes are needed for that, as it explained throughout the article. Also, while the component presented only offers basic file reading notifications, this can be extended with more notifications in the callback interface depending on the application needs.

One possible extension to this little framework would be adding some more notifications to the callback interface as bytes are being read so that applications could process the data read from the files as they are made available and then transform them into meaningful data to the application (in a SAX-like manner for instance).

Finally, while this doesn't eliminate the delays incurred by slow devices such as network drives, it does allow the programmer to isolate the blocking operations into a separate thread, thus possibly reducing the response time of an application which needs to do such repeated reads.

About the Author

Liviu Tudor is a Java consultant living in the UK with a lot of experience with high-availability systems, mostly in the online media sector. Throughout his years of working with Java, he came to realize that when performance matters, it is the "low level", core Java that makes an application deliver rather than a bloated middleware framework. When the injuries acquired playing rugby with his team, The Phantoms (http://www.phantomsrfc.com) in London, he writes articles on Java technology for Developer.com.

Code

You can download the source code from here: ReadCarefully.zip

ReadCallback.java


package reader;
public interface ReadCallback 
{
   public void readingFinished( String name, byte[] contents );
   public void readError( String name, Exception e );
}

ReadThread.java


package reader;
import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.logging.Logger;
public class ReaderThread implements Runnable 
{
   private static Logger      LOG = Logger.getLogger( "ReaderThread" );
   
   private static final int   ONE_KILO = 1024;
   private String name;
   private ReadCallback callback;
   
   public ReaderThread( String name, ReadCallback callback )
   {
      this.name = name;
      this.callback = callback;
   }
   
   @Override
   public void run() 
   {
      try
      {
         LOG.fine( "Reading file " + name );
         ByteArrayOutputStream bytes = new ByteArrayOutputStream();
         byte buffer[] = new byte[ONE_KILO];
         FileInputStream fis = new FileInputStream( name );
         int read;
         while( (read = fis.read(buffer)) != -1 )
            bytes.write( buffer, 0, read );
         fis.close();
         LOG.fine( "Finished reading file " + name + ", launching callback" );
         callback.readingFinished( name, bytes.toByteArray() );
         LOG.fine( "File " + name + " read and processed" );
  }
  catch( IOException ioe )
  {
   callback.readError( name, ioe );
      }
   }
}

ReadManager.java


package reader;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.logging.Logger;
public class ReadManager implements Runnable
{
   private static Logger      LOG = Logger.getLogger( "ReadManager" );
   private static ReadManager   instance;
   
   private BlockingQueue<ReadItem> queue;
   
   private ReadManager()
   {
      queue = new LinkedBlockingQueue<ReadItem>();
      new Thread(this).start();
   }
   
   public static ReadManager getManager()
   {
      if( instance == null )
         instance = new ReadManager();
      return instance;
   }
   
   public void addFile( String name, ReadCallback callback )
   {
      LOG.fine( "Adding file " + name + " to the queue" );
      queue.add( new ReadItem(name,callback) );
   }
   
   @Override
   public void run() 
   {
      try 
      {
         ReadItem i;
         while( (i = queue.take()) != null )
         {
            LOG.fine( "Extracted " + i + " from read queue, starting read thread" );
            Thread t = new Thread( new ReaderThread(i.getName(), i.getCallback()) );
            t.start();
         }
      } catch( InterruptedException e ) 
      {
         return;   //stop the thread
      }
   }
   class ReadItem
   {
      private String name;
      private ReadCallback callback;
      
      public ReadItem( String name, ReadCallback callback )
      {
         if( name == null || callback == null )
            throw new NullPointerException( "Invalid arguments for read" );
         this.name = name;
         this.callback = callback;
      }
      
      public String getName()
      {
         return name;
      }
      
      public ReadCallback getCallback()
      {
         return callback;
      }
      
      public String toString()
      {
         return "File name:" + name + " - Callback:" + callback;
      }
   }
}






Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel