JavaJava 7 Fork/Join Framework Tutorial

Java 7 Fork/Join Framework Tutorial

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Of the several new features introduced in Java 7, Join/Fork Framework is an important addition. JSR 166 aims to standardize a simple extensible framework for organizing common utility classes for parallel programming into a proper package similar to that of Collection classes bundled in the java.util package. The objective is to make it readily available and easily maintainable for developers. This also focuses on giving high quality implementations on Parallel programming. Several new classes and interfaces have been added for this framework.

Typically, this new feature will address the critical needs of the Java community in low level threading activities such as synchronized, wait and notify actions. The fork/Join framework is designed to parallelize, divide and conquer algorithms easily. Many times programmers opt to use their own concurrency mechanisms (high level), so the idea is to offer standard and the most efficient concurrency utilities that facilitate programmers to write a variety of multithreaded applications. The required classes and interfaces are made available in the java.util.concurrent package.

In this article, we describe the Fork/Join framework and how it is used to address Java parallelism issues (see Part 1 for an overview of these issues).

Fork/Join Framework

As discussed earlier, Fork/Join framework is a new feature, which was added to the Java 7 platform and is one of the major advancements made in the Java7 platform. In the previous sections, we learned how the Multi-threaded model in Java platform emerged from Java threads to the Executor framework. It is very obvious that the support for multi programming in the Executor framework is much better, when compared to handling them through plain Threads. 

In a real time scenario, there are several cases where we would need to address the classic “Divide and Conquer” problem. To solve this problem, the main task is divided into several chunks of tasks (The Divide phase!) and then each of these small tasks is computed individually in parallel. Once their execution is completed, the results are combined or conquered (The Conquer phase!).

The problem of “Divide and Conquer” can be easily achieved using the Executor interface, which works on Callable threads (refer to the previous example for more details on Callable threads). This can be done by instantiating multiple Callable instances for each task and then collecting the results of the computation in the ExecutorService class to compute the final result. Then the question that comes to mind is, if this Interface works fine, why do we need another framework in Java 7?

The main issue using the ExecutorService with Callable is that Callable instances are blocking in nature; once a Callable instance starts executing, all other Callables are blocked. This causes a lot of resources to be unutilized, since the next available Callable instance in the queue will not be executed until the previous one finishes its execution. Fork/Join Framework comes into the picture to address this parallelism problem, while Executor addresses the Concurrency issues.

Work Stealing in the Fork/Join Framework

Fork/Join Framework adds two main classes to the java.util.concurrent package:

  • ForkJoinPool
  • ForkJoinTask

The ForkJoinPool class is an executor class and executes ForkJoinTask instance(s). The main principle of ForkJoinPool is “work stealing“, where the threads attempt to find and execute subtasks, which are created by other tasks whose state is still active.  A ForkJoinTask instance is very light weight when compared to a normal Java thread.

Once a ForkJoinTask is started, it starts subtasks and then waits for the subtask to finish its execution.  The executor ForkJoinPool class is responsible for assigning this subtask to another thread within the Thread Pool, which is currently waiting for some task to be completed. The active threads in the thread pool try to execute some other subtask created by other tasks. Based on the number of processors available, the ForkJoinPool tries to work with that level of parallelism by maintaining adequate active threads at any given point of time.

ForkJoinTask has two main methods, apart from several other API methods, which are:

  • fork () – This method decides on asynchronous execution of the ForkJoinTask. By virtue of this method, a new task can be created.
  • join () – This method is responsible for returning the results once the computation is completed. This allows a task to wait for the completion of another task.

The overall Fork/Join process is depicted below.

Fork/Join Process
Fork/Join Process

Fork/Join Example in Java 7

In this section, we describe the example to compute the sum of n natural numbers using the Fork/Join framework introduced in Java7. The main purpose of this example, as described earlier, is to break the computation of sum of n natural numbers into smaller units to combine to get the final results. The logic to evaluate the sum remains the same.

The basic principle involved during this computation is to divide the value of n into a smaller value to compute the sum independently. During each iteration, the process involves breaking the value of n into two halves 1 to n/2 and (n/2)+1 to n and to compute the sum of each half independently.

The ForkJoinTask class is defined as follows:

The ForkJoinTask class 

The EvaluateSumForkJoinTask is the class that creates the ForkJoinTask instances and extends the RecursiveTask class, which is the recursive result baring class. The RecursiveTask class is a generic class set to Integer in our case. This class overrides the compute () method, which takes care of forking new tasks and joins the computed results.

Since the main problem can be broadly broken into two main chunks, we create two worker threads called myWorker1 and myWorker2 and assigned with EvaluateSum instances with smaller values for n.  Once the value is reduced to 2, the tasks are not further divided for parallelism. This condition can be varied based on the value of n to achieve better parallelism.

The client class maintains the ForkJoinPool and is responsible for invoking the main task. The ForkJoinPool instance is configured based on the number of processors available in the underlying hardware. The machine on which this demo application is executed runs on a dual core processor with 2 processors. The invoke() will fork the main process to evaluate the sum of first n (read as 500) numbers, which will in turn call the compute() of the EvaluateSumForkJoinTask class, which is a RecursiveTask class.

Example Code

Comparison of Results

The computation in the case of Fork/Join is delegated into multiple worker threads that work together to compute the sum. The results are compared with the single Threaded implementation and the following results are obtained:

Value of N

Single Thread model

Time Taken (in ms)

Fork/Join Implementation

Time taken (in ms)

Remarks

20

4

7

In this case, Fork/Join takes more time due to the small value of n. For a very small problem parallelism might not be worth using.

100

15

6

In these 3 cases, the Fork/Join implementation takes less time due to the use of multiple cores.

500

124

6

1000

285

7

It is also very important to note that the CPU utilization in the case of a single threaded model is quite a bit less when compared to the Fork/Join implementation that is as shown:

Single Threaded Model

Single Threaded Model

Fork/Join Implementation

Fork/Join Implementation

Conclusion

This article discussed the parallel programming feature in the Java platform through its evolution from Single threaded model to the Fork/Join framework introduced in Java7. Executors were a major improvement made to the Java platform in Java 5 but they cannot be used for Parallel Computing since the calls are blocking in nature. This is where the Fork/Join framework introduced in Java7 plays a vital role. This article explained with a very simple example the concurrent and parallelism programming concepts and concluded with a comparison in results of both approaches. 

Acknowledgements

The authors would like to sincerely thank Mr. Subrahmanya SV, VP, ECOM Research Group, E&R for the guidance, support and constant encouragement and Mr. Piram for kindly reviewing the article.

Nitin KL. Nitin works as a Technology Lead at Infosys Ltd. He has a very good knowledge of Java EE technologies. He is involved in design and development of Java EE applications using Hibernate, iBATIS and JPA. He has written many online articles for Devx. Nitin can be reached at KL_Nitin@infosys.com.

Sangeetha S. S Sangeetha works as a Senior Technology Architect at the E-Commerce Research Labs at Infosys Ltd. She has over 14 years of experience in design and development of Java and Java EE applications. She has co-authored a book on ‘J2EE Architecture’ and also has written numerous online articles on Java for JavaWorld, java.net, Devx. She can be reached at sangeethas@infosys.com.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories