dcsimg
October 21, 2018
Hot Topics:

What Are Sequential vs. Parallel Streams in Java?

  • July 30, 2018
  • By Manoj Debnath
  • Send Email »
  • More Articles »

Java can parallelize stream operations to leverage multi-core systems. This article provides a perspective and show how parallel stream can improve performance with appropriate examples.

Streams in Java

A stream in Java is a sequence of objects represented as a conduit of data. It usually has a source where the data is situated and a destination where it is transmitted. Note that a stream is not a repository; instead, it operates on a data source such as on an array or a collection. The in-between bits in the passage are actually called the stream. During the process of transmission, the stream usually goes through one or more possible transformations, such as filtering or sorting, or it can be any other process operating on the data. This customizes the original data into a different form, typically, according to the need of the programmer. Therefore, a new stream is created according to the operation applied on it. For example, when a stream is sorted, it results in a new stream that produces a result which then s sorted. This means the new data is a transformed copy of the original rather than being in the original form.

Sequential Stream

Any stream operation in Java, unless explicitly specified as parallel, is processed sequentially. They are basically non-parallel streams used a single thread to process their pipeline. Sequential streams never take advantage of the multicore system even if the underlying system may support parallel execution. What happens, for example, when we apply multithreading to process the stream? Even then, it operates on a single core at a time. However, it may hop from one core to another unless explicitly pinned to a specific core. For example, processing in four different threads versus four different cores is obviously different where the former is no match with the latter. It is quite possible to execute multiple threads in a single core environment but parallel processing is a different genre altogether. A program needs to be designed ground up for parallel programming apart from executing in an environment that supports it. This is the reason parallel programming is a complex arena.

Let's try an example to illustrate the idea further.

package org.mano.example;

import java.util.Arrays;
import java.util.List;

public class Main2 {
   public static oid main(String[] args) {
      List<Integer> list=Arrays.asList(1,2,3,4,5,6,7,8,9);
      list.stream().forEach(System.out::println);
      System.out.println();
      list.parallelStream().forEach(System.out::println);
   }
}

Output

123456789
685973214

This example is an illustration of q sequential stream as well as q parallel stream in operation. The list.stream() works in sequence on a single thread with the println() operation. list.parallelStream(), on the other hand, is processed in parallel, taking full advantage of the underlying multicore environment. The interesting aspect is in the output of the preceding program. In the case of a sequential stream, the content of the list is printed in an ordered sequence. The output of the parallel stream, on the other hand, is unordered and the sequence changes every time the program is run. This signifies at least one thing: that invocation of the list.parallelStream() method makes the println statement operate in multiple threads, something which list.stream() does in a single thread.

Parallel Stream

The primary motivation behind using a parallel stream is to make stream processing a part of the parallel programming, even if the whole program may not be parallelized. Parallel stream leverage multicore processors, resulting in a substantial increase in performance. Unlike any parallel programming, they are complex and error prone. However, the Java stream library provides the ability to do it easily, and in a reliable manner. The whole program may not be parallelized. but at least the part that handles the stream can be parallelized. They are actually quite simple in the sense that we can invoke a few methods and the rest is taken care of. There are a couple of ways to do it. One such way is to obtain a parallel stream by invoking the parallelStream() method defined by Collection. Another way is to invoke the parallel() method defined by BaseStream on a sequential stream. The sequential stream is parallelized by the invocation. Note that the underlying platform must support parallel programming, such as with a multicore system. Otherwise, there is no point in the invocation. The stream would be processed in sequence in such a case, even if we have made the invocation. If the invocation is made on an already parallel stream, it does nothing and simply returns the stream.

To ensure that the result of parallel processing applied on stream is same as is obtained through sequential processing, parallel streams must be stateless, non-interfering, and associative.

A Quick Example

package org.mano.example;

import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.Optional;
import java.util.stream.Collectors;

public class Main {

   public static void main(String[] args) {
      List<Employee> employees = Arrays.asList(
         new Employee(1276, "FFF",2000.00),
         new Employee(7865, "AAA",1200.00),
         new Employee(4975, "DDD",3000.00),
         new Employee(4499, "CCC",1500.00),
         new Employee(9937, "GGG",2800.00),
         new Employee(5634, "HHH",1100.00),
         new Employee(9276, "BBB",3200.00),
         new Employee(6852, "EEE",3400.00));

      System.out.println("Original List");
      printList(employees);

      // Using sequential stream
      long start = System.currentTimeMillis();
      List<Employee> sortedItems = employees.stream()
         .sorted(Comparator
            .comparing(Employee::getName))
         .collect(Collectors.toList());
      long end = System.currentTimeMillis();

      System.out.println("sorted using sequential stream");
      printList(sortedItems);
      System.out.println("Total the time taken process :"
         + (end - start) + " milisec.");

      // Using parallel stream
      start = System.currentTimeMillis();
      List<Employee> anotherSortedItems = employees
         .parallelStream().sorted(Comparator
            .comparing(Employee::getName))
         .collect(Collectors.toList());
      end = System.currentTimeMillis();

      System.out.println("sorted using parallel stream");
      printList(anotherSortedItems);
      System.out.println("Total the time taken process :"
         + (end - start) + " milisec.");


      double totsal=employees.parallelStream()
         .map(e->e.getSalary())
         .reduce(0.00,(a1,a2)->a1+a2);
      System.out.println("Total Salary expense: "+totsal);
      Optional<Employee> maxSal=employees.parallelStream()
         .reduce((Employee e1, Employee e2)->
         e1.getSalary()<e2.getSalary()?e2:e1);
      if(maxSal.isPresent())
         System.out.println(maxSal.get().toString());
   }

   public static void printList(List<Employee> list) {
      for (Employee e : list)
         System.out.println(e.toString());
   }
}


package org.mano.example;

public class Employee {
   private int empid;
   private String name;
   private double salary;

   public Employee() {
      super();
   }

   public Employee(int empid, String name,
         double salary) {
      super();
      this.empid = empid;
      this.name = name;
      this.salary = salary;
   }

   public int getEmpid() {
      return empid;
   }

   public void setEmpid(int empid) {
      this.empid = empid;
   }

   public String getName() {
      return name;
   }

   public void setName(String name) {
      this.name = name;
   }

   public double getSalary() {
      return salary;
   }

   public void setSalary(double salary) {
      this.salary = salary;
   }

   @Override
   public String toString() {
      return "Employee [empid=" + empid + ", name="
         + name + ", salary=" + salary + "]";
   }
}

In the previous code, note how we have applied sorting on a stream one by using sequential execution.

List<Employee> sortedItems = employees.stream()
               .sorted(Comparator
               .comparing(Employee::getName))
               .collect(Collectors.toList());

and parallel execution is achieved by changing the code slightly.

List<Employee> anotherSortedItems = employees
               .parallelStream().sorted(Comparator
               .comparing(Employee::getName))
               .collect(Collectors.toList());

We'll also compare the system time to get an idea which part of the code takes more time. Parallel operation begins once the parallel stream is explicitly obtained by the parallelStream() method. There is another interesting method, called reduce(). When we apply this method to a parallel stream, the operation can occur in different threads.

However, we always can switch between parallel and sequential as per the need. If we want to change the parallel stream to sequential, we may do so by invoking the sequential() method specified by BaseStream. As we saw in our first program, the operation performed on the stream can be ordered or unordered according to the order of the elements. This means that the order depends upon the data source. This, however, is not the situation in the case of parallel streams. To boost performance, they are processed in parallel. Because this is done without any sequence, where each partition of the stream is processed independently of the other partitions without any coordination, the consequence is unpredictably unordered. But, if we want specifically to perform an operation on each element in the parallel stream to be ordered, we can consider the forEachOrdered() method, which is an alternative to the forEach() method.

Conclusion

The stream APIs have been a part of Java for a long time, but adding the tweak of parallel processing is very welcoming, and at the same time quite an intriguing feature. This is particularly true because modern machines are multicore and there is a stigma that parallel programming design is complex. The APIs provided by Java provide the capability to incorporate a tinge of parallel programming tweaks in a Java program that has the overall design of sequential execution. This is perhaps the best part of this feature.






Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

By submitting your information, you agree that developer.com may send you developer offers via email, phone and text message, as well as email offers about other products and services that developer believes may be of interest to you. developer will process your information in accordance with the Quinstreet Privacy Policy.

Sitemap

Thanks for your registration, follow us on our social networks to keep up-to-date