October 25, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Going Parallel with the Task Parallel Library and PLINQ

  • June 30, 2009
  • By Jani Järvinen
  • Send Email »
  • More Articles »

Exploring the Task Parallel Library

For managed code developers, threading is not new. The System.Threading namespace contains Thread and ThreadPool classes: effective options for enabling threading. Even so, many developers are not comfortable handling locks, semaphores, and other synchronization mechanisms, and might thus not be familiar with concepts such as deadlocks or race conditions.

The TPL aims to solve these problems. The main idea of the library is to present developers with the concept of a task. A task can be any piece of code, such as a method call. Developers use these tasks to compose their applications. Behind the scenes, the framework manages a thread pool, selects how many such tasks may run in parallel, and even helps to you synchronize the tasks if needed. Tasks provide developers with a higher level of abstraction than plain threads.

In addition to task-related classes, the TPL contains classes to help parallelize for and foreach loops. Using the Parallel class, it's easy to modify existing loops to run iterations in parallel. Of course, developers can't just blindly replace all existing loops with parallel versions: the TPL can't guarantee that adding multithreading won't alter the meaning of the code. Instead, you should think of the TPL as a tool that's available, and—just like any other tool—you have to take some responsibility for using it appropriately; the TPL just makes some things much easier than they were before.

It's still up to developers to find the best spots to use the TPL. To do that, you need to understand where the TPL classes might be useful. This becomes easier if you master two concepts: task parallelism and data parallelism.

In task parallelism, multiple, (but not necessary similar) tasks run concurrently in the system. Data parallelism is a bit different: multiple similar operations run in parallel, but each unit processes a different set of data. Roughly put, the Task class supports task parallelism, and the Parallel class supports data parallelism. But technically speaking, that's not necessarily true.

Consider an example consisting of tasks A, B, and C. Running these tasks simultaneously would be an example of task parallelism. On the other hand, suppose you had an array of 10 elements, and you execute code that processes the first half (elements 0–4) in one thread, and the second half (elements 5–9) using another thread. That's an example of data parallelism (see Figure 4).

The Task class lives in the new System.Threading.Tasks namespace. To create a new task object, you can either construct an instance of the Task class directly, or use the TaskFactory class (accessible through the Task.Factory property).

Author's Note: When using tasks, it's assumed that you are familiar with C# 3.0's lambda expressions and the => operator.

You can parallelize loops easily using the Parallel class, also part of the TPL, defined in the System.Threading namespace. The Parallel class exposes static methods such as For and ForEach, as well as a method called Invoke.

Assume that you had the following for loop:

int[] numbers = new int[50];
for (int i = 0; i < numbers.Length; i++)
{
   numbers[i] = i * i;
} 

To parallelize this loop, you would change the code to look like this:

int[] numbers = new int[50];
Parallel.For(0, numbers.Length,
  i => numbers[i] = i * i); 

As you can see, the code does not need to change much, but those small changes affect the behind-the-scenes inner workings dramatically. Here's a disassembly of the generated code:

// Code size 49 (0x31)
.maxstack  5
.locals init ([0] class ParallelTest.MainForm/
  '<>c__DisplayClass1' 'CS$<>8__locals2')
IL_0000:  newobj instance void ParallelTest.MainForm/
  '<>c__DisplayClass1'::.ctor()
IL_0005:  stloc.0
IL_0006:  nop
IL_0007:  ldloc.0
IL_0008:  ldc.i4.s   50
IL_000a:  newarr [mscorlib]System.Int32
IL_000f:  stfld int32[] ParallelTest.MainForm/
  '<>c__DisplayClass1'::numbers
IL_0014:  ldc.i4.0
IL_0015:  ldloc.0
IL_0016:  ldfld int32[] ParallelTest.MainForm/
  '<>c__DisplayClass1'::numbers
IL_001b:  ldlen
IL_001c:  conv.i4
IL_001d:  ldloc.0
IL_001e:  ldftn instance void ParallelTest.MainForm/
  '<>c__DisplayClass1'::'<button2_Click>b__0'(int32)
IL_0024:  newobj instance void class [mscorlib]System.
  Action`1<int32>::.ctor(object, native int)
IL_0029:  call valuetype [mscorlib]System.Threading.
  ParallelLoopResult [mscorlib]System.Threading.
  Parallel::For(int32, int32,
  class [mscorlib]System.Action`1<int32>)
IL_002e:  pop
IL_002f:  nop
IL_0030:  ret 

The Parallel.Invoke method lets you start multiple operations quickly via a single statement, either by calling methods directly, as shown in the beginning of the article, or using anonymous methods as shown below:

int a = 123;
int b = 234;
int c = 345;
Parallel.Invoke(
  () =>
  {
    if (PrimeNumbers.IsPrime(a))
    {
        PrimeFound();
    }
  },
  () =>
  {
    if (PrimeNumbers.IsPrime(b))
    {
        PrimeFound();
    }
  },
  () =>
  {
    if (PrimeNumbers.IsPrime(c))
    {
        PrimeFound();
    }
  }); 

While being able to launch parallel operations is indeed very useful, you will often need to control your parallel operations as well. The TPL supports waiting on tasks and parallel operations to finish. It's also possible to cancel currently executing tasks. In addition, you can chain tasks together so that when one task completes, the TPL automatically continues with the next task in the chain.

The Task.WaitAll and Task.WaitAny let you wait until all (or any) of the given tasks completes. For example, note the two different ways to start tasks in the following code:

Task t1 = new Task(MyMethod1);
t1.Start();
// alternative construction method for tasks:
Task t2 = Task.Factory.StartNew(MyMethod2);
Task t3 = Task.Factory.StartNew(MyMethod3);
Task.WaitAll(t1, t2, t3);

Waiting is useful not only for synchronization, but also when a task returns some value, such as a calculation result. In this case, your code could look similar to this:

internal int Calculate(int number)
{
   return number * number + 123;
}
private void button6_Click(object sender, EventArgs e)
{
   Task<int>[] tasks = new Task<int>[] {
      Task<int>.Factory.StartNew(() => Calculate(1)),
      Task<int>.Factory.StartNew(() => Calculate(2)),
      Task<int>.Factory.StartNew(() => Calculate(3))
   };
   int index = Task.WaitAny(tasks);
   MessageBox.Show("Task " + index + " finished first, " +
      "result = " + tasks[index].Result);
} 

To start one task after another, use the ContinueWith method, which supports efficient task chaining.

Because parallel tasks are still regular code, you need to take possible exceptions into account. You can still handle these with the standard try-catch and try-finally constructs, but if you wanted to catch exceptions from the code that launched the threads, you have two new options: you can either protect your WaitAll and WaitAny calls with proper exception handling (more on this in just a second), or you can examine the Exception property of each Task object.

When a task raises an exception that is not handled by the task itself, Task Parallel Library stores the exception in the Task object's Exception property. Additionally, it collects the exception to an internal list. In other words, if your code waits for a thread and exceptions occur, the wait method also raises an exception.

Because multiple tasks could raise exceptions simultaneously—and you might be calling the WaitAll method, the TPL does not directly raise the same exceptions that the tasks raised. Instead, it uses an AggregateException object, which in turn contains the exceptions raised by the tasks as inner exception objects. You can then loop through the exceptions, and handle them as appropriate.





Page 2 of 3



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel