Architecture & DesignGoing Parallel with the Task Parallel Library and PLINQ

Going Parallel with the Task Parallel Library and PLINQ

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

When developing applications, most developers tend to think linearly through the logical steps needed to complete a task. While sequential thinking leads to working applications that are relatively easy to understand, such single-threaded applications are not able to benefit from the multiple cores in today’s processors.

Before the multi-core era began, Intel and AMD launched faster processors each year, with ever-increasing clock speeds. Effectively, this meant that the same application code simply ran faster on each processor generation—a real-world case of a free lunch.

However, limitations to current processor technology mean that the fastest clock speeds are limited to around 3 GHz. However, manufacturers still need to come up with faster and faster processors to match the demand. Because raising clock speed is (currently) out of question, the only way to increase performance significantly is to increase the number of cores or execution units in the chips. These multiple-core processor are then able to execute instructions in parallel, thus providing more speed. Today’s two- and four-core processors are only the beginning; in the future, 16, 32, and 64 core systems will be commonly available.

But unlike with increasing clock speed, as vendors add multiple cores, your application will not automatically run faster if you just sit on your laurels. The free lunch is over. Because most .NET applications are single-threaded by default (although they may use additional threads for such things as database connection pools), your application code will still run on a single core. For example, if you run a single-threaded .NET application on a PC with a quad-core processor, it will run on one core while the three other cores sit idle.

Surely, a quad-core processor is still able to run multiple applications faster compared to a traditional single-core processor. To some degree, that’s true, because the Windows task scheduler can assign different processes to run on different cores. (The same thing would happen if you had multiple processors with a single core each.)

However, to be able to take full use of the multiple cores that are in even mainstream PCs these days, you need to make your application use more than one thread. That way, the operating system can schedule your application’s threads into multiple cores for simultaneous execution. You need two separate skills to do this: one is the ability to identify possibilities where threading can help improve performance, the other is implementing that behavior.

Speaking of implementation, introducing multiple threads into an application is often easier said than done. In fact, using threads properly has been one programming’s most difficult tasks—until now. Although .NET has provided threading support since .version 1.0, using the Thread class and the low-level locking mechanisms correctly requires skill that not all developers have.

To help more developers gain from the current processors, Microsoft is planning to include support for easier threading in the forthcoming version 4.0 of the .NET framework. For example, the new libraries support running for and foreach loop iterations in parallel with only small alterations to your code. Similarly, you can use a parallel version of LINQ to help boost the performance of your queries.

This article discusses the new parallel programming features available in the future releases of Visual Studio 2010 and .NET 4.0.

Author’s Note: Both the code and information in this article are based on the Beta 1 release of Visual Studio and .NET 4.0, which of course are subject to change in later releases. Still, the concepts discussed here should remain valid even in the final RTM version.

Understanding the New Features in .NET 4.0

When planning the next version of the .NET Framework, one key design consideration was to let developers harness the power of the current processors more easily (see Figure 1). The results of this planning and development work have culminated in a new concurrency runtime with supporting APIs. Both will be available to developers when Visual Studio 2010 and .NET 4.0 are released to manufacturing.

For .NET developers, the new API classes are probably the most interesting new features. The parallel API can further be divided into two parts: the Task Parallel Library (TPL), and Parallel LINQ (PLINQ). Both features help developers use processors more fully. You can think of the Task Parallel Library as a generic set of parallel capabilities, whereas PLINQ focuses on database (or object) manipulation.

Although having additional parallelism support in the .NET framework is great in itself, the story gets better once you bring Visual Studio’s IDE into the mix. Although Visual Studio has had windows to help debug threaded applications for a long time, the new features in Visual Studio 2010 are aimed squarely at developers using the new parallel APIs.

For instance, Visual Studio 2010 has a new window called Parallel Tasks, which can show all tasks running at a given point in time (see Figure 2).

Another new IDE window shows stacks in a new way, referred to as the “cactus view” (see Figure 3), which can help when debugging applications that perform parallelization through the Task Parallel Library. You will also get access to new performance measurement tools that can help you spot bottlenecks in your code.

When designing Task Parallel Library and PLINQ, Microsoft focused on making the features intuitive to use. For example, to run three simple tasks in parallel, you can use the Task Parallel Library as follows:

Parallel.Invoke(
  () => MyMethod1(),
  () => MyMethod2(),
  () => MyMethod3());

Looks easy! Next, assume you had a traditional LINQ query like this:

int[] numbers = new int[50];
...
var over100 = from n in numbers
              where n > 100
              select n; 

To convert this query to a parallelized PLINQ version, simply add the AsParallel construct to the query:

var over100 = (from n in numbers
               where n > 100
               select n).AsParallel(); 

Again, that’s quite simple. After the change, PLINQ will attempt to parallelize the query, taking into account the number of processors (or processor cores) available. Although the preceding query is for illustration only (it actually wouldn’t benefit much from parallelization), you’d make the AsParallel method call the same way for more complex queries that would benefit more. But before going into PLINQ specifics, it’s worth exploring the TPL.

Exploring the Task Parallel Library

For managed code developers, threading is not new. The System.Threading namespace contains Thread and ThreadPool classes: effective options for enabling threading. Even so, many developers are not comfortable handling locks, semaphores, and other synchronization mechanisms, and might thus not be familiar with concepts such as deadlocks or race conditions.

The TPL aims to solve these problems. The main idea of the library is to present developers with the concept of a task. A task can be any piece of code, such as a method call. Developers use these tasks to compose their applications. Behind the scenes, the framework manages a thread pool, selects how many such tasks may run in parallel, and even helps to you synchronize the tasks if needed. Tasks provide developers with a higher level of abstraction than plain threads.

In addition to task-related classes, the TPL contains classes to help parallelize for and foreach loops. Using the Parallel class, it’s easy to modify existing loops to run iterations in parallel. Of course, developers can’t just blindly replace all existing loops with parallel versions: the TPL can’t guarantee that adding multithreading won’t alter the meaning of the code. Instead, you should think of the TPL as a tool that’s available, and—just like any other tool—you have to take some responsibility for using it appropriately; the TPL just makes some things much easier than they were before.

It’s still up to developers to find the best spots to use the TPL. To do that, you need to understand where the TPL classes might be useful. This becomes easier if you master two concepts: task parallelism and data parallelism.

In task parallelism, multiple, (but not necessary similar) tasks run concurrently in the system. Data parallelism is a bit different: multiple similar operations run in parallel, but each unit processes a different set of data. Roughly put, the Task class supports task parallelism, and the Parallel class supports data parallelism. But technically speaking, that’s not necessarily true.

Consider an example consisting of tasks A, B, and C. Running these tasks simultaneously would be an example of task parallelism. On the other hand, suppose you had an array of 10 elements, and you execute code that processes the first half (elements 0–4) in one thread, and the second half (elements 5–9) using another thread. That’s an example of data parallelism (see Figure 4).

The Task class lives in the new System.Threading.Tasks namespace. To create a new task object, you can either construct an instance of the Task class directly, or use the TaskFactory class (accessible through the Task.Factory property).

Author’s Note: When using tasks, it’s assumed that you are familiar with C# 3.0’s lambda expressions and the => operator.

You can parallelize loops easily using the Parallel class, also part of the TPL, defined in the System.Threading namespace. The Parallel class exposes static methods such as For and ForEach, as well as a method called Invoke.

Assume that you had the following for loop:

int[] numbers = new int[50];
for (int i = 0; i < numbers.Length; i++)
{
   numbers[i] = i * i;
} 

To parallelize this loop, you would change the code to look like this:

int[] numbers = new int[50];
Parallel.For(0, numbers.Length,
  i => numbers[i] = i * i); 

As you can see, the code does not need to change much, but those small changes affect the behind-the-scenes inner workings dramatically. Here’s a disassembly of the generated code:

// Code size 49 (0x31)
.maxstack  5
.locals init ([0] class ParallelTest.MainForm/
  '<>c__DisplayClass1' 'CS$<>8__locals2')
IL_0000:  newobj instance void ParallelTest.MainForm/
  '<>c__DisplayClass1'::.ctor()
IL_0005:  stloc.0
IL_0006:  nop
IL_0007:  ldloc.0
IL_0008:  ldc.i4.s   50
IL_000a:  newarr [mscorlib]System.Int32
IL_000f:  stfld int32[] ParallelTest.MainForm/
  '<>c__DisplayClass1'::numbers
IL_0014:  ldc.i4.0
IL_0015:  ldloc.0
IL_0016:  ldfld int32[] ParallelTest.MainForm/
  '<>c__DisplayClass1'::numbers
IL_001b:  ldlen
IL_001c:  conv.i4
IL_001d:  ldloc.0
IL_001e:  ldftn instance void ParallelTest.MainForm/
  '<>c__DisplayClass1'::'<button2_Click>b__0'(int32)
IL_0024:  newobj instance void class [mscorlib]System.
  Action`1<int32>::.ctor(object, native int)
IL_0029:  call valuetype [mscorlib]System.Threading.
  ParallelLoopResult [mscorlib]System.Threading.
  Parallel::For(int32, int32,
  class [mscorlib]System.Action`1<int32>)
IL_002e:  pop
IL_002f:  nop
IL_0030:  ret 

The Parallel.Invoke method lets you start multiple operations quickly via a single statement, either by calling methods directly, as shown in the beginning of the article, or using anonymous methods as shown below:

int a = 123;
int b = 234;
int c = 345;
Parallel.Invoke(
  () =>
  {
    if (PrimeNumbers.IsPrime(a))
    {
        PrimeFound();
    }
  },
  () =>
  {
    if (PrimeNumbers.IsPrime(b))
    {
        PrimeFound();
    }
  },
  () =>
  {
    if (PrimeNumbers.IsPrime(c))
    {
        PrimeFound();
    }
  }); 

While being able to launch parallel operations is indeed very useful, you will often need to control your parallel operations as well. The TPL supports waiting on tasks and parallel operations to finish. It’s also possible to cancel currently executing tasks. In addition, you can chain tasks together so that when one task completes, the TPL automatically continues with the next task in the chain.

The Task.WaitAll and Task.WaitAny let you wait until all (or any) of the given tasks completes. For example, note the two different ways to start tasks in the following code:

Task t1 = new Task(MyMethod1);
t1.Start();
// alternative construction method for tasks:
Task t2 = Task.Factory.StartNew(MyMethod2);
Task t3 = Task.Factory.StartNew(MyMethod3);
Task.WaitAll(t1, t2, t3);

Waiting is useful not only for synchronization, but also when a task returns some value, such as a calculation result. In this case, your code could look similar to this:

internal int Calculate(int number)
{
   return number * number + 123;
}
private void button6_Click(object sender, EventArgs e)
{
   Task<int>[] tasks = new Task<int>[] {
      Task<int>.Factory.StartNew(() => Calculate(1)),
      Task<int>.Factory.StartNew(() => Calculate(2)),
      Task<int>.Factory.StartNew(() => Calculate(3))
   };
   int index = Task.WaitAny(tasks);
   MessageBox.Show("Task " + index + " finished first, " +
      "result = " + tasks[index].Result);
} 

To start one task after another, use the ContinueWith method, which supports efficient task chaining.

Because parallel tasks are still regular code, you need to take possible exceptions into account. You can still handle these with the standard try-catch and try-finally constructs, but if you wanted to catch exceptions from the code that launched the threads, you have two new options: you can either protect your WaitAll and WaitAny calls with proper exception handling (more on this in just a second), or you can examine the Exception property of each Task object.

When a task raises an exception that is not handled by the task itself, Task Parallel Library stores the exception in the Task object’s Exception property. Additionally, it collects the exception to an internal list. In other words, if your code waits for a thread and exceptions occur, the wait method also raises an exception.

Because multiple tasks could raise exceptions simultaneously—and you might be calling the WaitAll method, the TPL does not directly raise the same exceptions that the tasks raised. Instead, it uses an AggregateException object, which in turn contains the exceptions raised by the tasks as inner exception objects. You can then loop through the exceptions, and handle them as appropriate.

Parallelizing Queries with PLINQ

Parallel LINQ lets you parallelize LINQ to Objects queries. With PLINQ, you can take existing LINQ queries, and use the client computer’s number-crunching power to process the results. For example, suppose you had the following LINQ query, which retrieves data from an SQL Server database:

DataClasses1DataContext ctx =
    new DataClasses1DataContext();
var orders =
  (from ord in ctx.Orders
   orderby ord.Order_Details.Sum(
     o => o.UnitPrice) descending
   select new
   {
     Customer = ord.Customer.CompanyName,
     OrderId = ord.OrderID,
     Amount = ord.Order_Details.Sum(
       o => o.UnitPrice)
   }).Take(5);

This query would return the top five orders with the maximum unit price. Using the preceding code, LINQ to SQL would construct the necessary SQL statement, send it to the server, and convey the results back to the application. To parallelize this query, you would simply add the AsParallel method call to the last line like this:

...
}).AsParallel().Take(5); 

After this change, query results processing will move from the database server to the client. This is important to understand: in addition to performance considerations, memory usage is also likely to increase on the client, because it must fetch more data from the database.

The original LINQ statement constructs a SQL SELECT statement containing the TOP clause, but in the PLINQ version, the SELECT statement fetches records from the Orders table and then filters the data on the client. This might or might not be what you are after.

In fact, when comparing the usability of the Task Parallel Library and PLINQ for database access, it is sometimes easier to find benefiting usage scenarios for TPL than for PLINQ. However, this is not to say that PLINQ isn’t useful—it just depends on the situation and your data source. As PLINQ is still a new technology, best practices are yet to be formed.

Still, it is clear that PLINQ is most useful when you already have a large set of data in memory on the client, and you wish to query this data using complex LINQ queries. In such cases, finding the correct records and doing calculations based on each record are jobs that can often benefit from threading. Splitting the work between multiple processor cores is the key.

Compared to regular LINQ queries against SQL data sources, the benefits of PLINQ can be smaller. Just as when parallelizing loops, developers must decide whether multi-threading will be appropriate for any given query; not all queries will automatically benefit from parallelization.

PLINQ does not blindly run each and every query in parallel; it analyzes your query first. Based on this analysis, PLINQ then either runs the query serially (without threading) or in parallel. Bu because PLINQ’s analysis isn’t always correct, you can give it hints, which are similar to plans in SQL queries. You do this via the WithExecutionMode method (new in Beta 1). You can also use the WithDegreeOfParallelism method to control the number of parallel threads used to run your query.

New Structures to Help Developers

In addition to providing the TPL and PLINQ, Microsoft has also enhanced support for thread-safe data structures and classes in .NET 4.0. The new System.Collections.Concurrent namespace introduces classes such as ConcurrentDictionary, ConcurrentQueue, and ConcurrentStack. Although you could use the regular versions of the Dictionary, Queue, and Stack classes in conjunction with your own locking schemes, these new classes are both more convenient, and are also optimized for performance.

Using the new classes is easy: They operate much in the same way as their non-parallel origins, but remove the need to worry about locks. For example, to add and remove items from a ConcurrentQueue you could use code like this:

ConcurrentQueue<int> queue = 
   new ConcurrentQueue<int>();
queue.Enqueue(123);
...
int first;
if (queue.TryDequeue(out first))
{
    // success
} 

In addition to these useful utility classes, the next .NET version also gives developers new low-level threading tools to play with in the extended System.Threading namespace. You can think of the new classes there as being lighter-weight versions of previous locking primitives such as Semaphore and ManualResetEvent. Here’s a list of the new additions:

  • Barrier
  • CountdownEvent
  • ManualResetEventSlim
  • SemaphoreSlim
  • SpinLock
  • SpinWait

For example, to lock access to a global resource using the new SpinLock class, you could write:

SpinLock spinlock = new SpinLock(true);
int[] numbers = new int[50];
for (int i = 0; i < numbers.Length; i++)
{
  bool lockTaken = false;
  spinlock.Enter(ref lockTaken);
  try
  {
    if (lockTaken)
    {
      ..
    }
  }
  finally
  {
    if (lockTaken) spinlock.Exit();
  }
} 

Although most of the parallel enhancements in .NET 4.0 are performance-related, there’s at least one new class that gives the chip inside your PC time to breath. The class sounds like the exact opposite of performance improvement—and appears to be the perfect solution for any late Friday afternoon development session: System.Lazy.

The idea behind this class is that you typically want to allocate memory for an object only when it’s needed. By default, the framework allocates memory immediately when you construct an object. In contrast, if you give an object type to the Lazy class, it initializes only when you first access the object. Convenient!

Now’s a Good Time to Get Parallel

This article showed some of the new parallel features in Visual Studio 2010 and .NET Framework 4.0. To sum up, the three major new portions are the concurrency runtime, the Task Parallel Library, and Parallel LINQ. The Task Parallel Library lets you split application execution into small chunks of code called tasks that can be executed in parallel. The new Task class and related helper classes, all part of TPL, make this work relatively easy. Unfortunately, it’s more difficult to spot the best opportunities for running code in parallel. Because loops are often good candidates for threaded execution, the TPL provides the Parallel class, which helps you parallelize for and foreach loops.

If you are already using LINQ queries, then you’ll probably want to take advantage of the new Parallel LINQ features. PLINQ extends your LINQ to Objects queries to use multiple threads. Although PLINQ isn’t an optimal solution for every possible LINQ query, it can boost the performance of some complex queries.

Underneath, the new concurrency runtime provides the executing engine for both the Task Parallel Library and LINQ. Of course, all the parallel enhancements in .NET 4.0 ultimately rely on the processor power of your PC. Although using these new features might seem simple for the developer, Microsoft has developed a custom task scheduler which is able to tune workloads based on the number of cores available on the computer. If for some reason you aren’t happy with Microsoft’s implementation, you are free to roll your own.

With this kind of power at hand, it’s definitely a good time to begin looking at the threading options. Even traditional business database GUI applications can benefit from parallelization; server applications are even better targets.

Here are some links for further reading:

About the Author

Jani Järvinen is a software development trainer and consultant in Finland. He’s a Microsoft C# MVP and frequent author who has published three books about software development. He is the group leader of a Finnish software development expert group at ITpro.fi and a board member of the Finnish Visual Studio Team System User Group. Check out his blog. You can send mail to the author clicking on his name at the top of the article.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories