October 22, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Going Parallel with the Task Parallel Library and PLINQ

  • June 30, 2009
  • By Jani Järvinen
  • Send Email »
  • More Articles »

Parallelizing Queries with PLINQ

Parallel LINQ lets you parallelize LINQ to Objects queries. With PLINQ, you can take existing LINQ queries, and use the client computer's number-crunching power to process the results. For example, suppose you had the following LINQ query, which retrieves data from an SQL Server database:

DataClasses1DataContext ctx =
    new DataClasses1DataContext();
var orders =
  (from ord in ctx.Orders
   orderby ord.Order_Details.Sum(
     o => o.UnitPrice) descending
   select new
   {
     Customer = ord.Customer.CompanyName,
     OrderId = ord.OrderID,
     Amount = ord.Order_Details.Sum(
       o => o.UnitPrice)
   }).Take(5);

This query would return the top five orders with the maximum unit price. Using the preceding code, LINQ to SQL would construct the necessary SQL statement, send it to the server, and convey the results back to the application. To parallelize this query, you would simply add the AsParallel method call to the last line like this:

...
}).AsParallel().Take(5); 

After this change, query results processing will move from the database server to the client. This is important to understand: in addition to performance considerations, memory usage is also likely to increase on the client, because it must fetch more data from the database.

The original LINQ statement constructs a SQL SELECT statement containing the TOP clause, but in the PLINQ version, the SELECT statement fetches records from the Orders table and then filters the data on the client. This might or might not be what you are after.

In fact, when comparing the usability of the Task Parallel Library and PLINQ for database access, it is sometimes easier to find benefiting usage scenarios for TPL than for PLINQ. However, this is not to say that PLINQ isn't useful—it just depends on the situation and your data source. As PLINQ is still a new technology, best practices are yet to be formed.

Still, it is clear that PLINQ is most useful when you already have a large set of data in memory on the client, and you wish to query this data using complex LINQ queries. In such cases, finding the correct records and doing calculations based on each record are jobs that can often benefit from threading. Splitting the work between multiple processor cores is the key.

Compared to regular LINQ queries against SQL data sources, the benefits of PLINQ can be smaller. Just as when parallelizing loops, developers must decide whether multi-threading will be appropriate for any given query; not all queries will automatically benefit from parallelization.

PLINQ does not blindly run each and every query in parallel; it analyzes your query first. Based on this analysis, PLINQ then either runs the query serially (without threading) or in parallel. Bu because PLINQ's analysis isn't always correct, you can give it hints, which are similar to plans in SQL queries. You do this via the WithExecutionMode method (new in Beta 1). You can also use the WithDegreeOfParallelism method to control the number of parallel threads used to run your query.

New Structures to Help Developers

In addition to providing the TPL and PLINQ, Microsoft has also enhanced support for thread-safe data structures and classes in .NET 4.0. The new System.Collections.Concurrent namespace introduces classes such as ConcurrentDictionary, ConcurrentQueue, and ConcurrentStack. Although you could use the regular versions of the Dictionary, Queue, and Stack classes in conjunction with your own locking schemes, these new classes are both more convenient, and are also optimized for performance.

Using the new classes is easy: They operate much in the same way as their non-parallel origins, but remove the need to worry about locks. For example, to add and remove items from a ConcurrentQueue you could use code like this:

ConcurrentQueue<int> queue = 
   new ConcurrentQueue<int>();
queue.Enqueue(123);
...
int first;
if (queue.TryDequeue(out first))
{
    // success
} 

In addition to these useful utility classes, the next .NET version also gives developers new low-level threading tools to play with in the extended System.Threading namespace. You can think of the new classes there as being lighter-weight versions of previous locking primitives such as Semaphore and ManualResetEvent. Here's a list of the new additions:

  • Barrier
  • CountdownEvent
  • ManualResetEventSlim
  • SemaphoreSlim
  • SpinLock
  • SpinWait

For example, to lock access to a global resource using the new SpinLock class, you could write:

SpinLock spinlock = new SpinLock(true);
int[] numbers = new int[50];
for (int i = 0; i < numbers.Length; i++)
{
  bool lockTaken = false;
  spinlock.Enter(ref lockTaken);
  try
  {
    if (lockTaken)
    {
      ..
    }
  }
  finally
  {
    if (lockTaken) spinlock.Exit();
  }
} 

Although most of the parallel enhancements in .NET 4.0 are performance-related, there's at least one new class that gives the chip inside your PC time to breath. The class sounds like the exact opposite of performance improvement—and appears to be the perfect solution for any late Friday afternoon development session: System.Lazy.

The idea behind this class is that you typically want to allocate memory for an object only when it's needed. By default, the framework allocates memory immediately when you construct an object. In contrast, if you give an object type to the Lazy class, it initializes only when you first access the object. Convenient!

Now's a Good Time to Get Parallel

This article showed some of the new parallel features in Visual Studio 2010 and .NET Framework 4.0. To sum up, the three major new portions are the concurrency runtime, the Task Parallel Library, and Parallel LINQ. The Task Parallel Library lets you split application execution into small chunks of code called tasks that can be executed in parallel. The new Task class and related helper classes, all part of TPL, make this work relatively easy. Unfortunately, it's more difficult to spot the best opportunities for running code in parallel. Because loops are often good candidates for threaded execution, the TPL provides the Parallel class, which helps you parallelize for and foreach loops.

If you are already using LINQ queries, then you'll probably want to take advantage of the new Parallel LINQ features. PLINQ extends your LINQ to Objects queries to use multiple threads. Although PLINQ isn't an optimal solution for every possible LINQ query, it can boost the performance of some complex queries.

Underneath, the new concurrency runtime provides the executing engine for both the Task Parallel Library and LINQ. Of course, all the parallel enhancements in .NET 4.0 ultimately rely on the processor power of your PC. Although using these new features might seem simple for the developer, Microsoft has developed a custom task scheduler which is able to tune workloads based on the number of cores available on the computer. If for some reason you aren't happy with Microsoft's implementation, you are free to roll your own.

With this kind of power at hand, it's definitely a good time to begin looking at the threading options. Even traditional business database GUI applications can benefit from parallelization; server applications are even better targets.

Here are some links for further reading:

About the Author

Jani Järvinen is a software development trainer and consultant in Finland. He's a Microsoft C# MVP and frequent author who has published three books about software development. He is the group leader of a Finnish software development expert group at ITpro.fi and a board member of the Finnish Visual Studio Team System User Group. Check out his blog. You can send mail to the author clicking on his name at the top of the article.





Page 3 of 3



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel