Thread Synchronization Fairness in the .NET CLR
When I first started learning how thread synchronization was exposed in the .NET Framework's class library, I was immediately concerned about the System.Threading.Monitor class's TryEnter method. There are several overloaded versions of the TryEnter method, some of which accept a timeout argument. This timeout argument allows the caller to specify an amount of time that the caller is willing to wait to gain ownership of the specified object. If the caller gains ownership of the specified object, then TryEnter returns true; if the timeout expires, then TryEnter returns false.
You might be wondering why I was concerned about the TryEnter method. Well, in July of 1996, I wrote a Win32 Q & A column for Microsoft Systems Journal where I implemented my own synchronization object called an Optex. My Optex object offered an OPTEX_Enter function that allowed the called to specify a timeout value (just like Monitor's TryEnter method). This MSJ column can be found here: http://www.microsoft.com/msj/defaulttop.asp?page=/msj/archive/s1da0.htm. Unfortunately, after the column was published, I realized that it contained a bug. The problem was that it is only possible to implement a user -mode synchronization object that either tests for ownership and returns immediately or waits for ownership infinitely. The reason is because making a thread wait requires that the thread jump to kernel mode and there is no way for a thread to wake up from kernel mode and set a user-mode variable atomically. So a race condition is possible when two threads simultaneously wait on the same object. This race condition will allow a thread to think another thread owns the object when, in fact, it doesn't. Once I discovered this bug in my Optex code, I published the discovery in my September 1996 MSJ column: http://www.microsoft.com/msj/defaulttop.asp?page=/msj/archive/s202b.htm.
So, if it's not possible to fix this problem, then you might ask if Monitor's TryEnter method is buggy or not. Well, as it turns out, Monitor's TryEnter is not buggy. Internally, TryEnter sleeps waiting for an owned object to become available. When the thread that owns the object releases it, all waiting threads are awakened. Each of the waiting threads loops around trying to gain ownership of the object again. One of the waiting threads will become the owner and the other threads will go back to sleep. When a thread goes back to sleep, it subtracts the amount of time that the thread has already slept from the amount of time the caller specified to the TryEnter method. So, to the caller, it looks like the thread is sleeping the correct amount of time. While TryEnter is not buggy, it is not fair: It's entirely possible (and quite likely) that multiple threads waiting to own an object will not be serviced in a first-in-first-out fashion.
So, the important thing for you to be aware of is that thread synchronization using the Monitor class is not fair in the .NET Framework and there is no way to make it fair. This means that if you have threads that are constantly trying to own an object using a Monitor, it is possible that some threads will never gain ownership! This also means that you should not use the Monitor if you are building an application that tries to simulate some kind of real-world situation that involves a queue. For example, you should not try to build a supermarket simulation where customers are standing in line at a cash register trying to be serviced on a first-come-first-serve basis and you want to see how many customers can be serviced per hour. If you use a Monitor for this, the simulation will be broken because it would allow customers to jump in front of other customers in the line.
So, after discovering how unfair synchronizing threads via the Monitor class was, I started designing and implementing my own, fair thread synchronization code for the .NET Framework. After several hours of playing around with different ideas, I was having no luck at all. All my tests showed that every synchronization technique I tried ended up producing another unfair mechanism. Then, finally, it dawned on me; it's not possible to have a fair thread synchronization mechanism in the managed world. Here's why...
The CLR manages memory via garbage collection. When the CLR wants to start a garbage collection, it will determine which threads are currently executing managed code and which threads are currently executing unmanaged code. After making this determination, the CLR will suspend the threads executing managed code. Threads that are currently executing unmanaged code will self-hijack themselves when they attempt to return back to managed code. There is a small window of time where a thread is currently in managed code, the CLR thinks it needs to suspend this thread, and then the thread calls into the unmanaged Win32 WaitForSingleObject or WaitForMultipleObjects functions and while in one of these functions, the CLR suspends the thread.
When Windows suspends a thread, it stops the thread from waiting for any thread synchronization object. Later, when the thread is resumed, all the suspended threads race back to wait on the object that it was waiting on before it got suspended. This means that threads are not guaranteed to gain ownership of an object on a first-in-first-out basis. Since a garbage collection can start at any time (and cannot be prevented), the architecture of the CLR just doesn't support fair thread synchronization — period. In addition, all managed wait methods (such as WaitHandle's WaitOne, WaitAll, and WaitAny methods) put the calling thread into an alertable state, which can also force a thread to stop waiting on an object and to re-queue its wait in a different order.
If you are building an application that absolutely requires fair thread synchronization, you should not use the .NET Framework at all. If your application's threads require synchronized access to resources only periodically, then the CLR's unfairness will most likely not be a problem for your application. In fact, most applications will run fine without fair thread synchronization but, at least, you should be aware of this issue.
About the Author
Jeffrey Richter has concentrated on Windows development since version 1.0 (the version in which all windows were tiled and there was no color). Jeffrey is a Wintellect cofounder and a member of the .NET team at Microsoft. He has also worked on Windows 9x, Windows NT/2000, Microsoft Golf, Visual Studio and Visual C++, and other projects for companies such as Intel and DreamWorks.
Jeffrey has written several books on Windows programming, including Advanced Windows, Programming Applications for Microsoft Windows(formerly Advanced Windows), and Programming Server-Side Applications for Microsoft Windows, all published by Microsoft Press.
# # #