"Beware! A server that’s humming along is easy prey for overutilization. " |
Your team is almost done. It’s been a long six months since you started developing this enterprise application: an order entry system for your company’s product catalog. It seems, though, that during stress testing the application bogs down under heavy loads.
Now it’s time to break the news to your boss: the servers all need substantial upgrades. It’s going to be expensive in terms of hardware and network administration (those MCSEs are so costly), and the servers will all have some down time. These are not pleasant thoughts.
Before the big news-breaking meeting, a fresh-out-of-college junior developer says he has some ideas. The new kid can’t possibly overshadow your 20 years, can he? But with the imminent fireworks looming, you give him the green light to experiment.
When you come in early the next morning, your junior developer is slumped in his chair. You ask how it’s going. He straightens up and breaks into a wide grin. He tells you that a simple change from a File DNS to a System DNS allowed the application to function adequately, even under a heavy load. What a relief!
This scenario plays out in most software development shops at one time or another. A simple change can make an enormous difference in performance and save the day. Even more damaging would have been a situation where the application was deployed with great expectations, only to find that it couldn’t handle heavy usage situations. This article addresses many of the Windows Distributed interNet Architecture (Windows DNA) performance issues. In a follow up article, I’ll show you how to test your applications before you go live, using Microsoft’s Performance Kit.
Assessing Your Needs
Before you dive headlong into the Windows DNA optimization morass (and before your boss skewers you for spending time unnecessarily), you need to find out what your goal is. It’s possible that your application is ready even without any optimization. The most important hard number you’ll need is the number of transactions per second (TPS) that the application must support. Two variables determine this: the application Think Time and the number of users.
Think Time is the amount of human interaction time for each transaction. This includes reading and viewing screens, making decisions, and making selections or typing input. If your application is an order entry system then the Think Time is the amount of time it takes for a user to view the catalog, decide what to buy, and select the desired items in the user interface.
To calculate the required TPS, divide the number of users by the Think Time as shown in the following formula:
Number of Users | ||
Transactions Per Second | = | —————————- |
Think Time (in seconds) |
With a target TPS value in hand, you can determine if your application will handle the projected load. The determination may be a result of your own testing, or a result of using the Windows DNA Performance Kit. The story isn’t over, though, when you test and find out that everything works well enough for the next 12 months. Consider the following:
- Will traffic increase at any time to outpace the capacity of the application before it’s upgraded?
- Will you rely on any of the current components in the next version of the application?
- Will other applications be deployed to the same server reducing its capabilities for this application?
On the first point, I’ve found it rare that traffic increases past expectations. That doesn’t mean it won’t happen, but management and marketing have a way of spinning everything so that projections are always rosy. As a matter of fact, I’m still waiting for traffic to meet the expectations of management on any project.
As for the second point, management always thinks you can reuse components in the next version without alteration. They don’t realize that what delivered adequate performance last year can turn out to be a bottleneck in the next version of the application. This is usually a result of newer components being based on newer, more efficient technology outpacing the components built on older technology. You have to decide before deploying a component if it will be expected to serve in a new version of the application, and if it can handle the increased demands. You may or may not decide that modifications are needed based on the constraints of the project.
The third point is difficult as well. A server that’s humming along is easy prey for overutilization. When someone’s budget is short, the first things targeted are servers that can accommodate an additional application. If another application is installed on the server where your application is performing perfectly, you might suddenly be faced with a performance hit that it can’t sustain. Plan for this eventuality in advance. I have been burned this way several times.
Optimization Tips
In this section, I’ll talk about some specific ways to optimize your Windows DNA applications. I would greatly appreciate your ideas. If you have any, please email them to me. I hope to write an article sometime in the near future with a compilation of suggestions from readers.
Implement User or System DSN instead of File DSN
As mentioned earlier, a User or System DSN will give better performance than a File DSN. That’s because a File DSN requires more resources including RAM and CPU cycles. In Microsoft’s DNA Performance Kit, the documentation contains a graph showing the results when changing from a File DSN to a System DSN. The performance in their example increased 577 percent. Avoid File DSNs whenever possible.
Optimize Algorithms, Especially Iterative Loops
It is amazing how much a simple algorithmic change can make. This is especially evident inside of loops. Compare, for example, the two code fragments below:
Two similar loops
Loop One
for( i=0; i<10000; i++ ) { for( j=0; j<10000; j++ ) { nValue = ( 10000 - i - 1 ) * 50 + j; } } Loop Two for( i=0; i<10000; i++ ) { nTemp = ( 10000 - i - 1 ) * 50; for( j=0; j<10000; j++ ) { nValue = nTemp + j; } }
I created a COM object with two classes, CVersionOne and CVersionTwo. CVersionOne implements the first loop in a method named Perform(). CVersionTwo implements the second loop in a method also named Perform(). Each class also has a property named Milliseconds that contains the number of milliseconds that it took to execute the method. After calling both methods from a Visual Basic program on a 350 Mhz Pentium II, I found the second code fragment to be significantly faster as the following table shows.
Class | Milliseconds | Performance Increase |
CVersionOne | 2,975 | n/a |
CVersionTwo | 2,063 | 44.2% |
The source code for this example is available here . |
Note: If you compile the Algorithm COM object in anything but debug mode, the optimizer in Visual C++ will see that nothing is actually being done and reduce the Perform() method to practically nothing. The Visual Basic program will then simply get a value of zero milliseconds for the time it takes to call each method.
Avoid registry access
For standalone applications, registry access is fast enough. That’s because most standalone applications make a minimum number of trips to the registry. A component that’s part of a middle tier is an entirely different matter. While the trip to the registry exacts a small price in itself, it quickly compounds with a large-scale application.
I wrote a COM object that opens and reads a single registry string 10,000 times to find out a hard number for how expensive registry access really is. The code fragment that does the work follows:
Reading a Registry String 10,000 times
void CWorker::ReadRegistryString( HKEY Key, DWORD dwSize,
const char *pszKeyname, const char *pszDataname, void *pRetbuffer )
{
HKEY hKey;
if( RegOpenKeyEx( Key, pszKeyname, 0, KEY_ALL_ACCESS, &hKey )
!= ERROR_SUCCESS )
return;
RegQueryValueEx( hKey, pszDataname, NULL, NULL,
(unsigned char *) pRetbuffer, &dwSize );
RegCloseKey( hKey );
}
STDMETHODIMP CWorker::Perform()
{
char szBuffer[500];
DWORD dwStart = GetTickCount();
for( int i= 0;i<10000; i++ )
{
ReadRegistryString( HKEY_LOCAL_MACHINE, sizeof( szBuffer ),
"SOFTWARERefillMinderRefillMinder",
"DrugInfoDatabase", szBuffer );
}
m_dwMilliseconds = GetTickCount() – dwStart;
return S_OK;
}
From a Visual Basic program, I called the Perform() method. It took 802 milliseconds for the call to Perform(). This equates to about .08 milliseconds per registry access. While it might seem small, it can quickly add up under heavy traffic situations.
The source code for this example is available here . |
Use Just-in-Time Activation Whenever Possible
One of the general techniques of optimization is to use resources as late as possible and release them as early as possible. This ties up resources for the shortest period of time possible. Tight resources can steal performance from components as quickly as anything else. This can be caused by several things, but the first and most likely cause is that the system will perform virtual memory swapping when RAM gets low. Resources can also be pooled more efficiently when they’re usage time is short.
Just-in-Time Activation (JIT) allows MTS and COM+ to manage objects so that they can be easily reused and pooled. That means that even if an object is created, it won’t tie up resources until it’s needed.
There’s a program in the Windows DNA Performance Kit named MTCLIENT (I’m ahead of myself already) that makes a great case for this argument. You can turn JIT on and off with command line arguments when executing the program. In my test I found the JIT improved the performance substantially:
Uses JIT | TPS | Performance Increase |
No | 5,174 | n/a |
Yes | 8,6882 | 15,792% |
Find and Fix Resource Leaks
Resource leaks over time, even small ones, can slow a server to an absolute crawl. Suppose for example that a component fails to deallocate 1,024 bytes. After 1,024 instantiations of the component you have 1 megabyte of locked RAM. Before long that 1 megabyte becomes 2 megabtes, and so on. Even with lots of memory the server starts to slow down as it relies on the swap file for virtual memory. Worse yet, the server will probably eventually crash.
I have personally learned how costly component memory leaks can be. I wrote a component that did some special email handling for a Web application. It had a memory leak of 12 bytes per instance. Over a two month period the server bogged down, and then crashed. Of course the system administrator was the one worrying over it. I was obliviously leading my life as if nothing were wrong. But six months after the component was in use, they had tracked the slowdown and crashing problems to the component. Needless to say, I had to do some talking just to get out of the system administrator’s office alive.
The point is this: a resource leak of any kind, no matter how small, will over time degrade performance. Testing for resource leaks should be a high priority when you or your team members develop components.
The following code illustrates how easily it can happen. If the file doesn’t open, a return is made before the char buffer is deallocated.
An Easy-To-Miss Memory Leak
void MyMethod( void )
{
char *pszBuffer;
HFILE hFile;
pszBuffer = new char [1024];
hFile = _lopen( "TestFile.txt", OF_READ );
if( hFile == HFILE_ERROR )
return;
_lread( hFile, pszBuffer, 1024 );
_lclose( hFile );
delete [] pszBuffer;
}
Some of you are looking at the simple code example above saying “I never do that, I’m more careful.” Sure you are. Don’t forget that more complex code, though, has more room for error. Everyone must be eternally vigilant to prevent memory leaks.
Balance Objects and Practicality
This topic is an invitation for dissent regardless of what I say. The debate over this rages on. In one camp are those who take a pragmatic approach and almost forsake Object-Oriented (OO) principles in the name of performance. In the other camp are the OO purists who say that good software design carries the day in any situation.
One point to consider is that OO components are typically slower and larger than their non-OO counterparts. That means that under MTS and COM+ you have to worry about whether they consume too much in the way of resources, and chew up too many CPU cycles. Microsoft seems to be backpedaling from pure OO design in their effort the achieve scalability. Their emphasis is shifting to using many small classes that each do very little. While this helps in the scalability arena, it hurts in the maintainability arena.
I had a recent experience in which I took over a VB project that was developed by someone else. The application concept was pretty simple and straightforward. But where a single method call with one or two arguments would have sufficed, there were twenty properties to look through and understand in the code. This cost an additional 40 percent in my time to make the required modifications to the application over what a less object-oriented code base would have yielded. The person who wrote the original code was pretty fresh out of school, and hadn’t been on receiving end of an application such as this. He’ll have his day!
Many find that OO hierarchies give extremely poor performance in VB components. These same developers have broken those classes out into separate components. One of the reasons is that smaller, discreet components work better due to the load time of the component being shorter. Many classes, even if they are simple, in a VB project produce larger compiled components. For MTS and COM+ this is bad. It’s better to have smaller components, and in VB your components should be compiled for small size preferred over faster code.
You can’t throw away OO principles. They have tremendous merit. When your application grows into a huge enterprise application, OO is a good thing because of the organization it provides. But you can’t forget performance and the resource issues either. Everything I do now is a balance of the two. I keep things Object-Oriented when I can, and don’t worry so much about OO architecture when it hurts performance and resource allocation past an unacceptable level.
The Right Language for the Job
Obviously you must decide a language with which to develop a component. Visual Basic allows for quick and robust component development. Visual C++ takes longer, is more prone to bugs, but gives better performance. I haven’t used Visual J++ in any real life settings, but many choose it for a variety of tasks.
Avoid Middle Tier State
To achieve scalability and performance, you shouldn’t use stateful components. Specific issues exist with using the Session and Application objects to store state of any kind. Not the least of these issues is the current inability to scale such objects across multiple servers. This becomes especially problematic (even in single-server deployments) when one attempts to cache object instances, such as database connections in Session or Application objects.
Avoid Data Access in the Middle Tier
Developers (especially those under a tight deadline) will directly perform tasks on data from the middle tier where the business logic resides. I know that there are times when rules should be broken, but it would be very unusual to find a justification for breaking this one. Doing direct data access in a middle tier will in a Windows DNA application is likely to perform more poorly than it would otherwise.
For example, it would be a mistake to retrieve multiple data sets from different tables and then join, sort, or search the data in middle-tier objects. The database is designed to handle this kind of activity and removing it to a middle tier is almost certainly a bad practice. True, there may be circumstances where doing so is called for because of the nature of the data store, but as much of this as possible should happen in the database before the dataset is returned to the middle tier.
Conclusions
Large-scale, enterprise applications are complex. Getting them to perform at an acceptable level can be an enormous challenge, especially when the application load is heavy.
The first thing you must do is to assess your needs. An application that performs up to snuff doesn’t need optimization, and you don’t want to waste time doing so.
Some relatively simple considerations can make a big difference. On the list of things to note are DSN types, algorithm efficiency, using JIT, registry use, avoiding memory leaks, language considerations, and middle tier architecture. And some of these can make a difference of over 1000 percent.
Finally, don’t forget to email me if you know a performance tip I’ve missed. I plan to create a compilation of tips from readers. Don’t forget when you send me your tip to let me know if I can include it in the compilation.
About the author:
Besides being a well-known author and trainer,Rick Leinecker is also a contributing member of the CodeGuru Web site. To contact or find out more about Rick and his work, visit sourceDNA where Rick writes about the technologies required to develop Windows DNA applications.