Performance Improvement: Understanding
<h2BottlenecksIn computer systems we really deal with four primary bottlenecks. They are: CPU, Memory, Disk, and Network (or Communications). Most performance and scalability challenges break down into one of these four areas. When you're building a system you should consider the impact on each of these resources and ideally to test your system while monitoring these resources.
|In computer systems we really deal with four primary bottlenecks. They are: CPU, Memory, Disk, and Network (or Communications).|
CPU issues are perhaps the easiest issues to spot these days. In Windows, you can fire up task manager and you'll see the CPU utilization. The key issue today for measuring CPU is to watch out for single threading. Any time you max out a single CPU in the system, you've got a problem.
The only other concern with looking at CPU time is to determine the overall utilization over a reasonably long period of time. 100% utilization for one second isn't a problem ; however, for fifteen minutes it's definitely an issue.
To get statistics on CPU utilization use Performance Monitor and in the Processor object include Percent (%) Processor Time for each CPU.
Memory issues are hard to find because there aren't good indicators for memory. The best answer is to look for at the Memory objects' Pages/ sec counter. This is a count of the times that requests for something had to be satisfied from disk rather than physical memory. Opinions vary about what this value should be. Generally, I don't get too concerned with activity below 100 pages per second, while ideally it should be zero or near zero.
One thing that you can do, from an infrastructure perspective that is just a configuration change, is minimize the paging file on the server. Paging files are really a holdover from when memory was expensive and it was occasionally necessary to swap out parts of a program to disk. In today's world memory isn't that expensive so you can generally buy all of the memory that you need. The problem with a large paging file is that some applications ask for the available memory to make decisions on how much to cache and can try to over cache when the virtual memory settings are high. One notable exception to this is SQL server which is exceptionally good at managing memory. It will make its allocations based only on physical memory and not on virtual memory.
Figuring out how much of your disk is in use isn't difficult; however, it can be tedious because ultimately it's necessary to measure the performance of each disk (or at least each disk array.) One of the most common challenges with disks is that most folks look almost exclusively at capacity when planning a system. From a performance perspective the concern is about how many IO operations you can get from the drive. This number is impacted by a number of factors like the interface of the drive (SAS is faster than SATA), the rotational speed of the drive (15K is faster than 10K which is faster than 7.2K), the track seek time, the number of partitions, the partition alignment (see Jimmy May's information on partition alignment), and the which array standard is in use (RAID 10 is better, from a performance perspective, than RAID 5).
The actual metrics for disk use are in the Physical Disk object. The first counter that is interesting is Avg. Disk Queue Length. This tells you how busy the drive is. The other counters that you want to watch are Avg. Disk sec/Read and Avg. Disk sec/Write. This tells you how long it takes to read/write information from the drives. Ideally this would be less than 20 ms. Each instance should be monitored separately since it can quite easily be that you're focusing disk activity onto one disk or disk array. It should be said that the counters in Windows report for each of the physical drives reported from the storage controllers. Most frequently these counters are per disk-array and aren't actually the individual disks.
Finally, there's little point in evaluating the disk performance numbers until you've resolved any memory issues because in low memory situations the disks are used as virtual memory. This isn't a desirable or normal situation so the results you see will be skewed when compared to normal operation.
The final area that can be a problem is network. Network could mean either the network connectivity to the clients of the application or can also mean connectivity between the servers in the solution. The good news here is that there are simple counters that you can look at. The Network interface object includes counters for Bytes Received/Sec and Bytes Sent/Sec. Since most connections are full duplex these days each number can be as high as the network connectivity. A one GB connection can send one GB and receive one GB at the same time—at least in theory.
The challenge with network interfaces is that the network card in the server may or may not be able to send and receive data at this rate. If there's a problem with the network interface card you'll likely see it with the Output Queue Length counter. This counter shouldn't be more than a few (less than 10). It can get higher than that if the network card isn’t capable at transmitting at the rate that the applications on the server want to send.
Making it more difficult is that the statistics from network interface cards are notoriously bad so you may not be able to trust the numbers that you're getting back from this counter. You'll want to cross check these numbers with the numbers from the switch the server is connected to.
The solution to network bottlenecks is to use a faster interface or to aggregate multiple network interfaces into one logical network interface. Most servers are shipping with two network adapters. Through configuration on the server and on the switch [look for Link Aggregation Control Protocol (LACP)] you can create a link aggregation group that can leverage two (or more) network interfaces as if they were one. This can help to address network connectivity issues.
Page 2 of 3