In last month’s article, “Improve Code Performance with the VSTS Code Profiler,” the VSTS profiler was introduced, and simple techniques for finding performance problems in applications where demonstrated. These simple techniques will scale to a relatively large code base, but for very complex applications, which are much more common in the C++ world, diagnosing and fixing performance problems can pose a challenge. In this article, advanced use of the VSTS profiler will be covered, and techniques to quickly and effectively target performance problems will be demonstrated.
Filtering Performance Results
To emphasize one of the key points from the introductory article, one of the simplest and most effective techniques for avoiding collecting excessive amounts of data in a profiling session is avoiding the urge to switch from sampling profiling to instrumentation profiling too early, and minimizing the number of projects that have instrumentation applied. Instrumentation-based profiling collects much more data, and if it is switched on before the specific area of a software system where the performance problem is occurring has been identified with sampling, extremely large profiling files can result.
For large code-bases or performance problems that are intermittent, it is still possible to end up with a large performance result file. One of the advances in the VSTS 2008 profiler is the ability to intelligently filter the results of a performance session. By default, functions that do not call other functions, which are referred to as short functions in the profiler’s documentation, are excluded from the results of a profiling session, and the time taken in these functions is charged to their parent functions. For most applications, this is a reasonable default, and small functions will rarely be the cause of excessive performance problems. If a particular profiling target has small functions where it is suspected performance problems may lie, it is possible to change this setting, as shown in Figure 1.
Figure 1: Including Small Function Results for a Profiling Target.
In addition to the small function exclusion feature, which must be set before a profiling session is conducted, the VSTS 2008 Profiler also supports post-profiling simplification of profiling results through a feature known as Noise Reduction. On the Call Tree View (which is one of the most useful views in tracking down performance problems) and the Allocation View, noise reduction presents a clearer picture of the results by excluding functions whose timing or allocation threshold fall below a certain level. Figure 2 shows the Call Tree View with the link to the Noise Reduction screen, and clicking on this link will bring up the Noise Reduction options shown in Figure 3. In the Call Tree View, functions can be trimmed (which removes them from the view) or folded (which combines consecutive functions) that have a certain percentage time threshold, and for Allocations, the thresholds are set based on number of allocation or the number of bytes allocated.
Figure 2: Call Tree View with Noise Reduction Hyperlink.
Figure 3: Configuring Noise Reduction.
VSTS Profiler Interaction
The VSTS Profiler exposes an API in both native and managed form that allows the amount of data collected in a profiling session to be precisely controlled and also supports the insertion of timestamp and profile mark data into a profile session. The most common usage scenario for the profiler API is when instrumentation profiling is enabled, and the amount of data collected is too large to make a timely determination of the specific area of the code base where the performance problem lays. For C++/CLI applications, the simplest API into the VSTS Profiler is through the classes exposed in the Microsoft.VisualStudio.Profiler.dll, which is located in the Microsoft Visual Studio 9.0Team ToolsPerformance Tools directory. A pure native API exists as well, and can be accessed by including the VSPerf.h header file and adding the VSPerf.lib file in the linker options for the project. These two files are located in the Microsoft Visual Studio 9.0Team ToolsPerformance ToolsPerfSDK directory. Because it is extremely unlikely that the profiler interaction code will need to be shipped as part of the final version of the software product, it is advisable to surround all calls to the profiler in #IFDEF blocks.
Controlling the profiler’s data collection is a simple matter of calling the StartProfile, StopProfile, SuspendProfile, and ResumeProfile functions. All four functions require that the scope of the profile session control be nominated—each call can be applied to all processes in the profile session (PROFILE_GLOBALLEVEL), a particular process (PROFILE_PROCESSLEVEL), or a particular thread (PROFILE_THREADLEVEL). The semantics of StartProfile and StopProfile are simple—StartProfile turns profiling on, and StopProfile turns it off. SuspendProfile and ResumeProfile have subtler behavior—SuspendProfile increments a suspend counter, and ResumeProfile decrements it, so to re-enable profiling every call to SuspendProfile must be matched by a corresponding call to ResumeProfile. In situations where it is not clear how many calls to ResumeProfile are required, it is possible to cancel all calls to SuspendProfile by simply calling StartProfile.
Instrumentation profiling can have descriptive information inserted in four distinct ways:
- A thread or process can be named in a profile session by calling NameProfile.
- An event in a profile session can be marked by associating it with an arbitrary 32-bit integer using MarkProfile. For example, it is possible to mark each point where a third-party library is called by designating each call with a mark value three, and each call to a web service with a mark value of four.
- A comment of up to 256 characters can be associated with a mark by calling CommentMarkProfile. To continue on the example in the previous point, it would be possible to include the web service name each time a mark with a value of four is added.
- The occurrence of external events can be recorded in a profile session using CommentMarkAtProfile, which inserts a comment and a mark at a specified point in time.
Comments, marks, and names are quite easy to identify in the results of a profile session. The following code will produce the output shown in Figures 4 and 5:
NameProfile(_T("MyThread"), PROFILE_THREADLEVEL, PROFILE_CURRENTID); CommentMarkProfile(1, _T("Start of call to calculation engine");
Figure 4: Profile Marks with Comments.
Figure 5: Named Threads in a Profile Session.
Conclusion
One of the hallmarks of a good craftsperson is how well they can use their tools to produce the desired result, and the same holds true for a developer. Knowing how to use the VSTS Profiler to track down performance problems in a timely manner is an important skill for all developers, and is particularly important for C++ developers, because C++ code bases typically encompass the largest and most complex software systems that are always expected to perform at the highest level.
Out of the box, the VSTS Profiler helps developers quickly identify performance problems by removing clutter through noise reduction and short function exclusion. For large systems, explicitly controlling when the profiler collects data, and adding marks and comments into the profiling log may be required to separate the performance impacts of various software components that are executing. The VSTS Profiler supports this fine-grained control through both a managed and native interface, and developers can fine tune the collection of data down to a very granular level.
About the Author
Nick Wienholt is an independent Windows and .NET consultant based in Sydney. He is the author of Maximizing .NET Performance and co-author of A Programmers Introduction to C# 2.0 from Apress, and specialises in system-level software architecture and development, with a particular focus of performance, security, interoperability, and debugging.
Nick is a keen and active participant in the .NET community. He is the co-founder of the Sydney Deep .NET User group and writes technical articles for Australian Developer Journal, ZDNet, Pinnacle Publishing, CodeGuru, MSDN Magazine (Australia and New Zealand Edition) and the Microsoft Developer Network. An archive of Nick’s SDNUG presentations, articles, and .NET blog is available at www.dotnetperformance.com.
In recognition of his work in the .NET area, he was awarded the Microsoft Most Valued Professional Award from 2002 through 2007.