Fragile Code, Page 2
Error Handling, Logging and Tracing
The best way to help code become more stable is to install error handlers throughout the code. This means providing a place for the operating system to go when the program encounters an error and it means testing conditions that were never tested.
In languages like VB and C++ you have the option of actual error handlers. They allow for a place, within the function, that the operating system can go to when an error is encountered. Even if you're not working with one of these languages you can add some basic error handling. First, you can register critical error handlers with the operating system when the program starts up. This allows the program to receive critical errors rather than the user receiving a generic operating system error.
The key to the error handler is to log whatever information is available about where the error occurred and whatever other conditions can be captured. For instance, logging the call stack and global variables before exiting may provide clues to what happened to cause the error.
The next step in shoring up fragile code is to add testing for all of the parameters passed to ensure that they are valid. The process of adding code to test parameters is relatively noninvasive. Although with any change in fragile code there is some risk, it is the least invasive was to get the most information. By checking every parameter before it enters the existing code you can identify problems caused between functions. Statistically speaking most problems occur between two functions rather than in the middle of a single function.
Finally, adding a set of logging statements that indicate when execution enters and exits a function you can determine what the entire call history is for an application. Obviously, you need the ability to turn off this logging so you don't impact performance when you're not debugging. Think of this logging as a general ledger for an accounting system. It identifies everything that happened. Further more each function that starts should end - just like every credit has a debit in a journaling system.
In some environments the extreme fragility of the code may lead you away from wanting to add the additional statements to support an increased level of error handling and logging. However, in the long term the number of problems caused by adding this additional testing will be far outstripped by the number of subtle errors that are detected and logged.
If one of the signs of fragile code is poor documentation then it would stand to reason that one of the ways to help reduce the fragility of code would be to generate documentation for it. Unfortunately documentation must be done with a certain amount of knowledge of the code itself. An amount of knowledge that is difficult to recover once the project is done - and even more difficult if the resources that were used to build the code are no longer available.
The key with documentation is a mixture of automated tools that can convert the code itself into meaningful documentation. For instance, a tool or set of tools that allow you to build the capability of determining where a function is used or a call tree that indicates what functions a function calls.
Automated tools convert the code into useful information about how the solution is architected. The time that can be saved by condensing and converting the code into useful information can reduce the challenges with understanding what the code does. Documentation will continue to be a source of struggle. However, every opportunity to add meaningful documentation should be exercised.
Preventing the problem
Coping with fragile code is a good reactive stance. However, there is also the proactive approach to consider. It's one thing to cope with fragile code but quite another to prevent fragile code in the first place. This involves being aware of the things that result in fragile code during the development phase.
The most challenging thing for most professionals is to make time in the schedule of a program being developed to include the necessary "checks and balances" that prevent fragile code from occurring in the first place. With the pressures to deliver code as soon as possible with as many features as is possible it's easy to see how it might be difficult to maintain time in the schedule to ensure that the code is built soundly. It's important to create an awareness of what "fragile code" is and how removing checks and balances increases the risk of creating a system composed of "fragile code"
Before the start of development agreeing on a set of standards for documentation, comments, error handling, parameter testing, trace logging, etc. will simplify the development process and help to ensure that the code has an even level of resilience to problems. It's a simple, easy step that is often overlooked in the rush to get started on a project.
By providing standards it become clear what is expected of all of the developers and causes an increased awareness of the core concepts of software development. This in turn helps to develop better code with minimal rework.
One of the best but frequently painful ways to ensure that code isn't fragile is to involve multiple parties at every level of the software development process. While it's typical in even the most harried environments to involve multiple people during the architecture phase it's fairly rare for projects to maintain a formal code review process during the development phase.
The code review process is not a punitive process designed to punish those developers who do not possess the greatest skill. It's a teaching tool designed to help all of the developers remember the standards that they've agreed to meet and to learn techniques from one another.
Code reviews need not be long, but they should be done because of their power to help prevent fragile code from being written.
Fragile code is expensive. The additional maintenance costs associated with fragile code will quickly eliminate any gains created during the construction and development phases. Spotting fragile code is easy if you know what you're looking for. Perhaps more importantly fragile code can be prevented.
Robert Bogue, MCSE (NT4/W2K), MCSA, A+, Network+, Server+, I-Net+, IT Project+, E-Biz+, CDIA+ has contributed to more than 100 book projects and numerous other publishing projects. He writes on topics from networking and certification to Microsoft applications and business needs. Robert is a strategic consultant for Crowe Chizek in Indianapolis. Some of Robert's more recent books are Mobilize Yourself!: The Microsoft Guide to Mobile Technology, Server+ Training Kit, and MCSA Training Guide (70-218): Managing a Windows 2000 Network. You can reach Robert at Robert.Bogue@CroweChizek.com.