An Introduction to Catastrophe Disentanglement
In Spencer Johnsons Who Moved My Cheese, the little people keep coming back to where the cheese used to be even though its not there anymore. Its a natural tendency to continue doing what we did before even when, to an outside observer, it no longer makes sense. This behavior is quite common when software projects get into trouble. We keep plodding away at the project hoping that the problems will go away and the cheese will miraculously reappear. In all too many cases, it doesnt.
Just as the smart thing to do when a ball of twine seems hopelessly entangled is to stop whatever we are doing with it (otherwise, the tangle gets worse), so it is with a disastrous project; the longer we keep at it, the worse it gets. At some point, we need to halt all activity and reassess what we are doing.
Disastrous software projects, or catastrophes, are projects that are completely out of control in one or more of the following aspects: schedule, budget, or quality. They are by no means rare; 44% of surveyed development organizations report that they have had software projects cancelled or abandoned due to significant overruns, and 15% say that it has happened to more than 10% of their projects (see Figure 1).
Figure 1 Percentage of surveyed organizations software projects that have been abandoned or cancelled due to significant cost or time overruns in the past three years (source )
But obviously, not every overrun or quality problem means a project is out of control, so at what point should we define a software project as a catastrophe? What are the criteria for taking the drastic step of halting all activities, and how do we go about reassessing the project? Most importantly, how do we go about getting the project moving again? The answers to these questions are the essence of the concept of catastrophe disentanglement.
One of the best-known attempts to disentangle a multi-hundred-milliondollar catastrophe ended recently, more than a decade after it began. In August 2005, the plug was finally pulled on the infamous Denver airport baggage handling system, in a scene reminiscent of Hals demise in the memorable Kubrick space odyssey movie.(1) This was a project that had gained notoriety for costing one million dollars a day for being late. One of the interesting questions about the Denver project is why didnt the repeated efforts to save it succeed?
Of all the problems that plagued the project (see , ), probably the most formidable was the projects unachievable goals. It is unlikely that anyone associated with the project could have brought about a significant change to the goals because the projects extravagant functionality had, in fact, become part of its main attraction. But the ability to define achievable goals is a cornerstone of any catastrophe disentanglement process, without which the process cannot succeed, and that is one of the main reasons the Denver system could not be disentangled.
As indicated by the above survey data, cases like the Denver project are not rare (although few are as extreme). Most development organizations know this even without seeing the survey data. This frustrating reality was expressed in a famous quote from Martin Cobb of the Canadian Treasury Board: We know why projects fail, we know how to prevent their failureso why do they still fail?.
Cobbs quote highlights the conventional approach of software engineering. The objective of existing software engineering practices is to prevent the occurrence of software catastrophesthat is, to prevent the project from spiraling out of control. As such, the practices have an important role to play in software development. However, more than five decades of experience show that despite these methods, software catastrophes will continue to be around for a while.
When a software project is out of control, there is no PMI, IEEE, SEI, or ISO rescue process to follow because these organizations offer preventive, rather than corrective, solutions. But is such a project necessarily doomed? Will it inevitably collapse in failure? The following chapters will show that this is far from inevitable.
This article fills the void for corrective solutions in software engineering. It deals with projects that are already in serious trouble. In fact, this article is less concerned with how we got into trouble; it is more concerned with how we get out.
Overview of the Catastrophe Disentanglement Process
Before the first step in disentangling a project can be taken, we must first establish that the whole process is necessary. This means deciding that the project, as it is currently proceeding, has little chance of success without taking drastic measures.
Many software organizations have difficulty making this decision, and some avoid it entirely. In fact, there is a general tendency to let troubled projects carry on way too long before appropriate action is taken . Keil  uses the term runaways to describe software projects that continue to absorb valuable resources without ever reaching their objective. Keils runaways are, in effect, undiagnosed catastrophes that went on unchecked for much too long. Indeed, the ability to save a project is usually dependent on how early in the schedule a catastrophe is diagnosed. Furthermore, organizations that permit a runaway project to continue are wasting valuable resources. This reality is well demonstrated in the following case.
A Case StudyThe FINALIST case, described next, demonstrates how difficult it is to acknowledge that a project is in serious trouble, even when the problem is obvious to almost anyone looking in from the outside. It is an interesting case because it is by no means unique; it demonstrates just how easy it is to become committed to a failing path.
After the year 2000 passed, and the software prophets of doom faded away, a Canadian software company found itself with almost no customers for one of its small business units. The units main expertise was in supporting Cobol programs (where many of the bug-2000 problems were expected to be), and suddenly there wasnt enough Cobol work to support it.
So the company decided to rewrite one of its core products, FINALIST, a large financial analysis system, but it chose to write it again in Cobol in order to retain the companys unique expertise for solving bug-2000 problems (which it still thought would materialize). The new project, appropriately named FINALIST2, was given a 30-month schedule and a team of 14 developers, eight of whom were veteran Cobol programmers.
At the beginning of the second year of the project, two Cobol programmers retired and, soon after, three more moved to another company. With only three veteran Cobol programmers left, the FINALIST2 project began to experience serious problems and schedule delays. The companys management repeatedly resisted calls to reevaluate the project and attempted to get it back on track by conducting frequent reviews, adding more people to the team, providing incentives, and eventually, by extending the schedule.
Finally, 28 months into the project, a consultant was brought in, and his first recommendation was to halt the project immediately. This drastic advice was based on the conclusion that little or no meaningful progress was being made and the project, as it was defined, would probably never be completed. There were not enough experienced Cobol programmers around to do the work, and it was unlikely that new ones would be hired.
Furthermore, it was unlikely that the new recruits would become sufficiently proficient in Cobol within any reasonable time frame.
The final recommendation was to either restart the project in a modern programming language or to cancel it entirely.
One of the key points in this case is that management failed to notice that what was once a strength (Cobol) had ceased to be onea classic example of who moved my cheese. This failure was clearly fostered by a strong desire to preserve Cobol expertise within the company, but it was also the result of a natural reluctance to acknowledge a mistake (resistance to reevaluate the project). These two factors obscured the solution. And so management attempted to fix almost everything (process, team, schedule) except the problem itself.
This case illustrates the difficulties decision makers have in accepting the need for drastic measures and is reminiscent of a gambler who cannot get up and walk away. First, there is the natural tendency to put off making the difficult decision in hope that conventional methods will eventually get the project back on track. A second difficulty involves over-commitment to previous decisions, prompting the investment of more resources to avoid admitting mistakes (this is known as escalation ).
But troubled projects are never a surprise, and even those most committed to
a failing path know that something is severely wrong. But how severe is severely
wrong? How can we know that it is time for drastic measures? Ideally, there
would be a decision algorithm (a kind of software breathalyzer) to which managers
could subject their projects, and which would make the decision for them.