Software Quality Metrics
Software cost overruns, schedule delays, and poor quality have been endemic in the software industry for more than 50 years.
The system is stable; let's just document the known problems.
—Quality control manager of a tier-one application vendor
A large body of literature has appeared over the past three or four decades on how developers can measure various aspects of software development and use, from the productivity of the programmers coding it to the satisfaction of the ultimate end users applying it to their business problems. Some metrics are broader than others. In any scientific measurement effort, you must balance the sensitivity and the selectivity of the measures employed. Here we are primarily concerned with the quality of the software end product as seen from the end user's point of view. Although much of the software metrics technology used in the past was applied downstream, the overall trend in the field is to push measurement methods and models back upstream to the design phase and even to measurement of the architecture itself. The issue in measuring software performance and quality is clearly its complexity as compared even to the computer hardware on which it runs. Managing complexity and finding significant surrogate indicators of program complexity must go beyond merely estimating the number of lines of code the program is expected to require.
Measuring Software Quality
Historically software quality metrics have been the measurement of exactly their opposite—that is, the frequency of software defects or bugs. The inference was, of course, that quality in software was the absence of bugs. So, for example, measures of error density per thousand lines of code discovered per year or per release were used. Lower values of these measures implied higher build or release quality. For example, a density of two bugs per 1,000 lines of code (LOC) discovered per year was considered pretty good, but this is a very long way from today's Six Sigma goals. We will start this article by reviewing some of the leading historical quality models and metrics to establish the state of the art in software metrics today and to develop a baseline on which we can build a true set of upstream quality metrics for robust software architecture. Perhaps at this point we should attempt to settle on a definition of software architecture as well. Most of the leading writers on this topic do not define their subject term, assuming that the reader will construct an intuitive working definition on the metaphor of computer architecture or even its earlier archetype, building architecture. And, of course, almost everyone does! There is no universally accepted definition of software architecture, but one that seems very promising has been proposed by Shaw and Garlan:
Abstractly, software architecture involves the description of elements from which systems are built, interactions among those elements, patterns that guide their composition, and constraints on those patterns. In general, a particular system is defined in terms of a collection of components, and interactions among those components.1
This definition follows a straightforward inductive path from that of building architecture, through system architecture, through computer architecture, to software architecture. As you will see, the key word in this definition—for software, at least—is patterns. Having chosen a definition for software architecture, we are free to talk about measuring the quality of that architecture and ultimately its implementations in the form of running computer programs. But first, we will review some classical software quality metrics to see what we must surrender to establish a new metric order for software.
Classic Software Quality Metrics
Software quality is a multidimensional concept. The multiple professional views of product quality may be very different from popular or nonspecialist views. Moreover, they have levels of abstraction beyond even the viewpoints of the developer or user. Crosby, among many others, has defined software quality as conformance to specification.2 However, very few end users will agree that a program that perfectly implements a flawed specification is a quality product. Of course, when we talk about software architecture, we are talking about a design stage well upstream from the program's specification. Years ago Juran3 proposed a generic definition of quality. He said products must possess multiple elements of fitness for use. Two of his parameters of interest for software products were quality of design and quality of conformance. These separate design from implementation and may even accommodate the differing viewpoints of developer and user in each area.
Two leading firms that have placed a great deal of importance on software quality are IBM and Hewlett-Packard. IBM measures user satisfaction in eight dimensions for quality as well as overall user satisfaction: capability or functionality, usability, performance, reliability, installability, maintainability, documentation, and availability (see Table 3.1). Some of these factors conflict with each other, and some support each other. For example, usability and performance may conflict, as may reliability and capability or performance and capability. IBM has user evaluations down to a science. We recently participated in an IBM Middleware product study of only the usability dimension. It was five pages of questions plus a two-hour interview with a specialist consultant. Similarly, Hewlett-Packard uses five Juran quality parameters: functionality, usability, reliability, performance, and serviceability. Other computer and software vendor firms may use more or fewer quality parameters and may even weight them differently for different kinds of software or for the same software in different vertical markets. Some firms focus on process quality rather than product quality. Although it is true that a flawed process is unlikely to produce a quality product, our focus here is entirely on software product quality, from architectural conception to end use.
TABLE 3.1 IBM's Measures of User Satisfaction
Total Quality Management
The Naval Air Systems Command coined the term Total Quality Management (TQM) in 1985 to describe its approach to quality improvement, patterned after the Japanese-style management approach to quality improvement. Since then, TQM has taken on many meanings across the world. TQM methodology is based on the teachings of such quality gurus as Philip B. Crosby, W. Edwards Deming, Armand V. Feigenbaum, Kaoru Ishikawa, and Joseph M. Juran. Simply put, it is a management approach to long-term success that is attained through a focus on customer satisfaction. This approach requires the creation of a quality culture in the organization to improve processes, products, and services. In the 1980s and '90s, many quality gurus published specific methods for achieving TQM, and the method was applied in government, industry, and even research universities. The Malcolm Baldrige Award in the United States and the ISO 9000 standards are legacies of the TQM movement, as is the Software Engineering Institute's (SEI's) Capability Maturity Model (CMM), in which organizational maturity level 5 represents the highest level of quality capability.4 In 2000, the SW-CMM was upgraded to Capability Maturity Model Integration (CMMI).
The implementation of TQM has many varieties, but the four essential characteristics of the TQM approach are as follows:
- Customer focus: The objective is to achieve total customer satisfaction—to "delight the customer." Customer focus includes studying customer needs and wants, gathering customer requirements, and measuring customer satisfaction.
- Process improvement: The objective is to reduce process variation and to achieve continuous process improvement of both business and product development processes.
- Quality culture: The objective is to create an organization-wide quality culture, including leadership, management commitment, total staff participation, and employee empowerment.
- Measurement and analysis: The objective is to drive continuous improvement in all quality parameters by a goal-oriented measurement system.
Total Quality Management made an enormous contribution to the development of enterprise applications software in the 1990s. Its introduction as an information technology initiative followed its successful application in manufacturing and service industries. It came to IT just in time for the redevelopment of all existing enterprise software for Y2K. The efforts of one of the authors to introduce TQM in the internal administrative services sector of research universities encountered token resistance from faculty oversight committees. They objected to the term "total" on the curious dogmatic grounds that nothing is really "total" in practice. As CIO, he attempted to explain TQM to a faculty IT oversight committee at the University of Pennsylvania that this name was merely a phrase to identify a commonly practiced worldwide methodology. But this didn't help much. However, he persevered with a new information architecture, followed by (totally!) reengineering all administrative processes using TQM "delight-the-customer" measures. He also designed a (totally) new information system to meet the university's needs in the post-Y2K world (which began in 1996 in higher education, when the class of 2000 enrolled and their student loans were set up).5
Generic Software Quality Measures
In 1993 the IEEE published a standard for software quality metrics methodology that has since defined and led development in the field. Here we begin by summarizing this standard. It was intended as a more systematic approach for establishing quality requirements and identifying, implementing, analyzing, and validating software quality metrics for software system development. It spans the development cycle in five steps, as shown in Table 3.2.
TABLE 3.2 IEEE Software Quality Metrics Methodology
A typical "catalog" of metrics in current use will be discussed later. At this point we merely want to present a gestalt for the IEEE recommended methodology. In the first step it is important to establish direct metrics with values as numerical targets to be met in the final product. The factors to be measured may vary from product to product, but it is critical to rank the factors by priority and assign a direct metric value as a quantitative requirement for that factor. There is no mystery at this point, because Voice of the Customer (VOC) and Quality Function Deployment (QFD) are the means available not only to determine the metrics and their target values, but also to prioritize them.
The second step is to identify the software quality metrics by decomposing each factor into subfactors and those further into the metrics. For example, a direct final metric for the factor reliability could be faults per 1,000 lines of code (KLOC) with a target value—say, one fault per 1,000 lines of code (LOC). (This level of quality is just 4.59 Sigma; Six Sigma quality would be 3.4 faults per 1,000 KLOC or one million lines of code.) For each validated metric at the metric level, a value should be assigned that will be achieved during development. Table 3.3 gives the IEEE's suggested paradigm for a description of the metrics set.6
TABLE 3.3 IEEE Metric Set Description Paradigm7
|Name||Name of the metric|
|Metric||Mathematical function to compute the metric|
|Cost||Cost of using the metric|
|Benefit||Benefit of using the metric|
|Impact||Can the metric be used to alter or stop the project?|
|Target value||Numerical value to be achieved to meet the requirement|
|Factors||Factors related to the metric|
|Tools||Tools to gather data, calculate the metric, and analyze the results|
|Application||How the metric is to be used|
|Data items||Input values needed to compute the metric|
|Computation||Steps involved in the computation|
|Interpretation||How to interpret the results of the computation|
|Considerations||Metric assumptions and appropriateness|
|Training||Training required to apply the metric|
|Example||An example of applying the metric|
|History||Projects that have used this metric and its validation history|
|References||List of projects used, project details, and so on|
To implement the metrics in the metric set chosen for the project under design, the data to be collected must be determined, and assumptions about the flow of data must be clarified. Any tools to be employed are defined, and any organizations to be involved are described, as are any necessary training. It is also wise at this point to test the metrics on some known software to refine their use, sensitivity, accuracy, and the cost of employing them.
Analyzing the metrics can help you identify any components of the developing system that appear to have unacceptable quality or that present development bottlenecks. Any components whose measured values deviate from their target values are noncompliant.
Validation of the metrics is a continuous process spanning multiple projects. If the metrics employed are to be useful, they must accurately indicate whether quality requirements have been achieved or are likely to be achieved during development. Furthermore, a metric must be revalidated every time it is used. Confidence in a metric will improve over time as further usage experience is gained.