Software Quality Metrics
Availability and Customer Satisfaction Metrics
To the end user of an application, the only measures of quality are in the performance, reliability, and stability of the application or system in everyday use. This is "where the rubber meets the road," as users often say. Developer quality metrics and their assessment are often referred to as "where the rubber meets the sky." This article is dedicated to the proposition that we can arrive at a priori user-defined metrics that can be used to guide and assess development at all stages, from functional specification through installation and use. These metrics also can meet the road a posteriori to guide modification and enhancement of the software to meet the user's changing needs. Caution is advised here, because software problems are not, for the most part, valid defects, but rather are due to individual user and organizational learning curves. The latter class of problem calls places an enormous burden on user support during the early days of a new release. The catch here is that neither alpha testing (initial testing of a new release by the developer) nor beta testing (initial testing of a new release by advanced or experienced users) of a new release with current users identifies these problems. The purpose of a new release is to add functionality and performance to attract new users, who initially are bound to be disappointed, perhaps unfairly, with the software's quality. The DFTS approach we advocate in this article is intended to handle both valid and perceived software problems.
Typically, customer satisfaction is measured on a five-point scale:11
- Very satisfied
- Very dissatisfied
- Completely satisfied = 100%
- Satisfied = 75%
- Neutral = 50%
- Dissatisfied = 25%
- Completely dissatisfied = 0%
NSI then ranges from 0% (all customers are completely dissatisfied) to 100% (all customers are completely satisfied). Although it is widely used, the NSI tends to obscure difficulties with certain problem products. In this case the developer is better served by a histogram showing satisfaction rates for each product individually.
|Sidebar 3.1: A Software Urban Legend|
|Professor Maurice H. Halstead was a pioneer in the development of ALGOL 58, automatic programming technology, and ALGOL-derived languages for military systems programming at the Naval Electronics Laboratory (NEL) at San Diego. Later, as a professor at Purdue (1967.1979), he took an interest in measuring software complexity and improving software quality. The legend we report, which was circulated widely in the early 1960s, dates from his years at NEL, where he was one of the developers of NELIAC (Naval Electronics Laboratory International ALGOL Compiler). As the story goes, Halstead was offered a position at Lockheed Missile and Space Systems. He would lead a large military programming development effort for the Air Force in the JOVIAL programming language, which he also helped develop. With messianic confidence, Halstead said he could do it with 12 programmers in one year if he could pick the 12. Department of Defense contracts are rewarded at cost plus 10%, and Lockheed had planned for a staff of 1,000 programmers, who would complete the work in 18 months. The revenue on 10% of the burdened cost of 1,000 highly paid professionals in the U.S. aerospace industry is a lot of money. Unfortunately, 10% of the cost of 12 even very highly paid software engineers is not, so Lockheed could not accept Halstead's proposition. This story was widely told and its message applied for the next 20 years by developers of compilers and operating systems with great advantage, but it has never appeared in print as far as we know. Halstead did leave the NEL to join Lockheed about this time. They benefited from his considerable software development expertise until he went to Purdue.|
Current Metrics and Models Technology
The best treatment of current software metrics and models is Software Measurement: A Visualization Toolkit for Project Control and Process Measurement,12 by Simmons, Ellis, Fujihara, and Kuo. It comes with a CD-ROM that contains the Project Attribute Monitoring and Prediction Associate (PAMPA) measurement and analysis software tools. The book begins with Halstead's software science from 1977 and then brings the field up to date to 1997, technologically updating the metrics and models by including later research and experience. The updated metrics are grouped by size, effort, development time, productivity, quality, reliability, verification, and usability.
Size metrics begin with Halstead's volume, now measured in source lines of code (SLOC), and add structure as the number of unconditional branches of control loop nesting and module fan-in and fan-out. The newly added rework attributes describe the size of additions, deletions, and changes made between versions. Combined, they measure the turmoil in the developing product. The authors have also added a new measure of code functionality smaller than the program or module, called a chunk. It is a single integral piece of code, such as a function, subroutine, script, macro, procedure, object, or method. Volume measures are now made on functionally distinct chunks, rather than larger-scale aggregates such as programs or components. Tools are provided that allow the designer to aggregate chunks into larger units and even predict the number of function points or object points. Furthermore, because most software products are not developed from scratch but rather reuse existing code chunks with known quality characteristics, the toolkit allows the prediction of equivalent volume using one of four different algorithms (or all four, if desired) taken from recent software science literature. A new volume measure called unique SLOC has been added. It evaluates new LOC on a per-chunk basis and can calculate unique SLOC for a developing version of the product.
Naturally, volume measures are the major input for effort metrics. Recent research adds five categories of 17 different dominators, which can have serious effort-magnifying effects. The categories into which dominators fall are project, product, organization, suppliers, and customers. For example, potential dominators in the product category include the amount of documentation needed, programming language, complexity, and type of application. In the organization category, they include the number of people, communications, and personnel turnover. Customer includes user interface complexity and requirements volatility, which are negative, but then the dominators are all basically negative. Their name signifies that their presence may have an effort-expansion effect as large as a factor of 10. But when their influence is favorable, they generally have a much smaller positive effect. A range of effort prediction and cost forecasting algorithms based on a variety of theoretical, historical/experiential, statistical, and even composite models are provided.
The third measure category is development time, which is derived from effort, which is derived from size or volume. The only independent new variable here is schedule. Given the resources available to the project manager, the toolkit calculates overall minimum development time and then allows the user to vary or reallocate resources to do more tasks in parallel. However, the system very realistically warns of cost runaways if the user tries to reduce development time by more than 15% of the forecast minimum.
Because effort is essentially volume divided by productivity, you can see that productivity is inversely related to effort. A new set of cost drivers enters as independent variables, unfortunately having mostly negative influences. When cost drivers begin to vary significantly from nominal values, you should take action to bring them back into acceptable ranges. A productivity forecast provides the natural objective function with which to do this.
The quality metrics advocated in Simmons, et al. are dependent on the last three metric sets: reliability, verification, and usability. Usability is a product's fitness for use. This metric depends on the product's intended features, their verified functionality, and their reliability in use. Simply stated, this metric means that all promises were fulfilled, no negative consequences were encountered, and the customer was delighted. This deceptively simple trio masks evaluation of multiple subjective psychometric evaluations plus a few performancebased factors such as learnability, relearnability, and efficiency. Much has been written about measures of these factors. To sell software, vendors develop and add more features. New features contain unique SLOCs, and new code means new opportunities to introduce bugs. As might be expected, a large measure of customer dissatisfaction is the result of new features that don't work, whether due to actual defects or merely user expectations. The only thing in the world increasing faster than computer performance is end-user expectations: A product whose features cannot be validated, or that is delivered late or at a higher-than-expected price, has a quality problem. Feature validation demands that features be clearly described without possible misunderstanding and that metrics for their measurement be identified.
The last point in the quality triangle is reliability, which may be defined as defect potential, defect removal efficiency, and delivered defects. The largest opportunity for software defects to occur is in the interfaces between modules, programs, and components, and with databases. Although the number of interfaces in an application is proportional to the program's size, it varies by application type, programming language, style, and many other factors. One estimate indicates that 70% or more of software reliability problems are in interfaces. Aside from the occurrence of errors or defects, and their number (if any), the major metric for quality is the mean time between their occurrence. Whether you record time to failure, time intervals between failures, cumulative failures in a given time period, or failures experienced in a given time interval, the basic metric of reliability is time.
New Metrics for Architectural Design and Assessment
A new science of software architecture metrics is slowly emerging, amazingly in the absence of any generally accepted definition of software architecture. Software engineers have been coasting on the metaphor of building architecture for a long time. Some clarity is developing, but it is scarcely more than extending the metaphor. For example, an early (1994) intuitive definition states the following:
There is currently no single, universally accepted definition of software architecture, but typically a systems architectural design is concerned with describing its decomposition into components and their interconnections.15
Actually, this is not a bad start. When one of the authors became manager of a large-scale computer design at Univac as chief architect in 1966, this was the operative definition of computer (hardware) architecture. It was sufficient only because of the tradition of hardware systems design, which had led to the large-scale multiprocessor computer system.16
But to be more precise for software that comes later to this tradition, software architecture is "the structure of the components of a program/system, their interrelationships, and principles and guidelines governing their design and evolution over time."17 While descriptive, these definitions still do not give us enough leverage to begin defining metrics for the architectural assessment of software. We would like to again quote Shaw and Garlan's more recent definition:
Abstractly, software architecture involves the description of elements from which systems are built, interactions among those elements, patterns that guide their composition, and constraints on those patterns. In general, a particular system is defined in terms of a collection of components and the interactions among those components.18
A software design pattern is a general repeatable solution to a commonly occurring problem in software design. It is not a finished design that can be transformed directly into program code. Rather, it is a description of or template for how to solve a problem that can be used in many different situations. Object-oriented design patterns typically show relationships and interactions between classes or objects, without specifying the final application classes or objects that are involved. Algorithms are not thought of as design patterns, because they solve computational problems rather than design problems.
In practice, architectural metrics are involved not only upstream in the software development process and architecture discovery but also further downstream before coding begins as architectural review. These terms were introduced by Avritzer and Weyuker19 for use at AT&T and will be used here as well.
Common Architectural Design Problems
The most commonly occurring architectural design problems can be grouped into three categories: project management, requirements, and performance. The following list describes problems affecting project management.20 It's an excellent list that we have reordered to reflect our own experiences with software development management:
- The stakeholders have not been clearly identified.
- No project manager has been identified.
- No one is responsible for the overall architecture.
- No project plan is in place.
- The deployment date is unrealistic.
- The independent requirements team is not yet in place.
- Domain experts have not been committed to the design.
- No software architect(s) have been assigned.
- No overall architecture plan has been prepared.
- No system test plan has been developed.
- No measures of success have been identified.
- No independent performance effort is in place.
- No contingency plans have been written.
- No modification tracking system is in place.
- Project funding is not committed.
- No quality assurance team is in place.
- No hardware installation schedule exists.
Here are the most common issues affecting the definition of requirements for a software development project (again in order of importance according to our experience):
- The project lacks a clear problem statement.
- No requirements document exists.
- The project lacks decision criteria for choosing the software architecture.
- Outputs have not been identified.
- The size of the user community has not been determined.
- Data storage requirements have not been determined.
- Operational administration and maintenance have not been identified.
- Resources to support a new requirement have not been allocated.
Here are the most common performance issues affecting the architecture of a software development project (priority reordered):
- The end user has not established performance requirements.
- No performance model exists.
- Expected traffic rates have not been established.
- No means for measuring transaction time or rates exists.
- No performance budgets have been established.
- No assessment has been made to ensure that hardware will meet processing requirements.
- No assessment has been made to ensure that the system can handle throughput.
- No performance data has been gathered.
In our experience, the leading critical quality issues in each category are either customer requirements issues or aspects of the project management team's commitment to the customer's requirements. This leads to our focus on QFD as a means of hearing the voice of the customer at the beginning of the software development project rather than having to listen to their complaints after the software has been delivered.
Page 3 of 4