Java EE 5 Performance Management and Optimization
An intelligent SLA maintains three core traits. It is
An SLA must satisfy end-user expectations but still be reasonable enough to be implemented. An unreasonable SLA will be ignored by all parties until end users complain. This is why SLAs need to be defined by both the application business owner and the application technical owner: the business owner pushes for the best SLAs for his users, while the application technical owner impresses upon the business owner the reality of what the business requirement presents. If the business requirement cannot be satisfied in a way acceptable to the application business owner, then the application technical owner needs to present all options and the cost of each (in terms of effort). The business requirement may need to be changed or divided into subprocesses that can be satisfied reasonably.
Finally, an intelligent SLA needs to be flexible. It needs to account for variations in behavior
as a result of unforeseen factors, but define a hard threshold for how flexible it is allowed to be.
For example, an SLA may read The search functionality will respond within three seconds
(specific) for 95 percent of requests (flexible). The occasional seven-second response time is
acceptable, as long as the integrity of the application is preservedit responds well most of the
time. By defining concrete values for the specific value as well as the limitations of the flexible
value, you can quantify what most of the time means to the performance of the application,
and you have a definite value with which to evaluate and verify the SLA.
Although you define specific performance criteria and a measure of flexibility, defining either a hard upper limit of tolerance or a relative upper limit is also a good idea. I prefer to specify a relative upper limit, measured in the number of standard deviations from the mean. The purpose of defining an SLA in this way is that on paper a 3-second response time for 95 percent of requests is tolerable, but how do you address drastically divergent response time, such as a 30-second response time? Statistically, this should not be grossly applicable, but it is a good safeguard to be aware of.
An important aspect of defining intelligent SLAs is tracking them. The best way to do this is to integrate them into your application use cases. A use case is built from a general thought, such as The application must provide search functionality for its patient medical records, but then the use case is divided into scenarios. Each scenario defines a path that the use case may follow given varying user actions. For example, what does the application do when the patient exists? What does it do when the patient does not exist? What if the search criterion returns more than one patient record? Each of these business processes needs to be explicitly called out in the use case, and each needs to have an SLA associated with it. The following exercise demonstrates the format that a proper use case containing intelligent SLAs should follow.
|USE CASE: PATIENT HISTORY SEARCH FUNCTIONALITY|
The Patient Management System must provide functionality to search for specific patient medical history information.
Scenario 1: The Patient Management System returns one distinct record.
The user has successfully logged in to the application.
The user enters search criteria and submits data using the Web interface.
The Patient Management System displays the results to the user.
Scenario 1: The Patient Management System will return a specific patient matching the specified criteria in less than three seconds for 95 percent of requests. The response time will at no point stray more than two standard deviations from the mean.
Scenario 2: The Patient Management System will return a collection of patients matching the specified criteria in less than five seconds for 95 percent of requests. The response time will at no point stray more than two standard deviations from the mean.
Scenario 3: When the Patient Management System cannot find a user matching the specified criteria, it will inform the user in less than two seconds for 95 percent of requests. The response time will at no point stray more than two standard deviations from the mean.
The format of this use case varies from traditional use cases with the addition of the SLA component. In the SLA component, you explicitly call out the performance requirements for each scenario. The performance criteria include the following:
- The expected tolerance level: Respond in less than three seconds.
- The measure of flexibility: Meet the tolerance level for 95 percent of requests.
- The upper threshold: Do not stray more than three standard deviations from the observed mean.
With each of these performance facets explicitly defined, the developers implementing code to satisfy the use case understand their expectations and can structure unit tests accordingly. The QA team has a specific value to test and measure the quality of the application against. Next, when the QA team, or a delegated performance capacity assessor, performs a formal capacity assessment, an extremely accurate assessment can be built and a proper degradation model constructed. Finally, when the application reaches production, enterprise Java system administrators have values from which to determine if the application is meeting its requirements.
All of this specific assessment is possible, because the application business owner and
application technical owner took time to carefully determine these values in the architecture
phase. My aim here is to impress upon you the importance of up-front research and a solid
communication channel between the business and technical representatives.
Page 2 of 3