The ideas for these articles come from the successes, failures, and discoveries in real-world projects. In this particular case, all three inspirations are at play. The unconventional process described in the first section has proven successful, with success measured by everyone going home on time and the projects delivered on time and under budget. The failure that led to the specific problem and solution covered here was an application that continually required more memory and additional hardware at regular release intervals. A random check of code while looking for re-usable assets revealed that a good percentage of the problem was the inefficient use of String. The discovery of the Eclipse feature that makes this issue easy to track and address was the result of randomly trying different settings (which is less efficient than reading the documentation but much more fun).
Three Costs of String Literals
FUD: Fear, Uncertainty, and Doubt. These are the three roadblocks to improvements. When you read this article, you will probably experience all three until you have read it all the way through. Some may even have to give this a try themselves before they will be cured. And, some need an incentive to keep reading to get over their FUD. So, look at three problems that almost every application suffers from as motivation to get through the next sections with an open mind.
The first cost of a String literal is the overhead of creating the String. There are few more expensive operations than creating a new object. The use of interfaces is a common approach because you are all aware of this overhead. Yet, because String is so ubiquitous, it is often forgotten that creating a String is creating a new object. The failure example alluded to earlier is a perfect example. One class had a large number of Strings declared. They were all declared as static final, a good practice to reduce the cost of String creation. However, a random search of one these declarations revealed the exact same objects being created in 70 different files. A minimum footprint for a String is 40 bytes. That adds up quickly to more hardware expense and a good amount of labor spent looking for performance improvements.
The second cost is in the processing overhead of String comparison, a frequent reason for declaring a String in the first place. With the same String declared multiple times, the more efficient == comparison cannot be relied upon, requiring the more intensive .equals() comparison.
The third cost is maintenance. Tracking down every instance of a String is much more tedious and prone to error than changing a single instance.
An Hour of Prevention is Worth a Weekend of Cure
There are some processes that are rarely used that every project can benefit from. One of these is daily code reviews by either a build master or technical lead. This is rarely done for many reasons, two of which are the misconceptions that it takes more time than it is worth and that it is hard to do.
Looking at the time versus value concept, a daily review should take no longer than an hour. That time estimate is based on the reviewer being responsible for no more than eight developers (more than that and re-configuring the teams should be considered) and that it is done from the beginning rather than waiting for a performance issue to occur that can’t be easily traced. In a six-month project, this would total 125 hours. When compared with how long it takes to tune applications in either QA or production, project savings will generally average 100%.
Daily code reviews should be easy. Every source control application includes a report of what files have changed since the last update and a comparison tool to view differences between versions. Setting aside simple beans and other classes that can (and should) be generated by an IDE, the total lines of code output by a team on a daily basis is far less than one might think. This is not because the team is not productive; it is because producing a line of code consists of thinking about what the line should be, writing the line, testing the line, and corrections to the current line or previously written lines based on test results.
The time taken to review the code can be greatly reduced through the use of code analysis tools both native to an IDE and available as plug-ins. By providing feedback from these daily reviews to team members and having them make the corrections themselves, the team will reduce their code standard variations. Code that is clean to begin with takes even less time to review. That hour per day can quickly drop to an average of 30 minutes a day.
These daily code reviews should not be full peer reviews. They only need to be cursory reviews looking for what can be found quickly (once the review becomes a habit). One Eclipse feature that can speed this process is the use of the Errors/Warnings settings under the Compiler preference. While the default settings are very useful, there is one non-default setting that every Java development team can benefit from. Setting Non-externalized strings to Warning.
Design for String Performance
The same String values used repeatedly in a web application is something that every developer is familiar with. Because example code is rift with String declarations, it is a common (and expensive) habit to declare Strings often. A much more efficient approach is the creation of a single interface to hold String values that will be re-used in more than one class (JSPs included).
A simple example is:
public interface StaticConstants { public static final String USER_ID = "userId"; public static final String PASSWORD = "pword"; }
The two Strings above will be familiar to anyone who has ever worked on a web application. If you were to look at your last application, how many times would you find these Strings declared? Multiply that by the memory size required by each and you will see how much time you spent in meetings discussing how to reduce the time it takes for a user to login where there was a simple (if minor) reduction available immediately. Then add in the processing time where the .equals() method is used instead of the faster == comparison operator. I have used this design approach for many years; I was fortunate enough to be introduced to it on my first Java project. The average number of such Strings used in a web application is 120, with the criteria that the String must be used by more than one object. Frequently, these Strings are used by four or more objects. You would average three objects per String with an average of 60 bytes per String. Gosh, that is only .02 MB. Hardly worth it, eh? Ah, but these Strings are rarely declared as static final, so if you expect 1000 concurrent users, you are now at 20 MB. I’m prone to kill processes on my machine that use anything more the 500k if they aren’t critical because I know they slow my 4GB machine down.
Although the use of String.intern() would also reduce overhead, that particular approach is much more useful at a class level than an application level.
Tracking the Culprits with Eclipse
The key to this has been mentioned, which is to set the Non-externalized strings to Warning in compiler preferences.
Figure 1: Setting Non-externalized strings to Warning in Compiler Preferences
The result of changing this configuration is that String declarations can be quickly found in the Problems view:
Figure 2: Quickly Identify Potential String Expenses
Hiding the Obvious Exceptions
Okay, so now you can identify all the non-externalized String literals to reduce overhead. Of course, in Figure 2 the warnings are displayed for the externalized String literals at their point of declaration. Eclipse provides a notation to add which allows the properly constructed Strings to be ignored as a potential issue.
public static final String USER_ID = "userId"; //$NON-NLS-1$ public static final String PASSWORD = "pword"; //$NON-NLS-1$
The use of //$NON-NLS-1$ tells the compiler to ignore the first string literal in the line. Another obvious exception to the rule is strings used for logging. The obvious solution would be something like this:
catch(Exception e) { //$NON-NLS-1$ logger.error("Unchecked exception in logoutPortalUser(): " + e); }
In this particular example, it may seem that it would be more efficient to declare a static final at the class level. However, since the logger concatenates the string, there is no actual savings, so adding the //$NON-NLS-1$ notation is adequate.
Just as the logger concatenates the string, you frequently need to do so in your own logger statements, such as:
catch(Exception e) { //$NON-NLS-1$ logger.error("Unchecked exception logging in "+userId+" with company name of "+companyName + e);} }
In this example, you have two strings on the line. This requires a double notation like this:
//$NON-NLS-1$ //$NON-NLS-2$
In some cases, it is cleaner to simply have multiple lines rather than multiple notations.
Cheating the Solution
Those of you with larceny in your hearts may be thinking to yourself “what is to prevent people from simply adding the notation without checking if the String is used elsewhere?” The answer is not technical in the sense of code, but is technical in the sense that people have tendencies based on thought processes known as meta-programs. Someone who is not contentious enough to use the notation properly is generally not going to think to use it improperly. Even though this is probably not 100% true, I have only seen someone cheat once. And, that is where the process of daily code reviews mentioned earlier comes into play; this is how I knew the cheat had been done. After discussing it with the developer, they were happy to apply their contentious cheating into contentious improvement of their code. A definite win-win.
Conclusion
The principle of Occam’s razor leads you to understand that the simplest solution is often the correct one. In software development, it is usually the best one, and often the one that eludes both junior and senior developers. Using a compiler warning to find potential performance issues is a very simple approach. What is interesting about this particular approach is first that the problem is common enough that a method for locating it is built into the world’s most popular Java IDE, and second that it is some commonly overlooked that the default setting is to ignore it.
About the Author
Scott Nelson is a Senior Principal Web Application Consultant with well over 10 years of experience designing, developing, and maintaining web-based applications for manufacturing, pharmaceutical, financial services, non-profit organizations, and real estate agencies for use by employees, customers, vendors, franchisees, executive management, and others who use a browser. For information on how he can help with your web applications, please visit http://www.fywservices.com/ He also blogs all of the funny emails forwarded to him at Frequently Unasked Questions.