It is a difficult prospect to take machine level code and translate it backwards into language level code with a complex programming language. Today, many modern compilers have built-in security features that remove all variables and function names, move code to optimize execution, strip all comments, and make the translation as difficult as possible. Unfortunately, it is easy to recover an assembler language version of a Java program. This is because Java’s constant pool contains a great deal of information about the source code.
Java programs are especially vulnerable to decompilers, because Java source code is compiled to Java bytecode, which is a platform independent abstraction layer for the virtual machine. Because Java’s bytecode contains interface and type information that runs safety checks on the language before it is actually run on the machine, decompiling is much easier in Java than most other development languages.
Java decompilers simply look for recognizable patterns in Java’s generated bytecode and then translate these patterns back into source code. This vulnerability became evident early in Java’s history with the advent of the controversial free decompiler Mocha, Hanpeter Van Vliet’s Sublime, and other reverse engineering tools.
The truth is that there is nothing stopping Java source code from being decompiled and analyzed once it has run natively on the machine of a hacker. Most hackers have better reverse engineering tools than developers, and security policy should assume that eventually all of the source code of a project will be decompiled and made available. Once the source code is compromised, it is easy to decipher how program security functions work, bypass them, read hard-coded information such as confidential material, and break or copy hard-coded algorithms.
Fortunately, there are a number of tricks developers can use to make their programs more difficult to decipher.
Salting code is an old programmers trick. When salting program code additional nonfunctional code or idiosyncratic symbols are added to make it more difficult to decipher. Programmers will build in intentional, non-harmful errors or loops that run and compile along with the program, but do not interfere with the programs function.
Although salting could potentially cause unforeseen bugs or cause decreased performance, it is useful in that it uniquely stamps code. Salting may not stop someone who is persistent from decompiling and restructuring the source code, but added errors and useless bits makes the source unique, which can help legislation demonstrating ownership during copyright contests or lawsuits.
Obfuscating source code is another tactic developers use to make it more difficult to reuse their code. This is a tradition that goes way back (it used to be called shrouding in Unix), where code is obscured to make it difficult to read. Obfuscation basically consists of replacing variable names with meaningless symbols, removing any or all programmer comments, removing any white space or tabular organization, and leaving as little a resemblance to the English language as possible. The theory is that the code may still compile into valid Java, but it will decompile into something that is nearly impossible to read or organize.
There are existing products that automate some of this process for coders like Crema, Van Vliet’s own answer to Mocha. Crema actually adds a new tact by replacing variable names in the constant pool with reserved words (like class or if), which not only adds complexity to reading the source code, it makes it impossible to recompile again without serious modification.
Since Java bytecode patterns are extremely vulnerable to decompilation, breaking up these recognizable patterns with fake instruction sequences is an effective deterrent to decompilation. This is called bytecode hosing, and it can be accomplished by existing security software such as Mark LaDue’s HoseMocha. Unfortunately, bytcode hosing severely cripples the performance of the JIT compiler.
Although these tactics can make source code difficult to decompile, relying on the secrecy of Java source code for security is a mistake. Sensitive information should never be hard-coded, particularly passwords, proprietary algorithms, or private cryptographic keys. After all, once Java code has been allowed onto a machine natively, it is only a matter of time before someone with the proper tools and knowledge will crack it.
Code security is a process, not an end result. Although it is a difficult prospect to stop program code from being compromised, stolen, or decompiled, programmers have a responsibility to reasonably secure their code, particularly when the code deals with sensitive information. Reasonable security begins with a reasonable security policy and continues by focusing on some of Java’s weak links, namely memory management and decompilation.
References and Resources
- Java 2 Network Security, Second Edition, Pistoia, Reller, Gupta, Nagnur, and Ramini, Prentice Hall, 1999.
- Java Security Handbook, Jamie Jaworski and Paul Perrone, SAMS Publishing, 2000.
- Securing Java, Gary McGraw and Ed Felton, John Wiley & Sons, Inc., 1999.
- Princeton University’s Secure Internet Programming Team.
About the Author
Thomas Gutschmidt is a freelance writer, in Bellevue, Wash., who also works for Widevine Technologies.