In general use in English, obfuscate means to obscure, hide, or confuse. It means the same thing in programming, where a developer will obfuscate code to obscure, hide, or confuse the meaning of code.
You might wander why you would want to obscure, hide, or confuse code’s meaning. After all, well written code should be well documented, use meaningful names, and follow other standard practices so that other developers can understand your code. Which is great for storing the original source, but once your code is compiled and in use by a customer or available on the internet, you might not want everyone who can get the binaries to read your source. Still, binary isn’t source that that’s not a problem, right?
But, with today’s two most prevalent platforms (Java and .NET), your compiled application can be fairly easily decompiled. With a good decompiler, anyone with your binary can recreate a reasonably close approximation of your original source. With that in hand, a competitor could appropriate your unique ideas and code into something similar for their product. Or a hacker could examine your code for weaknesses to exploit on systems running your application. On the other hand, some languages such as C, are so difficult to decompile or reverse engineer in the first place, that obfuscators would provide little additional benefit.
So, unless you are distributing your projects as open source and you want everyone who has the application to be able to examine the source, you might want a way to make the source hard to read, and that is obfuscation.
Obfuscation works at the very end of the development process. After you’ve developed and tested a project, you’ll run the obfuscator. Several third party obfuscator applications are available for both Java and Microsoft .NET languages. The obfuscator will use several techniques to take your beautiful code and render it into something resembling jibberish to the eye, without changing the run-time behavior of the application. (You’ll of course want to retest the application before deploying obfuscated apps.)
One method the obfuscator will employ is to turn descriptive names into something short and meaningless. So instead of a name that is something like revenue an obfuscator might replace with something shorter and cryptic like a. Once all of the names in a application have been stripped of their meaning, anyone reading the code will now have a much harder time trying to determine application behavior from reading the names.
Some obfuscators also use a technique called overload induction to rename multiple items to the same name. The obfuscator does this by examining all the application code and finding the context in which a name is used, and identifies named items that can be distinguished by something other than their name, for example, a parameter list. In this way, if address and payroll are defined different parameters, the obfuscator cane rename them with the same name, knowing that when they are compiled, they won’t clash because of the parameter differences. The more items that all share the same name, again the harder it will be for a human to distinguish one from another.
Obfuscators will also strip out extra lines breaks, spacing, and other structure you’ve added for readability. Doing this collapses neat nested code down into compacted strings of text where the flow is harder to understand. Obfuscators also look for code that is inserted by an IDE or designer application for the IDE or designer’s sole use and strip it out.
Obfuscation can have other side benefits than just protecting your intellectual property. Changing all of the names from long descriptive names to short meaningless ones will reduce the overall length of the code and size of the application. If the application no longer has to store a long multiple character name like revenue and is instead tracking that by the name a, the application will use less memory to store a. Likewise, stripping extra line breaks and character spacing reduces the application size.
Obfuscation is not desirable in every setting. For example, if you are writing components that need to be reusable from other applications, you shouldn’t obfuscate the components or they’ll no longer have the right names to be called for reuse.
Jim Minatel is a freelance writer for Developer.com in addition to working with Wiley and WROX publishing.