There’s been a fair amount of buzz lately about code generation: writing code
to write code. To some extent, developers use code generators all the time. When
you use a compiler, for example, you’re generating code in a lower-level
language. But that’s not the sort of code generation I’m discussing in this
article. The current crop of code generators use a variety of techniques and
their own high-level languages to generate high-level code.
Like most software tools and techniques, code generation can be overdone.
Though it’s extremely useful in the right circumstances, you need to be sensible
about identifying those circumstances. In this article, I’ll try to offer some
guidance.
A Generator in Action
To get a better feel for this modern world of code generation, let’s take a
look at one particular generator: CodeSmith. CodeSmith has its own
scripting language that’s very similar to ASP.NET. You use this scripting
language to define templates. For example, here’s a small piece of a CodeSmith
template for a HashTable class in C#:
<% if (TargetNamespace != null && TargetNamespace.Length > 0) { %> namespace <%= TargetNamespace %> { <% } %> #region Class <%= ClassName %> /// <summary> /// Implements a strongly typed collection of <see cref="<%= PairType %>"/> /// key-and-value pairs that are organized based on the hash code of the key. /// </summary> /// <remarks> /// <b><%= ClassName %></b> provides a <see cref="Hashtable"/> that is strongly typed /// for <see cref="<%= KeyType %>"/> keys and <see cref="<%=ItemType %>"/> values. /// </remarks> [Serializable] <%= GetAccessModifier(Accessibility) %> class <%= ClassName %>:
This is not the place to go into the details of the CodeSmith template
language (there’s a fine tutorial on the CodeSmith Web site), but you should be
able to get the general idea. The template provides a picture of the code to be
generated, and itself contains some scripting logic and some replaceable
parameters. Figure 1 shows this template in use in the CodeSmith interface. To
use CodeSmith, you fill in values on the property sheet and click the Generate
button. The result is code that can be pasted directly into your application
(CodeSmith also includes a Visual Studio .NET add-in, which can automatically
generate code whenever you rebuild a VS .NET project).
This little example demonstrates the essential features of most modern code
generators: they are tools that take some sort of abstract input, and use that
input to generate source code that can then be fed into a compiler or other
build process. Code generators can be freeware or commercial products, or
developed in-house for specific projects.
Signs That You Need a Code Generator
So far, code generation is just a neat parlor trick. The idea of writing code
to write code is interesting to most developers, but that’s not enough to make
it useful. When should you choose to buy or build a code generator for a real
project?
One good sign that it’s time to consider a code generator is that you’re
wearing out the Ctrl-C and Ctrl-V key combinations on your keyboard. If you find
yourself building large chunks of your project by cut and paste, you need to
stop and ask yourself why this is happening. It might be that you’re just being
sloppy, and that a bit of refactoring will eliminate the cut and paste; perhaps
all you need to do is define a utility function that you can call from elsewhere
in the project. In that case, you don’t need a code generator.
But it’s also possible that you’re going through a cut, paste, and edit
cycle. Consider the case of a HashTable class, for example. If your application
contains many business classes, you might find yourself with a HashTable that
holds Customer objects, and wanting one that holds Order objects. In that case,
your first temptation will be to copy the first class and then edit it so it
holds Orders instead of Customers. That’s not a situation that can be easily
solved with refactoring — but it is a great place to use a code generator. With
a single template and a code generator that’s capable of making successive
substitutions, you could quickly build both HashTable classes, without needing
to do any manual editing.
As the cut-paste-edit cycle gets longer, the attraction of a code generator
correspondingly increases. Consider the case of writing data access logic
classes to interface with a database. Your database might have 50 or 500 tables,
each of which will require stored procedures and code to handle data access. You
could put a junior developer to work writing that boring boilerplate code over
and over again, but a better solution is to use a tool that can iterate through
all of the tables in the database and automatically spit out the required stored
procedures and classes.
In some cases, schedule pressure will force you to consider code generation,
even if it’s not uppermost in your mind. What if you need to generate an HTML
Help file documenting a large class library, but you don’t have much time in the
schedule between the final testing of the class library and its release? The
answer is to use a tool that can automatically generate the documentation from
the final source code. Note that in this case, “code generation” means building
the source files that are compiled into HTML Help, rather than something like C#
or Java files. It’s the same basic process whatever the target.
Finally, don’t overlook the impact that code generation can have on code
quality. Let’s think again about those data access layer bits. Likely each table
will require dozens of lines of SQL code and hundreds of lines of high-level
code in your application. If you write all of those lines of code by hand,
what’s the chance of a typo slipping in a subtle bug somewhere along the line?
All too high, at least in my experience. By generating the code automatically,
you only need to make sure that the template is correct. After that, the
individual stored procedures and classes are sure to be correct — assuming that
there’s not an error in the code generation code itself! Code generation does
not free you from the necessity to test your code, but it can lower the chance
of silly errors.
Warning Signs
On the other hand, there are some times when code generation just doesn’t
make sense. Start with the most obvious barrier: if your application doesn’t
contain a lot of repetitive code, you’re unlikely to save effort with a code
generator. Remember, the code generator itself is code that needs to be
maintained and tested. If you’re only including a single HashTable class, you
might as well just write the class by hand. To use a generator for it only
increases the footprint for potential errors.
In fact, you should be very wary of using a code generator for any code that
you couldn’t write by hand if you needed to. The code generator should be a
time-saving device, not a black box that turns out magic code. You need to
consider that you might have to maintain the code by hand in the future, for a
variety of reasons. Perhaps you’ve upgraded your tools and the code generator
won’t work in the new version, or perhaps you need to customize the code after
all. Either way, you’d best understand the code that’s in your project.
Cultural factors can also get in the way of code generation. Just because
you’ve identified a need for such a tool doesn’t mean that your boss understands
the same need. Building a code generator takes time, and buying one takes money.
Before you can spend either one, you need to make sure that you have buy-in from
your management. Otherwise, the time you spend writing code to write code may
well seem wasted to someone who’s only monitoring your output in the “real”
project.
Other members of your team, too, can get in the way of code generation. Some
developers have the attitude that “real programmers don’t use code generators.”
Such developers are unlikely to use such a tool, even if it’s plain to you that
it’s the best thing for the project. Worse, they may actively sabotage your
efforts to use code generation. Most code generation tools are designed to
replace the classes they generate when they’re run again. If someone else is
making changes to those classes by hand, you can end up in an endless cycle of
check-ins and check-outs, as people try to recover their code that was
overwritten by your tool. In such a case, if you can’t educate your coworkers,
you might as well give up on code generation.
Finally, beware of the “all I have is a hammer” syndrome, where you treat
everything as a nail whether it is one or not. Code generators tend to be very
targeted tools that build a particular kind of code. Don’t get overly fond of a
particular tool before you determine whether it’s right for your own project. If
a particular object-relational mapper creates business objects that don’t have
the interfaces you’re expecting, you don’t want to end up rewriting your entire
project to accomodate the tool. Rewrite the tool instead, find a different tool,
or rethink your strategy.
If it Works, Do it!
The bottom line is simple: code generation can save you a considerable amount
of time and money on a large project. When you’re faced with a huge project and
time pressure (and when was there ever a huge project without time pressure?),
take a few days to get a feel for the code that you need to build. You may well
identify areas where a targeted code generation tool can help take the pressure
off, and that’s a big win when it happens.
About the Author
Mike Gunderloy is the author of over 20 books and numerous articles on
development topics, and the lead developer for Larkware. Check out his MCAD 70-305, MCAD
70-306, and MCAD 70-310 Training Guides from Que Publishing. When he’s not
writing code, Mike putters in the garden on his farm in eastern Washington
state.