Although fashions come and go in software development, some things
stay remarkably constant. One of these is the use of databases. You
may be wonderfully up-to-date with an AJAX Web interface or the latest
whizbang Windows user interface, but under the covers, you’re probably
still pumping data in and out of a database, just as we all did a
decade or more ago. That makes it all the more surprising that
developers are still making the same database mistakes that date back
to those good old days of Windows 95 and before. Perhaps it’s just
that most of us learn to use databases on the side, rather than really
studying them. In any case, here are my nominations for the biggest
mistakes that I see over and over again.
Choosing the Wrong Database
Not all databases are created equal — which means before you do
anything with a database, you have to pick the appropriate database. Time
and again I’ve seen Access databases groaning to bear the load of huge
data sets that would have been child’s play for SQL Server, or harried
users trying to pay for and set up SQL Server to hold a few hundred rows
of data. Broadly speaking, there are three tiers of databases in the
market these days: desktop and embedded databases suitable for smaller
tasks, “Express” versions of the major players that are good up to a few
gigabytes of data, and the truly enterprise databases like SQL Server,
Oracle, and DB2 that can handle just about anything you can throw at them.
Before you do anything else, you need to make some realistic estimates
about the amount of data that you’ll be storing and pick the appropriate
product to do the storage.
Choosing Too Many Databases
APIs such as ODBC, JDBC, and OLE DB have promoted the notion of
database independence – the idea that you can write your application
code in such a manner that you can plug any database at all in for
data storage. Well, yes, but there are compromises. I’ve seen a lot of
teams go down the rat hole of database independence, writing layers to
translate all of their SQL statements to some lowest common
denominator dialect that every conceivable database will support, and
at the same time giving up on advanced features available in any
particular database. The notion seems to be that some client in the
future might want to switch to Oracle or DB2 or FoxPro or whatever, so
it’s best to be prepared now. On the contrary: when you’re starting
out with a new product, pick your storage engine and write to it. If
your product is good, people will install the database you specify,
and you won’t be wasting untold man-hours supporting “just in case”
scenarios that you’ll probably never need.
Know Your Data
If I had a dollar for every time it turned out that some customer
numbers had seven digits instead of six, or that the registrar’s office
really did allow students to register without a social security number due
to privacy concerns and so the column has to be made nullable – well, I’d
have a lot of dollars. Database design can’t be done in a vacuum, away
from the business rules. It’s critical that you get the input of the
actual users of the data, and hammer on them to find out for sure how big
each column needs to be, what rules apply to it, what types of data it
will hold, who can update it, and so on. Otherwise, you’re setting
yourself up for costly rework down the line. You’ll learn to dread
sentences starting with “Well, it looks fine, except…”
It’s Just Like Excel, Right?
There’s a tendency, especially among managers of smaller shops, to
assume that any developer knows how to set up a database. Frankly,
this perplexes me. You wouldn’t assume that any given developer knows
how to code in C# or set up a Web Service, so why is it that we’re all
supposed to be database pros? The end result is that too many
databases are designed by people who have never even heard the term
normalization, never mind developed any understanding of the various
normal forms. I can’t tell you how many times I’ve found everything
shoved into one big table with hideous update anomalies and
performance issues as a result. If you’re in this situation yourself,
and you’re over your head, demand training – or start job-hunting.
Efficient database design is something you need to learn, not discover
by trial and error.
Third Normal Form is Not the Holy Grail
On the other hand, a little knowledge really can be a dangerous thing.
I’ve seen databases that were normalized to death by well-meaning
developers who insisted on putting everything in lookup tables. I
recall one memorable instance in which “yes” and “no” were relegated to
tblAnswers, where they could be referred to by an
AnswerID foreign key from other tables. Yes, you need to know
the normalization rules, but you also need to develop the skill to know
when to stop normalizing, and when denormalization for performance
actually makes sense.
What a Great Place to Hide Application Logic!
Stored procedures and triggers are a wonderful thing. When you’ve
got multiple clients accessing a database, they can be a great way to
make sure consistent data processing takes place. But they can also
turn into an ugly black box in which application logic hides, unknown
to Web and thick client developers, generally unseen and unreviewed.
Too often database code isn’t subject to the same standards of design,
test, and code review that we demand for the rest of our applications.
When you’re tempted to put code in the database, take a moment to ask
yourself whether it really belongs there.
Who Needs Backups?
Who needs backups? You do. Presumably you’re storing data in a
database because it’s important enough to hang on to. Somehow, though,
I end up walking into situations where “nobody got around to it” on a
regular basis, and valuable data is lost forever because hardware,
hackers, or just plain mistakes munged the database and there wasn’t
any backup. Your backup plan (including things like frequency, type of
backup, and how often you’re going to take backups off-site) needs to
be in place at the start of the development cycle, not at the end.
Yes, You Need Version Control
Speaking of backups, you need to worry about schema changes to your
database as well as data changes, and you need to keep track of those
schema changes in such a way that you can recreate the database at any
point in time. That’s right, if you want to do a really professional
job of building software you need to extend version control to your
database design. It doesn’t do much good to be able to recover version
0.784.5 of the software to test out a customer bug if you can’t also
produce the corresponding database. If your database developers are
cheerfully writing stored procedures and tweaking table designs
without leaving any trace of their work, you’ve got a problem.
Use the Tools
Modern databases offer a lot more than just a series of buckets
that you can toss your data into. They also come with a substantial
variety of tools to make it easier to manage that data. For example,
SQL Server makes it easy to inspect the plan of attack that the server
is using for your queries, and even includes wizards to tell you what
indexes would make your queries more efficient for the actual load
you’ve been throwing at your server. I’ve had great success running
these tools on client databases and speeding things up, or lowering
CPU usage by a factor of two – but the fact is, they shouldn’t have
had to call in a consultant to tell them to use the stuff in the box.
If you don’t know what tools and utilities come with your database,
and what they can do for you, then you’re paying for value that you’re
Don’t Assume Everything is a Nail Just Because You Have a Really
Databases have a tendency to take over all data storage for an
application. I’ve seen applications that tried to build an entire
metadata-driven user interface, and then stored that metadata along
with user preferences in the same database that was holding the
business data. This is a good way to complicate your life and kill
performance; some data really does belong in local files, not in a
client-server database across the network. When you’re storing data,
you need to evaluate the different places you can put it (database,
registry, plain text files, XML files…) and pick the appropriate
spot for each piece of data. Don’t just automatically shove it into a
database just because you have a connection string handy. These days,
there’s probably more of a tendency to overuse XML files than
relational databases, but the principle still holds.
About the Author
Mike Gunderloy is the author of over 20 books and numerous articles on
development topics, and the Senior Technology Partner for Adaptive Strategy, a
Washington State consulting firm. When
he’s not writing code, Mike putters in the garden on his farm in eastern
# # #