Lately, a number of open source relational databases have received favorable press. Indeed, MySQL and PostgreSQL have been touted as capable alternatives to Oracle and Sybase, the traditional heavyweights of the commercial RDBMS world.
But let’s say you, the programmer, need something simpler. You may have outgrown plain-text “flatfiles,” but you might not be quite ready for the planning and maintenance that goes along with a full-fledged RDBMS.
Further, as you likely run Linux or some version of BSD, Microsoft Access isn’t aviable option, either. Isn’t there an intermediate data storage solution, somewhere between clunky text files and full-blown relational databases?
Fortunately, the answer is yes: Gdbm!
Gdbm, or the GNU Database Manager, is a fully-capable, yet simple means of storing and retrieving data. There aren’t 1,001 features and procedures to memorize, making for a gentle learning curve. Tuning is not an issue, and you don’t need sophisticated algorithms to get the most out of your data store. Yet, you can still use gdbm-powered databases for customer files, back ends to Websites, and anywhere else that an RDBMS would be appropriate.
On the downside, the speed isn’t as great as it might be with other systems. Databases created with gdbm aren’t indexed, meaning that you certainly don’t want to use it for a multi-terabyte data warehouse. But, for small to mid-sized data collections, the time differential is negligible.
Best of all, you don’t need to worry about mounting and unmounting anything created with gdbm. Nor is there any server to run. Your data is simply stored as a regular file on disk.
Now, how do we use it?
The first step is to make sure that you have gdbm. Most Linux distributions include it, but you can make a quick check by looking in your /lib directory. If you see something along the lines of libgdbm.so. Now that you have the gdbm library, we’ll jump right in and do some coding. You access the gdbm library by creating C programs that use functions found in the gdbm.h header file. As with other libraries, you link with a simple command line flag — in this case, -lgdbm. Creating a new database and opening an existing one are both accomplished with the same function: gdbm_open(). Here’s how it works: Gdbm Functions
The “flags” argument is what differentiates between creating a new database and opening an old one. But let’s not get out of sequence. Here, then, are the arguments and explanations:
GDBM_FILE dbf; dbf = gdbm_open(name, block_size, flags, mode, fatal_func);
char *name -- The name of your database. This will be the name of the binary data file that's written to disk.int block_size -- A block is a group of bits that's transferred as a single unit. In this case, 512 is both the minimum and the default. Once the database is created, the block size cannot be changed.int flags -- GDBM_READER Opens the database in read-only mode. Multiple programs can access the database while in GDBM_READER mode. GDBM_WRITER Allows both reading and writing. Only one program can access the database in GDBM_WRITER mode. GDBM_WRCREAT Creates a new database if none previously exists; otherwise the same as GDBM_WRITER. GDBM_NEWDB Creates a new database regardless of whether another one with the same name exists (i.e., it will overwrite old databases); otherwise, it's the same as GDBM_WRITER.int mode -- This sets the file permissions of the database, just as you would with chmod. A sample value would be 0644. void (*fatal_func) () -- A function for gdbm to call if it detects a fatal error. The only parameter of this is a string. If the value of NULL is provided,gdbm will use a default function. |
gdbm_close(dbf); |
Adding Data
Data is stored in gdbm files by means of key/value pairs. The “key” is analogous to a data string’s name, and the “value” (or “content”) is, of course, your data. So, for example, you might have a key called “name,” and the content would be “John Smith.” Here’s how you add a record:
ret = gdbm_store(dbf, key, content, flag); |
typedef struct { char *dptr; int dsize; } datum; |
GDBM_REPLACE -- Trash the old data, and replace it with the new. GDBM_INSERT -- Only add the data if it won't overwrite anything else. If a key with the same name already exists, then return an error (without writing anything). |
-1 The data was NOT stored, because the database was opened in read-only mode, or the data was NULL. 1 The data was NOT stored, because the "flag" used was GDBM_INSERT and a key with the same name was already in the database. 0 The data was written successfully. |
Retrieving Data
Now that you’ve populated your database, how do you get the information back out? Simple! Use gdbm_fetch(), like this:
content = gdbm_fetch(dbf, key); |
ret = gdbm_exists(dbf, key); |
Trashing Records
If you want to delete a record without replacing it, just use gdbm_delete() as follows:
ret = gdbm_delete(dbf, key); |
Retrieving ALL Records
If you want each and every piece of data in your database, you can use the next two functions:
key = gdbm_firstkey(dbf); nextkey = gdbm_nextkey(dbf, key); |
Deleting an Entire Database
Remembering that gdbm databases are comprised of single files; all the programmer has to do to delete a database is to delete that one file. Trivial!
The gdbm man page lists a few other functions, but I’m not going to cover them here, as they’re not imperative to the operation of a gdbm database, and this is just a simple overview.
Can the functionality of gdbm be accessed via shell scripts? The answer is yes, but it takes a little bit of ingenious manipulation. What you’ll need to do is create some programs that accept command line arguments, and then pass those values on to the gdbm functions. For example, let’s make a simple program that allows you to populate a database from the command line. Here’s how it might look:
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <gdbm.h>#define BLOCK_SIZE 512#define MODE 0644int main(int argc, char *argv[]){ int ret; datum key; datum value; GDBM_FILE dbf; if (argc < 4) { printf("Usage: insert <database name> <key> <value>n"); exit(0); } key.dptr = argv[2]; key.dsize = strlen(argv[2]); value.dptr = argv[3]; value.dsize = strlen(argv[3]); dbf = gdbm_open(argv[1], BLOCK_SIZE, GDBM_WRCREAT, MODE, NULL); ret = gdbm_store(dbf, key, value, GDBM_INSERT); gdbm_close(dbf); if (ret == 1) { printf("That key already exists.n"); } return EXIT_SUCCESS;} |
gcc -O3 -Wall -o insert insert.c -lgdbm |
insert <database name> <key> <value> |
insert employees name1 Mary |
I’m sure you could now figure out how to make another program to retrieve specific key values, using gdbm_fetch()!
In conclusion, I find gdbm to be an excellent tool for creating small data files. Either used within C programs, or called from the shell using a program such as include.c, gdbm functions are much easier to work with than the tedious alternative of opening a text file and searching through line after line. This is especially true when dealing with bash scripts!
Related Resources
1. GNU’s home page This is the home page for the GNU project. You can download gdbm — including the source code — from here.
2. The Linux Documentation Project The LDP is a vast storehouse of knowledge. More information on gdbm can be found in the ELF-HOWTO guide.
3. PostgreSQL home page When you’re ready to move up to a full RDBMS, check out PostgreSQL. Pay no attention to the silly name!
About Author
Jay Link is twentysomething and lives in Springfield, Illinois. Aside from Linux, his interests include mountain climbing and flying. He administrates InterLink BBS (an unintentionally not-for-profit Internet provider) in his fleeting spare moments, as well as working various odd jobs to pay the rent.