An MUA (Mail User Agent) featuring an innovative approach to email message storage is now under development, and it bodes well for those email users who number their messages in the thousands. Gmail (http://gmail.linuxpower.org/), the brainchild of Wayne Schuller, a young Australian programmer, uses the MySQL database server, and retrieves, sorts and displays messages via SQL, the well known “Structured Query Language.” It became possible to develop Gmail as an Open Source project when, earlier this year, MySQL changed its licensing to the GNU GPL. Schuller is a advocate: “I’m a big supporter of the GNU project and the importance of GPL licensing; that pushed me in the [direction of] Gnome two years ago.” (He also reminded this reviewer that Gmail is not the only MUA using a SQL database; Pronto (http://www.muhri.net/pronto/), written in Perl, is very much a going concern.)
It isn’t hard to accumulate thousands of email messages. If one makes some careful choices among the myriad mailing lists that abound across the Internet, messages by the hundreds will pour in daily, and add up to thousands as the days go by. Given that hard drive space has become quite inexpensive, anyone can now create their own archives of lists that are of interest to them. If you’ve ever struggled with those online “searchable” archives of email lists, accessed with a web browser, then you can appreciate how such personal archives can have a use, sometimes a life-saving use, if the solution to a quandry can be quickly found because those archives were created and maintained.
Gnomes, Trees, and Queries
Storing thousands of pieces of information in a fashion that permits easy retrieval is a task computers have always been eminently qualified to undertake. Over the past ten years or so the development of computer databases has converged on SQL as the preeminent tool for manipulating the contents of these vast warehouses or “mines” of data. SQL is common to practically all the major commercial database products. The historical timeline stretches from origins in the Ingres and Postgres databases and forward to the current Open Source incarnations of PostgreSQL and MySQL. Gmail takes advantage of this mature technology to “nail down” the software requirements for storing, sorting, and viewing thousands upon thousands of messages.
Like many Open Source projects, Gmail began as a way to learn programming in a particular environment, in this case, the Gnome graphical desktop. In keeping perhaps with the trend set by pine and elm, several MUAs written for the Gnome desktop have adopted names of trees, such as Balsa, Spruce, Mahagony and Eucalyptus. Initially Schuller wanted to join the Balsa project, already well underway:
“The project [Gmail] allowed me to learn how to code in gtk and the gnome technologies. I did want to contribute to Balsa, but after many hours effort I found I couldn’t really understand the code. Starting from scratch allowed me to learn all the gtk/gnome stuff.”
Although still very much in “beta” phase, Gmail is already on its way to becoming a full-featured email client. For receiving mail it supports POP downloads from multiple mail accounts. APOP and IMAP have not been added yet, but APOP accounts can be utilized via a combination of fetchmail and a POP server (for instance qpopper) running on the local host. There is drag and drop support for utilizing the Gnome addressbook to manage email addresses. Messages stored in the MySQL tables can be exported to a standard Unix mbox file format, and, as noted, Unix inbox spool files can be read into Gmail with a POP server. Sound events can be configured via the Gnome Control Center, and printing is provided by the Gnome-Print facility. For users of dialup connections, outgoing messages can be composed offline and queued for sending later. Just in the past month or so Gmail’s mime support has made great strides. For the true “hacker” (not to be confused with computer security vandals) Gmail’s user interface can be custom tuned with a little proficiency in the use of Gnome’s “glade” facility.
Virtual Folders: Mileage Out of SQL
Despite this impressive list of features, Gmail’s real strength, what sets it apart from the pack, is its use of SQL queries to create and maintain “virtual folders.” Many email clients have built-in facilities for sorting incoming mail into various “folders,” or separate files according to rules defined by the user. The old Unix workhorse for this purpose, procmail, is still widely used, but many newcomers shy away from procmail due to “regex phobia,” fear of the arcane syntax used by the “regular expressions” that make up procmail’s rules for sorting.
For users who need only a few folders, and don’t anticipate accumulating thousands of messages, a few separate “folder” files will not present any problems. But for the industrial strength email list addict (or anyone else who finds it necessary to shepherd thousands of messages) these separate files soon become both numerous and large, factors which make the work of backing up or archiving messages increasingly tedious. At a certain point the diminishing returns of the file folders method become apparent, and it makes more sense to maintain (and eventually archive) *one* file that contains all the messages received in a certain period of time. Instead of individual files (“folders”) containing the messages of each email list (or subject, or sender, or whatever criteria are used to define the folders…more on which later) Gmail creates virtual folders which are simply SQL “views” of the message database. From Gmail’s User Guide:
“It’s main feature is the powerful vfolder mechanism. Imagine all your emails are sitting in the one big folder, but you can look at it in different ways (queries). These queries happen at run-time. So you can have a virtual folder (vfolder) which gives you all messages with ‘ham’ in the subject. And you can have another vfolder which gives you all messages with ‘pork’ in the subject. And if someone sends you a message with ‘pork and ham’ in the subject, it will appear under both queries!”
No new files are created, but from the user’s point of view, these virtual folders appear to be identical in appearance and use to those that are based on a set of individual folder files. When it comes time to archive the MySQL tables (two are used) that hold Gmail’s messages, one has the option of exporting the tables to an mbox format file. Then all sorts of possibilities are open, such as indexing that file with a tool like “glimpse” (http://www.webglimpse.org), or compressing it with either gzip or bzip2 and then using grepmail (http://grepmail.sourceforge.net) to read out messages matching a given condition or set of keywords. Or, other MySQL tools can be utilized (or created!) to access and retrieve messages directly from the tables. The use of MySQL tables means the keyword for Gmail’s message storage is: power.
SQL Made Easy
One does not need to be a SQL guru to set up virtual folders in Gmail, since a central tenet guiding its development is to make SQL queries available without requiring any knowledge of SQL syntax on the part of the user. While a full-fledged query-builder is still only in the planning stage, the process has been simplified a good deal. There is no need to write complete SQL SELECT statements, and the process is documented so that most users, perhaps with some trial and error, can easily produce working virtual folders. For instance, here are the definitions of two of this writer’s Gmail folders:
For the Snort IDS (http://www.snort.org) email list:
(subject LIKE "%Snort%" AND direction = "Incoming") |
For the Gmail discussion email list:
((subject LIKE "%Gmail%" OR tofield LIKE "%gmail-discuss%") AND direction = "Incoming") |
Here, from the Gmail User’s Guide, is a description of the matching fields available for these folder definitions:
“Gmail uses two tables to store emails. Why two? Because one is fixed length for fast access, and the other contains data which is variable in length.
“The fixed length table is called ‘display’, the vfolder usable fields are: date, subject, fromfield, readstatus (Read, Unread, Sent or Queued), direction (Incoming or Outgoing).
“The other table is called ‘details’, the vfolder usable fields are: headers (all the headers from the email), message (the body of the email), tofield, ccfield, bccfield, attachments (not in use yet).”
For each message in the database, Gmail “caches” its “folder matches,” i.e. a list of all the queries that message matches, for speedy navigation among different vfolders. Whenever a vfolder query is added, deleted or modified, this matching index must be rebuilt, but Gmail automatically prompts the user whenever this step is necessary. This is the one Gmail operation that can be time-consuming, depending on the speed of the machine and the size of the message base. On the author’s test machine, an old K6-233, a database of 4450 messages with ten vfolders required two minutes fifty four seconds for this step.
Although Gmail’s performance does not slow down as the size of the message base increases (with the one exception just noted), eventually it may be desirable to archive the messages one has accumulated. The mbox export is perfect in this context; after accumulating, say, six months of the dozen or so Debian (substitute your favorite distro) lists one subscribes to, those six months worth of messages can be read out into one mbox file for long term storage. There’s no need to keep track of twelve archives, one for each list.
Not Your Father’s Emailer
There are rough spots of course. Currently Gmail’s editor is a GTK textentry widget, so real word-wrapping isn’t there yet. Being very much still in “beta” stage, Gmail is still somewhat buggy, and will crash or hang from time to time depending on precisely what series of operations you ask it to perform. However, in several weeks of constant use by this writer, with a database of over four thousand messages (quite small by Gmail standards!), no mail has been lost, either incoming or outgoing. The mail stays safely tucked away in the MySQL tables regardless of what happens to Gmail itself, and not all email clients can make that claim!
And, granted, this is not everyone’s email client. It requires a working installation of MySQL to be present before Gmail can be installed. It requires a complete and up-to-date set of Gnome libraries. But both of these conditions can be met by virtually any current Linux distribution, and support is available, in the form of an email list populated by a handful of dedicated Gmail fans who have been following and debugging Gmail for months now. RPM’s of Gmail binaries are available, but it builds from source code without a great deal of trouble, provided one’s Gnome installation is up to date and “sane.”
This young project exemplifies the founding spirit of the free software movement, which thankfully remains alive and well. Given current trends in the Linux “industry,” we should enjoy projects such as Gmail while they are still with us.
About Author
About the author: Robert Bernstein (poobah@ruptured-duck.com) is a freelance writer specializing in the Open Source and Free Software movements. He has edited Linux texts for Macmillan Publishing, and written technical articles for Caldera Systems. Over the years he has worked as a land surveyor and as a Licensed Mental Health Counselor in Massachusetts. A life-long New Englander, Bob now lives in Esmond, Rhode Island, a village he describes as “apple orchards, cow pastures, and Victorian textile mills that speak to you from a hundred and fifty years ago.”