Uploading Old Email to Gmail using Java
Java Programming Notes # 2404
- General Background Information
- Discussion and Sample Code
- Run the Program
- What's Next?
- Complete Program Listings
The recent availability of low cost (or no cost) Email services, (such as Google's Gmail), which provide massive storage capacity, advanced features, and lightening-fast search capability has caused many of us to rethink the way that we manage Email.
A whole new outlook
Up until recently, I managed my Email pretty much the same as almost everyone else. I downloaded the messages from several different Email accounts into an Email client program, (Netscape Mail in my case), read messages, deleted messages, filed messages, etc.
Since getting my Gmail account in May of 2005, I have discovered that there is a much better way to manage Email. Quite simply, in my opinion, Gmail is the finest Email program that I have ever seen. It has completely changed my outlook (no pun intended) on how to manage Email.
Solving some operational problems
However, making the switch to Gmail does involve a few operational problems, none of which are of Google's making. The lessons in this series of lessons are intended to show you how I have elected to write special Java programs to deal with those operational problems.
You may find it useful to open another copy of this lesson in a separate browser window. That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.
One lesson in a series of lessons
This is the second lesson in a series of lessons on the general topic of moving Email messages around among servers and local computers. The first and previous lesson in the series was entitled Consolidating Email using Java.
I recommend that you also study the other lessons in my extensive collection of online Java tutorials. You will find those lessons published at Gamelan.com. However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there. You will find a consolidated index at www.DickBaldwin.com.
Historical approach to managing Email
Up until recently, I managed my Email pretty much the same way that others did. I downloaded messages from several different Email accounts onto my hard drive using a local Email client program. In my case, that client program was Netscape mail.
I would delete most of the messages because they were identified as spam either using Netscape filters or the Netscape junk mail capability. Then I would read the remaining messages.
Delete some, file the rest
Having read the remaining messages, I would delete some of them in order to conserve disk space, and file the rest in an elaborate system of Email folders that had evolved over the years.
(I have accumulated more than 500 Mbytes of Email messages on my hard disk.)
Hmm, now where did I file that message?
Everything usually went pretty well until I needed to find a message that I had received and filed earlier. Oftentimes, I would forget which folder I had filed it in. This often resulted in a long and frequently tedious manual search. Sometimes I would try running the search feature of the Email client program, but it was so slow making one search pass through 500 Mbytes of Email messages that it wasn't really practical.
Gmail to the rescue
Then along came an opportunity to open a Gmail account, which I did mainly out of curiosity. Once I became familiar with Gmail, I quickly decided that it would form the basis of my new approach to Email management.
(Access to a Gmail account is currently available by invitation only. People who already have an account can invite others to open accounts. As of this writing, I have coupons for fifty invitations. If you would like to open a Gmail account, send an Email message to firstname.lastname@example.org notifying me of that fact and I will submit your Email address to Google in order to get an invitation for you. To keep my spam blocker from discarding your message, please include Gmail in the subject of the message.)
Advantages and disadvantages
Gmail offers many advantages over my old approach. So far, at least, I haven't found any disadvantages. Although there are advertisements on the right side of the browser window when I read a message, they are unobtrusive and they don't bother me at all. In fact, I can eliminate them by adjusting the width of my browser window if I want to.
I have read concerns about privacy issues that involve storing Email messages on a server controlled by someone else. When I am concerned about having confidential information in my Email messages I encrypt them anyway, so that isn't a concern for me. I have much greater concerns about privacy related to making online purchases and having my credit card number stored in hundreds of servers scattered around the world than I do about having my Email messages stored on the Gmail server.
Virtually unlimited disk capacity
As one advantage, for example, I don't need to be concerned about conserving disk space. Google makes so much space available that I can even afford to save the messages in the trash folder. For example, here is a statement that appears on the bottom of my Gmail web page today, "You are currently using 243 MB (10%) of your 2502 MB."
Why would I save the trash?
Why, you might wonder, would I want to save the trash? As it turns out, having a large number of messages in the trash folder is very valuable. I periodically do a statistical analysis on those messages to identify emerging spam trends, such as new incorrect spellings for various medications. I use that information to develop new spam filters using Gmail's excellent filter system.
Also, approximately one-third of the messages that I receive are in languages that I can't read. Therefore, there is no point in having them clutter my inbox. Periodically, I use the messages in the trash folder to identify foreign-language characters that can be used to filter out foreign-language messages. This turns out to be a fairly easy filtering task. I am able to trap about ninety-five percent of all foreign-language messages and send them straight to the trash folder completely bypassing my inbox.
The main advantage is a lightening-fast search
I could go on and on singing the praises of Gmail, but I won't. The main advantage of Gmail, (the advantage to which this lesson is devoted), is the ability to use a very sophisticated search capability to search all the messages in the archives (and optionally in the spam and trash folders) with lightening speed.
An alternative to Email folders
Although Gmail makes it possible to apply labels to messages (as an alternative to using mail folders), the ability to search very rapidly reduces the need for filing messages in an organized way.
Basically, Gmail has only three folders:
- All Mail
The All Mail folder is automatically subdivided into the following categories, and you can further subdivide it through the use of labels:
- Sent Mail
A single message may be tagged with none, one, or more labels.
POP3 access and mail forwarding
I now do all of my serious Email processing using Gmail's web mail capability. However, Gmail also provides POP3 access and message forwarding at no cost as well. I use the POP3 capability to download and save selected messages locally for backup purposes. So far, I haven't found a use for the forwarding capability, but it is good to know that it is available.
Now back to the search capability
As an example of searching, I recently had a question about the availability of campus parking permits for the upcoming school year. I remembered having received an Email message that addressed that subject sometime in the recent past.
I was able to search through tens of thousands of messages for the keyword parking in less time than it took for me to type the keyword into the search field. The search isolated thirteen messages out of the thousands of messages in the archives and it was easy to spot the one that I needed. If need be, I could have further narrowed down the search using a logical combination of multiple keywords (and/or labels) in conjunction with AND, OR, and NOT.
If I had been looking for this message on my hard drive using my old approach, I would probably still be trying to figure out which folder contained the message.
Just use it and be quiet about it!
By now, you are probably wondering why I don't simply use Gmail and be quiet about it. The reason is that there are a couple of operational problems in making the switch to Gmail that are not of Google's making. I want to share the solution to those problems with you just in case you might be interested in making the switch.
Both problems revolve around the fact that in order for Gmail to be most useful, it is important for me to consolidate all of my Email messages on the Gmail server so that I can apply Gmail's fast search capability to all of my Email messages.
Email accounts refuse to forward messages
The first problem is that over the years, I have accumulated several different Email accounts. Unfortunately, a couple of the most important accounts (including my employer) refuse to forward my Email messages to Gmail, (or to any other Email account, for that matter). I addressed that problem in an earlier lesson entitled Consolidating Email using Java. In that lesson, I provided a Java program that can be used to fetch messages from such uncooperative Email accounts and to forward those messages to Gmail (or any other Email account).
Legacy Email messages
The second problem has to do with legacy Email messages. Over the years, I have accumulated many tens of thousands of Email messages under control of the Netscape Mail program on my local hard disk. I need to upload those messages to the Gmail server so that they will be included in my newfound search capability.
In this lesson, I will provide and explain a Java program that can be used to upload legacy Email messages to Gmail or to any other Email account.
Mbox format is required
This program requires that the legacy messages be stored in the well-known Mbox Email format. Many Email client programs, (including Netscape, Eudora, and Mozilla's Thunderbird) use this storage format. However, some Email client programs, (such as Microsoft's Outlook), do not store their Email messages in the Mbox format.
If you have legacy Email messages stored in Mbox format, you should be able to use this program to upload them to an Email account.
Format conversion may be required
If your legacy messages are stored in some other format, you will need to search the web to find a program that will convert them into Mbox format before attempting to upload them using this program. Because I have no experience with such conversions, I can't recommend any specific programs for making the conversion.
Email transmission volume limitations
When I first embarked on this project, my initial inclination was to write a recursive routine that would automatically traverse my entire Email folder tree, extracting messages and uploading them to Gmail in one great blast of Email transmissions. This would entail the sending of many tens of thousands of Email messages.
Then reality set in
Then I realized that if I were to start a program running that would automatically format and send hundreds (possibly thousands) of Email messages per hour over many consecutive hours, my ISP would shut me down after the first several hundred messages were sent. In other words, such an operation would probably trip a spam alarm causing the ISP management system to conclude that I was conducting a mass spam mailing campaign and they would disable my ability to send Email messages after a few hundred messages.
A more conservative approach
So, I was forced to take a more conservative approach. The approach that I settled in on involves the following steps:
- One time only, set up a dummy Email account in Netscape Mail that is linked to a dummy POP3 server and a dummy SMTP server. This account cannot be used to actually process Email outside of my local hard disk.
- Once each day, use the Netscape Mail program to delete all messages from the inbox folder of the dummy account and to copy several hundred existing messages from other folders into the inbox folder. (If you do this using Netscape Mail, be sure to Compact Folders after deleting the messages from the inbox folder. Otherwise, they will remain physically in the Mbox file and will be picked up and processed by the Java program discussed below.)
- Start the special Java program running to upload the messages from the inbox folder of the dummy account to the Gmail server. The program knows how to find the inbox folder of the dummy account and how to send those messages to the Gmail server.
May require several weeks to complete
Using this approach, several weeks (maybe several months) will be required for me to upload all of my legacy Email messages to the Gmail server, but the process is relatively painless. (I have no idea how many messages there are to be uploaded. I have been uploading about 750 messages per day for several days and haven't made much of a dent in the total.)
In this lesson, I will present and explain two classes that work together to:
- Extract individual Email messages from an Mbox file in a specified directory.
- Send those messages to a specified destination Email address using a specified SMTP server.
The names of the two classes are:
Basically, an object of the class named BigDog05parse parses the Mbox file, extracts the individual messages from the Mbox file, and writes each message into an individual file in a working directory. An object of the class named BigDog05upload sends the messages stored in those files to the destination Email address.