November 23, 2014
Hot Topics:

Getting Started with the BigDog Email Protection Program

  • June 15, 2004
  • By Richard G. Baldwin
  • Send Email »
  • More Articles »

Java Programming Notes # 2187


Preface

Protection against spam and viruses

In recent months, I have been showing you how to write different kinds of Java programs to protect your email inbox from viruses and spam.  For example, the series of lessons that began with the lesson entitled Enlisting Java in the War Against SPAM, Part 1, The Communications Module and ended with the lesson entitled Enlisting Java in the War Against SPAM: Training the Body Screener showed you how to write programs that apply spam screening algorithms to your email.

The two lessons entitled Enlisting Java in the War Against Email Viruses and Enlisting Java in the War Against Email Viruses, Part 2, A Much Faster Program described two versions of a program designed to protect your email database from email-borne viruses.

Pulling it all together

The lesson entitled Overview of the BigDog Email Protection Program provided an overview of a set of programs named BigDog.  These programs combine protection against email-borne viruses and spam.

There are several separate programs in the BigDog set of programs.  This lesson provides source code for each of the separate programs along with a brief description of each program.  This lesson also explains how to set up your computer to use these programs.

Using the BigDog programs

Future lessons will explain the technical aspects of the BigDog programs in detail.  If you already know enough about Java to understand the behavior of the programs based on the source code and comments alone, feel free to copy and use the programs for non-commercial purposes.

If you don't understand the source code, it may be a good idea for you to wait for the explanations before compiling and running the programs.

My experience with BigDog

Because I have published several hundred online programming tutorials during the past seven years, my email address is widely exposed on the Web.  During good times, I typically receive between 250 and 300 email messages each day.
(During bad times involving rampant virus infestation, I typically receive several thousand email messages each day.)
Of the 300 or so messages that I receive each day, only about ten to fifteen messages are messages that I need to read.  Approximately five to ten of the messages contain viruses.  The remainder of the 300 messages are usually spam.

Finally, I feel protected

I have been using increasingly sophisticated versions of the BigDog programs for several months.  For the first time in several years, I feel that I finally have email-borne viruses and spam under control.

Protection against viruses

The virus protection features built into the program make it possible for me to isolate and delete messages containing viruses before they become co-mingled with the other messages in my email inbox.
(I explained the dangers inherent in such co-mingling in the earlier lesson entitled Enlisting Java in the War Against Email Viruses.)
Protection against spam

The BigDog programs combine spam screening with an aggressive challenge/response message verification procedure.
(I discussed the challenge/response message verification procedure in the earlier lesson entitled Overview of the BigDog Email Protection Program.)
As a result, most of my good messages are clearly identified as such. (Very few good messages are falsely identified as spam.)  In addition, I am normally able to completely ignore all but about fifteen or twenty of the several hundred spam messages that I receive each day.
(I need to examine those fifteen or twenty messages to identify the occasional message sent by a computer that I do need to read, but for which the sending computer won't normally respond to the challenge.)
Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java tutorials.  You will find those lessons published at Gamelan.com.  However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there.  You will find a consolidated index at www.DickBaldwin.com.

Operational Discussion

Checking my email

Several times each day, I check my email by doing the following:
  • Run the program named BigDog02g to download and save each email message as a separate file in a disk folder named DataFiles.
  • Run my Norton AntiVirus program against the files in the folder named DataFiles, deleting any files that contain a virus.
  • Run the program named BigDog02j to:
    • Forward the remaining messages in the DataFiles folder to my email client program.
    • Send a challenge message to any messages received from strangers and place those messages in quarantine.
    • Retrieve any messages previously quarantined for which the sender of the original message has provided a proper response to the earlier challenge.
    • Delete the messages from my public email server.
    • Move the messages from the folder named DataFiles to another folder named Archives.
  • Run my email client program to read the messages now residing in my local email data structure.
That's all there is to it.  The procedure is straightforward, runs relatively fast, and provides the benefits described earlier.

The working disk directory structure

I'm going to begin by showing you how I have the various files and folders organized on my disk.  Once you understand the Java code involved, you can modify the code to support a different directory structure.  However, I recommend that you initially use the same setup that I use.

The contents of my working directory are shown in Figure 1.

BigDog02b.java
BigDog02g.java
BigDog02i.java
BigDog02j.java
BigDog02k.java
BigDog02m.java
BigDog02SpamScreen01.java

BigDog02BadList.txt
BigDog02GoodList.txt
BigDog02RawText.txt
BigDog02SubjAndHtml.txt

Archives
DataFiles
temp


Figure 1

(In addition to the files shown in Figure 1, once the Java source files are compiled, the directory will also contain a variety of compiled Java files with an extension of .class.  In addition, once you start running the programs, sever backup files will automatically appear in the directory.)
Java source code files

The items shown in red in Figure 1 are the required Java source code files.  Complete listings of each of these files are provided in Listing 1 through Listing 7 near the end of this lesson.  I will provide a brief description of each of these files later in this lesson, and will provide a detailed discussion in future lessons.

Required text files

The items shown in green in Figure 1 are required text files.  I will provide a brief description of each of these files later in this lesson.  A sample listing of one of the text files is shown in Listing 8 near the end of the lesson.

Required folders

The items shown in black boldface in Figure 1 are folders.  I will provide a brief description of each of these folders later in this lesson.

The BigDog02b program

This Java source code file is the repository for several static utility methods used by other programs in the BigDog02 set of programs.

A complete listing of this file is shown in Listing 1 near the end of the lesson.

I will provide a detailed explanation of the behavior of each of the methods contained in this file in future lessons.

The BigDog02g program

This program downloads all messages from a public email server and writes them as separate files in the local folder named DataFiles (see Figure 1).

A complete listing of this file is shown in Listing 2 near the end of the lesson.

This program makes the individual messages available in separate files for virus scanning before they are co-mingled in your email inbox.
(I explained the dangers inherent in such co-mingling in the earlier lesson entitled Enlisting Java in the War Against Email Viruses)
After running this program and before running either BigDog02i or BigDog02j, you should use your favorite virus scanner program to identify and delete any message files in the DataFiles folder containing viruses.  That will prevent them from being forwarded to your email account and potentially corrupting your email inbox.

I will explain this program in detail in a future lesson.

The BigDog02i and BigDog02j programs

Complete listings of these two files are provided in Listing 3 and Listing 4 near the end of the lesson.

Whereas all users will use the program named BigDog02g, any individual user will use only one of the two alternative programs named BigDog02i and BigDog02j.

A user whose email client program supports MBOX files can use either program, but will probably use BigDog02j, because it runs faster.  Other users will use the program named BigDog02i.

Both programs have the same purpose, but they accomplish that purpose in different ways.

Categorizing messages

Both BigDog02i and BigDog02j apply various criteria, including spam screening, to categorizes each of the virus-free messages into one of four categories:
  • {GD} Good
  • {BD} Bad
  • {SP} Spam
  • {QU} Quarantine
The text shown in matching curly braces in the above list is prefixed (tagged) onto the subject line of each message.  In addition, all messages in the {QU} category receive a spam score indicating the number of offensive words or phrases that were found in the message.

The message is then forwarded to the user's email client program.  The tag and the spam score can be used in conjunction with email filtering in the email client program to direct the messages into different email folders.

Delete messages from the server

When this program finishes running, the user is given the option of deleting the messages that have been processed from the public email server.
(Note that the code to delete messages from the server has been disabled in Listing 3 and Listing 4.  You should not enable that code until you have fully tested the behavior of the programs on your system and you are satisfied that you are ready to delete messages from the server.)
Election of this option also causes the individual message files to be moved from the folder named DataFiles to the folder named Archives.

I will explain both of these programs in future lessons.

The BigDog02k program

A complete listing of this program is shown in Listing 5 near the end of the lesson.

As explained in the earlier lesson entitled Overview of the BigDog Email Protection Program, the spam screening algorithm must initially be trained to recognize spam.  It is also useful to provide additional training later to teach the algorithm to recognize new forms of spam.

The algorithm training procedure

The program named BigDog02k is used to accomplish the first step in training the spam screening algorithm.  The procedure for training the algorithm is as follows:
  • Manually copy a batch of message files from the Archives folder into the folder named temp.
  • Run the program named BigDog02k to delete those files that won't make a positive contribution to the process of training the algorithm.
  • Run the program named BigDog02m to expand the vocabulary of offensive words and phrases used by the spam screening algorithm to identify spam.
Delete files that are not needed

Basically, the program named BigDog02k examines all the files in the folder named temp and deletes those files that would be categorized by BigDog02i or BigDog02j as:
  • {GD} Good
  • {BD} Bad
  • {SP} Spam
In addition, all files that would be categorized as {QU} with a spam score greater than zero are deleted.
(I discussed the concept of a spam score in the lesson entitled Overview of the BigDog Email Protection Program, and will have more to say about it in future lessons that explain the programs named BigDog02i and BigDog02j.)
These files are deleted because they have very little to contribute to the training of the spam screening algorithm.

I will provide a detailed explanation of the program named BigDog02k in a future lesson.

The BigDog02m program

As indicated in the algorithm training procedure described above, this is the program that is actually used to train the algorithm using the message files remaining in the temp folder after running the program named BigDog02k.

A complete listing of this program is provided in Listing 6 near the end of this lesson.

The user interface and the behavior of this program is similar to the program described in the earlier lesson entitled Enlisting Java in the War Against SPAM: Training the Subject Line Screener.  The main difference is that this program uses a much more sophisticated spam screening algorithm than was the case with the program described in that lesson.

I will provide a complete technical description of this program in a future lesson.

The BigDog02SpamScreen01 program

A complete listing of this file is provided in Listing 7 near the end of the lesson.

An object of the class defined in this file provides the spam screening algorithm for the BigDog02 set of programs.

This class implements a set of rules for detecting spam messages. An int value (spam score) is returned for each message showing the number of hits against offensive words and phrases that occur for each message.

Screening against lists of offensive words and phrases

The subject of the message and a clean version of the HTML content of the message is screened against a list of offensive words and phrases contained in the file named BigDog02SubjAndHtml.txt.

Raw body text (non-HTML text) is screened against a different list of offensive words and phrases contained in a file named BigDog02RawText.txt.

Why use separate lists?

The primary reason for keeping the two lists separate has mainly to do with speed.  The process of screening raw body text tends to be slower than the process of screening the subject line and the clean HTML content.  Therefore, care should be taken to keep the list of offensive words and phrases used to screen raw body text short and to the point.

On the other hand, the process of screening the subject line and clean HTML runs much faster.  Therefore, the list of offensive words and phrases used for that purpose can be much larger without slowing the program down.

I will explain the many aspects of the class name BigDog02SpamScreen01 in a future lesson.

The file named BigDog02BadList.txt

You will need to create and populate a plain text file having this name and put it in the working directory as shown in Figure 1.  Listing 8 near the end of the lesson provides a starter list of text items that you can use initially to populate this file.

I described the contents of this file in some detail in the earlier lesson entitled Overview of the BigDog Email Protection Program

Briefly, the BAD list is a plain text file containing words and phrases that identify bad messages.  When one of these words or phrases occurs in either the subject or the sender's email address, this causes the message to be tagged {BD} and forwarded to my email account with no further processing.  Simple message filtering within my email client program causes these messages to be stored temporarily in a Bad folder, just in case I need to refer to one of them later.
(I periodically delete the files in the Bad folder to save disk space.  If you wanted to do so, you could eliminate the forwarding of messages in this category without much risk of losing good messages.)
Messages that I don't want to read

This list contains the email addresses and subjects commonly used in messages that I don't want to read.  These are messages that are easy to identify and reject without the requirement for a fancy spam screening algorithm.  Therefore, this process is completed before spam screening begins.

Auto responder messages

My email address is well known across the web and throughout the world.  During each flurry of virus activity on the web, I receive thousands of messages automatically sent by computers claiming that I sent them a message containing a virus.
(These are cases where someone else sent a message containing a virus and faked my email address as the sender.)
Since I am very conscientious about using my anti virus software to keep my computer clean and free of viruses, I'm confident that I didn't send the message containing a virus.  Therefore, I'm not interested in reading these notification messages.

I have identified key phrases contained in many such notification messages and have included those phrases in Listing 8.  This prevents these messages from cluttering my email inbox.

Undeliverable messages

In addition, the use of the challenge/response message verification procedure causes a large number of auto responder messages to be received indicating that the email addresses used by spammers are not valid.  Listing 8 also contains a sampling of subjects produced by those auto responders.

Basically, the messages in the {BD} category (identified by the contents of the file named BigDog02BadList.txt) are messages that I rarely look at.  I simply save them for a few days in case I need to go back and look at one of them, and then I delete them.

The file named BigDog02GoodList.txt

You will also need to create and populate this plain text file.  I didn't provide a sample because I have no idea what you might want to include in this file.  What you include in the file will probably be much different from what I have included in my file.

I also discussed the contents of this list in the earlier lesson entitled Overview of the BigDog Email Protection Program.

Briefly, The GOOD list is a plain text file that contains phrases and words that identify good messages.  The occurrence of one of these phrases or words in either the subject or the sender's email address in a message will cause the message to be tagged {GD} and forwarded to my email account with no further processing.

Messages that I want to read

The GOOD list is used to identify email messages that I want to read regardless of what the spam screening algorithm might provide as a spam score for the message.

The list is automatically updated by the program whenever the sender responds to a challenge message.  I will explain this process in more detail in a future lesson.

Because this list is automatically updated by the program, and is subject to data loss in the event of a computer crash, several levels of backup are automatically maintained.  Therefore, once you start running the programs, you will notice new files appearing in your working directory with names like BigDog02GoodList.bak5.

The file named BigDog02RawText.txt

You will need to create and populate this file in your working directory.  Briefly, this is a plain text file containing words and phrases used to screen raw (non-HTML) text in the body of a message.

The occurrence of the words and phrases contained in this list in the raw body text of a message causes the spam screening algorithm to consider the message to contain spam.  The result is to increase the spam score associated with that message.

This list can be populated using a simple text editor.  There are a variety of ways to identify the words and phrases used to populate this list.  I will describe some of those ways in future lessons when I explain the program named BigDog02m.

If you elect to use these programs before I publish that lesson, you should provide this file as an empty file so that it can be found at runtime.  Otherwise the program will throw an exception if it can't find the file.

Once you start running the programs, you will probably identify words and phrases in spam messages viewed from within your email client that should be copied into this file.  You can copy them using your text editor.

The file named BigDog02SubjAndHtml.txt

You will need to create and populate this file in your working directory.  The contents of this file are used to screen the subject line and to screen the body of email messages containing HTML.

This list can be large

The screening of subject lines and HTML runs relatively fast.  Therefore, you don't need to be particularly concerned about the size of this list.  In other words, you can be fairly aggressive in adding offensive words and phrases to the list.  As of this writing, my file contains more than of 2,200 offensive words and phrases accumulated over several months of operation.  So far, I have seen no significant speed degradation as a result of the size of this list.
(On the other hand, you need to keep the size of the file named BigDog02RawText.txt much smaller.  A large number of entries in that file will result in significant speed degradation for the program.)
Populating the BigDog02SubjAndHtml.txt file

This list can be populated using a plain text editor.  However, the best way to populate this list is by running the program named BigDog02k followed by the program named BigDog02m.  These two programs are designed to make it easy to populate this list on the basis of actual email messages captured earlier in the folder named Archives and manually copied to the folder named temp.

I will have much more to say about this in a future lesson when I discuss these two programs.

The folder named DataFiles

This is the folder that receives the individual message files downloaded from the public email server by the program named BigDog02g.

The files should be allowed to remain in this folder while being scanned with your favorite virus scanning software.

The files in this folder provide the input to the alternative programs named BigDog02i and BigDog02j.

The folder named Archives

The alternative programs named BigDog02i and BigDog02j automatically move the individual message files from the folder named DataFiles to the folder named Archives at the end of the run when the user elects to delete the processed messages from the public email server.

Retrieving a message from quarantine

Later, when the sender of an earlier message that was placed in quarantine responds properly to a challenge, the earlier message is retrieved from the folder named Archives, tagged {GD}, and forwarded to the email client program.  Therefore, the files should be allowed to remain in this folder long enough for the response to take place.

Deleting files from the Archives folder

About once each week, I manually delete all the files in this folder that are more than seven days old.  This is based on the assumption that any sender who is going to respond to the challenge will do so within seven days.

This folder also serves as the repository for messages used to train the spam screening algorithm using the programs named BigDog02k and BigDog02m.

The folder named temp

The folder named temp provides the input files for the algorithm training programs named BigDog02k and BigDog02m.

As described earlier, when time comes to train the spam screening algorithm, you should copy a large block of message files from the Archives folder into the temp folder.  Then run the programs named BigDog02k and BigDog02m to train the spam screening algorithm using actual message files as input.

Setting up your email

Perhaps the most complex aspect of using these programs involves setting up your email in a compatible way.  This is complex only because different people use different email client programs and therefore, I am unable to give you step-by-step instructions on how to do it.

A choice of two approaches

The programs named BigDog02i and BigDog02j provide two different approaches to processing the individual message files that have been downloaded and scanned for viruses.  These two programs achieve the same end result, but they achieve that result in different ways.

The program named BigDog02j is for users whose email client program uses the MBOX file format to store email messages locally.

The program named BigDog02i is for all other users.
(Note: Even those users that use an MBOX-compatible email client program can use BigDog02i if they so choose.)
The way that you set up your email accounts is different depending on which of these two programs you elect to use.

Using BigDog02i

Before reading further in this lesson, I encourage you to go back and read the earlier lesson entitled Enlisting Java in the War Against Email Viruses.  The program named BigDog02i is based on the same technology as the program described in that lesson.

In order to use BigDog02i, you will need to establish a secret email account as described in that earlier lesson.

Processing, tagging, and forwarding messages

When you run BigDog02i, the program will process each of the individual message files in the DataFiles folder, tagging them appropriately as described earlier, and will forward the messages to your secret email account.  In addition, all messages with a {QU} tag will also be tagged with a spam score.

You can set up ordinary email filters in the email client program to route the messages into specific folders based on the spam score and the following tags:
  • {GD} Good
  • {BD} Bad
  • {SP} Spam
  • {QU} Quarantine
Using BigDog02j

If your email client program stores its messages locally in MBOX format and if you elect to take advantage of that fact, you can use the program named BigDog02j to process the individual message files in the folder named DataFiles.

Before proceeding further in this lesson, I encourage you to go back and read the lesson entitled Enlisting Java in the War Against Email Viruses, Part 2, A Much Faster Program.  The program named BigDog02j uses the same technology described in that lesson.

Locate the proper directory structure

In this case you will need to locate the local directory structure that belongs to your email client program.  The program named BigDog02j creates an MBOX-formatted file containing your messages.  The file is given a unique file name and is written into the directory structure belonging to your email client program.
(The String variable named emailPath in Listing 4 specifies the path to the disk directory where the MBOX file should be written.  You will need to modify the value of this variable to match your own circumstances.)
Appears as an email folder

The next time you start your email client program after BigDog02j finishes running, the MBOX file will appear as an email folder within your email client program.  The name of the email folder will match the name of the MBOX file.
(At least that is the case with Netscape 7, and is probably true with other MBOX-compatible email client programs as well.)
Tagged messages appear in email folder

The new email folder will contain all of the messages contained in the disk folder named DataFiles.  Each of the messages will be tagged with one of the tags in the list presented earlier.  In addition, all messages with a {QU} tag will also be tagged with a spam score.

At this point, you can use the ordinary email filtering capabilities of your email client program to cause the individual messages to be moved from that new folder to other email folders based on the tags listed above and the spam score.  Then you can use the capabilities of the email client program to delete the email folder that represents the MBOX file.  That will cause the MBOX file to be deleted.

Moving messages to email folders

You can set up the email filters in your email client program any way that you choose to help you to manage your messages.  For example, I currently use email filters in my email client program to move all of my messages into one of the following email folders:
  • Good
  • Bad
  • Spam
  • Quarantine{0}
  • Quarantine{1}
  • Quarantine{2+}
As you can probably guess, messages tagged {GD} are moved into the Good folder.  Messages tagged {BD} are moved into the Bad folder, and messages tagged {SP} are moved into the Spam folder.

Messages tagged {QU} with a spam score of {0} are moved into the Quarantine{0} folder.  Messages tagged {QU} with a spam score of {1} are moved into the Quarantine{1} folder.  All other messages tagged {QU} are moved into the Quarantine{2+} folder.

Most messages are ignored

As a practical matter, the only messages that I normally read are those in the Good folder.
(The assumption is that a good message from a stranger will automatically be retrieved and forwarded to the Good folder when the sender of that message responds to the challenge.)
The messages in the Good folder are messages that I want to read.

Visually scan {QU}{0} messages

The messages in the Quarantine{0} folder are usually messages that I don't care to read, because they are usually spam.  However, I visually scan them to make certain that none of the messages in this folder are good messages that were sent by a computer that won't respond to a challenge (such as confirmation of an airline reservation, for example).

Whenever I locate such a message in the Quarantine{0} folder, I enter the sending email address and perhaps some key words from the subject into the file named BigDog02GoodList.txt so that future messages from that sender will be routed into the Good folder.

Other messages are ignored

I basically ignore all of the messages in the other folders.  I leave them there for a few days just in case I need to go back and review one of them for some special purpose.

In order to preserve disk space, I use the features of the email client program to delete these messages after they are a few days old.

Run the Programs

If you know enough about Java to understand the programs on the basis of the source code and the comments, I encourage you to copy the code from the Listings near the end of this lesson.  Compile and run the programs for non-commercial purposes.  Experiment with them, improving them as you see fit relative to your specific situation.  If you come up with any good ideas on how to improve them, I would like to hear those ideas.

(IMPORTANT:  Do not enable the DELE code in the programs until you are certain that you actually want to delete messages from the server.  Once a message is deleted from the server, there is no way to recover it from the server.)

Summary

The BigDog set of programs is designed to protect your email inbox from email-borne viruses and spam.  This lesson provides source code and a brief description of each of those programs.  The lesson also explains how to set up your computer and your email to use the programs.

What's Next?

Several more lessons are planned for this series.  Because every lesson is a work in progress until I finish writing it, my plans usually change as I progress through the writing of a series of lessons.  My thinking at this time is that future lessons in this series will cover the following topics:
  • Dealing with base64 data
  • Dealing with HTML data
  • Sending email messages
  • Miscellaneous utility methods
  • Downloading email messages
  • Processing email messages for the non-MBOX case
  • Processing email messages for the MBOX case
  • The improved spam screening module
  • Training the spam screening algorithm
Only time will tell how many changes I will need to make to this list.  In any event, there are lots of interesting technical discussions ahead, so stay tuned.

Program Listings

Listing 1 through Listing 8 contain the programs that make up the BigDog set of programs.

DISCLAIMER OF RESPONSIBILITY:  If you elect to use these programs, you use them at your own risk.  Make absolutely certain that you understand what you are doing before you execute the programs.  Inappropriate use could result in the loss of email messages.  The author of these programs, Richard G. Baldwin, accepts no responsibility for any losses that you may incur as a result of using these programs

CAUTION:  Do not enable the DELE code in the programs until you are certain that you actually want to delete messages from the server.  Once a message is deleted from the server, there is no way to recover it from the server.

File BigDog02b

/*File BigDog02b.java Copyright 2004, R.G.Baldwin
Rev 01/31/04

This class is the repository for static utility
methods used by other programs in the BigDog02
series of programs.

Tested using SDK 1.4.2 under WinXP
************************************************/
import java.io.*;
import sun.net.smtp.SmtpClient;
import java.awt.*;

public class BigDog02b{

//This method is called to decode a Subject
// line.

//Sometimes the Subject line is encoded using
// techniques designed to allow the use of
// non-ASCII characters in message headers
// (See RFC2047).
//The following code determines if the Subject
// line has been encoded using the ISO-8859-1
// character set with an encoding value of B
// or Q. If so, the encoded material is
// decoded.
//Messages with an encoding value of Q contain
// a mixture of ASCII characters and encoded
// characters, so it is possible to partially
// read them without the need for decoding.
// They also sometimes use an underscore in
// place of a space to make them more readable.
public static String decodeSubj(String data){
try{
if(data.toUpperCase().indexOf(
"=?ISO-8859-1?B?") != -1){
//Need to decode for value of B.
int startIndex = data.toUpperCase().
indexOf("=?ISO-8859-1?B?") + 15;
int endIndex = data.length()-2;
sun.misc.BASE64Decoder dec =
new sun.misc.BASE64Decoder();
data = "Subject: =?ISO-8859-1?B? " +
new String(dec.decodeBuffer(
data.substring(
startIndex,endIndex)));
}//end if..."=?ISO-8859-1?B?"

if(data.toUpperCase().indexOf(
"=?ISO-8859-1?Q?") != -1){
//Need to decode for value of Q.
int startIndex = data.toUpperCase().
indexOf("=?ISO-8859-1?Q?") + 15;
int endIndex = data.length()-2;
String decodedData = data.substring(
startIndex,endIndex);

//Decode non-ASCII characters
StringBuffer stringBuf =
new StringBuffer(decodedData);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("=");
if(index > -1){
String hexString =
new String(stringBuf).substring(
index+1,index+3);
char decodedChar =
(char)Integer.parseInt(
hexString.trim(),16);
stringBuf.delete(index,index+3);
stringBuf.insert(index,decodedChar);
}//end if
}//end while(index > -1)

//Replace underscore with
// space.
index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("_");
if(index > -1){
stringBuf.deleteCharAt(index);
stringBuf.insert(index,' ');
}//end if
}//end while(index > -1)

data = "Subject: =?ISO-8859-1?Q? " +
new String(stringBuf);
}//end if..."=?ISO-8859-1?Q?"

}catch(Exception ex){
System.out.println(
"Failure in decodeSubj method");
ex.printStackTrace();
}//end catch
return data;
}//end decodeSubj
//===========================================//

//This method reads and saves lines of data
// from a file starting with the line that
// startsWith firstLine and ending with the
// line that startsWith lastLine. If lastLine
// is null, data is saved to the end of the
// file.
//The lines of data from the file are saved by
// concatenating them into a single string with
// a newline inserted into the string at the
// end of each line.
//If firstLine is null, data is saved beginning
// with the first line in the file.
//The name and path to the file is given by
// pathFileName.
public static String readLines(
String pathFileName,
String firstLine,
String lastLine){
StringBuffer strBuf = new StringBuffer();
try{
BufferedReader inDataMsg
= new BufferedReader(new FileReader(
pathFileName));

String data;
boolean isSave = false;
while((data = inDataMsg.readLine())
!= null){

if( ((firstLine == null) ||
(data.startsWith(firstLine))) &&
(isSave == false)){
isSave = true;
}//end if

if(isSave){
strBuf.append(data + "\n");
}//end if

if((lastLine != null) &&
(data.startsWith(lastLine))){
break;//no need to read any more
}//end if

}//end while loop
inDataMsg.close();//Close file
}catch(Exception e){e.printStackTrace();}
return new String(strBuf);
}//end readLines

//===========================================//

//This method is used to construct an email
// message and send it to the recipient.
public static boolean forwardEmailMsg(
String recipient,
String smtpServer,
String tag,
String pathFileName){

StringBuffer message = new StringBuffer(
"No message found");

try{
//Pass a string containing the name of
// the smtp server as a parameter to the
// following constructor.
SmtpClient smtp =
new SmtpClient(smtpServer);

//Pass a valid email address to the
// from() method.
smtp.from(recipient);

//Pass the email address of the recipient
// to the to() method.
smtp.to(recipient);

//Get an output stream for the message
PrintStream msg = smtp.startMessage();

//Write the message into the output
// stream.
message = new StringBuffer(readLines(
pathFileName,null,null));

//Insert tag in subject line
message = message.insert(message.indexOf(
"Subject: ")+9,tag);
msg.println(new String(message));
//Close the stream and send the message
smtp.closeServer();

return true;
}catch( Exception e ){
System.out.println("\n" + e);
System.out.println("Forwarding email");
Toolkit.getDefaultToolkit().beep();
try{
Thread.currentThread().sleep(300);
}catch(Exception ex){
System.out.println(ex);
}//end catch
Toolkit.getDefaultToolkit().beep();
return false;
}//end catch

}//end forwardEmailMsg
//===========================================//

//Method moves a file from its current location
// specified by pathFileName to a new location
// specified by archivePath.
public static void moveFile(
String pathFileName,
String archivePath){
String fileName = pathFileName.substring(
pathFileName.lastIndexOf('/') + 1);
String archivePathFileName =
archivePath + fileName;

boolean moved =
new File(pathFileName).renameTo(
new File(archivePathFileName));

if(!moved)System.out.println(
"Unable to move " + new File(pathFileName)
+ "\nto " + new File(archivePathFileName));
}//end moveFile method
//===========================================//

}//end class BigDog02b
//=============================================//

Listing 1

File BigDog02g

/*File BigDog02g.java Copyright 2004, R.G.Baldwin
Rev 03/14/04

This program downloads all messages from the
public email server and writes them in local
files.

After running this program and before running
either BigDog02i or BigDog02j, the user should
scan all of the message files produced by this
program with an anti virus program to remove any
files containing viruses from the local folder.
That way, messages containing viruses won't be
forwarded to the email account.

For technical information on POP3, see RFC 1725
at
http://www.cis.ohio-state.edu/htbin/rfc/rfc1725.
html

A POP3 Command Summary follows based on the
information at that web site.

Minimal POP3 Commands:
USER name
PASS string
QUIT
STAT
LIST [msg]
RETR msg
DELE msg
NOOP
RSET
QUIT

Optional POP3 Commands:
APOP name digest
TOP msg n
UIDL [msg]

POP3 Replies:
+OK
-ERR

Tested using SDK 1.4.2 under WinXP
************************************************/

import java.net.*;
import java.io.*;
import java.util.*;
import java.awt.*;
import java.awt.event.*;

class BigDog02g extends Frame{
//The following is the local folder where
// message files are stored awaiting
// processing. You may want to modify this on
// your machine. On my machine, this folder is
// a subfolder of the folder containing the
// Java class files (the execution directory).
String dataPath = "./DataFiles/";

//The following are working variables used by
// the program for various purposes.
int numberMsgs = 0;
int msgCounter = 0;
int msgNumber;
String uidl = "";//unique msg ID
BufferedReader inputStream;
PrintWriter outputStream;
Socket socket;
String pathFileName;

public static void main(String[] args){
if(args.length != 3){
System.out.println("Usage: java BigDog02g "
+ "server userName password");
System.exit(0);
}//end if

new BigDog02g(args[0],args[1],args[2]);
}//end main
//===========================================//

//Constructor
BigDog02g(String server,String userName,
String password){
int port = 110; //pop3 mail port
try{
//Get a socket, connected to the
// specified server on the specified
// port.
socket = new Socket(server,port);

//Get an input stream from the socket
inputStream = new BufferedReader(
new InputStreamReader(
socket.getInputStream()));

//Get an output stream to the socket.
// Note that this stream will autoflush.
outputStream = new PrintWriter(
new OutputStreamWriter(
socket.getOutputStream()),true);

//Display the msg received from the
// server on the command-line screen
// immediately following connection.
String connectMsg = validateOneLine();
System.out.println("Connected to server "
+ connectMsg);

//The communication process is now in the
// AUTHORIZATION state. Send the user
// name and password to the server.
//Commands are sent in plain text, upper
// case to the server. Some commands
// require an argument following the
// command, as is the case with USER.
//Send the command.
outputStream.println("USER " + userName);
//Get response and confirm that the
// response was +OK and was not -ERR.
String userResponse = validateOneLine();
//Display the response on the command-
// line screen.
System.out.println("USER " + userResponse);
//Send the password to the server
outputStream.println("PASS " + password);
//Validate the server's response as +OK.
// Display the response in the process.
System.out.println(
"PASS " + validateOneLine());
}catch(Exception e){e.printStackTrace();}

//Register a window listener to service
// the close button on the Frame.
this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){

//Terminate the session with the
// server.
outputStream.println("QUIT");
String quitResponse =
validateOneLine();
//Display the response on the
// command-line screen.
System.out.println(
"QUIT " + quitResponse);

try{
socket.close();
}catch(Exception ex){
System.out.println("\n" + ex);}

System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener

//Note that the compiler requires the
// reference to the following components to
// be final because they are accessed from
// within an anonymous class definition.
final Button startButton =
new Button("Start");
final TextArea textArea = new TextArea(
20,50);

//Register an ActionListener on the
// startButton.
startButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
try{
//The communication process is now
// in the TRANSACTION state.
//Retrive and save messages
if(numberMsgs == 0){
outputStream.println("STAT");
String stat = validateOneLine();
//Get the number of messages as
// a String.
String numberMsgsStr =
stat.substring(
4,stat.indexOf(" ",5));
//Convert the String to an int.
numberMsgs = Integer.parseInt(
numberMsgsStr);
}//end if numberMsgs == 0
//NOTE: Msg numbers begin with 1,
// not 0.
//Retrieve and save each
// message. Each msg ends with a
// period on a new line.
msgNumber = msgCounter + 1;
if(msgNumber <= numberMsgs){
//Process the next message.

//Get and save a unique identifier
// for the message from the server
// and validate the response.
outputStream.println(
"UIDL " + msgNumber);
uidl = validateOneLine();

//Open an output file to save
// the message. Use the UIDL
// as the file name.
pathFileName =
dataPath + uidl;
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
pathFileName));

//Send a RETR command to begin
// the message retrieval process
outputStream.println(
"RETR " + msgNumber);
//Validate the response.
String retrResponse =
validateOneLine();

//Read the first line in the
// message from the server.
String msgLine =
inputStream.readLine();

//Continue reading lines until
// a "." is encountered as the
// first char in a line. That
// signals the end of the msg.
while(!(msgLine.equals("."))){
//Write the line to the output
// file and read the next
// line. Insert newline
// characters when writing the
// output to the file.
dataOut.writeBytes(
msgLine + "\n");
msgLine = inputStream.readLine();

}//end while
//Close the output file. The
// message is now stored in a
// local file with a file name
// based on the unique ID
// provided by the server.
dataOut.close();

//Show progress
textArea.append(msgNumber + "\n");

//Increment the message counter
// in preparation for
// processing the next message.
msgCounter++;

Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));

}//end if msgNumber <= numberMsgs
else{//msgNumber > numberMsgs
//No more messages. Disable the
//Start/Next button.
startButton.setEnabled(false);
textArea.append(
"DON'T FORGET TO SCAN");

//Alert the user
Toolkit.getDefaultToolkit().beep();
Thread.currentThread().sleep(300);
Toolkit.getDefaultToolkit().beep();
Thread.currentThread().sleep(300);
Toolkit.getDefaultToolkit().beep();
}//end else
}//end try
catch(Exception ex){
ex.printStackTrace();}
}//end actionPerformed
}//end ActionListener
);//end addActionListener

//Configure the GUI by placing the
// various components on it, setting the size
// and making it visible.
add(startButton);
add(textArea);
textArea.setText("");
setLayout(new FlowLayout());

setTitle("Copyright 2004, R.G.Baldwin");
setSize(400,400);
//Make the GUI visible.
setVisible(true);
}//end constructor
//===========================================//

//Validate a one-line response.
//The purpose of this method is to confirm that
// the server returned +OK and not -ERR to the
// previous command.
//If +OK, the method returns the string
// returned by the server.
//If -ERR, the method displays the string
// returned by the server and terminates the
// session.
private String validateOneLine(){
try{
String response = inputStream.readLine();
if(response.startsWith("+OK")){
return response;
}else{
System.out.println(response);
//Terminate the session.
outputStream.println("QUIT");
socket.close();
System.out.println(
"Premature QUIT on -ERR");
System.exit(0);
}//end else
}catch(IOException e){
System.out.println("\n" + e);}
//The following return statement is requied
// to satisfy the compiler.
return "Make compiler happy";
}//end validateOneLine()
//===========================================//

}//end class BigDog02g
//=============================================//

Listing 2

File BigDog02i

/*File BigDog02i.java
Copyright 2004, R.G.Baldwin
Rev 02/28/04

This program processes a set of message files
written by the program named BigDog02g. This
program tags messages as {GD},{QU}, {SP} or {BD}
and forwards the messages to a secret email
account. The secret email account is provided as
a command-line parameter.

Messages tagged {GD} are messages whose sender
or subject matches a word or phrase in a GOOD
list.

Messages tagged {BD} are messages whose sender
or subject matches a word or phrase in an BAD
list.

Messages tagged {SP} are messages that were
identified by a spam screener as containing spam.

Remaining messages are tagged {QU}. The senders
of all messages tagged {QU} are sent a challenge
message asking them to reply and confirm that
they actually sent the original message.

In addition, this program monitors for REPLY
messages where the subject contains +OK. When
a REPLY message is received, the sender is
added to the GOOD list, the original message
referred to by the unique code in the subject
is retrieved from the archive folder, the
retrieved message is tagged {GD}, and the tagged
message is forwarded to the secret email
account.

This program should be run after the program
named BigDog02g has been run, and after a virus
checker has been used to confirm that all files
in the working directory produced by BigDog02g
are free of viruses. See additional comments at
the beginning of BigDog02g.java for a
description of this program.

For technical information on POP3, see RFC 1725
at
http://www.cis.ohio-state.edu/htbin/rfc/rfc1725.
html

A POP3 Command Summary follows based on the
information at that web site.

Minimal POP3 Commands:
USER name
PASS string
QUIT
STAT
LIST [msg]
RETR msg
DELE msg
NOOP
RSET
QUIT

Optional POP3 Commands:
APOP name digest
TOP msg n
UIDL [msg]

POP3 Replies:
+OK
-ERR

This program uses the DELE command to delete
messages from the public POP3 server.

This program uses an object of the class named
BigDog02SpamScreen01 to screen messages to
determine if they contain spam.

Certain portions of this program have been
disabled for test purposes. Search for the word
disable to identify those portions.

Tested using SDK 1.4.2 under WinXP
************************************************/

import java.net.*;
import java.io.*;
import java.util.*;
import java.awt.*;
import java.awt.event.*;
import sun.net.smtp.SmtpClient;

class BigDog02i extends Frame{
//All of the user-specific information is
// provided here.

//Beginning of subject for outgoing message.
String subjOut = "Put your subj here ";
//Signature on outgoing message.
String signature = "Your signature\n\n";
//List of email addresses that should not be
// sent an email message regardless of any
// other circumstance. It should probably
// include your own email addresses as a
// minimum.
String[] doNotSendList =
{"you@yourAddress"
};//end of list
//The From: address in outgoing email message.
String fromAddr = "you@yourAddress";

//ID of the secret email account.
String recipient = "See command-line input";
//An smtp server through which the user is
// authorized to send email messages.
String smtpServer = "See command-line input";
//End of user-specific information.

//Local folder where message files are stored
// awaiting processing. You may want to modify
// this on your machine. On my machine, this
// folder is a subfolder of the folder
// containing the Java class files (the
// execution directory).
String dataPath = "./DataFiles/";
//Local folder where the messages are stored
// after they have been processed. They are
// automatically moved to this folder after
// being deleted from the email server.
String archivePath = "./Archives/";
//Following two files contain lists of phrases
// used in processing the messages.
String goodPhraseFile = "BigDog02GoodList.txt";
String badPhraseFile = "BigDog02BadList.txt";

//Following are working variables used by the
// program for various purposes.
TreeSet goodPhraseList;
TreeSet badPhraseList;
BufferedReader inputStream;
PrintWriter outputStream;
Socket socket;
String pathFileName;
Vector msgToDelete = new Vector();
Button startButton = new Button("Start/Next");
Button deleteButton = new Button(
"Delete Msg On Server");
TextArea textArea = new TextArea(20,50);
String uidl;
String subject = "No Subject line found";
String sender = "No From line found";
String msgNumberStr = "000";
boolean okToDelete = false;
int msgNumber = 0;
String subjAndHtmlPhraseFile =
"BigDog02SubjAndHtml.txt";
String rawTextPhraseFile =
"BigDog02RawText.txt";
int hitCount = 0;
int hitLimit = 6;

public static void main(String[] args){
if(args.length != 5){
System.out.println("Usage: java BigDog02i "
+ "pubServer userName password "
+ "secretServer smtpServer");
System.exit(0);
}//end if

//Construct an object of this class
new BigDog02i(args[0],args[1],args[2],
args[3],args[4]);
}//end main
//===========================================//

//Constructor
BigDog02i(final String server,
final String userName,
final String password,
String secretServer,
String smtpServer){

recipient = secretServer;
this.smtpServer = smtpServer;

makeGoodPhraseList();
makeBadPhraseList();

//Register a window listener to service
// the close button on the Frame.
this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){
System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener



//Register an ActionListener on the
// startButton.
startButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
startButton.setEnabled(false);
//Get a directory listing
File dataDir = new File(dataPath);
//The following code creates a
// directory listing containing only
// those files that begin with +OK.
//This is an anonymous implementation
// of a class that implements
// FilenameFilter.
String[] dirList = dataDir.list(
new FilenameFilter(){
public boolean accept(
File dir,String name){
if(!(new File(dir,name).
isFile())) return false;
return name.startsWith("+OK");
}//end accept
}//end FilenameFilter
);//end list

//Now process the files in the
// directory
int msgCounter = 0;
for(msgCounter = 0;
msgCounter < dirList.length;
msgCounter++){
String fileName =
dirList[msgCounter];
pathFileName = dataPath + fileName;

//Get the original message number
// used by the server to ID the msg.
String strMsgNumber =
fileName.substring(
fileName.indexOf(" "),
fileName.lastIndexOf(" "))
.trim();
msgNumber =
Integer.parseInt(strMsgNumber);
System.out.print("" + msgNumber
+ ", ");

//Process the message
startProcess();
}//end for loop on directory length

//Write the possibly modified
// goodPhraseList into an output file
writeGoodPhraseList();

//Make it possible for the user to
// delete all processed messages from
// the server, and notify the user that
// the time has come for a deletion
// decision.
deleteButton.setEnabled(true);
textArea.append("\nDo you want to "
+ "delete messages from server?\n");
//Sound an audio alert
try{
Toolkit.getDefaultToolkit().beep();
Thread.currentThread().sleep(300);
Toolkit.getDefaultToolkit().beep();
Thread.currentThread().sleep(300);
Toolkit.getDefaultToolkit().beep();
}catch(Exception ex){
ex.printStackTrace();}
}//end actionPerformed
}//end ActionListener
);//end addActionListener



//Register an action listener on the delete
// button
deleteButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
deleteButton.setEnabled(false);
textArea.append("\n");

//Get connected to the email server
int port = 110; //pop3 mail port
try{
//Get a socket, connected to the
// specified server on the specified
// port.
socket = new Socket(server,port);

//Get an input stream from the socket
inputStream = new BufferedReader(
new InputStreamReader(
socket.getInputStream()));

//Get an output stream to the socket
outputStream = new PrintWriter(
new OutputStreamWriter(
socket.getOutputStream()),true);

//Display the msg received from the
// server on the command-line screen
// immediately following connection.
String connectMsg =
validateOneLine();
System.out.println(
"Connected to server "
+ connectMsg);

//The communication process is now in
// the AUTHORIZATION state. Send the
// user name and password to the
// server.
outputStream.println("USER "
+ userName);
//Get response and confirm that the
// response was +OK and was not -ERR.
String userResponse =
validateOneLine();
//Display the response on the
// command-line screen.
System.out.println("USER "
+ userResponse);
//Send the password to the server
outputStream.println("PASS "
+ password);
//Validate the server's response as
// +OK. Display the response in the
// process.
System.out.println("PASS "
+ validateOneLine());
}catch(Exception ex){
ex.printStackTrace();}



//Process the files in the msgToDelete
// collection and delete those messages
// from the email server
for(int cnt = 0;
cnt < msgToDelete.size();cnt++){
pathFileName = (String)msgToDelete.
elementAt(cnt);
String strMsgNumber = pathFileName.
substring(pathFileName.indexOf(" "),
pathFileName.lastIndexOf(" ")).
trim();
int msgNumber = Integer.parseInt(
strMsgNumber);

//Deletion of a message from the
// server is accomplished by marking
// the message for deletion while in
// the TRANSACTION state. The
// message is actually deleted when
// the client sends a QUIT command
// to the server causing the server
// to enter the UPDATE state. If the
// program aborts prematurely before
// sending a QUIT command, marked
// messages are not deleted from the
// server.
//Mark the message for deletion.

//Message deletion has been disabled
// for test purposes.
textArea.append(
"\nMessage deletion disabled");

/*
outputStream.println("DELE "
+ msgNumber);


//Validate the response and display
// it on the GUI.
textArea.append(
"Msg: " + msgNumber + " "
+ validateOneLine()+"\n");
textArea.append(
"Deleted:" + msgNumber + "\n");
*/
//Now move the file that has been
// processed and deleted from the
// server to the archive folder on
// the local disk.
BigDog02b.moveFile(pathFileName,
archivePath);

}//end for loop on msgToDelete.size()


//Terminate the session with the
// server causing the messages to
// actually be deleted from the server.
outputStream.println("QUIT");
String quitResponse =
validateOneLine();
//Display the response on the
// command-line screen.
System.out.println(
"QUIT " + quitResponse);

//Server is now in the UPDATE mode.
// It will delete all files marked
// with the DELE command earlier
// in the execution of the program.
//Close the socket
try{
socket.close();
}catch(Exception ex){
System.out.println("\n" + ex);}

textArea.append("\n\nMessages deleted "
+ "from server.\n");
}//end actionPerformed
}//end ActionListener
);//end addActionListener



//Configure the GUI by placing the
// various components on it, setting the
// size, and making it visible.
add(startButton);
add(deleteButton);
deleteButton.setEnabled(false);
add(textArea);
textArea.setText("");
setLayout(new FlowLayout());

setTitle("Copyright 2004, R.G.Baldwin");
setSize(400,400);
//Make the GUI visible.
setVisible(true);
}//end constructor
//===========================================//

//Validate a one-line response.
//The purpose of this method is to confirm that
// the server returned +OK and not -ERR to the
// previous command.
//If +OK, the method returns the string
// returned by the server.
//If -ERR, the method displays the string
// returned by the server and terminates the
// session.
private String validateOneLine(){
try{
String response = inputStream.readLine();
if(response.startsWith("+OK")){
return response;
}else{
System.out.println(response);
//Terminate the session.
outputStream.println("QUIT");
socket.close();
System.out.println(
"Premature QUIT on -ERR");
System.exit(0);
}//end else
}catch(IOException e){
System.out.println("\n" + e);}
//The following return statement is requied
// to satisfy the compiler.
return "Make compiler happy";
}//end validateOneLine()
//===========================================//

//The purpose of this method is to kick off the
// processing of a new message.
void startProcess(){
//Create a three-digit string representing
// the message number. This will be used to
// tag the subject before the message is
// forwarded to the secret email account.
if(msgNumber < 10){
msgNumberStr = "00" + msgNumber;
}else if(msgNumber > 99){
msgNumberStr = "" + msgNumber;
}else{
msgNumberStr = "0" + msgNumber;
}//end else

//Get and save the unique identifier assigned
// by the public email server.
uidl = pathFileName.substring(
pathFileName.lastIndexOf(" "));

//Determine the type of message and take the
// appropriate action.

if(isBad()){
//This message was determined to be from
// a confirmed spammer, virus writer, other
// machine, or some other undesirable
// source. No point in sending them a
// message. Tag the message as {BD}
// and forward it to the secret email
// account.
processBad();
}else if(isReply()){
//This message is a reply to a previous
// message sent to someone inviting them
// to confirm that they are a human and
// not a machine. Add the email address
// to the list of good addresses for
// future messages, retrieve the original
// message that triggered the inquiry, tag
// the original message as {GD} and
// forward it to the secret email account.
// This is the most complex of all the
// processing tasks in the program.
processReply();
}else if(isGood()){
//This message was determined either to be
// from an approved sender, or to have an
// approved subject. Tag the message as
// {GD} and forward it to the secret email
// account.
processGood();
}else if(isSpam()){
//This message has been processed by a spam
// screener and has been determined to be
// spam. It will be marked {SP} along with
// a spam score before being written into
// the MBOX file.
processSpam();
}else{
//This message is from an unknown address.
// It is probably spam, but may be from
// someone worth communicating with. Send
// a message asking the sender to confirm
// that they are a human. Tag the message
// as {QU} and forward it to the secret
// email account. If a reply is received
// in a reasonable time, that reply will
// trigger the processReply procedure
// described above. Otherwise, manually
// delete the message from the local
// archive folder after a reasonable
// amount of time has transpired.
processQuarantine();
}//end else

}//end startProcess
//===========================================//

//Purpose: To write the data from a TreeSet
// object into an output file.
//This method is the reverse of the method
// named makeGoodPhraseList.

void writeGoodPhraseList(){
try{
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
goodPhraseFile));

//Use an iterator to access the data in
// the TreeSet object.
Iterator iter = goodPhraseList.iterator();
String data;

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}

}//end writeGoodPhraseList
//===========================================//

//This method tests the sender of the message
// and the subject of the message against the
// list of items in the badPhraseFile.
// Returns true on match, false otherwise.
private boolean isBad(){
boolean match = false;

//Get the Subject line decode if necessary,
// convert it to upper case
subject = BigDog02b.readLines(
pathFileName,"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

//Get the sender and convert it to upper
// case
sender = BigDog02b.readLines(pathFileName,
"From:","From:");
sender = sender.toUpperCase();

//The Subject and From lines have been
// captured. Screen each of them against
// an upper case version ofwords and
// phrases in a TreeSet object containing
// quarantine email addresses and subjects.
match = screenForBadSubjAndFromLines();
return match;
}//end isBad method
//===========================================//

//This method screens the Subject and From
// lines to determine if they contain bad
// subjects or email addresses. If so, the
// method returns true. Otherwise, it returns
// false. An exact match on an upper-case basis
// is required
private boolean
screenForBadSubjAndFromLines(){
Iterator iterator =
badPhraseList.iterator();
while(iterator.hasNext()){
String badWord =
((String)(iterator.next())).
toUpperCase();
if(!(badWord.equals(""))){
if((subject.indexOf(badWord) != -1) ||
(sender.indexOf(badWord) != -1)){
//An exact match was found.
return true;
}//end if((subject.indexOf...
}//end if!(badWord.equals("")
}//end while iterator has next
return false;
}//end screenForBadSubjAndFromLines
//===========================================//

//This method is used to process messages that
// have been determined to be in the bad
// category.
void processBad(){

BigDog02b.forwardEmailMsg(
recipient,
smtpServer,
"{BD}{"+msgNumberStr+"}",
pathFileName);

//Add this message to the list of messages
// scheduled to be deleted from the public
// email server
msgToDelete.add(pathFileName);

}//end processBad
//===========================================//

//This method tests the subject of the current
// message to determine if the message is a
// reply to a message sent to an email address
// earlier. If the subject contains +OK, it is
// assumed to be a reply because that is
// the beginning of a unique ID assigned to
// each message that is sent. It is also the
// beginning of the file name by which message
// files are stored locally. Returns true on
// match, false otherwise. If it is a reply,
// the unique ID in the subject of the message
// matches the file name of the earlier
// message that triggered the sending of an
// email message to the email address. That
// makes it possible to locate and retrieve
// the original message from a local archive
// folder.
private boolean isReply(){
boolean match = false;
String subject = "";

//Get the subject, decode if necessary, and
// convert to upper case
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

if(subject.indexOf("+OK ") != -1){
//Tag the message {GD} and forward it to
// the secret email account. The file in
// the archives represented by this message
// should also be forwarded to the secret
// email account by the processReply
// method if it can be found.
BigDog02b.forwardEmailMsg(
recipient,
smtpServer,
"{GD}{"+msgNumberStr+"}",
pathFileName);

return true;
}else{
return false;
}//end else
}//end isReply method
//===========================================//

//This method uses information in the subject
// of the current message to retrieve an
// earlier message file from a local archive
// folder. The earlier message is tagged {GD}
// and forwarded to the secret email account.

private void processReply(){
String sender = "No sender identified";
String emailAddr =
"No email address identified";
String subject = "";

//Beep twice to alert the user that a reply
// is being processed.
Toolkit.getDefaultToolkit().beep();
try{
Thread.currentThread().sleep(200);
}catch(Exception ex){System.out.println(ex);}
Toolkit.getDefaultToolkit().beep();

//Get the subject, decode if necessary, and
// trim off the newline character
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject).
trim();

//Now parse the subject to get the name of
// the original file.
File theFile = null;
try{
//Note, this assumes that the requested
// file is now located in the folder
// pointed to by archivePath
theFile = new File(archivePath
+ subject.substring(
subject.indexOf("+OK")));
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println("Getting theFile");
System.out.println("pathFileName:"
+ pathFileName);
}//end catch
textArea.append("\nProcessing reply message "
+ msgNumberStr + "\n" + subject
+ "\nFile: " + theFile + "\n");
if(theFile.exists()){
//Read the file from the local archive
// folder. Extract the email address. Add
// the email address to the goodPhraseList.
// Tag the message {GD} and forward it to
// the secret email account. Note that the
// last parameter identifies the path and
// file name of the file being retrieved.
BigDog02b.forwardEmailMsg(
recipient,
smtpServer,
"{GD}{"+msgNumberStr+"}",
theFile.toString());

//Add the message to the list of messages
// scheduled for deletion from the public
// email server.
msgToDelete.add(pathFileName);

//Now get the sender email address and add
// it to the goodPhraseList
//Get the sender, convert to upper case,
// and trim off the new line character.
sender = BigDog02b.readLines(
theFile.toString(),
"From:","From:");
sender = sender.toUpperCase().trim();

//Deal with the format of the email
// address. Some have the email address
// in angle brackets with something like a
// name ahead of the angle brackets.
// Others simply have an email address.
try{
if((sender.indexOf("<") != -1)
&& (sender.indexOf(">") != -1)){
emailAddr = sender.substring(
sender.indexOf("<") + 1,
sender.indexOf(">")).toUpperCase();
}else if(sender.indexOf(" ") != 1){
//Get rid of text ahead of the email
// address
emailAddr = sender.substring(sender.
lastIndexOf(" ") + 1).toUpperCase();
}else{
emailAddr = sender.toUpperCase();
}//end else
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println(
"Getting sender for goodPhraseList");
System.out.println("sender:" + sender);
System.out.println("pathFileName:"
+ pathFileName);
}//end catch

//Add the email address to the good list.
goodPhraseList.add(emailAddr);

}else{
textArea.append("\nUnable to locate file "
+ "referred to in reply.\n");
//Beep to alert the user of this problem.
Toolkit.getDefaultToolkit().beep();
try{
Thread.currentThread().sleep(200);
}catch(Exception ex){
System.out.println(ex);}
Toolkit.getDefaultToolkit().beep();
}//end else

}//end processReply
//===========================================//

//This method tests the sender of the message
// and the subject of the message against the
// list of items in the goodPhraseFile. Returns
// true on match, false otherwise.
private boolean isGood(){
boolean match = false;
//Get the subject, decode if necessary, and
// convert to upper case
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

//Get the sender and convert to upper case
sender = BigDog02b.readLines(pathFileName,
"From:","From:");
sender = sender.toUpperCase();

//The Subject and From lines have been
// captured. Screen each of them against
// an upper case version ofwords and
// phrases in a TreeSet object containing
// good email addresses and subjects.
match = screenForGoodSubjAndFromLines();
return match;
}//end isGood method
//===========================================//

//This method screens the Subject and From
// lines to determine if they contain good
// subjects or email addresses. If so, the
// method returns true. Otherwise, it returns
// false. An exact match on an upper-case basis
// is required
private boolean
screenForGoodSubjAndFromLines(){
Iterator iterator =
goodPhraseList.iterator();
while(iterator.hasNext()){
String goodWord =
((String)(iterator.next())).
toUpperCase();
if(!(goodWord.equals(""))){
if((subject.indexOf(goodWord) != -1) ||
(sender.indexOf(goodWord) != -1)){
//An exact match was found.
System.out.println("\ngoodWord:"
+ goodWord);
return true;
}//end if((subject.indexOf...
}//end if!(goodWord.equals("")
}//end while iterator has next
return false;
}//end screenForGoodSubjAndFromLines
//===========================================//

//This method processes a message that has been
// determined to be a good message. Forward the
// message to the secret email account and add
// the identification of the message to the
// list of messages scheduled for deletion from
// the server later.
//Don't add it to the deletion list if
// forwarding failed.
void processGood(){
okToDelete = BigDog02b.forwardEmailMsg(
recipient,
smtpServer,
"{GD}{"+msgNumberStr+"}",
pathFileName);
if(okToDelete){
msgToDelete.add(pathFileName);
}//end if

}//end processGood
//===========================================//


//This method is used to process messages that
// have been determined to be in the quarantine
// category. These are messages which probably
// are spam or viruses, sent by machines.
// However, some small percentage may have been
// sent by a human who wishes to communicate in
// a meaningful way, but whose email address
// has not yet been entered into the good list.
// As a result, each of these messages
// triggers an email message to be sent
// automatically asking the sender to
// demonstrate that they are a human by
// replying to the message. The original
// message is tagged {QU} and forwarded to the
// secret email account. It is also stored in
// a local archive folder. The receipt of a
// reply later will cause the original message
// to be retrieved from the local archive
// folder, tagged {GD}, and forwarded to the
// secret email account.
void processQuarantine(){

String subject = "";
String sender = "";
String date = "";
String header = "";

//Read the message from a local file, tag it
// {QU} and forward it to the secret email
// account.
//You can tag the subject with any
// string that you want to pass as
// the third parameter. I elected to add
// the original message number and the spam
// score to the tag. This information is
// useful when using ordinary email program
// filters to direct the messages to
// specific email folders.
BigDog02b.forwardEmailMsg(
recipient,
smtpServer,
"{QU}{"+msgNumberStr+"}{"
+hitCount+"}",
pathFileName);

//Add the message to the list of messages
// scheduled for deletion from the public
// email server.
msgToDelete.add(pathFileName);

//Now prepare for composing and sending the
// email message to the sender of the
// current message.
//Get the Subject line decode if necessary,
// and convert it to upper case
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

//Get the sender in upper case
sender = BigDog02b.readLines(pathFileName,
"From:","From:");
sender = sender.toUpperCase();

//Now get the date.
date = BigDog02b.readLines(pathFileName,
"Date:","Date:");
date = date.toUpperCase();

//Now get the header of the original message.
header = BigDog02b.readLines(pathFileName,
null,"Status:");

//Use this information to send an email
// message to the sender. Need to avoid a
// substring index error later if the sender
// or the subject are blank.
if(!(sender.equals("")
|| subject.equals(""))){
sendEmailMsg(sender,subject,date,
header,pathFileName);
}else{
textArea.append(
"\nUnable to send message\n");
}//end else
}//end processQuarantine
//===========================================//

//This method is used to automatically send an
// email message to the sender of every
// quarantine message, asking them to indicate
// that they are a human rather than a machine
// by replying to the message.
//The incoming sender parameter is used to
// establish the address of the recipient.
//The incoming parameter subject is reported
// to the recipient along with the date to
// identify the message to the recipient.
//The incoming pathFileName is used to
// place a unique identifier in the subject of
// the message that is sent. This identifies
// the original message that triggered this
// event.
private void sendEmailMsg(String sender,
String subject,
String date,
String header,
String pathFileName){
//Enable the following two statements and
// enclose the remaining body of the method
// in a large block comment when testing the
// program to avoid sending nuisance
// messages.
textArea.append("sendEmail disabled\n");
return;

/* //Start a block comment here to disable
//Don't send messages to any email address
// on the doNotSendList.
boolean okToSend = true;
for(int cnt = 0; cnt < doNotSendList.length;
cnt++){
if(sender.toUpperCase().indexOf(
doNotSendList[cnt].
toUpperCase()) != -1){
okToSend = false;
textArea.append("\nDon't send to: "
+ sender.toUpperCase() + "\n");
break;
}//end if
}//end for loop


if(okToSend){
//Get the email address from the incoming
// parameter sender. Sometimes the actual
// address is enclosed in angle brackets.
String emailAddr = "";
if((sender.indexOf("<") != -1) &&
(sender.indexOf(">") != -1)){
try{
emailAddr = sender.substring(
sender.indexOf("<") + 1,
sender.indexOf(">"));
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println("Sending email");
System.out.println(
"Getting emailAddr in <>");
System.out.println("sender:" + sender);
System.out.println("pathFileName:"
+ pathFileName);
System.out.println("Forcing a valid "
+ "email address structure");
emailAddr = "dummy@dummy.com";
}//end catch
}else{
//Sometimes the email address simply
// follows the word From: in the header
// of the message from which the sender
// parameter is derived.
try{
emailAddr = sender.substring(
sender.toUpperCase().indexOf(
"FROM:")+5);
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println("Sending email");
System.out.println(
"Getting emailAddr");
System.out.println("sender:" + sender);
System.out.println("pathFileName:"
+ pathFileName);
System.out.println("Forcing a valid "
+ "email address structure");
emailAddr = "dummy@dummy.com";
}//end catch
emailAddr = emailAddr.trim();
}//end else

//Make sure that emailAddr contains an @
// indicating that it is probably a
// properly formatted email address.
if(emailAddr.indexOf("@") == -1){
Toolkit.getDefaultToolkit().beep();
try{
Thread.currentThread().sleep(200);
}catch(Exception ex){
System.out.println(ex);}
Toolkit.getDefaultToolkit().beep();
System.out.println("\nCan't send to:"
+ emailAddr);
return;
}//end if

//Extract the file name from the
// pathFileName parameter and the actual
// subject from the incoming subject
// parameter.
String fileName = "No file name available";
try{
fileName = pathFileName.substring(
pathFileName.lastIndexOf("/") + 1);
String theSubject = subject.substring(9);
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println("Sending email");
System.out.println(
"Getting fileName and theSubject");
System.out.println("subject:" + subject);
System.out.println("fileName:"
+ fileName);
System.out.println("pathFileName:"
+ pathFileName);
}//end catch

//Display information about the message. I
// may decide to write this into a history
// file later so that I will have a record
// of messages sent.
textArea.append("\nSending email to:\n"
+ emailAddr +
"\n" + fileName + "\n"
+ date.trim() + "\n");

try{
//Pass a string containing the name of
// the smtp server as a parameter to the
// SmtpClient constructor.
SmtpClient smtp =
new SmtpClient(smtpServer);

//Pass the sender's email address to the
// from() method.
smtp.from(fromAddr);

//Pass the email address of the recipient
// to the method named to().
smtp.to(emailAddr);

//Get an output stream for the message
PrintStream msg = smtp.startMessage();

//Write the message header in the output
// stream.
msg.println("To: " + emailAddr);
msg.println("Subject: " +
subjOut + fileName);
msg.println();//blank line

//Write the text of the message in the
// output stream.
msg.println(
"I recently received a message from your\n"+
"Email address with the following subject\n"+
"and date:\n\n"+

subject + "\n" +
date + "\n\n" +

"Because your Email address has not been \n"+
"entered into the Approved Sender list of my \n"+
"SPAM blocking software, the message has been\n"+
"placed in the Quarantine folder. To move \n"+
"the message from the Quarantine folder into \n"+
"my Inbox, you will need to press your Reply \n"+
"button and send this message back to me \n"+
"making no changes to the Subject line or the\n"+
"body of the message. This will also cause \n"+
"your Email address to be added to my \n"+
"Approved Sender list so that future messages\n"+
"from you won't be similarly delayed.\n\n"+

"I apologize for this inconvenience. \n"+
"However, due to the large amount of SPAM \n"+
"that I must contend with, I have been \n"+
"forced to implement a mail handling system \n"+
"that asks you for a one-time confirmation \n"+
"that you intend to communicate with me via \n"+
"Email.\n\n"+

"If you didn't send the original message, I \n"+
"apologize for the intrusion. However, it is\n"+
"possible that someone is using your Email \n"+
"address for misleading, possibly fraudulent,\n"+
"and possibly malicious purposes. I strongly\n"+
"encourage you to file a complaint regarding \n"+
"the inappropriate use of your Email address.\n"+

"I have provided all of the information below\n"+
"that you will need to file such a \n"+
"complaint.\n\n"+


"The information provided below my signature\n"+
"block is the full header of the original \n"+
"Email message. You will find a short \n"+
"tutorial at \n"+
"http://www.dickbaldwin.com/java/Java2158.htm\n"+
"that explains how to use this header to file\n"+
"a complaint.\n\n"+

"If we all ban together in opposing SPAM and \n"+
"Email viruses, perhaps we can have a \n"+
"positive impact on this increasingly serious\n"+
"problem.\n\n"+

"Regards,\n"+ signature +

"=======HEADER BEGINS HERE========\n\n"+
header +"\n"

);//end of message

//Close the stream and send the message
smtp.closeServer();

}catch( Exception e ){
System.out.println("\n" + e);
System.out.println("Sending email");
System.out.println(pathFileName);
}//end catch
}//end if(okToSend)
*/ //end a block comment here to disable
}//end sendEmailMsg
//===========================================//


//Purpose: To create a TreeSet object
// containing words used to screen the message
// From and Subject lines.
//This method reads strings from a text file
// and creates the list as a TreeSet object
// with no duplicates.
//Only the primary portion of the good
// Email address should be included in the
// file used to create the list. This would
// be x@y.z

//After creating the list, it writes the data
// from the list into a backup file named
// ....bakN, where N is the value of the
// next available file name in the directory.
//A new backup file with a unique name is
// created each time the program is run. Once
// the number of backup files reaches 5, the
// program automatically deletes the oldest
// file before creating a new backup
// file. Thus the program automatically
// maintains a sequence of five backup files
// with extensions .bak0 through bak5 with one
// number missing. The age-order of the files
// should be determined by the modificatin date
// and not by the name of the file.
//The data read from the file is converted to
// upper case before being added to the TreeSet
// object.

void makeGoodPhraseList(){
goodPhraseList = new TreeSet();

//Read words or phrases from text file and
// populate the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
goodPhraseFile));
String data; //temp holding area

while((data = inData.readLine()) != null){
goodPhraseList.add(data.toUpperCase());
}//end while loop

inData.close();//Close input file

//Write a backup file before making any
// modifications to the data.

//First determine the name of the next
// backup file allowed in the directory.
int N = 0;
File theFile = null;
String baseFileName = goodPhraseFile.
substring(0,goodPhraseFile.indexOf(
".txt"));
for(N = 0;N < 6;N++){
theFile = new File(baseFileName
+ ".bak" + N);
if(!(theFile.exists()))break;
}//end for loop

//Cause N to rotate from 0 through 5
if(N == 5){//del file 0 for use next time
new File(baseFileName
+ ".bak0").delete();
}//end if
else{//delete the next file in sequence
if(new File(
baseFileName + ".bak"
+ (N + 1)).exists()){
new File(
baseFileName + ".bak"
+ (N + 1)).delete();
}//end if
}//end else

//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
theFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = goodPhraseList.iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}
}//end makeGoodPhraseList
//===========================================//

//Purpose: To create a TreeSet object
// containing words used to screen the message
// From and Subject lines.
//This method reads strings from a text file
// and creates the list as a TreeSet object
// with no duplicates.
//Only the primary portion of the bad
// Email address should be included in the
// file used to create the list. This would
// be x@y.z

//After creating the list, it writes the data
// from the list back out into the file. This
// is done to keep the contents of the file
// sorted in upper case. Since the program
// doesn't modify the contents of the list,
// there is no point in creating backup files.

void makeBadPhraseList(){
badPhraseList = new TreeSet();

//Read words or phrases from text file and
// populate the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
badPhraseFile));
String data; //temp holding area

while((data = inData.readLine()) != null){
badPhraseList.add(data.toUpperCase());
}//end while loop

inData.close();//Close input file

//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
badPhraseFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = badPhraseList.iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}
}//end makeBadPhraseList
//===========================================//

//This method passes the message through a spam
// screener to determine if it should be
// considered spam. The screener program
// produces and returns a score based on the
// number of hits against offensive words and
// phrases. The number of hits is compared to
// a hitLimit value that is established in the
// general instance variables at the beginning
// of the program. When the number of hits
// reaches that value, the screener terminates
// in order to avoid wasting time. If that
// limit has been reached, this method returns
// true indicating that the message is thought
// to be spam. Otherwise, it returns false. If
// it returns true, the control program invokes
// the method named processSpam to deal with
// the message.
private boolean isSpam(){
BigDog02SpamScreen01 screener =
new BigDog02SpamScreen01(dataPath,
subjAndHtmlPhraseFile,
rawTextPhraseFile,
hitLimit);

hitCount = screener.screenMsg(pathFileName);
if(hitCount >= hitLimit){
return true;
}else{
return false;
}//end else
}//end isSpam method
//===========================================//

//This method deals with a message that has
// been identified as spam.
void processSpam(){

//Forward the message to the secret email
// account.
//You can tag the subject with any string
// that you want to pass as the third
// parameter. I elected to tag it with {SP}
// indicating that it is spam. I also added
// the message number and the spam score,
// which may be useful for using email
// program filters to cause the messages to
// be directed to specific email folders.
BigDog02b.forwardEmailMsg(
recipient,
smtpServer,
"{SP}{"+msgNumberStr+"}{"+hitCount+"}",
pathFileName);

//Add this message to the list of messages
// scheduled to be deleted from the public
// email server
msgToDelete.add(pathFileName);

}//end processSpam
//===========================================//
}//end class BigDog02i
//=============================================//

Listing 3

File BigDog02j

/*File BigDog02j.java
Copyright 2004, R.G.Baldwin
Rev 02/28/04

This program processes a set of message files
written by the program named BigDog02g. This
program tags messages as {GD},{QU}, {SP} or {BD}
and writes them into an MBOX file in the local
directory tree for a dummy email account.

This is a fast alternative to the program named
BigDog02i This program can be used by persons
whose email client program stores messages
locally in MBOX format.

Messages tagged {GD} are messages whose sender
or subject matches a word or phrase in a GOOD
list.

Messages tagged {BD} are messages whose sender
or subject matches a word or phrase in a BAD
list.

Messages tagged {SP} are messages that were
identified by a spam filter as containing spam.

Remaining messages are tagged {QU}. The senders
of all messages tagged {QU} are sent a challenge
message asking them to reply and confirm that
they actually sent the original message.

In addition, this program monitors for REPLY
messages where the subject contains +OK. When
a REPLY message is received, the sender is
added to the GOOD list, the original message
referred to by the unique code in the subject
is retrieved from the archive folder, the
retrieved message is tagged {GD}, and the tagged
message is written into the MBOX file.

You should terminate your email client program
before running this program. Otherwise, it may
not recognize the new email folder until you stop
and then restart the email client program.

This program should be run after the program
named BigDog02g has been run, and after a virus
checker has been used to confirm that all files
in the working directory produced by BigDog02g
are free of viruses. See additional comments at
the beginning of BigDog02g.java for a
description of this program.

For technical information on POP3, see RFC 1725
at
http://www.cis.ohio-state.edu/htbin/rfc/rfc1725.
html

A POP3 Command Summary follows based on the
information at that web site.

Minimal POP3 Commands:
USER name
PASS string
QUIT
STAT
LIST [msg]
RETR msg
DELE msg
NOOP
RSET
QUIT

Optional POP3 Commands:
APOP name digest
TOP msg n
UIDL [msg]

POP3 Replies:
+OK
-ERR

This program uses the DELE command to delete
messages from the public POP3 server at the
request of the user.

This program uses an object of the class named
BigDog02SpamScreen01 to screen messages to
determine if they contain spam.

Certain portions of this program have been
disabled for test purposes. Search for the word
disable to identify those portions.

Tested using SDK 1.4.2 under WinXP
************************************************/

import java.net.*;
import java.io.*;
import java.util.*;
import java.awt.*;
import java.awt.event.*;
import sun.net.smtp.SmtpClient;

class BigDog02j extends Frame{
//All of the user-specific information is
// provided here.

//Beginning of subject for outgoing message.
String subjOut = "Put your subj here ";
//Signature on outgoing message.
String signature = "Your signature\n\n";
//List of email addresses that should not be
// sent an email message regardless of any
// other circumstance. It should probably
// include your own email addresses as a
// minimum.
String[] doNotSendList =
{"you@yourAddress"
};//end of list
//The From: address in outgoing email message.
String fromAddr = "you@yourAddress";

//An smtp server through which the user is
// authorized to send email messages.
String smtpServer = "See command-line input";
//Local folder where message files are stored
// awaiting processing. You may want to modify
// this on your machine. On my machine, this
// folder is a subfolder of the folder
// containing the Java class files (the
// execution directory).
String dataPath = "./DataFiles/";
//Local folder where the messages are stored
// after they have been processed. They are
// automatically moved to this folder after
// being deleted from the email server.
String archivePath = "./Archives/";
//Path to the local folder where you write
// files to cause them to be treated as email
// folders. Note that this doesn't have to be
// a valid email account so long as the email
// client program considers it to be valid. In
// other words, you can create a new account in
// your email client program using dummy server
// names, etc.
String emailPath =
"C:/Baldwin/DummyMailAccount/";
//Following two files contain lists of phrases
// used in processing the messages.
String goodPhraseFile = "BigDog02GoodList.txt";
String badPhraseFile = "BigDog02BadList.txt";
//End of user-specific information.

//Following are working variables used by the
// program for various purposes.
TreeSet goodPhraseList;
TreeSet badPhraseList;
BufferedReader inputStream;
PrintWriter outputStream;
Socket socket;
String pathFileName;
Vector msgToDelete = new Vector();
Button startButton = new Button("Start/Next");
Button deleteButton = new Button(
"Delete Msg On Server");
TextArea textArea = new TextArea(20,50);
String uidl;
String subject = "No Subject line found";
String sender = "No From line found";
String msgNumberStr = "000";
int msgNumber = 0;
StringBuffer mBoxStrBuf = new StringBuffer("");
String newFolder;
String subjAndHtmlPhraseFile =
"BigDog02SubjAndHtml.txt";
String rawTextPhraseFile =
"BigDog02RawText.txt";
int hitCount = 0;
int hitLimit = 6;


public static void main(String[] args){
if(args.length != 4){
System.out.println(
"Usage: java BigDog02j "
+ "pubServer userName password "
+ "smtpServer");
System.exit(0);
}//end if

//Construct an object of this class
new BigDog02j(args[0],args[1],args[2],
args[3]);
}//end main
//===========================================//

//Constructor
BigDog02j(final String server,
final String userName,
final String password,
String smtpServer){

this.smtpServer = smtpServer;
newFolder = "A" + new Date().getTime();

makeGoodPhraseList();
makeBadPhraseList();

//Register a window listener to service
// the close button on the Frame.
this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){
System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener



//Register an ActionListener on the
// startButton.
startButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
startButton.setEnabled(false);
//Get a directory listing
File dataDir = new File(dataPath);
//The following code creates a
// directory listing containing only
// those files that begin with +OK.
//This is an anonymous implementation
// of a class that implements
// FilenameFilter.
String[] dirList = dataDir.list(
new FilenameFilter(){
public boolean accept(
File dir,String name){
if(!(new File(dir,name).
isFile())) return false;
return name.startsWith("+OK");
}//end accept
}//end FilenameFilter
);//end list

//Now process the files in the
// directory
int msgCounter = 0;
for(msgCounter = 0;
msgCounter < dirList.length;
msgCounter++){
String fileName =
dirList[msgCounter];
pathFileName = dataPath + fileName;

//Get the original message number
// used by the server to ID the msg.
String strMsgNumber =
fileName.substring(
fileName.indexOf(" "),
fileName.lastIndexOf(" "))
.trim();
msgNumber =
Integer.parseInt(strMsgNumber);
System.out.print("" + msgNumber
+ ", ");

//Process the message
startProcess();
}//end for loop on directory length


try{
System.out.println("Writing: "
+ emailPath + newFolder);
//Write the updated string into the
// MBOX file (email folder).
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
emailPath + newFolder));
dataOut.writeBytes(
new String(mBoxStrBuf));
dataOut.close();
}catch(Exception ex){
System.out.println(
"Writing MBOX file");
ex.printStackTrace();
}//end catch




//Write the possibly modified
// goodPhraseList into an output file
writeGoodPhraseList();

//Make it possible for the user to
// delete all processed messages from
// the server, and notify the user that
// the time has come for a deletion
// decision.
deleteButton.setEnabled(true);
textArea.append("\nDo you want to "
+ "delete messages from server?\n");
//Sound an audio alert
try{
Toolkit.getDefaultToolkit().beep();
Thread.currentThread().sleep(300);
Toolkit.getDefaultToolkit().beep();
Thread.currentThread().sleep(300);
Toolkit.getDefaultToolkit().beep();
}catch(Exception ex){
ex.printStackTrace();}
}//end actionPerformed
}//end ActionListener
);//end addActionListener



//Register an action listener on the delete
// button
deleteButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
deleteButton.setEnabled(false);
textArea.append("\n");

//Get connected to the email server
int port = 110; //pop3 mail port
try{
//Get a socket, connected to the
// specified server on the specified
// port.
socket = new Socket(server,port);

//Get an input stream from the socket
inputStream = new BufferedReader(
new InputStreamReader(
socket.getInputStream()));

//Get an output stream to the socket
outputStream = new PrintWriter(
new OutputStreamWriter(
socket.getOutputStream()),true);

//Display the msg received from the
// server on the command-line screen
// immediately following connection.
String connectMsg =
validateOneLine();
System.out.println(
"Connected to server "
+ connectMsg);

//The communication process is now in
// the AUTHORIZATION state. Send the
// user name and password to the
// server.
outputStream.println("USER "
+ userName);
//Get response and confirm that the
// response was +OK and was not -ERR.
String userResponse =
validateOneLine();
//Display the response on the
// command-line screen.
System.out.println("USER "
+ userResponse);
//Send the password to the server
outputStream.println("PASS "
+ password);
//Validate the server's response as
// +OK. Display the response in the
// process.
System.out.println("PASS "
+ validateOneLine());
}catch(Exception ex){
ex.printStackTrace();}



//Process the files in the msgToDelete
// collection and delete those messages
// from the email server
for(int cnt = 0;
cnt < msgToDelete.size();cnt++){
pathFileName = (String)msgToDelete.
elementAt(cnt);
String strMsgNumber = pathFileName.
substring(pathFileName.indexOf(" "),
pathFileName.lastIndexOf(" ")).
trim();
int msgNumber = Integer.parseInt(
strMsgNumber);

//Deletion of a message from the
// server is accomplished by marking
// the message for deletion while in
// the TRANSACTION state. The
// message is actually deleted when
// the client sends a QUIT command
// to the server causing the server
// to enter the UPDATE state. If the
// program aborts prematurely before
// sending a QUIT command, marked
// messages are not deleted from the
// server.
//Mark the message for deletion.

//Message deletion has been disabled
// for test purposes.
textArea.append(
"\nMessage deletion disabled");

/*
outputStream.println("DELE "
+ msgNumber);


//Validate the response and display
// it on the GUI.
textArea.append(
"Msg: " + msgNumber + " "
+ validateOneLine()+"\n");
textArea.append(
"Deleted:" + msgNumber + "\n");
*/
//Now move the file that has been
// processed and deleted from the
// server to the archive folder on
// the local disk.
BigDog02b.moveFile(pathFileName,
archivePath);

}//end for loop on msgToDelete.size()


//Terminate the session with the
// server causing the messages to
// actually be deleted from the server.
outputStream.println("QUIT");
String quitResponse =
validateOneLine();
//Display the response on the
// command-line screen.
System.out.println(
"QUIT " + quitResponse);

//Server is now in the UPDATE mode.
// It will delete all files marked
// with the DELE command earlier
// in the execution of the program.
//Close the socket
try{
socket.close();
}catch(Exception ex){
System.out.println("\n" + ex);}

textArea.append("\n\nMessages deleted "
+ "from server.\n");
}//end actionPerformed
}//end ActionListener
);//end addActionListener



//Configure the GUI by placing the
// various components on it, setting the
// size, and making it visible.
add(startButton);
add(deleteButton);
deleteButton.setEnabled(false);
add(textArea);
textArea.setText("");
setLayout(new FlowLayout());

setTitle("Copyright 2004, R.G.Baldwin");
setSize(400,400);
//Make the GUI visible.
setVisible(true);
}//end constructor
//===========================================//

//Validate a one-line response.
//The purpose of this method is to confirm that
// the server returned +OK and not -ERR to the
// previous command.
//If +OK, the method returns the string
// returned by the server.
//If -ERR, the method displays the string
// returned by the server and terminates the
// session.
private String validateOneLine(){
try{
String response = inputStream.readLine();
if(response.startsWith("+OK")){
return response;
}else{
System.out.println(response);
//Terminate the session.
outputStream.println("QUIT");
socket.close();
System.out.println(
"Premature QUIT on -ERR");
System.exit(0);
}//end else
}catch(IOException e){
System.out.println("\n" + e);}
//The following return statement is requied
// to satisfy the compiler.
return "Make compiler happy";
}//end validateOneLine()
//===========================================//

//The purpose of this method is to kick off the
// processing of a new message.
void startProcess(){
//Create a three-digit string representing
// the message number. This will be used to
// tag the subject before the message is
// written into the MBOX file.
if(msgNumber < 10){
msgNumberStr = "00" + msgNumber;
}else if(msgNumber > 99){
msgNumberStr = "" + msgNumber;
}else{
msgNumberStr = "0" + msgNumber;
}//end else

//Get and save the unique identifier assigned
// by the public email server.
uidl = pathFileName.substring(
pathFileName.lastIndexOf(" "));

//Determine the type of message and take the
// appropriate action.

if(isBad()){
//This message was determined to be from
// a confirmed spammer, virus writer, other
// machine, or some other undesirable
// source. No point in sending them a
// message. Tag the message as {BD}
// and write it into the MBOX file
processBad();
}else if(isReply()){
//This message is a reply to a previous
// message sent to someone inviting them
// to confirm that they are a human and
// not a machine. Add the email address
// to the list of good addresses for
// future messages, retrieve the original
// message that triggered the inquiry, tag
// the original message as {GD} and
// write it into the MBOX file.
processReply();
}else if(isGood()){
//This message was determined either to be
// from an approved sender, or to have an
// approved subject. Tag the message as
// {GD} and write it into the MBOX file.
processGood();
}else if(isSpam()){
//This message has been processed by a spam
// filter and has been determined to be
// spam. It will be marked {SP} along with
// a spam score before being written into
// the MBOX file.
processSpam();
}else{
//This message is from an unknown address.
// It is probably spam, but may be from
// someone worth communicating with. Send
// a message asking the sender to confirm
// that they are a human. Tag the message
// as {QU} and write it into the MBOX file.
// If a reply is received
// in a reasonable time, that reply will
// trigger the processReply procedure
// described above. Otherwise, manually
// delete the message from the local
// archive folder after a reasonable
// amount of time has transpired.
processQuarantine();
}//end else

}//end startProcess
//===========================================//

//Purpose: To write the data from a TreeSet
// object into an output file.
//This method is the reverse of the method
// named makeGoodPhraseList.

void writeGoodPhraseList(){
try{
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
goodPhraseFile));

//Use an iterator to access the data in
// the TreeSet object.
Iterator iter = goodPhraseList.iterator();
String data;

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}

}//end writeGoodPhraseList
//===========================================//

//This method tests the sender of the message
// and the subject of the message against the
// list of items in the badPhraseFile.
// Returns true on match, false otherwise.
private boolean isBad(){
boolean match = false;

//Get the Subject line decode if necessary,
// convert it to upper case
subject = BigDog02b.readLines(
pathFileName,"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

//Get the sender and convert it to upper
// case
sender = BigDog02b.readLines(pathFileName,
"From:","From:");
sender = sender.toUpperCase();

//The Subject and From lines have been
// captured. Screen each of them against
// an upper case version ofwords and
// phrases in a TreeSet object containing
// quarantine email addresses and subjects.
match = screenForBadSubjAndFromLines();
return match;
}//end isBad method
//===========================================//

//This method screens the Subject and From
// lines to determine if they contain bad
// subjects or email addresses. If so, the
// method returns true. Otherwise, it returns
// false. An exact match on an upper-case basis
// is required
private boolean
screenForBadSubjAndFromLines(){
Iterator iterator =
badPhraseList.iterator();
while(iterator.hasNext()){
String badWord =
((String)(iterator.next())).
toUpperCase();
if(!(badWord.equals(""))){
if((subject.indexOf(badWord) != -1) ||
(sender.indexOf(badWord) != -1)){
//An exact match was found.
return true;
}//end if((subject.indexOf...
}//end if!(badWord.equals("")
}//end while iterator has next
return false;
}//end screenForBadSubjAndFromLines
//===========================================//

//This method is used to process messages that
// have been determined to be in the bad
// category.
void processBad(){

//Add the message to the MBOX file.
// You can tag the subject with any
// string that you want to pass as
// the second parameter.
mBoxStrBuf = addToMboxStr(mBoxStrBuf,
"{BD}{"+msgNumberStr+"}",
pathFileName);


//Add this message to the list of messages
// scheduled to be deleted from the public
// email server
msgToDelete.add(pathFileName);

}//end processBad
//===========================================//

//This method tests the subject of the current
// message to determine if the message is a
// reply to a message sent to an email address
// earlier. If the subject contains +OK, it is
// assumed to be a reply because that is
// the beginning of a unique ID assigned to
// each message that is sent. It is also the
// beginning of the file name by which message
// files are stored locally. Returns true on
// match, false otherwise. If it is a reply,
// the unique ID in the subject of the message
// matches the file name of the earlier
// message that triggered the sending of an
// email message to the email address. That
// makes it possible to locate and retrieve
// the original message from a local archive
// folder.
private boolean isReply(){
boolean match = false;
String subject = "";

//Get the subject, decode if necessary, and
// convert to upper case
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

if(subject.indexOf("+OK ") != -1){

//Tag the message {GD} and write it into
// the MBOX file. The file in the archives
// represented by this message should also
// be written into the MBOX file.by the
// processReply method if it can be found.
// You can tag the subject with any string
// that you want to pass as the second
// parameter.
mBoxStrBuf = addToMboxStr(mBoxStrBuf,
"{GD}{"+msgNumberStr+"}",
pathFileName);

return true;
}else{
return false;
}//end else
}//end isReply method
//===========================================//

//This method uses information in the subject
// of the current message to retrieve an
// earlier message file from a local archive
// folder. The earlier message is tagged {GD}
// and written into the MBOX file.

private void processReply(){
String sender = "No sender identified";
String emailAddr =
"No email address identified";
String subject = "";

//Beep twice to alert the user that a reply
// is being processed.
Toolkit.getDefaultToolkit().beep();
try{
Thread.currentThread().sleep(200);
}catch(Exception ex){System.out.println(ex);}
Toolkit.getDefaultToolkit().beep();

//Get the subject, decode if necessary, and
// trim off the newline character
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject).
trim();

//Now parse the subject to get the name of
// the original file.
File theFile = null;
try{
//Note, this assumes that the requested
// file is now located in the folder
// pointed to by archivePath
theFile = new File(archivePath
+ subject.substring(
subject.indexOf("+OK")));
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println("Getting theFile");
System.out.println("pathFileName:"
+ pathFileName);
}//end catch
textArea.append("\nProcessing reply message "
+ msgNumberStr + "\n" + subject
+ "\nFile: " + theFile + "\n");
if(theFile.exists()){

//Read the file from the local archive
// folder. Extract the email address. Add
// the email address to the goodPhraseList.
// Tag the message {GD} and write it into
// the MBOX file. Note that the last
// parameter identifies the path and file
// name of the file being retrieved.
// You can tag the subject with any string
// that you want to pass as the second
// parameter.
mBoxStrBuf = addToMboxStr(mBoxStrBuf,
"{GD}{"+msgNumberStr+"}",
theFile.toString());

//Add the message to the list of messages
// scheduled for deletion from the public
// email server.
msgToDelete.add(pathFileName);

//Now get the sender email address and add
// it to the goodPhraseList
//Get the sender, convert to upper case,
// and trim off the new line character.
sender = BigDog02b.readLines(
theFile.toString(),
"From:","From:");
sender = sender.toUpperCase().trim();

//Deal with the format of the email
// address. Some have the email address
// in angle brackets with something like a
// name ahead of the angle brackets.
// Others simply have an email address.
try{
if((sender.indexOf("<") != -1)
&& (sender.indexOf(">") != -1)){
emailAddr = sender.substring(
sender.indexOf("<") + 1,
sender.indexOf(">")).toUpperCase();
}else if(sender.indexOf(" ") != 1){
//Get rid of text ahead of the email
// address
emailAddr = sender.substring(sender.
lastIndexOf(" ") + 1).toUpperCase();
}else{
emailAddr = sender.toUpperCase();
}//end else
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println(
"Getting sender for goodPhraseList");
System.out.println("sender:" + sender);
System.out.println("pathFileName:"
+ pathFileName);
}//end catch

//Add the email address to the good list.
goodPhraseList.add(emailAddr);

}else{
textArea.append("\nUnable to locate file "
+ "referred to in reply.\n");
//Beep to alert the user of this problem.
Toolkit.getDefaultToolkit().beep();
try{
Thread.currentThread().sleep(200);
}catch(Exception ex){
System.out.println(ex);}
Toolkit.getDefaultToolkit().beep();
}//end else

}//end processReply
//===========================================//

//This method tests the sender of the message
// and the subject of the message against the
// list of items in the goodPhraseFile. Returns
// true on match, false otherwise.
private boolean isGood(){
boolean match = false;
//Get the subject, decode if necessary, and
// convert to upper case
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

//Get the sender and convert to upper case
sender = BigDog02b.readLines(pathFileName,
"From:","From:");
sender = sender.toUpperCase();

//The Subject and From lines have been
// captured. Screen each of them against
// an upper case version ofwords and
// phrases in a TreeSet object containing
// good email addresses and subjects.
match = screenForGoodSubjAndFromLines();
return match;
}//end isGood method
//===========================================//

//This method screens the Subject and From
// lines to determine if they contain good
// subjects or email addresses. If so, the
// method returns true. Otherwise, it returns
// false. An exact match on an upper-case basis
// is required
private boolean
screenForGoodSubjAndFromLines(){
Iterator iterator =
goodPhraseList.iterator();
while(iterator.hasNext()){
String goodWord =
((String)(iterator.next())).
toUpperCase();
if(!(goodWord.equals(""))){
if((subject.indexOf(goodWord) != -1) ||
(sender.indexOf(goodWord) != -1)){
//An exact match was found.
System.out.println("\ngoodWord:"
+ goodWord);
return true;
}//end if((subject.indexOf...
}//end if!(goodWord.equals("")
}//end while iterator has next
return false;
}//end screenForGoodSubjAndFromLines
//===========================================//

//This method processes a message that has been
// determined to be a good message. It writes
// the message into the MBOX file, and adds
// the identification of the message to the
// list of messages scheduled for deletion from
// the server later.
void processGood(){
//Add the message to the MBOX file.You can
// tag the subject with any string that you
// want to pass as the second parameter.
mBoxStrBuf = addToMboxStr(mBoxStrBuf,
"{GD}{"+msgNumberStr+"}",
pathFileName);

//Add the message to the list of messages
// scheduled for deletion from the public
// email server.
msgToDelete.add(pathFileName);

}//end processGood
//===========================================//


//This method is used to process messages that
// have been determined to be in the quarantine
// category. These are messages which probably
// are spam or viruses, sent by machines.
// However, some small percentage may have been
// sent by a human who wishes to communicate in
// a meaningful way, but whose email address
// has not yet been entered into the good list.
// As a result, each of these messages
// triggers an email message to be sent
// automatically asking the sender to
// demonstrate that they are a human by
// replying to the message. The original
// message is tagged {QU} and written into the
// MBOX file. It is also stored in a local
// archive folder. The receipt of a reply
// later will cause the original message to be
// retrieved from the local archive folder,
// tagged {GD}, and written into the MBOX file.
void processQuarantine(){

String subject = "";
String sender = "";
String date = "";
String header = "";

//Read the message from a local file, tag it
// {QU} and write it into the MBOX file.
// You can tag the subject with any
// string that you want to pass as
// the second parameter. I elected to add
// the original message number and the spam
// score to the tag. This information is
// useful when using ordinary email program
// filters to direct the messages to
// specific email folders.
mBoxStrBuf = addToMboxStr(mBoxStrBuf,
"{QU}{"+msgNumberStr+"}{"+hitCount+"}",
pathFileName);

//Add the message to the list of messages
// scheduled for deletion from the public
// email server.
msgToDelete.add(pathFileName);

//Now prepare for composing and sending the
// email message to the sender of the
// current message.
//Get the Subject line decode if necessary,
// and convert it to upper case
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

//Get the sender in upper case
sender = BigDog02b.readLines(pathFileName,
"From:","From:");
sender = sender.toUpperCase();

//Now get the date.
date = BigDog02b.readLines(pathFileName,
"Date:","Date:");
date = date.toUpperCase();

//Now get the header of the original message.
header = BigDog02b.readLines(pathFileName,
null,"Status:");

//Use this information to send an email
// message to the sender. Need to avoid a
// substring index error later if the sender
// or the subject are blank.
if(!(sender.equals("")
|| subject.equals(""))){
sendEmailMsg(sender,subject,date,
header,pathFileName);
}else{
textArea.append(
"\nUnable to send message\n");
}//end else
}//end processQuarantine
//===========================================//

//This method is used to automatically send an
// email message to the sender of every
// quarantine message, asking them to indicate
// that they are a human rather than a machine
// by replying to the message.
//The incoming sender parameter is used to
// establish the address of the recipient.
//The incoming parameter subject is reported
// to the recipient along with the date to
// identify the message to the recipient.
//The incoming pathFileName is used to
// place a unique identifier in the subject of
// the message that is sent. This identifies
// the original message that triggered this
// event.
private void sendEmailMsg(String sender,
String subject,
String date,
String header,
String pathFileName){
//Enable the following two statements and
// enclose the remaining body of the method
// in a large block comment when testing the
// program to avoid sending nuisance
// messages.
textArea.append("sendEmail disabled\n");
return;

/* //Start a block comment here to disable

//Don't send messages to any email address
// on the doNotSendList.
boolean okToSend = true;
for(int cnt = 0; cnt < doNotSendList.length;
cnt++){
if(sender.toUpperCase().indexOf(
doNotSendList[cnt].
toUpperCase()) != -1){
okToSend = false;
textArea.append("\nDon't send to: "
+ sender.toUpperCase() + "\n");
break;
}//end if
}//end for loop


if(okToSend){
//Get the email address from the incoming
// parameter sender. Sometimes the actual
// address is enclosed in angle brackets.
String emailAddr = "";
if((sender.indexOf("<") != -1) &&
(sender.indexOf(">") != -1)){
try{
emailAddr = sender.substring(
sender.indexOf("<") + 1,
sender.indexOf(">"));
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println("Sending email");
System.out.println(
"Getting emailAddr in <>");
System.out.println("sender:" + sender);
System.out.println("pathFileName:"
+ pathFileName);
System.out.println("Forcing a valid "
+ "email address structure");
emailAddr = "dummy@dummy.com";
}//end catch
}else{
//Sometimes the email address simply
// follows the word From: in the header
// of the message from which the sender
// parameter is derived.
try{
emailAddr = sender.substring(
sender.toUpperCase().indexOf(
"FROM:")+5);
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println("Sending email");
System.out.println(
"Getting emailAddr");
System.out.println("sender:" + sender);
System.out.println("pathFileName:"
+ pathFileName);
System.out.println("Forcing a valid "
+ "email address structure");
emailAddr = "dummy@dummy.com";
}//end catch
emailAddr = emailAddr.trim();
}//end else

//Make sure that emailAddr contains an @
// indicating that it is probably a
// properly formatted email address.
if(emailAddr.indexOf("@") == -1){
Toolkit.getDefaultToolkit().beep();
try{
Thread.currentThread().sleep(200);
}catch(Exception ex){
System.out.println(ex);}
Toolkit.getDefaultToolkit().beep();
System.out.println("\nCan't send to:"
+ emailAddr);
return;
}//end if

//Extract the file name from the
// pathFileName parameter and the actual
// subject from the incoming subject
// parameter.
String fileName = "No file name available";
try{
fileName = pathFileName.substring(
pathFileName.lastIndexOf("/") + 1);
String theSubject = subject.substring(9);
}catch(Exception ex){
System.out.println("\n" + ex);
System.out.println("Sending email");
System.out.println(
"Getting fileName and theSubject");
System.out.println("subject:" + subject);
System.out.println("fileName:"
+ fileName);
System.out.println("pathFileName:"
+ pathFileName);
}//end catch

//Display information about the message. I
// may decide to write this into a history
// file later so that I will have a record
// of messages sent.
textArea.append("\nSending email to:\n"
+ emailAddr +
"\n" + fileName + "\n"
+ date.trim() + "\n");

try{
//Pass a string containing the name of
// the smtp server as a parameter to the
// SmtpClient constructor.
SmtpClient smtp =
new SmtpClient(smtpServer);

//Pass the sender's email address to the
// from() method.
smtp.from(fromAddr);

//Pass the email address of the recipient
// to the method named to().
smtp.to(emailAddr);

//Get an output stream for the message
PrintStream msg = smtp.startMessage();

//Write the message header in the output
// stream.
msg.println("To: " + emailAddr);
msg.println("Subject: " +
subjOut + fileName);
msg.println();//blank line

//Write the text of the message in the
// output stream.
msg.println(
"I recently received a message from your\n"+
"Email address with the following subject\n"+
"and date:\n\n"+

subject + "\n" +
date + "\n\n" +

"Because your Email address has not been \n"+
"entered into the Approved Sender list of my \n"+
"SPAM blocking software, the message has been\n"+
"placed in the Quarantine folder. To move \n"+
"the message from the Quarantine folder into \n"+
"my Inbox, you will need to press your Reply \n"+
"button and send this message back to me \n"+
"making no changes to the Subject line or the\n"+
"body of the message. This will also cause \n"+
"your Email address to be added to my \n"+
"Approved Sender list so that future messages\n"+
"from you won't be similarly delayed.\n\n"+

"I apologize for this inconvenience. \n"+
"However, due to the large amount of SPAM \n"+
"that I must contend with, I have been \n"+
"forced to implement a mail handling system \n"+
"that asks you for a one-time confirmation \n"+
"that you intend to communicate with me via \n"+
"Email.\n\n"+

"If you didn't send the original message, I \n"+
"apologize for the intrusion. However, it is\n"+
"possible that someone is using your Email \n"+
"address for misleading, possibly fraudulent,\n"+
"and possibly malicious purposes. I strongly\n"+
"encourage you to file a complaint regarding \n"+
"the inappropriate use of your Email address.\n"+

"I have provided all of the information below\n"+
"that you will need to file such a \n"+
"complaint.\n\n"+


"The information provided below my signature\n"+
"block is the full header of the original \n"+
"Email message. You will find a short \n"+
"tutorial at \n"+
"http://www.dickbaldwin.com/java/Java2158.htm\n"+
"that explains how to use this header to file\n"+
"a complaint.\n\n"+

"If we all ban together in opposing SPAM and \n"+
"Email viruses, perhaps we can have a \n"+
"positive impact on this increasingly serious\n"+
"problem.\n\n"+

"Regards,\n"+ signature +

"=======HEADER BEGINS HERE========\n\n"+
header +"\n"

);//end of message

//Close the stream and send the message
smtp.closeServer();

}catch( Exception e ){
System.out.println("\n" + e);
System.out.println("Sending email");
System.out.println(pathFileName);
}//end catch
}//end if(okToSend)

*/ //end a block comment here to disable
}//end sendEmailMsg
//===========================================//


//Purpose: To create a TreeSet object
// containing words used to screen the message
// From and Subject lines.
//This method reads strings from a text file
// and creates the list as a TreeSet object
// with no duplicates.
//Only the primary portion of the good
// Email address should be included in the
// file used to create the list. This would
// be x@y.z

//After creating the list, it writes the data
// from the list into a backup file named
// ....bakN, where N is the value of the
// next available file name in the directory.
//A new backup file with a unique name is
// created each time the program is run. Once
// the number of backup files reaches 5, the
// program automatically deletes the oldest
// file before creating a new backup
// file. Thus the program automatically
// maintains a sequence of five backup files
// with extensions .bak0 through bak5 with one
// number missing. The age-order of the files
// should be determined by the modificatin date
// and not by the name of the file.
//The data read from the file is converted to
// upper case before being added to the TreeSet
// object.

void makeGoodPhraseList(){
goodPhraseList = new TreeSet();

//Read words or phrases from text file and
// populate the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
goodPhraseFile));
String data; //temp holding area

while((data = inData.readLine()) != null){
goodPhraseList.add(data.toUpperCase());
}//end while loop

inData.close();//Close input file

//Write a backup file before making any
// modifications to the data.

//First determine the name of the next
// backup file allowed in the directory.
int N = 0;
File theFile = null;
String baseFileName = goodPhraseFile.
substring(0,goodPhraseFile.indexOf(
".txt"));
for(N = 0;N < 6;N++){
theFile = new File(baseFileName
+ ".bak" + N);
if(!(theFile.exists()))break;
}//end for loop

//Cause N to rotate from 0 through 5
if(N == 5){//del file 0 for use next time
new File(baseFileName
+ ".bak0").delete();
}//end if
else{//delete the next file in sequence
if(new File(
baseFileName + ".bak"
+ (N + 1)).exists()){
new File(
baseFileName + ".bak"
+ (N + 1)).delete();
}//end if
}//end else

//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
theFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = goodPhraseList.iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}
}//end makeGoodPhraseList
//===========================================//

//Purpose: To create a TreeSet object
// containing words used to screen the message
// From and Subject lines.
//This method reads strings from a text file
// and creates the list as a TreeSet object
// with no duplicates.
//Only the primary portion of the bad
// Email address should be included in the
// file used to create the list. This would
// be x@y.z

//After creating the list, it writes the data
// from the list back out into the file. This
// is done to keep the contents of the file
// sorted in upper case. Since the program
// doesn't modify the contents of the list,
// there is no point in creating backup files.

void makeBadPhraseList(){
badPhraseList = new TreeSet();

//Read words or phrases from text file and
// populate the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
badPhraseFile));
String data; //temp holding area

while((data = inData.readLine()) != null){
badPhraseList.add(data.toUpperCase());
}//end while loop

inData.close();//Close input file

//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
badPhraseFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = badPhraseList.iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}
}//end makeBadPhraseList
//===========================================//

private StringBuffer addToMboxStr(
StringBuffer mBoxStrBuf,
String tag,
String pathFileName){

StringBuffer message = new StringBuffer(
"No message found");
message = new StringBuffer(
BigDog02b.readLines(
pathFileName,null,null));

//Prepare the message for appending to the
// end of the MBOX string. This requires
// the creation of four new header lines
// and the prepending of those four lines
// onto the message. Examples of those
// four new header lines follow:

//From - Wed Jan 21 15:59:09 2004
//X-UIDL: 400ed7770000000b
//X-Mozilla-Status: 0000
//X-Mozilla-Status2: 00000000

// The first line contains the date and
// time. The second line contains the UIDL
// from the email server. The meaning of
// the third and fourth lines can be found
// at various web sites including
// http://www.eyrich-net.org/mozilla/
// X-Mozilla-Status.html?en
// For a new message, the status values
// given above are satisfactory.

//Create Mozilla header lines and insert
// them at the beginning of the message.
// First get a 24-char date string matching
// the format required by Mozilla.
String theDate = new Date().toString();
theDate = "From - " +
theDate.substring(0,19) +
theDate.substring(23);
//Create the UIDL string.
String xUidl = "X-UIDL:" +
pathFileName.substring(
pathFileName.lastIndexOf(" "));
//Create the two status strings.
String xMozillaStatus =
"X-Mozilla-Status: 0000";
String xMozillaStatus2 =
"X-Mozilla-Status2: 00000000";

message.insert(0,theDate + "\n" + xUidl
+ "\n" + xMozillaStatus + "\n"
+ xMozillaStatus2 + "\n");
//Append a new line at the end of the
// message.
message.append("\n");
//Insert tag in subject line
message = message.insert(message.indexOf(
"Subject: ")+9,tag);

//Append this message at the end of the
// string that will be used to create the
// MBOX file.
mBoxStrBuf.append(message);

return mBoxStrBuf;

}//end addToMboxStr
//===========================================//

//This method passes the message through a spam
// screener to determine if it should be
// considered spam. The screener program
// produces and returns a score based on the
// number of hits against offensive words and
// phrases. The number of hits is compared to
// a hitLimit value that is established in the
// general instance variables at the beginning
// of the program. When the number of hits
// reaches that value, the screener terminates
// in order to avoid wasting time. If that
// limit has been reached, this method returns
// true indicating that the message is thought
// to be spam. Otherwise, it returns false. If
// it returns true, the control program invokes
// the method named processSpam to deal with
// the message.
private boolean isSpam(){
BigDog02SpamScreen01 screener =
new BigDog02SpamScreen01(dataPath,
subjAndHtmlPhraseFile,
rawTextPhraseFile,
hitLimit);

hitCount = screener.screenMsg(pathFileName);
if(hitCount >= hitLimit){
return true;
}else{
return false;
}//end else
}//end isSpam method
//===========================================//

//This method deals with a message that has
// been identified as spam.
void processSpam(){

//Add the message to the MBOX file.
//You can tag the subject with any string
// you want to pass as the second parameter.
// I elected to tag it with {SP} indicating
// that it is spam. I also added the message
// number and the spam score which may be
// useful for using email program filters to
// cause the messages to be directed to
// specific email folders.
mBoxStrBuf = addToMboxStr(mBoxStrBuf,
"{SP}{"+msgNumberStr+"}{"+hitCount+"}",
pathFileName);

//Add this message to the list of messages
// scheduled to be deleted from the public
// email server
msgToDelete.add(pathFileName);

}//end processSpam
//===========================================//

}//end class BigDog02j
//=============================================//

Listing 4

File BigDog02k

/*File BigDog02k.java
Copyright 2004, R.G.Baldwin
Rev 03/06/04

This is a special modified version of the program
named BigDog02j. The purpose of this version is
to examine message files that have been manually
copied from the archive folder into a folder
named temp, and to delete all files other than
those that are quarantined with a hit count of
zero. The quarantined files can then be used to
train the spam screening algorithms to do a
better job in the future.

The main purpose of this and the program used to
train the algorithm is to identify messages that
clearly seem to contain spam, but which
currently are being categorized as quarantined
with zero spam hits. Messages in quarantine with
no spam hits deserve special manual scrutiny to
make certain that they don't represent messages
from a computer than need to be read (such as
machine generated airline reservations.).

This program processes a set of message files
written by the program named BigDog02g that have
been manually copied into a folder named temp.

This program should be run after a virus
checker has been used to confirm that all files
copied into the temp directory are free of
viruses.

Tested using SDK 1.4.2 under WinXP
************************************************/

import java.net.*;
import java.io.*;
import java.util.*;
import java.awt.*;
import java.awt.event.*;

class BigDog02k extends Frame{

String dataPath = "./temp/";

//Following two files contain lists of phrases
// used in processing the messages before they
// are subjected to the spam screen.
String goodPhraseFile = "BigDog02GoodList.txt";
String badPhraseFile = "BigDog02BadList.txt";

//Following two files contain lists of phrases
// used in performing the spam screen.
String subjAndHtmlPhraseFile =
"BigDog02SubjAndHtml.txt";
String rawTextPhraseFile =
"BigDog02RawText.txt";

//Following are working variables used by the
// program for various purposes.
TreeSet goodPhraseList;
TreeSet badPhraseList;
String pathFileName;
Button startButton = new Button("Start/Next");

TextArea textArea = new TextArea(20,50);
String subject = "No Subject line found";
String sender = "No From line found";

int hitCount = 0;
int hitLimit = 6;
//Will delete all msg files with a hit count
// greater than or equal to the following.
int deleteLimit = 1;

public static void main(String[] args){
//Construct an object of this class
new BigDog02k();
}//end main
//===========================================//

//Constructor
BigDog02k(){
makeGoodPhraseList();
makeBadPhraseList();

//Register a window listener to service
// the close button on the Frame.
this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){
System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener

//Register an ActionListener on the
// startButton.
startButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
startButton.setEnabled(false);
//Get a directory listing
File dataDir = new File(dataPath);
//The following code creates a
// directory listing containing only
// those files that begin with +OK.
//This is an anonymous implementation
// of a class that implements
// FilenameFilter.
String[] dirList = dataDir.list(
new FilenameFilter(){
public boolean accept(
File dir,String name){
if(!(new File(dir,name).
isFile())) return false;
return name.startsWith("+OK");
}//end accept
}//end FilenameFilter
);//end list

//Now process the files in the
// directory
int msgCounter = 0;
for(msgCounter = 0;
msgCounter < dirList.length;
msgCounter++){
String fileName =
dirList[msgCounter];
pathFileName = dataPath + fileName;

//Process the message
startProcess();
}//end for loop on directory length
System.out.println("Finished");
}//end actionPerformed
}//end ActionListener
);//end addActionListener

//Configure the GUI by placing the
// various components on it, setting the
// size, and making it visible.
add(startButton);
add(textArea);
textArea.setText("");
setLayout(new FlowLayout());

setTitle("Copyright 2004, R.G.Baldwin");
setSize(400,400);
//Make the GUI visible.
setVisible(true);
}//end constructor
//===========================================//

//The purpose of this method is to kick off the
// processing of a new message.
void startProcess(){
//Determine the type of message and take the
// appropriate action.

if(isBad()){
//This message was determined to be from
// a confirmed spammer, virus writer, other
// machine, or some other undesirable
// source. No point in sending them a
// message. Tag the message as {BD}
// and write it into the MBOX file
System.out.print("{BD}: ");
deleteFile(pathFileName);
}else if(isGood()){
//This message was determined either to be
// from an approved sender, or to have an
// approved subject. Tag the message as
// {GD} and write it into the MBOX file.
System.out.print("{GD}: ");
deleteFile(pathFileName);
}else if(isSpam()){
//This message has been processed by a spam
// filter and has been determined to be
// spam. It will be marked {SP} along with
// a spam score before being written into
// the MBOX file.
System.out.print("{SP} ");
deleteFile(pathFileName);
}else{
//This message is from an unknown address.
// It is probably spam, but may be from
// someone worth communicating with.
// Process the message to determine the
// number of spam hits and delete it if
// the number is greater than zero. Can
// modify the comparison value if it is
// decided to keep files with greater
// hit count values.
if(hitCount >= deleteLimit){
System.out.print("{QU}{"+hitCount+"} ");
deleteFile(pathFileName);
}//end if
}//end if(isSpam()

}//end startProcess
//===========================================//

//This method tests the sender of the message
// and the subject of the message against the
// list of items in the badPhraseFile.
// Returns true on match, false otherwise.
private boolean isBad(){
boolean match = false;

//Get the Subject line decode if necessary,
// convert it to upper case
subject = BigDog02b.readLines(
pathFileName,"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

//Get the sender and convert it to upper
// case
sender = BigDog02b.readLines(pathFileName,
"From:","From:");
sender = sender.toUpperCase();

//The Subject and From lines have been
// captured. Screen each of them against
// an upper case version ofwords and
// phrases in a TreeSet object containing
// quarantine email addresses and subjects.
match = screenForBadSubjAndFromLines();
return match;
}//end isBad method
//===========================================//

//This method screens the Subject and From
// lines to determine if they contain bad
// subjects or email addresses. If so, the
// method returns true. Otherwise, it returns
// false. An exact match on an upper-case basis
// is required
private boolean
screenForBadSubjAndFromLines(){
Iterator iterator =
badPhraseList.iterator();
while(iterator.hasNext()){
String badWord =
((String)(iterator.next())).
toUpperCase();
if(!(badWord.equals(""))){
if((subject.indexOf(badWord) != -1) ||
(sender.indexOf(badWord) != -1)){
//An exact match was found.
return true;
}//end if((subject.indexOf...
}//end if!(badWord.equals("")
}//end while iterator has next
return false;
}//end screenForBadSubjAndFromLines
//===========================================//

//This method tests the sender of the message
// and the subject of the message against the
// list of items in the goodPhraseFile. Returns
// true on match, false otherwise.
private boolean isGood(){
boolean match = false;
//Get the subject, decode if necessary, and
// convert to upper case
subject = BigDog02b.readLines(pathFileName,
"Subject:","Subject:");
subject = BigDog02b.decodeSubj(subject);
subject = subject.toUpperCase();

//Get the sender and convert to upper case
sender = BigDog02b.readLines(pathFileName,
"From:","From:");
sender = sender.toUpperCase();

//The Subject and From lines have been
// captured. Screen each of them against
// an upper case version ofwords and
// phrases in a TreeSet object containing
// good email addresses and subjects.
match = screenForGoodSubjAndFromLines();
return match;
}//end isGood method
//===========================================//

//This method screens the Subject and From
// lines to determine if they contain good
// subjects or email addresses. If so, the
// method returns true. Otherwise, it returns
// false. An exact match on an upper-case basis
// is required
private boolean
screenForGoodSubjAndFromLines(){
Iterator iterator =
goodPhraseList.iterator();
while(iterator.hasNext()){
String goodWord =
((String)(iterator.next())).
toUpperCase();
if(!(goodWord.equals(""))){
if((subject.indexOf(goodWord) != -1) ||
(sender.indexOf(goodWord) != -1)){
//An exact match was found.
System.out.println("\ngoodWord:"
+ goodWord);
return true;
}//end if((subject.indexOf...
}//end if!(goodWord.equals("")
}//end while iterator has next
return false;
}//end screenForGoodSubjAndFromLines
//===========================================//

//Purpose: To create a TreeSet object
// containing words used to screen the message
// From and Subject lines.
//This method reads strings from a text file
// and creates the list as a TreeSet object
// with no duplicates.
//Only the primary portion of the good
// Email address should be included in the
// file used to create the list. This would
// be x@y.z

//After creating the list, it writes the data
// from the list into a backup file named
// ....bakN, where N is the value of the
// next available file name in the directory.
//A new backup file with a unique name is
// created each time the program is run. Once
// the number of backup files reaches 5, the
// program automatically deletes the oldest
// file before creating a new backup
// file. Thus the program automatically
// maintains a sequence of five backup files
// with extensions .bak0 through bak5 with one
// number missing. The age-order of the files
// should be determined by the modificatin date
// and not by the name of the file.
//The data read from the file is converted to
// upper case before being added to the TreeSet
// object.

void makeGoodPhraseList(){
goodPhraseList = new TreeSet();

//Read words or phrases from text file and
// populate the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
goodPhraseFile));
String data; //temp holding area

while((data = inData.readLine()) != null){
goodPhraseList.add(data.toUpperCase());
}//end while loop

inData.close();//Close input file

//Write a backup file before making any
// modifications to the data.

//First determine the name of the next
// backup file allowed in the directory.
int N = 0;
File theFile = null;
String baseFileName = goodPhraseFile.
substring(0,goodPhraseFile.indexOf(
".txt"));
for(N = 0;N < 6;N++){
theFile = new File(baseFileName
+ ".bak" + N);
if(!(theFile.exists()))break;
}//end for loop

//Cause N to rotate from 0 through 5
if(N == 5){//del file 0 for use next time
new File(baseFileName
+ ".bak0").delete();
}//end if
else{//delete the next file in sequence
if(new File(
baseFileName + ".bak"
+ (N + 1)).exists()){
new File(
baseFileName + ".bak"
+ (N + 1)).delete();
}//end if
}//end else

//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
theFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = goodPhraseList.iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}
}//end makeGoodPhraseList
//===========================================//

//Purpose: To create a TreeSet object
// containing words used to screen the message
// From and Subject lines.
//This method reads strings from a text file
// and creates the list as a TreeSet object
// with no duplicates.
//Only the primary portion of the bad
// Email address should be included in the
// file used to create the list. This would
// be x@y.z

//After creating the list, it writes the data
// from the list back out into the file. This
// is done to keep the contents of the file
// sorted in upper case. Since the program
// doesn't modify the contents of the list,
// there is no point in creating backup files.

void makeBadPhraseList(){
badPhraseList = new TreeSet();

//Read words or phrases from text file and
// populate the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
badPhraseFile));
String data; //temp holding area

while((data = inData.readLine()) != null){
badPhraseList.add(data.toUpperCase());
}//end while loop

inData.close();//Close input file

//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
badPhraseFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = badPhraseList.iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}
}//end makeBadPhraseList
//===========================================//

//This method passes the message through a spam
// screener to determine if it should be
// considered spam. The screener program
// produces and returns a score based on the
// number of hits against offensive words and
// phrases. The number of hits is compared to
// a hitLimit value that is established in the
// general instance variables at the beginning
// of the program. When the number of hits
// reaches that value, the screener terminates
// in order to avoid wasting time. If that
// limit has been reached, this method returns
// true indicating that the message is thought
// to be spam. Otherwise, it returns false. If
// it returns true, the control program invokes
// the method named processSpam to deal with
// the message.
private boolean isSpam(){
BigDog02SpamScreen01 screener =
new BigDog02SpamScreen01(dataPath,
subjAndHtmlPhraseFile,
rawTextPhraseFile,
hitLimit);

hitCount = screener.screenMsg(pathFileName);
if(hitCount >= hitLimit){
return true;
}else{
return false;
}//end else
}//end isSpam method
//===========================================//

void deleteFile(String pathFileName){
File tempFile = new File(pathFileName);
if(tempFile.exists()){
boolean deleted = tempFile.delete();
if(deleted){
System.out.println(
"Deleted: " + pathFileName);
}//end if
}//end if

}//end deleteFile

}//end class BigDog02k
//=============================================//

Listing 5

File BigDog02m

/*File BigDog02m.java Copyright 2003, R.G.Baldwin
Rev 03/07/04

The purpose of this program is to process text
files produced by BigDog02g for the purpose of
using the information contained in those files
to update the word list stored in
BigDog02SubjAndHtml.txt

This program should be run following BigDog02k.
It is used to train the subject line screener to
do a better job of detecting spam in the subject
lines and in the HTML body of messages. It
should not be used in an attempt to train the
raw body text screener, except that when this
program displays raw body text, that text can
be manually copied to the clipboard and then
pasted into the text file named
BigDog02RawText.txt.

In operation, a large block of message files
should be manually copied from the archive folder
to the folder named temp. Then BigDog02k should
be run to delete {GD}, {BD}, and {SP} files and
also to delete {QU} files with a spam hit count
greater than zero. This program should then be
run in an attempt to find and save offensive
words and phrases in the subject line and HTML
body that would cause those files to experience
a spam hit count greater than zero in the
future. This serves to reduce the number of
{QU} messages that must be examined following
the running of either BigDog02i or BigDog02j.

Tested using SDK 1.4.2 under WinXP
************************************************/
import java.io.*;
import java.util.*;
import java.awt.*;
import java.awt.event.*;

class BigDog02m extends Frame{

BufferedReader inData;
TextArea textArea = new TextArea(12,50);
Button copyButton = new Button(
"Copy Selected Text");
Button postButton = new Button("Post Text");
Button deleteButton = new Button(
"Delete Local File");
Button nextButton = new Button("Next");
TextField fromField = new TextField(
"From data will appear here",50);
TextField subjField = new TextField(
"Subject data will appear here",50);
TextField outputWordField = new TextField(
"User pastes output words here",50);
TextField operMsgField = new TextField(
"User instructions appear here. " +
"Press Next to process first message.",50);
TreeSet subjWordList;
String[] dirList;
int fileCounter = 0;
String dataPath = "./temp/";
File dataDir = new File(dataPath);
String msgToUser =
"\nPost phrases for this message.\n" +
"Then press Next to process next message.";

public static void main(String[] args){
BigDog02m thisObj = new BigDog02m();
thisObj.makeSubjWordList();
}//end main
//===========================================//

BigDog02m(){//constructor
//Register a window listener to service
// the close button on the Frame. This is
// an anonymous class defiition.
this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){
//Write the updated word list stored in
// a TreeSet object to an output file
// on shutdown. It is also written
// when you click the Next button and
// there are no remaining files to be
// processed.
writeSubjWordList();
System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener

setLayout(new FlowLayout());

//Register an ActionListener on the
// nextButton.
nextButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){

//Protect against ArrayIndexOutOfBounds
if((fileCounter >= 0) &&
(fileCounter < dirList.length)){

if(fileCounter ==
(dirList.length - 1)){
//The user clicked the Next button
// but there are no more files.
//Write the modified word list
// stored in the TreeSet object to
// an output file. This also
// happens when the user clicks the
// close button on the Frame later,
// but this write operation is
// provided here just in case the
// user terminates without pressing
// the close button. The user can
// post additional words to the
// TreeSet object after this write
// operation occurs. That is why
// an additional write operation
// occurs when the user presses
// the close button.
writeSubjWordList();
msgToUser = "\n\nNo more messages."
+ "\nPost phrases for this "
+ "message.\n Then press "
+ "close to terminate.";
//Disable the Next button so that
// the user cannot fire any more
// events of this type.
nextButton.setEnabled(false);
}//end if no more messages

//Identify the file being processed
textArea.setText("Processing " +
dirList[fileCounter] + "\n");

//Provide instructions to the user.
operMsgField.setText("Paste a phrase"
+ " in the output field and press "
+ "Post. Post as many new phrases "
+ "as you want. Press next to "
+ "process next message.");
outputWordField.setText("Paste "
+ "output phrase here and then "
+ "press Post.");

try{
//Open the file containing a local
// copy of the message.

inData = new BufferedReader(
new FileReader(dataPath
+ dirList[fileCounter]));

String data; //temp holding area

//Precondition the display of
// Subject in the GUI by skipping
// header lines prior to the
// Subject line. Mark the beginning
// of the file. Set the
// readAheadLimit to 10000
// characters before the mark will
// be lost.
inData.mark(10000);
//Some messages may not contain a
// Subject or From line. Don't
// want the old one to continue to
// be visible in the GUI.
subjField.setText(
"No Subj line found yet");
fromField.setText(
"No From line found yet");
while((data = inData.readLine())
!= null){
//A null result indicates end of
// file.

//Trap the Subject line, decode
// if necessary, convert it to
// upper case, and display it in
// a field on the GUI.
if(data.startsWith("Subject:")){

data = decodeSubj(data);
subjField.setText(
data.toUpperCase());
break;//No need to keep reading
}//end if(data.startsWith("Subj..
}//end while loop on null

//Reset back to beginning of file.
// The Subject for this message is
// now showing in the GUI.
inData.reset();

//Precondition the display of From
// line in the GUI by skipping
// header lines prior to the From
// line. Code is similar to that
// discssed above.
while((data = inData.readLine())
!= null){
if(data.startsWith("From:")){
fromField.setText(
data.toUpperCase());
break;
}//end if
}//end while loop on null

//Reset back to beginning of file.
// The From line for this message
// is now showing in the GUI. Read
// and display the entire file.
// This data is displayed for
// informtion purposes only to help
// the user decide what to do in
// terms of updating the word list
// used by BigDog02i or BigDog02j
// for processing the Subject line.
inData.reset();

//Start by reading the entire
// message into a single upper case
// String object with no line
// breaks. Limit the size of the
// file that the program is willing
// to read.
int lineLimit = 500;
int lineCount = 0;
inData.reset();//rewind input
String msgString = "";
while(((data = inData.readLine())
!= null) && ++lineCount
< lineLimit){
msgString += data + "\n";
}//end while data != null

if(lineCount == lineLimit){
System.out.println(
dirList[fileCounter]
+ " terminated, excessive "
+ "length");
}//end if(lineCount == lineLimit)

//Expand base64 data in msg body.
msgString = decodeBody(msgString).
toUpperCase();
msgString = removeNewLine(
msgString);

//Get and display embedded email
// addresses
String emailString = getEmailAddrs(
msgString);

//Get the HTML as a single string
String cleanString =
getCleanHtmlString(msgString);

String msgToUser =
"Initial msgToUser";
if(cleanString != null){
msgToUser =
"This is clean HTML\n";
msgString = cleanString;
}//end if(cleanString != null)
else{//cleanString == null
msgToUser = "No HTML found, this"
+ " is raw text\n";
}//end cleanString == null

//Display on multiple lines
int lineLen = 90;
int cnt = 0;
for(cnt = 0;
cnt < (msgString.length())
/lineLen;
cnt++){
textArea.append(
msgString.substring(
lineLen*cnt,
lineLen*cnt+lineLen) + "\n");
}//end for loop
//Display remaining characters
textArea.append(
msgString.substring(
lineLen*(cnt-1)+lineLen)
+ "\n");
textArea.append("\n" + msgToUser
+ "\n");

}catch(Exception ex){
ex.printStackTrace();}

//Increment the fileCounter so that
// the next time the Next button
// fires an ActionEvent, the next
// file in the directory listing will
// be processed.
fileCounter++;

}//end if on fileCounter in bounds
else{
//File counter out of bounds. This
// happens if you delete all the
// files.
textArea.setText(
"No more files. Press Close to "
+ "terminate.");
nextButton.setEnabled(false);
}//end else counter is out of bounds
}//end actionPerformed
}//end ActionListener
);//end addActionListener

//Register an object of the following
// anonymous class on both the Post button
// and the outputWordField. That way, the
// contents of the outputWordField can be
// posted to the new word list by either
// clicking the Post button, or pressing the
// Enter key when the outputWordField has the
// focus.
ActionListener postListener =
new ActionListener(){
public void actionPerformed(
ActionEvent e){
//Get the word or phrase from the field
// and add it to the TreeSet object.
String tempWord =
outputWordField.getText();
subjWordList.add(tempWord);

//Provide feedback to confirm that it
// has been posted. This tells the
// user that she is free to post
// another word if she desires.
outputWordField.setText(
tempWord + " posted");
}//end actionPerformed
};//end ActionListener

//Register the ActionListener object on
// the two source objects.
postButton.addActionListener(postListener);
outputWordField.addActionListener(
postListener);

//Register an ActionListener on the
// copyButton to copy selected text to the
// outputWordField. First tries to copy
// selected text from the Subject. If that
// produces an empty string, tries to copy
// selected text from the text area. There
// must not be any text selected in the
// Subject in order to copy selected text
// from the text area.
copyButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
String selected =
subjField.getSelectedText();

if(selected.equals("")){
selected =
textArea.getSelectedText();
}//end if(selected.equals(""))
outputWordField.setText(selected);
}//end actionPerformed
}//end new ActionListener
);//end addActionListener

//Register an ActionListener on the Delete
// button to make it possible for the
// user to remove a file from the local
// directory.
deleteButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
//Delete the local file currently being
// displayed in the GUI. Must subtract
// one from the value of the file
// counter to cause it to reference the
// current file because it has already
// been incremented by the event
// handler for the Next button in
// preparation for processing the next
// file.

//Create a File object that represents
// the current file.
File tempFile = new File(dataPath +
dirList[fileCounter-1]);

if(tempFile.exists()){
try{
inData.close();
}catch(Exception ex){
ex.printStackTrace();}
tempFile.delete();//Delete the file
}//end if

//Fire a synthetic event on the Next
// button to cause the program to
// process the next file in the
// directory listing without user
// interaction.
Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
nextButton,
ActionEvent.
ACTION_PERFORMED,
"Next"));
}//end actionPerformed
}//end ActionListener
);//end addActionListener

//Configure the GUI by placing the various
// components on it.
add(copyButton);
add(postButton);
add(nextButton);
add(deleteButton);
add(fromField);
add(subjField);
add(outputWordField);
add(operMsgField);
add(textArea);
setTitle("Copyright 2004, R.G.Baldwin");
//Will need to make the GUI narrower in order
// to create the figures for publication.
setSize(400,400);
//Make the GUI visible.
setVisible(true);

//The following code creates a directory
// listing containing only those files that
// start with +OK.
dirList = dataDir.list(
new FilenameFilter(){
public boolean accept(
File dir,String name){
if(!(new File(dir,name).
isFile())) return false;
return name.startsWith("+OK");
}//end accept
}//end FilenameFilter
);//end list

//Create a message in the text area at
// startup showing the list of files in the
// directory that are available for
// processing.
this.textArea.append("Files to be processed"
+ "\n");
//Display the list of files
for(int cnt = 0;cnt < dirList.length;cnt++){
this.textArea.append(dirList[cnt] + "\n");
}//end for loop

}//end constructor
//===========================================//

//Purpose: To create a TreeSet object
// containing words used to filter the message
// subject lines in the program named
// Pop303.java.
//This method reads strings from a text file
// named BigDog02SubjAndHtml.txt and creates
// the list as a TreeSet object sorted in
// natural order with no duplicates.
//After creating the list, it writes the data
// from the list into a backup file named
// Pop303a.bakN, where N is the value of the
// next available file name in the directory.
//A new backup file with a unique name is
// created each time the program is run. Once
// the number of backup files reaches 5, the
// program automatically deletes the oldest
// file before creating a new backup
// file. Thus the program automatically
// maintains a sequence of five backup files
// with extensions .bak0 through bak5 with one
// number missing. The age-order of the files
// should be determined by the modificatin date
// and not by the name of the file.
//The data read from the file is converted to
// upper case before being added to the TreeSet
// object.

void makeSubjWordList(){
subjWordList = new TreeSet();

//Read words or phrases from text file and
// populate the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
"BigDog02SubjAndHtml.txt"));
String data; //temp holding area

while((data = inData.readLine()) != null){
subjWordList.add(data.toUpperCase());
}//end while loop

inData.close();//Close input file

//Write a backup file before making any
// modifications to the data.

//First determine the name of the next
// backup file allowed in the directory.
int N = 0;
File theFile = null;
for(N = 0;N < 6;N++){
theFile = new File(
"BigDog02SubjAndHtml.bak" + N);
if(!(theFile.exists()))break;
}//end for loop

//Cause N to rotate from 0 through 5
if(N == 5){//del file 0 for use next time
new File("BigDog02SubjAndHtml.bak0").
delete();
}//end if
else{//delete the next file in sequence
if(new File(
"BigDog02SubjAndHtml.bak"
+ (N + 1)).exists()){
new File(
"BigDog02SubjAndHtml.bak" +
(N + 1)).delete();
}//end if
}//end else

//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
theFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = subjWordList.iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}
}//end makeSubjWordList
//===========================================//

//Purpose: To write the data from a TreeSet
// object into a file named
// BigDog02SubjAndHtml.txt that is used in the
// programs named BigDog02i or BigDog02j to
// filter the message subject lines.
//This method is the reverse of the method
// named makeSubjWordList.

void writeSubjWordList(){
try{
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
"BigDog02SubjAndHtml.txt"));

//Use an iterator to access the data in
// the TreeSet object.
Iterator iter = subjWordList.iterator();
String data;

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();
}catch(Exception e){e.printStackTrace();}
}//end SubjWordList
//===========================================//

//Removes newline characters from an incoming
// String object and converts them to spaces.
String removeNewLine(String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("\n");
if(index > -1){
stringBuf.delete(index,index+1);
stringBuf.insert(index," ");
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end removeNewLine()
//===========================================//

//This method is called to decode a Subject
// line.

//Sometimes the Subject line is encoded using
// techniques designed to allow the use of
// non-ASCII characters in message headers
// (See RFC2047).
//The following code determines if the Subject
// line has been encoded using the ISO-8859-1
// character set with an encoding value of B or
// Q. If so, the encoded material is decoded.
//Messages with an encoding value of Q contain
// a mixture of ASCII characters and encoded
// characters, so it is possible to partially
// read them without the need for decoding.
// They also sometimes use an underscore in
// place of a space to make them more readable.
private String decodeSubj(String data){
try{
if(data.toUpperCase().indexOf(
"=?ISO-8859-1?B?") != -1){
//Need to decode for value of B.
int startIndex = data.toUpperCase().
indexOf("=?ISO-8859-1?B?") + 15;
int endIndex = data.length()-2;
sun.misc.BASE64Decoder dec =
new sun.misc.BASE64Decoder();
data = "Subject: " + "=?ISO-8859-1?B? "
+ new String(dec.decodeBuffer(
data.substring(startIndex,endIndex)));
}//end if..."=?ISO-8859-1?B?"

if(data.toUpperCase().indexOf(
"=?ISO-8859-1?Q?") != -1){
//Need to decode for value of Q.
int startIndex = data.toUpperCase().
indexOf("=?ISO-8859-1?Q?") + 15;
int endIndex = data.length()-2;
String decodedData = data.substring(
startIndex,endIndex);

//Decode non-ASCII characters
StringBuffer stringBuf =
new StringBuffer(decodedData);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("=");
if(index > -1){
String hexString =
new String(stringBuf).substring(
index+1,index+3);
char decodedChar =
(char)Integer.parseInt(
hexString.trim(),16);
stringBuf.delete(index,index+3);
stringBuf.insert(index,decodedChar);
}//end if
}//end while(index > -1)

//Replace underscore with space.
index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("_");
if(index > -1){
stringBuf.deleteCharAt(index);
stringBuf.insert(index,' ');
}//end if
}//end while(index > -1)

data = "Subject: " +"=?ISO-8859-1?Q? "
+ new String(stringBuf);
}//end if..."=?ISO-8859-1?Q?"
}catch(Exception ex){ex.printStackTrace();}
return data;
}//end decodeSubj
//===========================================//

//Expand base64 data in msg body.
private String decodeBody(String data){
String decodedData = "";
int currentPartIndex;
int nextPartIndex;
try{
if(data.toUpperCase().indexOf(
"Content-Transfer-Encoding: base64".
toUpperCase()) != -1){
//This message has base64 encoding
if((data.toUpperCase().indexOf(
"Content-Type: text/html".
toUpperCase()) != -1)
&& (data.toUpperCase().indexOf(
"Content-Type: multipart".
toUpperCase()) == -1)){
//This is a non-multipart message with
// base64 encoding.
//Locate the end of the header.
int base64Index = data.indexOf(
"Status:");
if(base64Index != -1){
int crIndex = data.indexOf(
"\n",base64Index);
String tempStr = data.substring(
crIndex+2,data.length());
sun.misc.BASE64Decoder dec =
new sun.misc.BASE64Decoder();
decodedData = "Start base64 "
+ new String(
dec.decodeBuffer(tempStr))
+ " End base64";
}//end if(base64Index != -1)
}//end if((data.toUpperCase().indexOf(...
else{
int boundaryIndex = data.indexOf(
"boundary=");
int newLineIndex = data.indexOf(
"\n",boundaryIndex);

if(boundaryIndex != -1){
String multipartCode =
data.substring(
boundaryIndex+10,newLineIndex-1);
nextPartIndex = data.indexOf(
multipartCode,newLineIndex+1);
while(nextPartIndex != -1){
int base64Index = data.indexOf(
"Content-Transfer-Encoding: "
+ "base64",nextPartIndex);
currentPartIndex = nextPartIndex;
nextPartIndex = data.indexOf(
multipartCode,nextPartIndex+1);
if((base64Index != -1) &&
(base64Index < nextPartIndex)){

//Don't process .gif or .jpg file
// attachments
String partBody = data.substring(
currentPartIndex,
nextPartIndex).toUpperCase();
if((partBody.indexOf(".GIF")
== -1) && (partBody.indexOf(
".JPG") == -1)){
//gif image not found. Process
// the data
int crIndex = data.indexOf(
"\n",base64Index);

//Search for the required blank
// line preceeding the block
// of base64 data
//Prevent infinite loop on bad
// data
int count = 0;
char firstChar = data.charAt(
crIndex+1);
while((firstChar != '\n')
&& (count < 100)){
crIndex = data.indexOf(
"\n",crIndex+1);
firstChar = data.charAt(
crIndex+1);
count++;
}//end while

String tempStr =
data.substring(
crIndex+2,nextPartIndex);
sun.misc.BASE64Decoder dec =
new sun.misc.BASE64Decoder();
decodedData += new String(
dec.decodeBuffer(tempStr));
decodedData += "\n\n-----End "
+ "base64 part-----\n\n";
}//end if(partBody.toUpperCa...
else{
decodedData += "-----Image "
+ "stripped off-----";
}//end else
}//end if(base64Index != -1)
else{
if(nextPartIndex != -1){
decodedData += data.substring(
currentPartIndex,
nextPartIndex);
decodedData += "\n\n-----End "
+ "non-base64 part-----\n\n";
}//end if(nextPartIndex != -1)
}//end else
}//end while loop on nextPartIndex...
}//end if(boundaryIndex != -1)
}//end else
return decodedData;
}//end if(data.toUpperCase().indexOf("Co...
else{
//This msg does not have base64 encoding
return data;
}//end else
}catch(Exception ex){ex.printStackTrace();}
return "Make Compiler Happy";
}//end decodeBody
//===========================================//

//This method receives an incoming string. It
// searches the string for all occurrences of
// the @ character. When it finds an @
// character, it extracts the substring that
// includes that character along with 50
// previous and 15 following characters. It
// appends a \n to the substring and appends
// it to an output string.
//The purpose is to return a string containing
// concatenated substrings, each of which
// probably contains an Email address.
//If it doesn't find any @ characters, it
// returns null.
private String getEmailAddrs(String data){
String dataOut = "";
int index = data.indexOf("@");
if(index == -1) return null;
while(index != -1){
if(index > 50){
//Eliminate as much non-ASCII data as
// possible by testing following
// characters for non-ASCII values
if((data.charAt(index+1) < 126) &&
(data.charAt(index+2) < 126) &&
(data.charAt(index+3) < 126) &&
(data.charAt(index+4) < 126) &&
(data.charAt(index+5) < 126) &&
(data.charAt(index+6) < 126)
){
dataOut += data.substring(index - 50,
index + 15) + "\n";
}//end if
}else{
dataOut += data.substring(0,index + 15)
+ "\n";
}//end else
index = data.indexOf("@",index+1);
}//end while loop
return dataOut;
}//end getEmailAddrs
//===========================================//

private String getCleanHtmlString(
String msgString){
String cleanString = removeTags(msgString);

if(cleanString != null){
cleanString = repNbsp(cleanString);
if(cleanString != null){
cleanString = remEntities(cleanString);
if(cleanString != null){
cleanString = remEquals(cleanString);
if(cleanString != null){
cleanString = remTabs(cleanString);
if(cleanString != null){
cleanString = remMultipleSpaces(
cleanString);
}//end if(cleanString != null){
}//end if(cleanString != null){
}//endif(cleanString != null){
}//endif(cleanString != null){
}//end if(cleanString != null)

return cleanString;
}//end method getCleanHtmlString
//===========================================//

//This method determines if a message
// contains HTML and removes all tags. If
// there is no HTML in the message text, it
// returns null.
private String removeTags(String msgString){
int isHtml = -1;
int startIndex = -1;
int endIndex = -1;

//Search for clues that the message
// contains HTML.
isHtml = msgString.indexOf("<HTML");
if(isHtml == -1) isHtml =
msgString.indexOf("<BODY");
if(isHtml == -1) isHtml =
msgString.indexOf("<FONT");
if(isHtml == -1) isHtml =
msgString.indexOf("<DIV");
if(isHtml == -1) isHtml =
msgString.indexOf("<STRONG");
if(isHtml == -1) isHtml =
msgString.indexOf("<BR");
if(isHtml == -1) isHtml =
msgString.indexOf("<TABLE");
if(isHtml == -1) isHtml =
msgString.indexOf("<SPAN");
if(isHtml == -1) isHtml =
msgString.indexOf("<UL");
if(isHtml == -1) isHtml =
msgString.indexOf("<OL");
if(isHtml == -1) isHtml =
msgString.indexOf("<P>");

if(isHtml != -1){
//Msg contains HTML but not in very good
// form since it is missing the matching
// HTML tags.

//Eliminate as much of the header as
// possible by finding the location of
// the last identifiable item in the
// message header and discarding
// everything prior to that point.

int tempIndex = -1;
startIndex = -1;
String line = "";

//Create an array of valid header lines.
String[] headerLines =
{"STATUS:",
"X-MAILSCANNER:",
"X-MAILSCANNER-INFORMATION:",
"X-MSMAIL-PRIORITY:",
"X-PRIORITY:",
"X-MAILER:",
"DATE:",
"SUBJECT:",
"REPLY-TO:",
"FROM:",
"MESSAGE-ID:",
"RECEIVED:"
};//end array definition

for(int cnt = 0;
cnt < headerLines.length;cnt++){
tempIndex = msgString.lastIndexOf(
headerLines[cnt]);
if(tempIndex > startIndex){
//Save the larger index value
startIndex = tempIndex;
//Save corresponding header line
line = headerLines[cnt];
}//end if
}//end for loop

if(startIndex != -1){
//Use that header line to eliminate
// everything prior from the message
// header.
msgString = msgString.substring(
startIndex);
}//end if(startIndex != -1)
}//end if(isHtml != -1)

//Process the string if it contains HTML
if(isHtml != -1){
//msgString has been determined to contain
// HTML.
//Insert a dummy first character to ensure
// that the first character is not the
// beginning of a tag.
msgString = "X" + msgString;
int leftIndex=0;
int rightIndex=0;
String outputString = "";
while(leftIndex != -1){
leftIndex = msgString.indexOf(
'<',rightIndex);
if((leftIndex != -1) &&
(rightIndex != -1)){
outputString += msgString.substring(
rightIndex+1,leftIndex);
rightIndex = msgString.indexOf(
'>',leftIndex);
//Have to deal with missing > char,
// particularly for truncated messages.
if(rightIndex >
(msgString.length() - 2)){
//Don't try to process the last few
// characters when left and right
// angle brackets don't match.
break;
}//end if(rightIndex > (msgString...
if(rightIndex == -1){
//Create an artificial right angle
// bracket to replace the missing
// one.
rightIndex = leftIndex + 1;
}//end if(rightIndex == -1)
}//end ((leftIndex != -1) && ...
}//end while(leftIndex != -1)
//Get text at the tail end.
if((rightIndex + 1) < msgString.length()){
outputString += msgString.substring(
rightIndex+1);
}//end if
if(outputString.equals("")){
//msgString contained HTML, but it was
// all removed in the cleanup process.
// The output string is empty.
return null;
}else{
//Return the string produced by removing
// HTML material from msgString.
return outputString;
}//end else
}//end if(isHtml != -1)
else{
//Apparently msgString doesn't contain
// HTML.
return null;
}//end else
}//end removeTags
//===========================================//

//Purpose of this method is to replace all
// occurences of "&NBSP;" with " "
private static String repNbsp(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("&NBSP;");
if(index > -1){
stringBuf.replace(index,index+6," ");
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end repNbsp()
//===========================================//

//Removes entities from an HTML body identified
// by the string &...; Converts those entities
// that represent English language characters
// and punctuation (32 - 126) to the
// corresponding character and inserts it into
// the message text.
private static String remEntities(
String msgString){

//Insert a dummy first character
msgString = "X" + msgString;
int leftIndex=0;
int rightIndex=0;
String outputString = "";
while(leftIndex != -1){
leftIndex = msgString.indexOf(
'&',rightIndex);
if((leftIndex != -1) &&
(rightIndex != -1)){

if(leftIndex > rightIndex){
outputString += msgString.substring(
rightIndex+1,leftIndex);
}//end if
rightIndex = msgString.indexOf(
';',leftIndex);

if((leftIndex != -1) &&
(rightIndex != -1)){
String extract = msgString.substring(
leftIndex,rightIndex + 1).
toUpperCase();

//Make sure we didn't extract good text
// by accident. Apparently real entity
// cannot contain more than seven
// characters, as in &#nnnn; Remove
// spaces before making the test.
if(remSpaces(remEquals(
remTabs(extract))).length() > 6){
//Apparently not an entity. Put it
// back in the text.
outputString += extract;
}//end if(rightIndex-leftIndex > 6)
else{
//Remove any spaces prior to further
// processing
extract = remSpaces(remEquals(
remTabs(extract)));
}//end else

//Convert English language character
// entities to characters and insert
// them in the text.
//Don't try to restore HEX
// representations at this time. Maybe
// add that later. Ignore extracted
// sequences longer than six
// characters.
try{
if((extract.charAt(1) == '#') &&
(extract.charAt(2) != 'X') &&
(extract.length() <=6)){
//Get the internal characters of
// the entity.
String strValue = extract.
substring(2,extract.length()-1);
//Try to convert to a numeric char
// type. May throw an exception.
char theChar =
(char)Integer.parseInt(strValue);
//Ignore all but English language
// characters and punctuation.
if((theChar >= 32) &&
(theChar <= 126)){
char[] charArray = {theChar};
String theStr = new String(
charArray).toUpperCase();
outputString += theStr;
}//end ((theChar >= 32) && ...
}//end ((extract.charAt(1) == ..
}catch(NumberFormatException ex){
//Ignore it. It is apparently a
// badly formed entity.
}//end catch
}//end if((leftIndex != -1) && ...


//Have to deal with missing ; char.
if(rightIndex >
(msgString.length() - 2)){
//Don't try to process the last few
// characters when left and right
// angle brackets aren't matching.
break;
}//end if(rightIndex > (msgString...
if(rightIndex == -1){
//Create an artificial right ; char
// to replace the missing one.
rightIndex = leftIndex + 1;
}//end if(rightIndex == -1)
}//end if((leftIndex != -1)...
}//end while(leftIndex != -1)
//Get the text at the tail end
if((rightIndex + 1) < msgString.length()){
outputString += msgString.substring(
rightIndex+1);
}//end if((rightIndex + 1)< msgString...

if(outputString.equals("")){
//The entire string was apparently made up
// of entities. It is empty now.
return null;
}else{
return outputString;
}//end else
}//end remEntities
//===========================================//

//Method removes all '=' characters from an
// incoming string.
private static String remEquals(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("=");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end remEquals()
//===========================================//

//Method removes tab characters from an
// incoming string.
private static String remTabs(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("\t");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end remTabs()
//===========================================//

//Method removes space characters from an
// incoming string.
private static String remSpaces(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf(" ");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
String outputString = new String(stringBuf);
return outputString;
}//end remSpaces()
//===========================================//

//Method converts all multiple spaces to a
// single space. This is not ideal. If there
// are multiple spaces within a word, all but
// one of the spaces will be removed, leaving
// one extraneous space in the word.
private static String remMultipleSpaces(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf(" ");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end remMultipleSpaces()
//===========================================//

}//end class BigDog02m
//=============================================//

Listing 6

File BigDog02SpamScreen01

/*File BigDog02SpamScreen01.java
Copyright 2004, R.G.Baldwin
Rev 02/27/04

This class implements a set of rules for
detecting SPAM messages. An int value is
returned showing the number of hits against
offensive words and phrases that occur up
to a specified hitLimit. The subject and
a clean version of HTML content is screened
against one list of offensive words and
phrases. Raw body text is screened against a
different list of offensive words and phrases.
The process of screening raw body text tends
to be rather slow so care should be taken to
keep that list short.

An object of this class has one entry point and
one exit point, which is the public method
named screenMsg.
************************************************/

import java.util.*;
import java.io.*;

public class BigDog02SpamScreen01{

TreeSet subjAndHtmlPhraseList;
TreeSet rawTextPhraseList;
String subjAndHtmlPhraseFile;
String rawTextPhraseFile;
String phrase;
String dataPath;
int hitLimit;


public BigDog02SpamScreen01(String dataPath,
String subjAndHtmlPhraseFile,
String rawTextPhraseFile,
int hitLimit){

this.dataPath = dataPath;
this.subjAndHtmlPhraseFile =
subjAndHtmlPhraseFile;
this.rawTextPhraseFile = rawTextPhraseFile;
this.hitLimit = hitLimit;
//Read the files containing words and
// phrases and create TreeSet objects
// containing those words and phrases in
// alphabetical order with no duplicates.
makeSubjAndHtmlPhraseList();
makeRawTextPhraseList();
}//end constructor
//===========================================//

//This method is used to identify spam
// messages and to return an int value that
// indicates the number of hits against
// offensive words and phrases up to a limit
// of hitLimit.

public int screenMsg(String pathFileName){
BufferedReader inData = null;
int hitCount = 0;

try{
//Open the file containing a local copy of
// the message.
inData = new BufferedReader(new FileReader(
pathFileName));
String data;

//Get the Subject line by skipping header
// lines prior to the Subject line. Mark
// the beginning of the file to make it
// easy to rewind later. Set the readAhead
// Limit to 150000 characters before the
// mark will be lost. Limit the size of the
// file that the program is willing to
// read.
inData.mark(150000);
int lineLimit = 1000;
int lineCount = 0;
String subject = "No subj found";

while(((data = inData.readLine())!= null)
&& ++lineCount < lineLimit){
if(data.toUpperCase().startsWith(
"SUBJECT:")){
subject = decodeSubj(data);
}//end if(data starts with SUBJECT)
}//end while readLine != null

//Reset back to beginning of file. The
// Subject for this message has now been
// saved.
inData.reset();

//Screen the Subject line against a list of
// offensive words and phrases.
hitCount = screenForOffensiveSubject(
hitCount,subject);
if(hitCount >= hitLimit){
inData.close();
return hitCount;
}//end if(hitCount >= hitLimit)

//Screen HTML (if any) for offensive words
// and phrases.
//Start by reading the entire message into
// a single upper case String object.
// Limit the size of the file that the
// program is willing to read to avoid
// excessive delays in screening very large
// files.
lineCount = 0;
inData.reset();//rewind the input stream
String msgString = "";
while(((data = inData.readLine())!= null)
&& ++lineCount < lineLimit){
msgString += data + "\n";
}//end while data != null

if(lineCount == lineLimit){
System.out.println(pathFileName +
" terminated, excessive length");
}//end if(lineCount == lineLimit)

//Expand base64 data in msg body.
msgString = decodeBody(msgString).
toUpperCase();
msgString = removeNewLine(msgString);

//Screen the HTML portion of the string for
// offensive words and phrases.
hitCount = screenForOffensiveHtml(
msgString,hitCount);
if(hitCount >= hitLimit){
inData.close();
return hitCount;
}//end hitCount >= hitLimit)

//Screen the raw body text for offensive
// words or phrases. This is last in the
// sequence because it probably takes the
// longest amount of time to accomplish.
hitCount = screenForOffensiveRawText(
msgString,hitCount);
if(hitCount >= hitLimit){
inData.close();
return hitCount;
}//end if (hitCount >= hitLimit)

inData.close();
return hitCount;//with hitCount < hitLimit
}catch(Exception e){e.printStackTrace();}
return hitCount;//make compiler happy
}//end screenMsg
//===========================================//

//This method tests a string to see if it
// contains a word or phrase that may have
// extraneous characters inserted into it,
// such as VI*A-GRA.
//If the string contains the sequence of
// characters making up the word or phrase,
// with spanLim or fewer extraneous characters
// between any two of the word's characters,
// the method returns true. For example, if
// spanLim = 1, the spammer can insert one
// character between any two of the characters
// that make up the word and the word will
// still be detected. However, if the
// spammer inserts two or more characters,
// the offending word will not be detected.
//Need to be careful to avoid making spanLim
// too large. Large values of spanLim result
// in false alarms due to the fact that
// widely-separated characters can be
// considered to be part of the word or
// phrase. For example, if spanLim = 2 or
// greater, the word PORN will be found in
// the word imPORtaNt.
private int matchPhrase(String data,
String phrase,
int spanLim){
this.phrase = phrase;
StringBuffer str = new StringBuffer();
ArrayList locationData = new ArrayList();

//Compare each char in the data with each
// unique char in the word or phrase. If
// there is a match, append the char to str
// and save the location of the char in
// the ArrayList referred to by locationData.

//Eliminate duplicate char in the word or
// phrase by storing in a TreeSet. Note that
// this will also sort the char, but that
// doesn't matter.
TreeSet treeSet = new TreeSet();
for(int cnt = 0; cnt < phrase.length();
cnt++){
treeSet.add(
new Character(phrase.charAt(cnt)));
}//end for loop

//Get the unique characters from the set and
// save them in a StringBuffer
Iterator iter = treeSet.iterator();
StringBuffer tempPhrase = new StringBuffer();
while(iter.hasNext()){
tempPhrase.append(
((Character)(iter.next())).charValue());
}//end while

//Use the StringBuffer of unique characters
// to test the string and extract matching
// characters from the string. Discard all
// non-matching characters. This converts
// the original data into a string of
// characters, each of which is a character
// in the word or phrase. All other
// characters have been removed. Thus, if
// the data contains the word or phrase, it
// will occur somewhere in the compressed
// string with no extra characters in
// between. An example might be as follows:
// SMSPMASPAMMPAS
for(int i = 0; i < data.length(); i++){
for(int j = 0; j < tempPhrase.length();
j++){
if(data.charAt(i) ==
tempPhrase.charAt(j)){
str.append(data.charAt(i));
locationData.add(new Integer(i));
}//end if
}//end for on tempPhrase
}//end for on data

//Test to see if the extracted char sequence
// contains the word or phrase.
int match = str.indexOf(phrase);
if(match == -1){
return -1;//no match
}//end if

//There is a match. Confirm that the span
// between target characters in data is not
// greater than allowed by the incoming
// spanLim parameter.
int maxSpan = 0;
int locA = ((Integer)locationData.
get(match)).intValue();
int locB = 0;
int startIndex = locA;
for(int cnt = 1; cnt < phrase.length();
cnt++){
locB = ((Integer)locationData.get(
match + cnt)).intValue();
int span = locB - locA;
if(span > maxSpan){
maxSpan = span;
}//end if
locA = locB;
}//end for loop

if(maxSpan > spanLim+1){
return -1;//span too large
}else{
return startIndex;//made a match
}//end else

}//end matchPhrase
//===========================================//

//Purpose: To create a TreeSet object
// containing words used to screen the message
// subject lines and HTML text blocks.
//This method reads strings from a text file
// and creates the list as a TreeSet object
// with no duplicates.
//See additional comments in the later section
// regarding the makeBodyList method.

private void makeSubjAndHtmlPhraseList(){
subjAndHtmlPhraseList = new TreeSet();

//Read word list from text file and populate
// the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
subjAndHtmlPhraseFile));
String data; //temp holding area

while((data = inData.readLine()) != null){
subjAndHtmlPhraseList.add(
data.toUpperCase());

}//end while loop
inData.close();//Close file


//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
subjAndHtmlPhraseFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = subjAndHtmlPhraseList.
iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();

}catch(Exception e){e.printStackTrace();}
}//end makeSubjAndHtmlPhraseList
//===========================================//

//Purpose: To create a TreeSet object
// containing words and phrases used to screen
// the raw BODY text. See notes above
// regarding the list used to screen the
// Subject line of each message in the method
// named makeSubjAndHtmlPhraseList.

//It is important to maintain these two lists
// as separate lists. Because of the much
// larger number of characters in the body than
// in the Subject, false alarms are much more
// likely in the body. Therefore, individual
// words that work well when screening the
// Subject line may produce false alarms when
// screening the body. For example, the word
// PORN appears in the word IMPORTANT. It is
// much more likely that the word IMPORTANT
// will appear somewhere in the body than in
// the Subject line (although it may appear in
// the Subject line as well, thus producing a
// false alarm in both cases). Also, the word
// ANTIVIRUS works well in the Subject, but
// cannot be used to screen the body because
// many servers insert that word into the
// message header after they test the message
// for viruses. Also, IP addresses and URLs
// work well in the body, but rarely appear in
// the Subject. Therefore, testing the Subject
// against a long list of URLs simply wastes
// time.

//The following words (among others) should not
// be added to the list for the reasons given:

//PORN may be confused with IMPORTANT
//SPAM causes lots of false alarms. I inserted
// a space as in "SPAM " to decrease false
// alarms. Will probably also decrease valid
// hits.
//ANTIVIRUS appears in some valid message hdrs
//WEIGHT often appears in messages regarding
// html fonts
//SLUT may be confused with SOLUTION
//==End of prohibited list==


private void makeRawTextPhraseList(){
rawTextPhraseList = new TreeSet();

//Read word list from text file and populate
// the TreeSet object.
try{
BufferedReader inData
= new BufferedReader(new FileReader(
rawTextPhraseFile));
String data; //temp holding area

while((data = inData.readLine()) != null){
rawTextPhraseList.add(data.
toUpperCase());

}//end while loop
inData.close();//Close file


//Now write the output file
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
rawTextPhraseFile));

//Use an Iterator object to access the data
// in the TreeSet object.
Iterator iter = rawTextPhraseList.
iterator();

while(iter.hasNext()){
data = (String)iter.next();
dataOut.writeBytes(data + "\n");
}//end while

dataOut.close();

}catch(Exception e){e.printStackTrace();}
}//end makeRawTextPhraseList
//===========================================//

//This method screens the Subject line against
// an upper-case version of a list of offensive
// words and phrases, returning the number of
// hits up to a limit of hitLimit. An exact
// match is not required. Rather, the
// characters in the offensive phrase in the
// Subject may be separated by as many as one
// extraneous character.
private int screenForOffensiveSubject(
int hitCount,String subject){
int matchLocation = -1;
Iterator iterator =
subjAndHtmlPhraseList.iterator();
while(iterator.hasNext()){
String offensivePhrase =
((String)(iterator.next())).
toUpperCase();
if(!(offensivePhrase.equals(""))){
//First try for an exact match because it
// is fastest and less prone to false
// positives. Award two hits for a
// successful exact match.
matchLocation = subject.toUpperCase().
indexOf(offensivePhrase);

if(matchLocation != -1){
//An exact match was found. Award one
// hit for the exact match and another
// hit later for a match of either
// type.
hitCount++;
}else{
//There was no exact match.
//Search for a match between the words
// and phrases in the
// subjAndHtmlPhraseList and the
// Subject line allowing for one
// extraneous character between the
// characters in the Subject line.
matchLocation = matchPhrase(
subject.toUpperCase(),
offensivePhrase,1);
}//end else onmatchLocation != -1)
}//end if!(offensivePhrase.equals("")

if(matchLocation != -1){
//A match was found.
hitCount++;
if(hitCount >= hitLimit){
return hitCount;
}//end if
}//end if matchLocation != -1
}//end while iterator has next
return hitCount;//with hitCount < hitLimit

}//end screenForOffensiveSubject
//===========================================//

//This method extracts an HTML code block, if
// it exists from an incoming string. Then it
// converts that block into clean text free of
// all manifestations of HTML.
//Then it screens the clean text against a list
// of offensive words and phrases looking for
// exact matches. It returns when the value of
// hitCount is equal to hitLimit or when the
// end of the clean text is reached, whichever
// occurs first.
private int screenForOffensiveHtml(
String msgString,int hitCount){

String cleanString = getCleanHtmlString(
msgString);

if(cleanString != null){
//Screen the cleanString for offensive
// words and phrases. Require an exact
// match.
int indexOfOffensivePhrase = 0;
Iterator iterator =
subjAndHtmlPhraseList.iterator();
while(iterator.hasNext()){
String offensivePhrase =
((String)(iterator.next())).
toUpperCase();
if(!(offensivePhrase.equals(""))){
indexOfOffensivePhrase = cleanString.
indexOf(offensivePhrase);
if(indexOfOffensivePhrase != -1){
//An exact match was found. Award
// two hits for an exact match
// because it is less prone to false
// positives than a match with
// intervening extraneous characters.
hitCount++;
hitCount++;

if(hitCount >= hitLimit){
return hitCount;
}//end
}//end if(indexOfOffensivePhrase != -1)
}//end if!(offensivePhrase.equals("")
}//end while iterator has next
}//end if(cleanString != null)
else{//cleanString == null
return hitCount;//with hitCount < hitLimit
}//end cleanString == null
return hitCount;//required by compiler
}//end screenForOffensiveHtml();
//===========================================//

//Removes newline characters from an incoming
// String object.
String removeNewLine(String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("\n");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
return new String(stringBuf);
}//end removeNewLine()
//===========================================//

//Method screens a String containing the entire
// raw text for a message against offensive
// words and phrases. Hopefully a match will
// have been found in one of the faster
// processes performed before this one. Allows
// one extraneous chracter between each of the
// characters in the offending word or phrase.
private int screenForOffensiveRawText(
String msgString,int hitCount){

int indexOfOffensivePhrase = 0;
Iterator iterator =
rawTextPhraseList.iterator();
while(iterator.hasNext()){
String offensivePhrase =
((String)(iterator.next())).
toUpperCase();
if(!(offensivePhrase.equals(""))){
//First try for an exact match because it
// is faster and less prone to false
// positives. Award two hits for an
// exact match.
indexOfOffensivePhrase =
msgString.toUpperCase().indexOf(
offensivePhrase);
if(indexOfOffensivePhrase != -1){
//An exact match was found. Award one
// hit for the exact match and another
// hit later for a match of either
// type.
hitCount++;
}else{
//An exact match was not found. Try
// for a match with intervening
// extraneous characters, which is more
// prone to false positives.
indexOfOffensivePhrase = matchPhrase(
msgString,offensivePhrase,1);
}//end else on indexOfOffensivePhrase !=.

if(indexOfOffensivePhrase != -1){
//A match was found of one type or the
// other.
hitCount++;

if(hitCount >= hitLimit){
return hitCount;
}//end
}//end if(hitCount >= hitLimit)
}//end if!(offensivePhrase.equals("")
}//end while iterator has next
return hitCount;//with hitCount < hitLimit
}//end screenForOffensiveRawText
//===========================================//

//This method gets and returns a string
// extracted from HTML text. Various features
// are used to make the string as useful as
// practical consistent with speedy operation.
private String getCleanHtmlString(
String msgString){
String cleanString = removeTags(msgString);

if(cleanString != null){
cleanString = repNbsp(cleanString);
if(cleanString != null){
cleanString = remEntities(cleanString);
if(cleanString != null){
cleanString = remEquals(cleanString);
if(cleanString != null){
cleanString = remTabs(cleanString);
if(cleanString != null){
cleanString = remMultipleSpaces(
cleanString);
}//end if(cleanString != null){
}//end if(cleanString != null){
}//endif(cleanString != null){
}//endif(cleanString != null){
}//end if(cleanString != null)
//The following doesn't make sense
if(cleanString != null){
return cleanString;
}else{
return cleanString;
}//end else

}//end method getCleanHtmlString

//===========================================//

//This method determines if a message
// contains HTML and removes all tags. If
// there is no HTML in the message text, it
// returns null.
private String removeTags(String msgString){
int isHtml = -1;
int startIndex = -1;
int endIndex = -1;

//Search for clues that the message
// contains HTML.
isHtml = msgString.indexOf("<HTML");
if(isHtml == -1) isHtml =
msgString.indexOf("<BODY");
if(isHtml == -1) isHtml =
msgString.indexOf("<FONT");
if(isHtml == -1) isHtml =
msgString.indexOf("<DIV");
if(isHtml == -1) isHtml =
msgString.indexOf("<STRONG");
if(isHtml == -1) isHtml =
msgString.indexOf("<BR");
if(isHtml == -1) isHtml =
msgString.indexOf("<TABLE");
if(isHtml == -1) isHtml =
msgString.indexOf("<SPAN");
if(isHtml == -1) isHtml =
msgString.indexOf("<UL");
if(isHtml == -1) isHtml =
msgString.indexOf("<OL");
if(isHtml == -1) isHtml =
msgString.indexOf("<P>");

if(isHtml != -1){
//Msg contains HTML but not in very good
// form since it is missing the matching
// HTML tags.

//Eliminate as much of the header as
// possible by finding the location of
// the last identifiable item in the
// message header and discarding
// everything prior to that point.

int tempIndex = -1;
startIndex = -1;
String line = "";

//Create an array of valid header lines.
String[] headerLines =
{"STATUS:",
"X-MAILSCANNER:",
"X-MAILSCANNER-INFORMATION:",
"X-MSMAIL-PRIORITY:",
"X-PRIORITY:",
"X-MAILER:",
"DATE:",
"SUBJECT:",
"REPLY-TO:",
"FROM:",
"MESSAGE-ID:",
"RECEIVED:"
};//end array definition

for(int cnt = 0;
cnt < headerLines.length;cnt++){
tempIndex = msgString.lastIndexOf(
headerLines[cnt]);
if(tempIndex > startIndex){
//Save the larger index value
startIndex = tempIndex;
//Save corresponding header line
line = headerLines[cnt];
}//end if
}//end for loop

if(startIndex != -1){
//Use that header line to eliminate
// everything prior from the message
// header.
msgString = msgString.substring(
startIndex);
}//end if(startIndex != -1)
}//end if(isHtml != -1)

//Process the string if it contains HTML
if(isHtml != -1){
//msgString has been determined to contain
// HTML.
//Insert a dummy first character to ensure
// that the first character is not the
// beginning of a tag.
msgString = "X" + msgString;
int leftIndex=0;
int rightIndex=0;
String outputString = "";
while(leftIndex != -1){
leftIndex = msgString.indexOf(
'<',rightIndex);
if((leftIndex != -1) &&
(rightIndex != -1)){
outputString += msgString.substring(
rightIndex+1,leftIndex);
rightIndex = msgString.indexOf(
'>',leftIndex);
//Have to deal with missing > char,
// particularly for truncated messages.
if(rightIndex >
(msgString.length() - 2)){
//Don't try to process the last few
// characters when left and right
// angle brackets don't match.
break;
}//end if(rightIndex > (msgString...
if(rightIndex == -1){
//Create an artificial right angle
// bracket to replace the missing
// one.
rightIndex = leftIndex + 1;
}//end if(rightIndex == -1)
}//end ((leftIndex != -1) && ...
}//end while(leftIndex != -1)
//Get text at the tail end.
if((rightIndex + 1) < msgString.length()){
outputString += msgString.substring(
rightIndex+1);
}//end if
if(outputString.equals("")){
//msgString contained HTML, but it was
// all removed in the cleanup process.
// The output string is empty.
return null;
}else{
//Return the string produced by removing
// HTML material from msgString.
return outputString;
}//end else
}//end if(isHtml != -1)
else{
//Apparently msgString doesn't contain
// HTML.
return null;
}//end else
}//end removeTags
//===========================================//

//Purpose of this method is to replace all
// occurences of "&NBSP;" with " "
private static String repNbsp(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("&NBSP;");
if(index > -1){
stringBuf.replace(index,index+6," ");
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end repNbsp()
//===========================================//

//Removes entities from an HTML body identified
// by the string &...; Converts those entities
// that represent English language characters
// and punctuation (32 - 126) to the
// corresponding character and inserts it into
// the message text.
private static String remEntities(
String msgString){

//Insert a dummy first character
msgString = "X" + msgString;
int leftIndex=0;
int rightIndex=0;
String outputString = "";
while(leftIndex != -1){
leftIndex = msgString.indexOf(
'&',rightIndex);
if((leftIndex != -1) &&
(rightIndex != -1)){

if(leftIndex > rightIndex){
outputString += msgString.substring(
rightIndex+1,leftIndex);
}//end if
rightIndex = msgString.indexOf(
';',leftIndex);

if((leftIndex != -1) &&
(rightIndex != -1)){
String extract = msgString.substring(
leftIndex,rightIndex + 1).
toUpperCase();

//Make sure we didn't extract good text
// by accident. Apparently real entity
// cannot contain more than seven
// characters, as in &#nnnn; Remove
// spaces before making the test.
if(remSpaces(remEquals(
remTabs(extract))).length() > 6){
//Apparently not an entity. Put it
// back in the text.
outputString += extract;
}//end if(rightIndex-leftIndex > 6)
else{
//Remove any spaces prior to further
// processing
extract = remSpaces(remEquals(
remTabs(extract)));
}//end else

//Convert English language character
// entities to characters and insert
// them in the text.
//Don't try to restore HEX
// representations at this time. Maybe
// add that later. Ignore extracted
// sequences longer than six
// characters.
try{
if((extract.charAt(1) == '#') &&
(extract.charAt(2) != 'X') &&
(extract.length() <=6)){
//Get the internal characters of
// the entity.
String strValue = extract.
substring(2,extract.length()-1);
//Try to convert to a numeric char
// type. May throw an exception.
char theChar =
(char)Integer.parseInt(strValue);
//Ignore all but English language
// characters and punctuation.
if((theChar >= 32) &&
(theChar <= 126)){
char[] charArray = {theChar};
String theStr = new String(
charArray).toUpperCase();
outputString += theStr;
}//end ((theChar >= 32) && ...
}//end ((extract.charAt(1) == ..
}catch(NumberFormatException ex){
//Ignore it. It is apparently a
// badly formed entity.
}//end catch
}//end if((leftIndex != -1) && ...


//Have to deal with missing ; char.
if(rightIndex >
(msgString.length() - 2)){
//Don't try to process the last few
// characters when left and right
// angle brackets aren't matching.
break;
}//end if(rightIndex > (msgString...
if(rightIndex == -1){
//Create an artificial right ; char
// to replace the missing one.
rightIndex = leftIndex + 1;
}//end if(rightIndex == -1)
}//end if((leftIndex != -1)...
}//end while(leftIndex != -1)
//Get the text at the tail end
if((rightIndex + 1) < msgString.length()){
outputString += msgString.substring(
rightIndex+1);
}//end if((rightIndex + 1)< msgString...

if(outputString.equals("")){
//The entire string was apparently made up
// of entities. It is empty now.
return null;
}else{
return outputString;
}//end else
}//end remEntities
//===========================================//

//Method removes all '=' characters from an
// incoming string.
private static String remEquals(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("=");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end remEquals()
//===========================================//

//Method removes tab characters from an
// incoming string.
private static String remTabs(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("\t");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end remTabs()
//===========================================//

//Method removes space characters from an
// incoming string.
private static String remSpaces(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf(" ");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
String outputString = new String(stringBuf);
return outputString;
}//end remSpaces()
//===========================================//

//Method converts all multiple spaces to a
// single space. This is not ideal. If there
// are multiple spaces within a word, all but
// one of the spaces will be removed, leaving
// one extraneous space in the word.
private static String remMultipleSpaces(
String msgString){
StringBuffer stringBuf =
new StringBuffer(msgString);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf(" ");
if(index > -1){
stringBuf.delete(index,index+1);
}//end if
}//end while
String outputString = new String(stringBuf);
if(outputString.equals("")){
return null;
}else{
return outputString;
}//end else
}//end remMultipleSpaces()
//===========================================//

//This method is called to decode a Subject
// line.

//Sometimes the Subject line is encoded using
// techniques designed to allow the use of
// non-ASCII characters in message headers
// (See RFC2047).
//The following code determines if the Subject
// line has been encoded using the ISO-8859-1
// character set with an encoding value of B or
// Q. If so, the encoded material is decoded.
//Messages with an encoding value of Q contain
// a mixture of ASCII characters and encoded
// characters, so it is possible to partially
// read them without the need for decoding.
// They also sometimes use an underscore in
// place of a space to make them more readable.
private String decodeSubj(String data){
try{
if(data.toUpperCase().indexOf(
"=?ISO-8859-1?B?") != -1){
//Need to decode for value of B.
int startIndex = data.toUpperCase().
indexOf("=?ISO-8859-1?B?") + 15;
int endIndex = data.length()-2;
sun.misc.BASE64Decoder dec =
new sun.misc.BASE64Decoder();
data = "Subject: " + "=?ISO-8859-1?B? "
+ new String(dec.decodeBuffer(
data.substring(startIndex,endIndex)));
}//end if..."=?ISO-8859-1?B?"

if(data.toUpperCase().indexOf(
"=?ISO-8859-1?Q?") != -1){
//Need to decode for value of Q.
int startIndex = data.toUpperCase().
indexOf("=?ISO-8859-1?Q?") + 15;
int endIndex = data.length()-2;
String decodedData = data.substring(
startIndex,endIndex);

//Decode non-ASCII characters
StringBuffer stringBuf =
new StringBuffer(decodedData);
int index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("=");
if(index > -1){
String hexString =
new String(stringBuf).substring(
index+1,index+3);
char decodedChar =
(char)Integer.parseInt(
hexString.trim(),16);
stringBuf.delete(index,index+3);
stringBuf.insert(index,decodedChar);
}//end if
}//end while(index > -1)

//Replace underscore with space.
index = 0;
while(index > -1){
index = stringBuf.lastIndexOf("_");
if(index > -1){
stringBuf.deleteCharAt(index);
stringBuf.insert(index,' ');
}//end if
}//end while(index > -1)

data = "Subject: " +"=?ISO-8859-1?Q? "
+ new String(stringBuf);
}//end if..."=?ISO-8859-1?Q?"
}catch(Exception ex){ex.printStackTrace();}
return data;
}//end decodeSubj
//===========================================//

//Expand base64 data in msg body.
private String decodeBody(String data){
String decodedData = "";
int currentPartIndex;
int nextPartIndex;
try{
if(data.toUpperCase().indexOf(
"Content-Transfer-Encoding: base64".
toUpperCase()) != -1){
//This message has base64 encoding
if((data.toUpperCase().indexOf(
"Content-Type: text/html".
toUpperCase()) != -1)
&& (data.toUpperCase().indexOf(
"Content-Type: multipart".
toUpperCase()) == -1)){
//This is a non-multipart message with
// base64 encoding.
//Locate the end of the header.
int base64Index = data.indexOf(
"Status:");
if(base64Index != -1){
int crIndex = data.indexOf(
"\n",base64Index);
String tempStr = data.substring(
crIndex+2,data.length());
sun.misc.BASE64Decoder dec =
new sun.misc.BASE64Decoder();
decodedData = "Start base64 "
+ new String(
dec.decodeBuffer(tempStr))
+ " End base64";
}//end if(base64Index != -1)
}//end if((data.toUpperCase().indexOf(...
else{
int boundaryIndex = data.indexOf(
"boundary=");
int newLineIndex = data.indexOf(
"\n",boundaryIndex);

if(boundaryIndex != -1){
String multipartCode =
data.substring(
boundaryIndex+10,newLineIndex-1);
nextPartIndex = data.indexOf(
multipartCode,newLineIndex+1);
while(nextPartIndex != -1){
int base64Index = data.indexOf(
"Content-Transfer-Encoding: "
+ "base64",nextPartIndex);
currentPartIndex = nextPartIndex;
nextPartIndex = data.indexOf(
multipartCode,nextPartIndex+1);
if((base64Index != -1) &&
(base64Index < nextPartIndex)){

//Don't process .gif or .jpg file
// attachments
String partBody = data.substring(
currentPartIndex,
nextPartIndex).toUpperCase();
if((partBody.indexOf(".GIF")
== -1) && (partBody.indexOf(
".JPG") == -1)){
//gif image not found. Process
// the data
int crIndex = data.indexOf(
"\n",base64Index);

//Search for the required blank
// line preceeding the block
// of base64 data
//Prevent infinite loop on bad
// data
int count = 0;
char firstChar = data.charAt(
crIndex+1);
while((firstChar != '\n')
&& (count < 100)){
crIndex = data.indexOf(
"\n",crIndex+1);
firstChar = data.charAt(
crIndex+1);
count++;
}//end while

String tempStr =
data.substring(
crIndex+2,nextPartIndex);
sun.misc.BASE64Decoder dec =
new sun.misc.BASE64Decoder();
decodedData += new String(
dec.decodeBuffer(tempStr));
decodedData += "\n\n-----End "
+ "base64 part-----\n\n";
}//end if(partBody.toUpperCa...
else{
decodedData += "-----Image "
+ "stripped off-----";
}//end else
}//end if(base64Index != -1)
else{
if(nextPartIndex != -1){
decodedData += data.substring(
currentPartIndex,
nextPartIndex);
decodedData += "\n\n-----End "
+ "non-base64 part-----\n\n";
}//end if(nextPartIndex != -1)
}//end else
}//end while loop on nextPartIndex...
}//end if(boundaryIndex != -1)
}//end else
return decodedData;
}//end if(data.toUpperCase().indexOf("Co...
else{
//This msg does not have base64 encoding
return data;
}//end else
}catch(Exception ex){ex.printStackTrace();}
return "Make Compiler Happy";
}//end decodeBody
//===========================================//

}//end class BigDog02SpamScreen01

Listing 7

File BigDog02RawText.txt

DELIVERY FAILURE
DELIVERY NOTIFICATION:
DELIVERY STATUS NOTIFICATION
EMAIL QUARANTINED DUE TO VIRUS
FAILURE NOTICE
INBOUND ATTACHMENT REMOVED - ROUTE66
MAIL ADMINISTRATOR
MAIL DELIVERY FAILED: RETURNING MESSAGE TO SENDER
MAIL DELIVERY SUBSYSTEM
MAIL DELIVERY SYSTEM
MAIL SYSTEM ERROR - RETURNED MAIL
MAIL TRANSACTION FAILED
MAILER-DAEMON
NAV DETECTED A VIRUS IN A DOCUMENT YOU AUTHORED
RETURNED MAIL
RETURNED MAIL: SEE TRANSCRIPT FOR DETAILS
RETURNED MAIL: USER UNKNOWN
SYSTEM ADMINISTRATOR
TO SENDER VIRUS FOUND AND ACTION TAKEN.
UNABLE TO DELIVER YOUR MESSAGE
UNDELIVERABLE MAIL
UNDELIVERABLE:
USUARIO INEXISTENTE / USER DOES NOT EXIST
VIRUS FOUND IN A MESSAGE YOU SENT
VIRUS FOUND IN SENT MESSAGE
WARNING: COULD NOT SEND MESSAGE
WARNING: E-MAIL VIRUSES DETECTED

Listing 8 BigDog02BadList.txt



Copyright 2004, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com

-end-
 








Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel