Java Programming Notes # 2187
Protection against spam and viruses
In recent months, I have been showing you how to write different kinds
of Java programs
to protect your email inbox from viruses and spam.
For example, the series of lessons that began with the lesson entitled Enlisting
Java in the War Against SPAM, Part 1, The Communications Module and
ended
with the lesson entitled
Enlisting Java
in the War Against SPAM: Training the Body Screener showed you how to
write programs that
apply spam screening algorithms to your email.
The two lessons entitled Enlisting
Java in the War Against Email Viruses and
Enlisting Java
in the War Against Email Viruses, Part 2, A Much Faster Program
described two
versions of a
program designed to protect your email database from email-borne
viruses.
Pulling it all together
The lesson entitled
Overview of
the BigDog Email Protection Program provided an overview of a
set of programs named BigDog. These programs combine
protection against email-borne viruses and spam.
There are several separate programs in the BigDog set
of programs. This lesson provides source code for each of the
separate programs along with a brief description of each program.
This lesson also explains how to set up your computer to use these
programs.
Using the BigDog programs
Future lessons will explain the technical aspects of the BigDog programs
in detail. If you already know enough about Java to understand
the behavior of the programs based on the source code and comments
alone, feel free
to copy and use the programs for non-commercial purposes.
If you
don’t understand the source code, it may be a good idea for you to wait
for the explanations before compiling and running the
programs.
My experience with BigDog
Because I have published several hundred online programming tutorials
during the past seven years,
my
email address is widely exposed on the Web. During good times, I
typically receive between 250 and 300 email messages each day.
(During bad times involving rampant virus infestation, I
typically
receive several thousand email messages each day.)
Of the 300 or so messages that I receive each day,
only about ten to fifteen messages are messages that I need to
read.
Approximately five to ten of the messages contain viruses. The
remainder of the 300 messages are usually spam.
Finally, I feel protected
I have been using increasingly sophisticated versions of the BigDog
programs for several months. For the first time in
several years, I feel that I finally have email-borne viruses and spam
under control.
Protection against viruses
The virus protection features built into the program make it possible
for me to isolate and delete messages containing viruses before they
become co-mingled with the other messages in my email inbox.
(I explained the dangers inherent in such co-mingling
in the earlier lesson entitled Enlisting
Java in the War Against Email Viruses.)
Protection against spam
The BigDog programs combine spam screening with an aggressive challenge/response
message verification procedure.
(I discussed the challenge/response message verification
procedure in the earlier lesson entitled
Overview
of the BigDog Email Protection Program.)
As a result, most of my good messages are clearly identified as
such. (Very few good messages are falsely identified as spam.)
In addition, I am normally able
to completely ignore all but about fifteen or twenty of the several
hundred spam
messages that I receive each day.
(I need to examine those fifteen or twenty messages to
identify the occasional message sent by a computer that I do need to
read, but for which the sending computer won’t normally respond to the
challenge.)
Supplementary material
I recommend that you also study the other lessons in my extensive
collection of online Java tutorials. You will find those lessons
published at Gamelan.com.
However, as of the date of this writing, Gamelan doesn’t maintain a
consolidated index of my Java tutorial lessons, and sometimes
they are difficult to locate there. You will find a consolidated
index at www.DickBaldwin.com.
Operational Discussion
Checking my email
Several times each day, I check my email by doing the following:
- Run the program named BigDog02g to download and save each
email message as a separate file in a disk folder named DataFiles.
- Run my Norton
AntiVirus program against the files in the folder named DataFiles,
deleting any files that contain a virus.
- Run the program named BigDog02j to:
- Forward the remaining messages in the DataFiles folder
to my email client program.
- Send a challenge message to any messages received from
strangers and place those messages in quarantine.
- Retrieve any messages previously quarantined for which the
sender of
the original message has provided a proper response to the earlier
challenge.
- Delete the messages from my public email server.
- Move the messages from the folder named DataFiles to
another folder named Archives.
- Run my email client program to read the messages now residing in
my local email data structure.
That’s all there is to it. The procedure is straightforward, runs
relatively fast, and provides the benefits described earlier.
The working disk directory structure
I’m going to begin by showing you how I have the various files and
folders organized on my disk. Once you understand the Java code
involved, you can modify the code to support a different directory
structure. However, I recommend that you initially use the same
setup that I use.
The contents of my working directory are shown in Figure 1.
BigDog02b.java BigDog02g.java BigDog02i.java BigDog02j.java BigDog02k.java BigDog02m.java BigDog02SpamScreen01.java BigDog02BadList.txt BigDog02GoodList.txt BigDog02RawText.txt BigDog02SubjAndHtml.txt Archives DataFiles temp
Figure 1
|
(In addition to the files shown in Figure 1, once the
Java source files are compiled,
the directory will also contain a variety of compiled Java files with
an extension of
.class. In addition, once you start running the programs, sever
backup files will automatically appear in the directory.)
Java source code files
The items shown in red in Figure 1 are the required Java source code
files. Complete listings of each of these files are provided in
Listing 1
through Listing 7 near the end of this lesson. I will provide a
brief description of each of these files later in this lesson, and will
provide a detailed discussion in future lessons.
Required text files
The items shown in green in Figure 1 are required text files. I
will provide a brief description of each of these files later in this
lesson. A sample listing of one of the text files is shown in
Listing 8 near the end of the lesson.
Required folders
The items shown in black boldface in Figure 1 are folders. I will
provide a brief description of each of these folders later in this
lesson.
The BigDog02b program
This Java source code file is the repository for several static utility
methods used by other programs in the BigDog02 set of programs.
A complete listing of this file is shown in Listing 1 near the end of
the lesson.
I will provide a detailed explanation of the behavior of each of the
methods contained in this file in future lessons.
The BigDog02g program
This program downloads all messages from a public email server and
writes them as separate files in the local folder named DataFiles
(see
Figure 1).
A complete listing of this file is shown in Listing 2 near the end of
the lesson.
This program makes the individual messages available in separate files
for virus scanning before they are co-mingled in your email inbox.
(I explained the dangers inherent in such co-mingling
in the earlier lesson entitled Enlisting
Java in the War Against Email Viruses)
After running this program and before running either BigDog02i
or BigDog02j, you should use your favorite virus scanner
program to identify and delete any message files in the DataFiles
folder containing viruses. That will prevent them from being
forwarded to your email account and potentially corrupting your email
inbox.
I will explain this program in detail in a future lesson.
The BigDog02i and BigDog02j programs
Complete listings of these two files are provided in Listing 3 and
Listing 4 near the end of the lesson.
Whereas all users will use the program named BigDog02g, any
individual user will use only one of the two alternative programs named
BigDog02i and BigDog02j.
A
user whose email client program supports MBOX files can use either
program, but will probably use BigDog02j, because it runs
faster. Other users will use the program
named BigDog02i.
Both programs have the same purpose, but they accomplish that purpose
in
different ways.
Categorizing messages
Both BigDog02i and BigDog02j apply various criteria,
including spam screening,
to categorizes each of the virus-free messages into
one of four categories:
- {GD} Good
- {BD} Bad
- {SP} Spam
- {QU} Quarantine
The text shown in matching curly braces in the above list is prefixed
(tagged) onto the subject line of each message. In
addition, all messages in the {QU} category receive a spam score
indicating the number of offensive words or phrases that were found in
the message.
The
message
is then forwarded to the user’s email client program. The tag
and the spam score can be used in conjunction with email filtering in
the email client
program to direct the messages into different email folders.
Delete messages from the server
When this program finishes running, the user is given the option of
deleting the messages that have been processed from the public email
server.
(Note that the code to delete messages from the server
has been disabled in Listing 3 and Listing 4. You should not
enable that code until you have fully
tested the behavior of the programs on your system and you are
satisfied that you are ready to delete messages from the server.)
Election of
this option also causes the individual message files to be moved from
the folder named DataFiles to the folder named Archives.
I will explain both of these programs in future lessons.
The BigDog02k program
A complete listing of this program is shown in Listing 5 near the end
of the lesson.
As explained in the earlier lesson entitled
Overview of
the BigDog Email Protection Program,
the spam screening algorithm must initially be trained to recognize
spam. It is also useful to provide additional
training later to teach the algorithm to recognize new forms of spam.
The algorithm training procedure
The program named BigDog02k is used to accomplish
the first step in training the spam screening algorithm.
The procedure for training the algorithm is as follows:
- Manually copy a batch of message files from the Archives folder
into the folder named temp.
- Run the program named BigDog02k to delete those files
that won’t
make a positive contribution to the process of training the algorithm.
- Run the program named BigDog02m to expand the vocabulary
of
offensive words and phrases used by the spam screening algorithm to
identify spam.
Delete files that are not needed
Basically, the program named BigDog02k examines all the files
in the folder named temp and deletes those files that would be
categorized by BigDog02i or BigDog02j as:
- {GD} Good
- {BD} Bad
- {SP} Spam
In addition, all files that would be categorized as {QU} with a spam
score greater than zero are deleted.
(I discussed the concept of a spam score in the lesson
entitled
Overview of the BigDog Email Protection Program, and will have more to say about it
in future lessons that explain the programs named BigDog02i and
BigDog02j.)
These files are deleted because they have very little to contribute to
the training of the spam screening algorithm.
I will provide a detailed explanation of the program named BigDog02k
in a future lesson.
The BigDog02m program
As indicated in the algorithm training procedure described above, this
is the program that is actually used to train the algorithm using the
message files remaining in the temp folder after running the
program named BigDog02k.
A complete listing of this program is provided in Listing 6 near the
end of this lesson.
The user interface and the behavior of this program is similar to
the program described in the earlier lesson entitled Enlisting
Java in the War Against SPAM: Training the Subject Line Screener.
The main difference is that this program uses a much more sophisticated
spam screening algorithm than was the case with the program described
in that lesson.
I will provide a complete technical description of this program in a
future lesson.
The BigDog02SpamScreen01 program
A complete listing of this file is provided in Listing 7 near the end
of the lesson.
An object of the class defined in this file provides the spam screening
algorithm for the BigDog02 set of programs.
This class implements a set of rules for detecting spam messages. An int
value (spam score) is returned for each message showing the
number of hits
against offensive
words and phrases that occur for each message.
Screening against lists of offensive words and
phrases
The subject of the message and a clean version of the HTML content of
the message is screened against a list of offensive words and phrases
contained in the file named BigDog02SubjAndHtml.txt.
Raw body
text (non-HTML text) is screened against a different list of
offensive words and phrases
contained in a file named BigDog02RawText.txt.
Why use separate lists?
The primary reason for keeping the two lists separate has mainly to do
with speed. The process of screening raw body text tends to be
slower than the process of screening the subject line and the clean
HTML content. Therefore, care should be taken to keep the list of
offensive words and phrases used to screen raw body text short and to
the point.
On the other hand, the process of screening the subject line and clean
HTML runs much faster. Therefore, the list of offensive words and
phrases used for that purpose can be much larger without slowing the
program down.
I will explain the many aspects of the class name BigDog02SpamScreen01
in a future lesson.
The file named BigDog02BadList.txt
You will need to create and populate a plain text file having this
name and put it in the working directory as shown in Figure 1.
Listing 8 near the end of the lesson provides a starter
list of text items that you can use initially to populate this file.
I described the contents of this file in some detail in the earlier
lesson entitled
Overview of
the BigDog Email Protection Program.
Briefly, the BAD list is a plain text file containing words and
phrases that identify bad messages. When one of these
words or phrases occurs in either the
subject or the sender’s email address, this causes the message to be
tagged {BD} and forwarded to my email account with no further
processing. Simple message filtering within my email client
program causes these messages to be stored temporarily in a Bad folder,
just in case I need to refer to one of them later.
(I periodically delete the files in the Bad folder to
save
disk space. If you wanted to do so, you could eliminate the
forwarding of messages in this category without much risk of losing
good messages.)
Messages that I don’t want to read
This list contains the email
addresses and subjects commonly used in messages that I don’t want to
read. These are messages that are easy to identify
and reject without the requirement for a fancy spam screening
algorithm. Therefore, this process is completed before spam
screening begins.
Auto responder messages
My email address is well known across the web and throughout the
world. During each flurry of virus activity on the web, I receive
thousands of messages automatically sent by computers claiming that I
sent them a message containing a virus.
(These are cases where someone else sent a message
containing a virus and faked my email address as the sender.)
Since I am very conscientious about using my anti virus software to
keep my
computer clean and free of viruses, I’m confident that I didn’t send
the
message containing a virus. Therefore, I’m not interested in
reading these
notification messages.
I have identified key phrases contained in
many such notification messages and have included those phrases in
Listing 8.
This prevents these messages from cluttering my email inbox.
Undeliverable messages
In addition, the use of the challenge/response message
verification procedure causes a large number of auto responder messages
to be received
indicating that the email addresses used by spammers are not
valid. Listing 8 also contains a sampling of subjects produced
by those auto responders.
Basically, the messages in the {BD} category (identified by the
contents of the file named BigDog02BadList.txt) are messages that I
rarely look at. I simply save them for a few days in case I need
to go back and look at one of them, and then I delete them.
The file named BigDog02GoodList.txt
You will also need to create and populate this plain text file. I
didn’t provide a sample because I have no idea what you might want to
include in this file. What you include in the file will probably
be much different from what I have included in my file.
I also discussed the contents of this list in the earlier lesson
entitled
Overview of the BigDog Email Protection Program.
Briefly, The GOOD list is a
plain text file that contains phrases and words
that identify good messages. The occurrence of
one of these phrases or words in either the subject or the sender’s
email
address in a message will cause the message to be tagged {GD} and
forwarded to my email account with no further processing.
Messages that I want to read
The GOOD list is used to identify email messages that I
want
to read regardless of what the spam screening algorithm might provide
as a spam score for the message.
The list is automatically updated by the program
whenever the
sender responds to a challenge message. I will explain this
process in more detail in a future lesson.
Because this list is
automatically
updated by the program, and is subject to data loss in the event of a
computer crash,
several levels of backup are automatically maintained. Therefore,
once you start running the programs, you will notice new files
appearing in your working directory with names like BigDog02GoodList.bak5.
The file named BigDog02RawText.txt
You will need to create and populate this file in your working
directory. Briefly, this is a plain text file containing words
and phrases used to screen raw (non-HTML) text in the body of a
message.
The occurrence of the words and phrases contained in this list in the
raw body text of a message causes the spam screening algorithm to
consider the message to contain spam. The result is to increase
the spam score associated with that message.
This list can be populated using a simple text editor. There are
a
variety of ways to identify the words and phrases used to populate this
list. I will describe some of those ways in future lessons when I
explain the program named BigDog02m.
If you elect to use these programs before I publish that lesson, you
should provide this file as an empty file so that it can be found at
runtime. Otherwise the program will throw an exception if it
can’t find the file.
Once you start running the programs, you will probably
identify words and phrases in spam messages viewed from within your
email client that should be copied into this file. You can copy
them using your text editor.
The file named BigDog02SubjAndHtml.txt
You will need to create and populate this file in your working
directory. The contents of this file are used to screen the
subject line and to screen the
body of email messages containing HTML.
This list can be large
The screening of subject lines and HTML runs relatively
fast. Therefore, you don’t need to be particularly concerned
about the size of this list. In other words, you can be fairly
aggressive in adding offensive words and phrases to the list. As
of this writing, my file contains more than of 2,200 offensive words
and phrases accumulated over several months of operation. So far,
I have seen no significant speed degradation as a result of the size of
this list.
(On the other hand, you need to keep the size of the
file named BigDog02RawText.txt much smaller. A large
number of entries in that file will result in significant speed
degradation for the program.)
Populating the BigDog02SubjAndHtml.txt file
This list can be populated using a plain text editor. However,
the best way to populate this list is by running the program named BigDog02k
followed by the program named BigDog02m. These two
programs are designed to make it easy to populate this list on the
basis of actual email messages captured earlier in the folder named Archives
and manually copied to the folder named temp.
I will have much more to say about this in a future lesson when I
discuss these two programs.
The folder named DataFiles
This is the folder that receives the individual message files
downloaded from the public email server by the program named BigDog02g.
The files should be allowed to remain in this folder while being
scanned with your favorite virus scanning software.
The files in this folder provide the input to the alternative programs
named BigDog02i and BigDog02j.
The folder named Archives
The alternative programs named BigDog02i and BigDog02j
automatically move the individual message files from the folder named DataFiles
to the folder named Archives at the end of the run when the
user elects to delete the processed messages from the public email
server.
Retrieving a message from quarantine
Later, when the sender of an earlier message that was placed in
quarantine responds properly to a challenge, the earlier message is
retrieved from the folder named Archives, tagged {GD}, and
forwarded to the email client program. Therefore, the files
should be allowed to remain in this folder long enough for the response
to take place.
Deleting files from the Archives folder
About once each week, I manually delete all the files in this folder
that are more than seven days old. This is based on the
assumption that any sender
who is going to respond to the challenge will do so within seven days.
This folder also serves as the repository for messages used to train
the spam screening algorithm using the programs named BigDog02k
and BigDog02m.
The folder named temp
The folder named temp provides the input files for the
algorithm training programs named BigDog02k and BigDog02m.
As described earlier, when time comes to train the spam screening
algorithm, you should copy a large block of message files from the Archives
folder into the temp folder. Then run the programs
named BigDog02k and BigDog02m to train the spam
screening algorithm using actual message files as input.
Setting up your email
Perhaps the most complex aspect of using these programs involves
setting up your email in a compatible way. This is complex
only because different people use different email client programs and
therefore, I am unable to give you step-by-step instructions on how to
do it.
A choice of two approaches
The programs named BigDog02i and BigDog02j provide two
different approaches to processing the individual message files that
have been downloaded and scanned for viruses. These two programs
achieve the same end result, but they achieve that result in different
ways.
The program named BigDog02j is for users whose email client
program uses the MBOX file format to store email messages locally.
The program named BigDog02i is for all other users.
(Note: Even those users that use an MBOX-compatible
email client program can use BigDog02i if they so choose.)
The way that you set up your email accounts is different depending on
which of these two programs you elect to use.
Using BigDog02i
Before reading further in this lesson, I encourage you to go back and
read the earlier
lesson entitled Enlisting
Java in the War Against Email Viruses. The program named BigDog02i
is based on the same technology as the program described in that lesson.
In order to use BigDog02i, you will need to establish a secret
email account as described in that earlier lesson.
Processing, tagging, and forwarding messages
When you run BigDog02i, the program will
process each of the individual message files in the DataFiles
folder, tagging them appropriately as described earlier, and will
forward
the messages to your secret email account. In addition, all
messages with
a {QU} tag will also be tagged with a spam score.
You can set up ordinary email filters in the email client program to
route the messages into specific folders based on the spam score and
the following tags:
- {GD} Good
- {BD} Bad
- {SP} Spam
- {QU} Quarantine
Using BigDog02j
If your email client program stores its messages locally in MBOX format
and if you elect to take advantage of that fact, you can use the
program named BigDog02j to process the individual message files
in the folder named DataFiles.
Before proceeding further in this lesson, I encourage you to go back
and read the lesson entitled
Enlisting Java
in the War Against Email Viruses, Part 2, A Much Faster Program. The
program named BigDog02j uses the same technology described in
that lesson.
Locate the proper directory structure
In this case you will need to locate the local directory structure
that belongs to your email client program. The program named BigDog02j
creates an MBOX-formatted file containing your messages. The file
is given a unique
file name and is written into the directory structure belonging to your
email client program.
(The String variable named emailPath in
Listing 4 specifies the path to the disk directory where the MBOX file
should be written. You will need to modify the value of this
variable to match your own circumstances.)
Appears as an email folder
The next
time you start your email client program after BigDog02j
finishes running, the MBOX file will appear as
an email folder within your email client program. The name of the
email folder will match the name of the MBOX file.
(At least that is the case with Netscape 7, and is
probably true with other MBOX-compatible email client programs as well.)
Tagged messages appear in email folder
The new email folder will contain all of the messages contained in the
disk folder named DataFiles. Each of the messages will be
tagged with one of the tags in the list presented earlier. In
addition, all messages with a {QU} tag will also be tagged with a spam
score.
At this point, you can use the ordinary email filtering capabilities of
your email client program to cause the individual messages to be moved
from that new folder to other email folders based on the tags listed
above and the spam score. Then you can use the capabilities of
the email client program to delete the email folder that represents the
MBOX file. That will cause the MBOX file to be deleted.
Moving messages to email folders
You can set up the email filters in your email client program any way
that you choose to help you to manage your messages. For example,
I
currently use email filters in my email client program to move all of
my messages into one of the following email folders:
- Good
- Bad
- Spam
- Quarantine{0}
- Quarantine{1}
- Quarantine{2+}
As you can probably guess, messages tagged {GD} are moved into the Good
folder. Messages tagged {BD} are moved into the Bad
folder, and messages tagged {SP} are moved into the Spam folder.
Messages tagged {QU} with a spam score of {0} are moved into the Quarantine{0}
folder. Messages tagged {QU} with a spam score of {1} are moved
into the Quarantine{1} folder. All other messages tagged
{QU} are moved into the Quarantine{2+} folder.
Most messages are ignored
As a practical matter, the only messages that I normally read are those
in the Good folder.
(The assumption is that a good message from a stranger
will automatically be retrieved and forwarded to the Good
folder when the
sender of that message responds to the challenge.)
The messages in the Good folder are messages that I want to
read.
Visually scan {QU}{0} messages
The messages in the Quarantine{0} folder are usually messages
that I don’t care to read, because they are usually spam.
However, I visually scan them to make certain that none of the messages
in this folder are good messages that were sent by a computer that
won’t respond to a challenge (such as confirmation of an airline
reservation, for example).
Whenever I locate such a message in the Quarantine{0} folder, I
enter the sending email address and perhaps some key words from the
subject into the file named BigDog02GoodList.txt so that future
messages from that sender will be routed into the Good folder.
Other messages are ignored
I basically ignore all of the messages in the other folders. I
leave them there for a few days just in case I need to go back and
review one of them for some special purpose.
In order to preserve
disk space, I use the features of the email client program to delete
these messages after they are a few days old.
Run the Programs
If you know enough about Java to understand the
programs on the basis of the source code and the comments, I encourage
you to copy the code from the Listings near the end of this
lesson. Compile and
run the programs for non-commercial purposes. Experiment with
them,
improving them as you see fit relative to your specific
situation. If you come up with any good ideas on how to
improve them, I would like to hear those ideas.
(IMPORTANT: Do not enable
the
DELE
code in the programs until you are certain
that you actually want to delete messages from the server. Once a
message is deleted from the server, there is no way to recover it from
the server.)
The BigDog set of programs is designed to protect your email
inbox from email-borne viruses
and spam. This lesson provides source code and a brief
description of
each of those programs. The lesson also explains how to set up
your computer
and your email to use the programs.
Several more lessons are planned for this series. Because every
lesson is a work in progress until I finish writing it, my plans
usually change as I progress through the writing of a series of
lessons. My thinking at this time is that future lessons in this
series will cover the following topics:
- Dealing with base64 data
- Dealing with HTML data
- Sending email messages
- Miscellaneous utility methods
- Downloading email messages
- Processing email messages for the non-MBOX case
- Processing email messages for the MBOX case
- The improved spam screening module
- Training the spam screening algorithm
Only time will tell how many changes I will need to make to this
list. In any event, there are lots of interesting technical
discussions ahead, so stay tuned.
Program Listings
Listing 1 through Listing 8 contain the programs that make up the BigDog
set of programs.
DISCLAIMER OF RESPONSIBILITY: If you elect to use these
programs, you use them at your own risk. Make absolutely certain
that you
understand what you are doing before you execute the programs.
Inappropriate use could result in the loss of email messages. The
author of these programs, Richard G. Baldwin, accepts no responsibility
for any losses that you may incur as a result of using these programs
CAUTION: Do not enable
the
DELE
code in the programs until you are certain
that you actually want to delete messages from the server. Once a
message is deleted from the server, there is no way to recover it from
the server.
File BigDog02b
/*File BigDog02b.java Copyright 2004, R.G.Baldwin Rev 01/31/04
This class is the repository for static utility methods used by other programs in the BigDog02 series of programs.
Tested using SDK 1.4.2 under WinXP ************************************************/ import java.io.*; import sun.net.smtp.SmtpClient; import java.awt.*;
public class BigDog02b{
//This method is called to decode a Subject // line.
//Sometimes the Subject line is encoded using // techniques designed to allow the use of // non-ASCII characters in message headers // (See RFC2047). //The following code determines if the Subject // line has been encoded using the ISO-8859-1 // character set with an encoding value of B // or Q. If so, the encoded material is // decoded. //Messages with an encoding value of Q contain // a mixture of ASCII characters and encoded // characters, so it is possible to partially // read them without the need for decoding. // They also sometimes use an underscore in // place of a space to make them more readable. public static String decodeSubj(String data){ try{ if(data.toUpperCase().indexOf( "=?ISO-8859-1?B?") != -1){ //Need to decode for value of B. int startIndex = data.toUpperCase(). indexOf("=?ISO-8859-1?B?") + 15; int endIndex = data.length()-2; sun.misc.BASE64Decoder dec = new sun.misc.BASE64Decoder(); data = "Subject: =?ISO-8859-1?B? " + new String(dec.decodeBuffer( data.substring( startIndex,endIndex))); }//end if..."=?ISO-8859-1?B?"
if(data.toUpperCase().indexOf( "=?ISO-8859-1?Q?") != -1){ //Need to decode for value of Q. int startIndex = data.toUpperCase(). indexOf("=?ISO-8859-1?Q?") + 15; int endIndex = data.length()-2; String decodedData = data.substring( startIndex,endIndex);
//Decode non-ASCII characters StringBuffer stringBuf = new StringBuffer(decodedData); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("="); if(index > -1){ String hexString = new String(stringBuf).substring( index+1,index+3); char decodedChar = (char)Integer.parseInt( hexString.trim(),16); stringBuf.delete(index,index+3); stringBuf.insert(index,decodedChar); }//end if }//end while(index > -1)
//Replace underscore with // space. index = 0; while(index > -1){ index = stringBuf.lastIndexOf("_"); if(index > -1){ stringBuf.deleteCharAt(index); stringBuf.insert(index,' '); }//end if }//end while(index > -1)
data = "Subject: =?ISO-8859-1?Q? " + new String(stringBuf); }//end if..."=?ISO-8859-1?Q?"
}catch(Exception ex){ System.out.println( "Failure in decodeSubj method"); ex.printStackTrace(); }//end catch return data; }//end decodeSubj //===========================================//
//This method reads and saves lines of data // from a file starting with the line that // startsWith firstLine and ending with the // line that startsWith lastLine. If lastLine // is null, data is saved to the end of the // file. //The lines of data from the file are saved by // concatenating them into a single string with // a newline inserted into the string at the // end of each line. //If firstLine is null, data is saved beginning // with the first line in the file. //The name and path to the file is given by // pathFileName. public static String readLines( String pathFileName, String firstLine, String lastLine){ StringBuffer strBuf = new StringBuffer(); try{ BufferedReader inDataMsg = new BufferedReader(new FileReader( pathFileName));
String data; boolean isSave = false; while((data = inDataMsg.readLine()) != null){
if( ((firstLine == null) || (data.startsWith(firstLine))) && (isSave == false)){ isSave = true; }//end if
if(isSave){ strBuf.append(data + "n"); }//end if
if((lastLine != null) && (data.startsWith(lastLine))){ break;//no need to read any more }//end if
}//end while loop inDataMsg.close();//Close file }catch(Exception e){e.printStackTrace();} return new String(strBuf); }//end readLines
//===========================================//
//This method is used to construct an email // message and send it to the recipient. public static boolean forwardEmailMsg( String recipient, String smtpServer, String tag, String pathFileName){
StringBuffer message = new StringBuffer( "No message found");
try{ //Pass a string containing the name of // the smtp server as a parameter to the // following constructor. SmtpClient smtp = new SmtpClient(smtpServer);
//Pass a valid email address to the // from() method. smtp.from(recipient);
//Pass the email address of the recipient // to the to() method. smtp.to(recipient);
//Get an output stream for the message PrintStream msg = smtp.startMessage();
//Write the message into the output // stream. message = new StringBuffer(readLines( pathFileName,null,null));
//Insert tag in subject line message = message.insert(message.indexOf( "Subject: ")+9,tag); msg.println(new String(message)); //Close the stream and send the message smtp.closeServer();
return true; }catch( Exception e ){ System.out.println("n" + e); System.out.println("Forwarding email"); Toolkit.getDefaultToolkit().beep(); try{ Thread.currentThread().sleep(300); }catch(Exception ex){ System.out.println(ex); }//end catch Toolkit.getDefaultToolkit().beep(); return false; }//end catch
}//end forwardEmailMsg //===========================================//
//Method moves a file from its current location // specified by pathFileName to a new location // specified by archivePath. public static void moveFile( String pathFileName, String archivePath){ String fileName = pathFileName.substring( pathFileName.lastIndexOf('/') + 1); String archivePathFileName = archivePath + fileName;
boolean moved = new File(pathFileName).renameTo( new File(archivePathFileName));
if(!moved)System.out.println( "Unable to move " + new File(pathFileName) + "nto " + new File(archivePathFileName)); }//end moveFile method //===========================================//
}//end class BigDog02b //=============================================//
Listing 1
|
File BigDog02g
/*File BigDog02g.java Copyright 2004, R.G.Baldwin Rev 03/14/04
This program downloads all messages from the public email server and writes them in local files.
After running this program and before running either BigDog02i or BigDog02j, the user should scan all of the message files produced by this program with an anti virus program to remove any files containing viruses from the local folder. That way, messages containing viruses won't be forwarded to the email account.
For technical information on POP3, see RFC 1725 at http://www.cis.ohio-state.edu/htbin/rfc/rfc1725. html
A POP3 Command Summary follows based on the information at that web site.
Minimal POP3 Commands: USER name PASS string QUIT STAT LIST [msg] RETR msg DELE msg NOOP RSET QUIT
Optional POP3 Commands: APOP name digest TOP msg n UIDL [msg]
POP3 Replies: +OK -ERR
Tested using SDK 1.4.2 under WinXP ************************************************/
import java.net.*; import java.io.*; import java.util.*; import java.awt.*; import java.awt.event.*;
class BigDog02g extends Frame{ //The following is the local folder where // message files are stored awaiting // processing. You may want to modify this on // your machine. On my machine, this folder is // a subfolder of the folder containing the // Java class files (the execution directory). String dataPath = "./DataFiles/";
//The following are working variables used by // the program for various purposes. int numberMsgs = 0; int msgCounter = 0; int msgNumber; String uidl = "";//unique msg ID BufferedReader inputStream; PrintWriter outputStream; Socket socket; String pathFileName;
public static void main(String[] args){ if(args.length != 3){ System.out.println("Usage: java BigDog02g " + "server userName password"); System.exit(0); }//end if
new BigDog02g(args[0],args[1],args[2]); }//end main //===========================================//
//Constructor BigDog02g(String server,String userName, String password){ int port = 110; //pop3 mail port try{ //Get a socket, connected to the // specified server on the specified // port. socket = new Socket(server,port);
//Get an input stream from the socket inputStream = new BufferedReader( new InputStreamReader( socket.getInputStream()));
//Get an output stream to the socket. // Note that this stream will autoflush. outputStream = new PrintWriter( new OutputStreamWriter( socket.getOutputStream()),true);
//Display the msg received from the // server on the command-line screen // immediately following connection. String connectMsg = validateOneLine(); System.out.println("Connected to server " + connectMsg);
//The communication process is now in the // AUTHORIZATION state. Send the user // name and password to the server. //Commands are sent in plain text, upper // case to the server. Some commands // require an argument following the // command, as is the case with USER. //Send the command. outputStream.println("USER " + userName); //Get response and confirm that the // response was +OK and was not -ERR. String userResponse = validateOneLine(); //Display the response on the command- // line screen. System.out.println("USER " + userResponse); //Send the password to the server outputStream.println("PASS " + password); //Validate the server's response as +OK. // Display the response in the process. System.out.println( "PASS " + validateOneLine()); }catch(Exception e){e.printStackTrace();}
//Register a window listener to service // the close button on the Frame. this.addWindowListener( new WindowAdapter(){ public void windowClosing(WindowEvent e){
//Terminate the session with the // server. outputStream.println("QUIT"); String quitResponse = validateOneLine(); //Display the response on the // command-line screen. System.out.println( "QUIT " + quitResponse);
try{ socket.close(); }catch(Exception ex){ System.out.println("n" + ex);}
System.exit(0); }//end windowClosing }//end WindowAdapter() );//end addWindowListener
//Note that the compiler requires the // reference to the following components to // be final because they are accessed from // within an anonymous class definition. final Button startButton = new Button("Start"); final TextArea textArea = new TextArea( 20,50);
//Register an ActionListener on the // startButton. startButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){ try{ //The communication process is now // in the TRANSACTION state. //Retrive and save messages if(numberMsgs == 0){ outputStream.println("STAT"); String stat = validateOneLine(); //Get the number of messages as // a String. String numberMsgsStr = stat.substring( 4,stat.indexOf(" ",5)); //Convert the String to an int. numberMsgs = Integer.parseInt( numberMsgsStr); }//end if numberMsgs == 0 //NOTE: Msg numbers begin with 1, // not 0. //Retrieve and save each // message. Each msg ends with a // period on a new line. msgNumber = msgCounter + 1; if(msgNumber <= numberMsgs){ //Process the next message.
//Get and save a unique identifier // for the message from the server // and validate the response. outputStream.println( "UIDL " + msgNumber); uidl = validateOneLine();
//Open an output file to save // the message. Use the UIDL // as the file name. pathFileName = dataPath + uidl; DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( pathFileName));
//Send a RETR command to begin // the message retrieval process outputStream.println( "RETR " + msgNumber); //Validate the response. String retrResponse = validateOneLine();
//Read the first line in the // message from the server. String msgLine = inputStream.readLine();
//Continue reading lines until // a "." is encountered as the // first char in a line. That // signals the end of the msg. while(!(msgLine.equals("."))){ //Write the line to the output // file and read the next // line. Insert newline // characters when writing the // output to the file. dataOut.writeBytes( msgLine + "n"); msgLine = inputStream.readLine();
}//end while //Close the output file. The // message is now stored in a // local file with a file name // based on the unique ID // provided by the server. dataOut.close();
//Show progress textArea.append(msgNumber + "n");
//Increment the message counter // in preparation for // processing the next message. msgCounter++;
Toolkit.getDefaultToolkit(). getSystemEventQueue(). postEvent(new ActionEvent( startButton, ActionEvent. ACTION_PERFORMED, "Start/Next"));
}//end if msgNumber <= numberMsgs else{//msgNumber > numberMsgs //No more messages. Disable the //Start/Next button. startButton.setEnabled(false); textArea.append( "DON'T FORGET TO SCAN");
//Alert the user Toolkit.getDefaultToolkit().beep(); Thread.currentThread().sleep(300); Toolkit.getDefaultToolkit().beep(); Thread.currentThread().sleep(300); Toolkit.getDefaultToolkit().beep(); }//end else }//end try catch(Exception ex){ ex.printStackTrace();} }//end actionPerformed }//end ActionListener );//end addActionListener
//Configure the GUI by placing the // various components on it, setting the size // and making it visible. add(startButton); add(textArea); textArea.setText(""); setLayout(new FlowLayout());
setTitle("Copyright 2004, R.G.Baldwin"); setSize(400,400); //Make the GUI visible. setVisible(true); }//end constructor //===========================================//
//Validate a one-line response. //The purpose of this method is to confirm that // the server returned +OK and not -ERR to the // previous command. //If +OK, the method returns the string // returned by the server. //If -ERR, the method displays the string // returned by the server and terminates the // session. private String validateOneLine(){ try{ String response = inputStream.readLine(); if(response.startsWith("+OK")){ return response; }else{ System.out.println(response); //Terminate the session. outputStream.println("QUIT"); socket.close(); System.out.println( "Premature QUIT on -ERR"); System.exit(0); }//end else }catch(IOException e){ System.out.println("n" + e);} //The following return statement is requied // to satisfy the compiler. return "Make compiler happy"; }//end validateOneLine() //===========================================//
}//end class BigDog02g //=============================================//
Listing 2
|
File BigDog02i
/*File BigDog02i.java Copyright 2004, R.G.Baldwin Rev 02/28/04
This program processes a set of message files written by the program named BigDog02g. This program tags messages as {GD},{QU}, {SP} or {BD} and forwards the messages to a secret email account. The secret email account is provided as a command-line parameter.
Messages tagged {GD} are messages whose sender or subject matches a word or phrase in a GOOD list.
Messages tagged {BD} are messages whose sender or subject matches a word or phrase in an BAD list.
Messages tagged {SP} are messages that were identified by a spam screener as containing spam.
Remaining messages are tagged {QU}. The senders of all messages tagged {QU} are sent a challenge message asking them to reply and confirm that they actually sent the original message.
In addition, this program monitors for REPLY messages where the subject contains +OK. When a REPLY message is received, the sender is added to the GOOD list, the original message referred to by the unique code in the subject is retrieved from the archive folder, the retrieved message is tagged {GD}, and the tagged message is forwarded to the secret email account.
This program should be run after the program named BigDog02g has been run, and after a virus checker has been used to confirm that all files in the working directory produced by BigDog02g are free of viruses. See additional comments at the beginning of BigDog02g.java for a description of this program.
For technical information on POP3, see RFC 1725 at http://www.cis.ohio-state.edu/htbin/rfc/rfc1725. html
A POP3 Command Summary follows based on the information at that web site.
Minimal POP3 Commands: USER name PASS string QUIT STAT LIST [msg] RETR msg DELE msg NOOP RSET QUIT
Optional POP3 Commands: APOP name digest TOP msg n UIDL [msg]
POP3 Replies: +OK -ERR
This program uses the DELE command to delete messages from the public POP3 server.
This program uses an object of the class named BigDog02SpamScreen01 to screen messages to determine if they contain spam.
Certain portions of this program have been disabled for test purposes. Search for the word disable to identify those portions.
Tested using SDK 1.4.2 under WinXP ************************************************/
import java.net.*; import java.io.*; import java.util.*; import java.awt.*; import java.awt.event.*; import sun.net.smtp.SmtpClient;
class BigDog02i extends Frame{ //All of the user-specific information is // provided here.
//Beginning of subject for outgoing message. String subjOut = "Put your subj here "; //Signature on outgoing message. String signature = "Your signaturenn"; //List of email addresses that should not be // sent an email message regardless of any // other circumstance. It should probably // include your own email addresses as a // minimum. String[] doNotSendList = {"you@yourAddress" };//end of list //The From: address in outgoing email message. String fromAddr = "you@yourAddress";
//ID of the secret email account. String recipient = "See command-line input"; //An smtp server through which the user is // authorized to send email messages. String smtpServer = "See command-line input"; //End of user-specific information.
//Local folder where message files are stored // awaiting processing. You may want to modify // this on your machine. On my machine, this // folder is a subfolder of the folder // containing the Java class files (the // execution directory). String dataPath = "./DataFiles/"; //Local folder where the messages are stored // after they have been processed. They are // automatically moved to this folder after // being deleted from the email server. String archivePath = "./Archives/"; //Following two files contain lists of phrases // used in processing the messages. String goodPhraseFile = "BigDog02GoodList.txt"; String badPhraseFile = "BigDog02BadList.txt";
//Following are working variables used by the // program for various purposes. TreeSet goodPhraseList; TreeSet badPhraseList; BufferedReader inputStream; PrintWriter outputStream; Socket socket; String pathFileName; Vector msgToDelete = new Vector(); Button startButton = new Button("Start/Next"); Button deleteButton = new Button( "Delete Msg On Server"); TextArea textArea = new TextArea(20,50); String uidl; String subject = "No Subject line found"; String sender = "No From line found"; String msgNumberStr = "000"; boolean okToDelete = false; int msgNumber = 0; String subjAndHtmlPhraseFile = "BigDog02SubjAndHtml.txt"; String rawTextPhraseFile = "BigDog02RawText.txt"; int hitCount = 0; int hitLimit = 6;
public static void main(String[] args){ if(args.length != 5){ System.out.println("Usage: java BigDog02i " + "pubServer userName password " + "secretServer smtpServer"); System.exit(0); }//end if
//Construct an object of this class new BigDog02i(args[0],args[1],args[2], args[3],args[4]); }//end main //===========================================//
//Constructor BigDog02i(final String server, final String userName, final String password, String secretServer, String smtpServer){
recipient = secretServer; this.smtpServer = smtpServer;
makeGoodPhraseList(); makeBadPhraseList();
//Register a window listener to service // the close button on the Frame. this.addWindowListener( new WindowAdapter(){ public void windowClosing(WindowEvent e){ System.exit(0); }//end windowClosing }//end WindowAdapter() );//end addWindowListener
//Register an ActionListener on the // startButton. startButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){ startButton.setEnabled(false); //Get a directory listing File dataDir = new File(dataPath); //The following code creates a // directory listing containing only // those files that begin with +OK. //This is an anonymous implementation // of a class that implements // FilenameFilter. String[] dirList = dataDir.list( new FilenameFilter(){ public boolean accept( File dir,String name){ if(!(new File(dir,name). isFile())) return false; return name.startsWith("+OK"); }//end accept }//end FilenameFilter );//end list
//Now process the files in the // directory int msgCounter = 0; for(msgCounter = 0; msgCounter < dirList.length; msgCounter++){ String fileName = dirList[msgCounter]; pathFileName = dataPath + fileName;
//Get the original message number // used by the server to ID the msg. String strMsgNumber = fileName.substring( fileName.indexOf(" "), fileName.lastIndexOf(" ")) .trim(); msgNumber = Integer.parseInt(strMsgNumber); System.out.print("" + msgNumber + ", ");
//Process the message startProcess(); }//end for loop on directory length
//Write the possibly modified // goodPhraseList into an output file writeGoodPhraseList();
//Make it possible for the user to // delete all processed messages from // the server, and notify the user that // the time has come for a deletion // decision. deleteButton.setEnabled(true); textArea.append("nDo you want to " + "delete messages from server?n"); //Sound an audio alert try{ Toolkit.getDefaultToolkit().beep(); Thread.currentThread().sleep(300); Toolkit.getDefaultToolkit().beep(); Thread.currentThread().sleep(300); Toolkit.getDefaultToolkit().beep(); }catch(Exception ex){ ex.printStackTrace();} }//end actionPerformed }//end ActionListener );//end addActionListener
//Register an action listener on the delete // button deleteButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){ deleteButton.setEnabled(false); textArea.append("n");
//Get connected to the email server int port = 110; //pop3 mail port try{ //Get a socket, connected to the // specified server on the specified // port. socket = new Socket(server,port);
//Get an input stream from the socket inputStream = new BufferedReader( new InputStreamReader( socket.getInputStream()));
//Get an output stream to the socket outputStream = new PrintWriter( new OutputStreamWriter( socket.getOutputStream()),true);
//Display the msg received from the // server on the command-line screen // immediately following connection. String connectMsg = validateOneLine(); System.out.println( "Connected to server " + connectMsg);
//The communication process is now in // the AUTHORIZATION state. Send the // user name and password to the // server. outputStream.println("USER " + userName); //Get response and confirm that the // response was +OK and was not -ERR. String userResponse = validateOneLine(); //Display the response on the // command-line screen. System.out.println("USER " + userResponse); //Send the password to the server outputStream.println("PASS " + password); //Validate the server's response as // +OK. Display the response in the // process. System.out.println("PASS " + validateOneLine()); }catch(Exception ex){ ex.printStackTrace();}
//Process the files in the msgToDelete // collection and delete those messages // from the email server for(int cnt = 0; cnt < msgToDelete.size();cnt++){ pathFileName = (String)msgToDelete. elementAt(cnt); String strMsgNumber = pathFileName. substring(pathFileName.indexOf(" "), pathFileName.lastIndexOf(" ")). trim(); int msgNumber = Integer.parseInt( strMsgNumber);
//Deletion of a message from the // server is accomplished by marking // the message for deletion while in // the TRANSACTION state. The // message is actually deleted when // the client sends a QUIT command // to the server causing the server // to enter the UPDATE state. If the // program aborts prematurely before // sending a QUIT command, marked // messages are not deleted from the // server. //Mark the message for deletion.
//Message deletion has been disabled // for test purposes. textArea.append( "nMessage deletion disabled");
/* outputStream.println("DELE " + msgNumber);
//Validate the response and display // it on the GUI. textArea.append( "Msg: " + msgNumber + " " + validateOneLine()+"n"); textArea.append( "Deleted:" + msgNumber + "n"); */ //Now move the file that has been // processed and deleted from the // server to the archive folder on // the local disk. BigDog02b.moveFile(pathFileName, archivePath);
}//end for loop on msgToDelete.size()
//Terminate the session with the // server causing the messages to // actually be deleted from the server. outputStream.println("QUIT"); String quitResponse = validateOneLine(); //Display the response on the // command-line screen. System.out.println( "QUIT " + quitResponse);
//Server is now in the UPDATE mode. // It will delete all files marked // with the DELE command earlier // in the execution of the program. //Close the socket try{ socket.close(); }catch(Exception ex){ System.out.println("n" + ex);}
textArea.append("nnMessages deleted " + "from server.n"); }//end actionPerformed }//end ActionListener );//end addActionListener
//Configure the GUI by placing the // various components on it, setting the // size, and making it visible. add(startButton); add(deleteButton); deleteButton.setEnabled(false); add(textArea); textArea.setText(""); setLayout(new FlowLayout());
setTitle("Copyright 2004, R.G.Baldwin"); setSize(400,400); //Make the GUI visible. setVisible(true); }//end constructor //===========================================//
//Validate a one-line response. //The purpose of this method is to confirm that // the server returned +OK and not -ERR to the // previous command. //If +OK, the method returns the string // returned by the server. //If -ERR, the method displays the string // returned by the server and terminates the // session. private String validateOneLine(){ try{ String response = inputStream.readLine(); if(response.startsWith("+OK")){ return response; }else{ System.out.println(response); //Terminate the session. outputStream.println("QUIT"); socket.close(); System.out.println( "Premature QUIT on -ERR"); System.exit(0); }//end else }catch(IOException e){ System.out.println("n" + e);} //The following return statement is requied // to satisfy the compiler. return "Make compiler happy"; }//end validateOneLine() //===========================================//
//The purpose of this method is to kick off the // processing of a new message. void startProcess(){ //Create a three-digit string representing // the message number. This will be used to // tag the subject before the message is // forwarded to the secret email account. if(msgNumber < 10){ msgNumberStr = "00" + msgNumber; }else if(msgNumber > 99){ msgNumberStr = "" + msgNumber; }else{ msgNumberStr = "0" + msgNumber; }//end else
//Get and save the unique identifier assigned // by the public email server. uidl = pathFileName.substring( pathFileName.lastIndexOf(" "));
//Determine the type of message and take the // appropriate action.
if(isBad()){ //This message was determined to be from // a confirmed spammer, virus writer, other // machine, or some other undesirable // source. No point in sending them a // message. Tag the message as {BD} // and forward it to the secret email // account. processBad(); }else if(isReply()){ //This message is a reply to a previous // message sent to someone inviting them // to confirm that they are a human and // not a machine. Add the email address // to the list of good addresses for // future messages, retrieve the original // message that triggered the inquiry, tag // the original message as {GD} and // forward it to the secret email account. // This is the most complex of all the // processing tasks in the program. processReply(); }else if(isGood()){ //This message was determined either to be // from an approved sender, or to have an // approved subject. Tag the message as // {GD} and forward it to the secret email // account. processGood(); }else if(isSpam()){ //This message has been processed by a spam // screener and has been determined to be // spam. It will be marked {SP} along with // a spam score before being written into // the MBOX file. processSpam(); }else{ //This message is from an unknown address. // It is probably spam, but may be from // someone worth communicating with. Send // a message asking the sender to confirm // that they are a human. Tag the message // as {QU} and forward it to the secret // email account. If a reply is received // in a reasonable time, that reply will // trigger the processReply procedure // described above. Otherwise, manually // delete the message from the local // archive folder after a reasonable // amount of time has transpired. processQuarantine(); }//end else
}//end startProcess //===========================================//
//Purpose: To write the data from a TreeSet // object into an output file. //This method is the reverse of the method // named makeGoodPhraseList.
void writeGoodPhraseList(){ try{ DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( goodPhraseFile));
//Use an iterator to access the data in // the TreeSet object. Iterator iter = goodPhraseList.iterator(); String data;
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();}
}//end writeGoodPhraseList //===========================================//
//This method tests the sender of the message // and the subject of the message against the // list of items in the badPhraseFile. // Returns true on match, false otherwise. private boolean isBad(){ boolean match = false;
//Get the Subject line decode if necessary, // convert it to upper case subject = BigDog02b.readLines( pathFileName,"Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
//Get the sender and convert it to upper // case sender = BigDog02b.readLines(pathFileName, "From:","From:"); sender = sender.toUpperCase();
//The Subject and From lines have been // captured. Screen each of them against // an upper case version ofwords and // phrases in a TreeSet object containing // quarantine email addresses and subjects. match = screenForBadSubjAndFromLines(); return match; }//end isBad method //===========================================//
//This method screens the Subject and From // lines to determine if they contain bad // subjects or email addresses. If so, the // method returns true. Otherwise, it returns // false. An exact match on an upper-case basis // is required private boolean screenForBadSubjAndFromLines(){ Iterator iterator = badPhraseList.iterator(); while(iterator.hasNext()){ String badWord = ((String)(iterator.next())). toUpperCase(); if(!(badWord.equals(""))){ if((subject.indexOf(badWord) != -1) || (sender.indexOf(badWord) != -1)){ //An exact match was found. return true; }//end if((subject.indexOf... }//end if!(badWord.equals("") }//end while iterator has next return false; }//end screenForBadSubjAndFromLines //===========================================//
//This method is used to process messages that // have been determined to be in the bad // category. void processBad(){
BigDog02b.forwardEmailMsg( recipient, smtpServer, "{BD}{"+msgNumberStr+"}", pathFileName);
//Add this message to the list of messages // scheduled to be deleted from the public // email server msgToDelete.add(pathFileName);
}//end processBad //===========================================//
//This method tests the subject of the current // message to determine if the message is a // reply to a message sent to an email address // earlier. If the subject contains +OK, it is // assumed to be a reply because that is // the beginning of a unique ID assigned to // each message that is sent. It is also the // beginning of the file name by which message // files are stored locally. Returns true on // match, false otherwise. If it is a reply, // the unique ID in the subject of the message // matches the file name of the earlier // message that triggered the sending of an // email message to the email address. That // makes it possible to locate and retrieve // the original message from a local archive // folder. private boolean isReply(){ boolean match = false; String subject = "";
//Get the subject, decode if necessary, and // convert to upper case subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
if(subject.indexOf("+OK ") != -1){ //Tag the message {GD} and forward it to // the secret email account. The file in // the archives represented by this message // should also be forwarded to the secret // email account by the processReply // method if it can be found. BigDog02b.forwardEmailMsg( recipient, smtpServer, "{GD}{"+msgNumberStr+"}", pathFileName);
return true; }else{ return false; }//end else }//end isReply method //===========================================//
//This method uses information in the subject // of the current message to retrieve an // earlier message file from a local archive // folder. The earlier message is tagged {GD} // and forwarded to the secret email account.
private void processReply(){ String sender = "No sender identified"; String emailAddr = "No email address identified"; String subject = "";
//Beep twice to alert the user that a reply // is being processed. Toolkit.getDefaultToolkit().beep(); try{ Thread.currentThread().sleep(200); }catch(Exception ex){System.out.println(ex);} Toolkit.getDefaultToolkit().beep();
//Get the subject, decode if necessary, and // trim off the newline character subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject). trim();
//Now parse the subject to get the name of // the original file. File theFile = null; try{ //Note, this assumes that the requested // file is now located in the folder // pointed to by archivePath theFile = new File(archivePath + subject.substring( subject.indexOf("+OK"))); }catch(Exception ex){ System.out.println("n" + ex); System.out.println("Getting theFile"); System.out.println("pathFileName:" + pathFileName); }//end catch textArea.append("nProcessing reply message " + msgNumberStr + "n" + subject + "nFile: " + theFile + "n"); if(theFile.exists()){ //Read the file from the local archive // folder. Extract the email address. Add // the email address to the goodPhraseList. // Tag the message {GD} and forward it to // the secret email account. Note that the // last parameter identifies the path and // file name of the file being retrieved. BigDog02b.forwardEmailMsg( recipient, smtpServer, "{GD}{"+msgNumberStr+"}", theFile.toString());
//Add the message to the list of messages // scheduled for deletion from the public // email server. msgToDelete.add(pathFileName);
//Now get the sender email address and add // it to the goodPhraseList //Get the sender, convert to upper case, // and trim off the new line character. sender = BigDog02b.readLines( theFile.toString(), "From:","From:"); sender = sender.toUpperCase().trim();
//Deal with the format of the email // address. Some have the email address // in angle brackets with something like a // name ahead of the angle brackets. // Others simply have an email address. try{ if((sender.indexOf("<") != -1) && (sender.indexOf(">") != -1)){ emailAddr = sender.substring( sender.indexOf("<") + 1, sender.indexOf(">")).toUpperCase(); }else if(sender.indexOf(" ") != 1){ //Get rid of text ahead of the email // address emailAddr = sender.substring(sender. lastIndexOf(" ") + 1).toUpperCase(); }else{ emailAddr = sender.toUpperCase(); }//end else }catch(Exception ex){ System.out.println("n" + ex); System.out.println( "Getting sender for goodPhraseList"); System.out.println("sender:" + sender); System.out.println("pathFileName:" + pathFileName); }//end catch
//Add the email address to the good list. goodPhraseList.add(emailAddr);
}else{ textArea.append("nUnable to locate file " + "referred to in reply.n"); //Beep to alert the user of this problem. Toolkit.getDefaultToolkit().beep(); try{ Thread.currentThread().sleep(200); }catch(Exception ex){ System.out.println(ex);} Toolkit.getDefaultToolkit().beep(); }//end else
}//end processReply //===========================================//
//This method tests the sender of the message // and the subject of the message against the // list of items in the goodPhraseFile. Returns // true on match, false otherwise. private boolean isGood(){ boolean match = false; //Get the subject, decode if necessary, and // convert to upper case subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
//Get the sender and convert to upper case sender = BigDog02b.readLines(pathFileName, "From:","From:"); sender = sender.toUpperCase();
//The Subject and From lines have been // captured. Screen each of them against // an upper case version ofwords and // phrases in a TreeSet object containing // good email addresses and subjects. match = screenForGoodSubjAndFromLines(); return match; }//end isGood method //===========================================//
//This method screens the Subject and From // lines to determine if they contain good // subjects or email addresses. If so, the // method returns true. Otherwise, it returns // false. An exact match on an upper-case basis // is required private boolean screenForGoodSubjAndFromLines(){ Iterator iterator = goodPhraseList.iterator(); while(iterator.hasNext()){ String goodWord = ((String)(iterator.next())). toUpperCase(); if(!(goodWord.equals(""))){ if((subject.indexOf(goodWord) != -1) || (sender.indexOf(goodWord) != -1)){ //An exact match was found. System.out.println("ngoodWord:" + goodWord); return true; }//end if((subject.indexOf... }//end if!(goodWord.equals("") }//end while iterator has next return false; }//end screenForGoodSubjAndFromLines //===========================================//
//This method processes a message that has been // determined to be a good message. Forward the // message to the secret email account and add // the identification of the message to the // list of messages scheduled for deletion from // the server later. //Don't add it to the deletion list if // forwarding failed. void processGood(){ okToDelete = BigDog02b.forwardEmailMsg( recipient, smtpServer, "{GD}{"+msgNumberStr+"}", pathFileName); if(okToDelete){ msgToDelete.add(pathFileName); }//end if
}//end processGood //===========================================//
//This method is used to process messages that // have been determined to be in the quarantine // category. These are messages which probably // are spam or viruses, sent by machines. // However, some small percentage may have been // sent by a human who wishes to communicate in // a meaningful way, but whose email address // has not yet been entered into the good list. // As a result, each of these messages // triggers an email message to be sent // automatically asking the sender to // demonstrate that they are a human by // replying to the message. The original // message is tagged {QU} and forwarded to the // secret email account. It is also stored in // a local archive folder. The receipt of a // reply later will cause the original message // to be retrieved from the local archive // folder, tagged {GD}, and forwarded to the // secret email account. void processQuarantine(){
String subject = ""; String sender = ""; String date = ""; String header = "";
//Read the message from a local file, tag it // {QU} and forward it to the secret email // account. //You can tag the subject with any // string that you want to pass as // the third parameter. I elected to add // the original message number and the spam // score to the tag. This information is // useful when using ordinary email program // filters to direct the messages to // specific email folders. BigDog02b.forwardEmailMsg( recipient, smtpServer, "{QU}{"+msgNumberStr+"}{" +hitCount+"}", pathFileName);
//Add the message to the list of messages // scheduled for deletion from the public // email server. msgToDelete.add(pathFileName);
//Now prepare for composing and sending the // email message to the sender of the // current message. //Get the Subject line decode if necessary, // and convert it to upper case subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
//Get the sender in upper case sender = BigDog02b.readLines(pathFileName, "From:","From:"); sender = sender.toUpperCase();
//Now get the date. date = BigDog02b.readLines(pathFileName, "Date:","Date:"); date = date.toUpperCase();
//Now get the header of the original message. header = BigDog02b.readLines(pathFileName, null,"Status:");
//Use this information to send an email // message to the sender. Need to avoid a // substring index error later if the sender // or the subject are blank. if(!(sender.equals("") || subject.equals(""))){ sendEmailMsg(sender,subject,date, header,pathFileName); }else{ textArea.append( "nUnable to send messagen"); }//end else }//end processQuarantine //===========================================//
//This method is used to automatically send an // email message to the sender of every // quarantine message, asking them to indicate // that they are a human rather than a machine // by replying to the message. //The incoming sender parameter is used to // establish the address of the recipient. //The incoming parameter subject is reported // to the recipient along with the date to // identify the message to the recipient. //The incoming pathFileName is used to // place a unique identifier in the subject of // the message that is sent. This identifies // the original message that triggered this // event. private void sendEmailMsg(String sender, String subject, String date, String header, String pathFileName){ //Enable the following two statements and // enclose the remaining body of the method // in a large block comment when testing the // program to avoid sending nuisance // messages. textArea.append("sendEmail disabledn"); return;
/* //Start a block comment here to disable //Don't send messages to any email address // on the doNotSendList. boolean okToSend = true; for(int cnt = 0; cnt < doNotSendList.length; cnt++){ if(sender.toUpperCase().indexOf( doNotSendList[cnt]. toUpperCase()) != -1){ okToSend = false; textArea.append("nDon't send to: " + sender.toUpperCase() + "n"); break; }//end if }//end for loop
if(okToSend){ //Get the email address from the incoming // parameter sender. Sometimes the actual // address is enclosed in angle brackets. String emailAddr = ""; if((sender.indexOf("<") != -1) && (sender.indexOf(">") != -1)){ try{ emailAddr = sender.substring( sender.indexOf("<") + 1, sender.indexOf(">")); }catch(Exception ex){ System.out.println("n" + ex); System.out.println("Sending email"); System.out.println( "Getting emailAddr in <>"); System.out.println("sender:" + sender); System.out.println("pathFileName:" + pathFileName); System.out.println("Forcing a valid " + "email address structure"); emailAddr = "dummy@dummy.com"; }//end catch }else{ //Sometimes the email address simply // follows the word From: in the header // of the message from which the sender // parameter is derived. try{ emailAddr = sender.substring( sender.toUpperCase().indexOf( "FROM:")+5); }catch(Exception ex){ System.out.println("n" + ex); System.out.println("Sending email"); System.out.println( "Getting emailAddr"); System.out.println("sender:" + sender); System.out.println("pathFileName:" + pathFileName); System.out.println("Forcing a valid " + "email address structure"); emailAddr = "dummy@dummy.com"; }//end catch emailAddr = emailAddr.trim(); }//end else
//Make sure that emailAddr contains an @ // indicating that it is probably a // properly formatted email address. if(emailAddr.indexOf("@") == -1){ Toolkit.getDefaultToolkit().beep(); try{ Thread.currentThread().sleep(200); }catch(Exception ex){ System.out.println(ex);} Toolkit.getDefaultToolkit().beep(); System.out.println("nCan't send to:" + emailAddr); return; }//end if
//Extract the file name from the // pathFileName parameter and the actual // subject from the incoming subject // parameter. String fileName = "No file name available"; try{ fileName = pathFileName.substring( pathFileName.lastIndexOf("/") + 1); String theSubject = subject.substring(9); }catch(Exception ex){ System.out.println("n" + ex); System.out.println("Sending email"); System.out.println( "Getting fileName and theSubject"); System.out.println("subject:" + subject); System.out.println("fileName:" + fileName); System.out.println("pathFileName:" + pathFileName); }//end catch
//Display information about the message. I // may decide to write this into a history // file later so that I will have a record // of messages sent. textArea.append("nSending email to:n" + emailAddr + "n" + fileName + "n" + date.trim() + "n");
try{ //Pass a string containing the name of // the smtp server as a parameter to the // SmtpClient constructor. SmtpClient smtp = new SmtpClient(smtpServer);
//Pass the sender's email address to the // from() method. smtp.from(fromAddr);
//Pass the email address of the recipient // to the method named to(). smtp.to(emailAddr);
//Get an output stream for the message PrintStream msg = smtp.startMessage();
//Write the message header in the output // stream. msg.println("To: " + emailAddr); msg.println("Subject: " + subjOut + fileName); msg.println();//blank line
//Write the text of the message in the // output stream. msg.println( "I recently received a message from yourn"+ "Email address with the following subjectn"+ "and date:nn"+
subject + "n" + date + "nn" +
"Because your Email address has not been n"+ "entered into the Approved Sender list of my n"+ "SPAM blocking software, the message has beenn"+ "placed in the Quarantine folder. To move n"+ "the message from the Quarantine folder into n"+ "my Inbox, you will need to press your Reply n"+ "button and send this message back to me n"+ "making no changes to the Subject line or then"+ "body of the message. This will also cause n"+ "your Email address to be added to my n"+ "Approved Sender list so that future messagesn"+ "from you won't be similarly delayed.nn"+
"I apologize for this inconvenience. n"+ "However, due to the large amount of SPAM n"+ "that I must contend with, I have been n"+ "forced to implement a mail handling system n"+ "that asks you for a one-time confirmation n"+ "that you intend to communicate with me via n"+ "Email.nn"+
"If you didn't send the original message, I n"+ "apologize for the intrusion. However, it isn"+ "possible that someone is using your Email n"+ "address for misleading, possibly fraudulent,n"+ "and possibly malicious purposes. I stronglyn"+ "encourage you to file a complaint regarding n"+ "the inappropriate use of your Email address.n"+
"I have provided all of the information belown"+ "that you will need to file such a n"+ "complaint.nn"+
"The information provided below my signaturen"+ "block is the full header of the original n"+ "Email message. You will find a short n"+ "tutorial at n"+ "http://www.dickbaldwin.com/java/Java2158.htmn"+ "that explains how to use this header to filen"+ "a complaint.nn"+
"If we all ban together in opposing SPAM and n"+ "Email viruses, perhaps we can have a n"+ "positive impact on this increasingly seriousn"+ "problem.nn"+
"Regards,n"+ signature +
"=======HEADER BEGINS HERE========nn"+ header +"n"
);//end of message
//Close the stream and send the message smtp.closeServer();
}catch( Exception e ){ System.out.println("n" + e); System.out.println("Sending email"); System.out.println(pathFileName); }//end catch }//end if(okToSend) */ //end a block comment here to disable }//end sendEmailMsg //===========================================//
//Purpose: To create a TreeSet object // containing words used to screen the message // From and Subject lines. //This method reads strings from a text file // and creates the list as a TreeSet object // with no duplicates. //Only the primary portion of the good // Email address should be included in the // file used to create the list. This would // be x@y.z
//After creating the list, it writes the data // from the list into a backup file named // ....bakN, where N is the value of the // next available file name in the directory. //A new backup file with a unique name is // created each time the program is run. Once // the number of backup files reaches 5, the // program automatically deletes the oldest // file before creating a new backup // file. Thus the program automatically // maintains a sequence of five backup files // with extensions .bak0 through bak5 with one // number missing. The age-order of the files // should be determined by the modificatin date // and not by the name of the file. //The data read from the file is converted to // upper case before being added to the TreeSet // object.
void makeGoodPhraseList(){ goodPhraseList = new TreeSet();
//Read words or phrases from text file and // populate the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( goodPhraseFile)); String data; //temp holding area
while((data = inData.readLine()) != null){ goodPhraseList.add(data.toUpperCase()); }//end while loop
inData.close();//Close input file
//Write a backup file before making any // modifications to the data.
//First determine the name of the next // backup file allowed in the directory. int N = 0; File theFile = null; String baseFileName = goodPhraseFile. substring(0,goodPhraseFile.indexOf( ".txt")); for(N = 0;N < 6;N++){ theFile = new File(baseFileName + ".bak" + N); if(!(theFile.exists()))break; }//end for loop
//Cause N to rotate from 0 through 5 if(N == 5){//del file 0 for use next time new File(baseFileName + ".bak0").delete(); }//end if else{//delete the next file in sequence if(new File( baseFileName + ".bak" + (N + 1)).exists()){ new File( baseFileName + ".bak" + (N + 1)).delete(); }//end if }//end else
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( theFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = goodPhraseList.iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();} }//end makeGoodPhraseList //===========================================//
//Purpose: To create a TreeSet object // containing words used to screen the message // From and Subject lines. //This method reads strings from a text file // and creates the list as a TreeSet object // with no duplicates. //Only the primary portion of the bad // Email address should be included in the // file used to create the list. This would // be x@y.z
//After creating the list, it writes the data // from the list back out into the file. This // is done to keep the contents of the file // sorted in upper case. Since the program // doesn't modify the contents of the list, // there is no point in creating backup files.
void makeBadPhraseList(){ badPhraseList = new TreeSet();
//Read words or phrases from text file and // populate the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( badPhraseFile)); String data; //temp holding area
while((data = inData.readLine()) != null){ badPhraseList.add(data.toUpperCase()); }//end while loop
inData.close();//Close input file
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( badPhraseFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = badPhraseList.iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();} }//end makeBadPhraseList //===========================================//
//This method passes the message through a spam // screener to determine if it should be // considered spam. The screener program // produces and returns a score based on the // number of hits against offensive words and // phrases. The number of hits is compared to // a hitLimit value that is established in the // general instance variables at the beginning // of the program. When the number of hits // reaches that value, the screener terminates // in order to avoid wasting time. If that // limit has been reached, this method returns // true indicating that the message is thought // to be spam. Otherwise, it returns false. If // it returns true, the control program invokes // the method named processSpam to deal with // the message. private boolean isSpam(){ BigDog02SpamScreen01 screener = new BigDog02SpamScreen01(dataPath, subjAndHtmlPhraseFile, rawTextPhraseFile, hitLimit);
hitCount = screener.screenMsg(pathFileName); if(hitCount >= hitLimit){ return true; }else{ return false; }//end else }//end isSpam method //===========================================//
//This method deals with a message that has // been identified as spam. void processSpam(){
//Forward the message to the secret email // account. //You can tag the subject with any string // that you want to pass as the third // parameter. I elected to tag it with {SP} // indicating that it is spam. I also added // the message number and the spam score, // which may be useful for using email // program filters to cause the messages to // be directed to specific email folders. BigDog02b.forwardEmailMsg( recipient, smtpServer, "{SP}{"+msgNumberStr+"}{"+hitCount+"}", pathFileName);
//Add this message to the list of messages // scheduled to be deleted from the public // email server msgToDelete.add(pathFileName);
}//end processSpam //===========================================// }//end class BigDog02i //=============================================//
Listing 3
|
File BigDog02j
/*File BigDog02j.java Copyright 2004, R.G.Baldwin Rev 02/28/04
This program processes a set of message files written by the program named BigDog02g. This program tags messages as {GD},{QU}, {SP} or {BD} and writes them into an MBOX file in the local directory tree for a dummy email account.
This is a fast alternative to the program named BigDog02i This program can be used by persons whose email client program stores messages locally in MBOX format.
Messages tagged {GD} are messages whose sender or subject matches a word or phrase in a GOOD list.
Messages tagged {BD} are messages whose sender or subject matches a word or phrase in a BAD list.
Messages tagged {SP} are messages that were identified by a spam filter as containing spam.
Remaining messages are tagged {QU}. The senders of all messages tagged {QU} are sent a challenge message asking them to reply and confirm that they actually sent the original message.
In addition, this program monitors for REPLY messages where the subject contains +OK. When a REPLY message is received, the sender is added to the GOOD list, the original message referred to by the unique code in the subject is retrieved from the archive folder, the retrieved message is tagged {GD}, and the tagged message is written into the MBOX file.
You should terminate your email client program before running this program. Otherwise, it may not recognize the new email folder until you stop and then restart the email client program.
This program should be run after the program named BigDog02g has been run, and after a virus checker has been used to confirm that all files in the working directory produced by BigDog02g are free of viruses. See additional comments at the beginning of BigDog02g.java for a description of this program.
For technical information on POP3, see RFC 1725 at http://www.cis.ohio-state.edu/htbin/rfc/rfc1725. html
A POP3 Command Summary follows based on the information at that web site.
Minimal POP3 Commands: USER name PASS string QUIT STAT LIST [msg] RETR msg DELE msg NOOP RSET QUIT
Optional POP3 Commands: APOP name digest TOP msg n UIDL [msg]
POP3 Replies: +OK -ERR
This program uses the DELE command to delete messages from the public POP3 server at the request of the user.
This program uses an object of the class named BigDog02SpamScreen01 to screen messages to determine if they contain spam.
Certain portions of this program have been disabled for test purposes. Search for the word disable to identify those portions.
Tested using SDK 1.4.2 under WinXP ************************************************/
import java.net.*; import java.io.*; import java.util.*; import java.awt.*; import java.awt.event.*; import sun.net.smtp.SmtpClient;
class BigDog02j extends Frame{ //All of the user-specific information is // provided here.
//Beginning of subject for outgoing message. String subjOut = "Put your subj here "; //Signature on outgoing message. String signature = "Your signaturenn"; //List of email addresses that should not be // sent an email message regardless of any // other circumstance. It should probably // include your own email addresses as a // minimum. String[] doNotSendList = {"you@yourAddress" };//end of list //The From: address in outgoing email message. String fromAddr = "you@yourAddress";
//An smtp server through which the user is // authorized to send email messages. String smtpServer = "See command-line input"; //Local folder where message files are stored // awaiting processing. You may want to modify // this on your machine. On my machine, this // folder is a subfolder of the folder // containing the Java class files (the // execution directory). String dataPath = "./DataFiles/"; //Local folder where the messages are stored // after they have been processed. They are // automatically moved to this folder after // being deleted from the email server. String archivePath = "./Archives/"; //Path to the local folder where you write // files to cause them to be treated as email // folders. Note that this doesn't have to be // a valid email account so long as the email // client program considers it to be valid. In // other words, you can create a new account in // your email client program using dummy server // names, etc. String emailPath = "C:/Baldwin/DummyMailAccount/"; //Following two files contain lists of phrases // used in processing the messages. String goodPhraseFile = "BigDog02GoodList.txt"; String badPhraseFile = "BigDog02BadList.txt"; //End of user-specific information.
//Following are working variables used by the // program for various purposes. TreeSet goodPhraseList; TreeSet badPhraseList; BufferedReader inputStream; PrintWriter outputStream; Socket socket; String pathFileName; Vector msgToDelete = new Vector(); Button startButton = new Button("Start/Next"); Button deleteButton = new Button( "Delete Msg On Server"); TextArea textArea = new TextArea(20,50); String uidl; String subject = "No Subject line found"; String sender = "No From line found"; String msgNumberStr = "000"; int msgNumber = 0; StringBuffer mBoxStrBuf = new StringBuffer(""); String newFolder; String subjAndHtmlPhraseFile = "BigDog02SubjAndHtml.txt"; String rawTextPhraseFile = "BigDog02RawText.txt"; int hitCount = 0; int hitLimit = 6;
public static void main(String[] args){ if(args.length != 4){ System.out.println( "Usage: java BigDog02j " + "pubServer userName password " + "smtpServer"); System.exit(0); }//end if
//Construct an object of this class new BigDog02j(args[0],args[1],args[2], args[3]); }//end main //===========================================//
//Constructor BigDog02j(final String server, final String userName, final String password, String smtpServer){
this.smtpServer = smtpServer; newFolder = "A" + new Date().getTime();
makeGoodPhraseList(); makeBadPhraseList();
//Register a window listener to service // the close button on the Frame. this.addWindowListener( new WindowAdapter(){ public void windowClosing(WindowEvent e){ System.exit(0); }//end windowClosing }//end WindowAdapter() );//end addWindowListener
//Register an ActionListener on the // startButton. startButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){ startButton.setEnabled(false); //Get a directory listing File dataDir = new File(dataPath); //The following code creates a // directory listing containing only // those files that begin with +OK. //This is an anonymous implementation // of a class that implements // FilenameFilter. String[] dirList = dataDir.list( new FilenameFilter(){ public boolean accept( File dir,String name){ if(!(new File(dir,name). isFile())) return false; return name.startsWith("+OK"); }//end accept }//end FilenameFilter );//end list
//Now process the files in the // directory int msgCounter = 0; for(msgCounter = 0; msgCounter < dirList.length; msgCounter++){ String fileName = dirList[msgCounter]; pathFileName = dataPath + fileName;
//Get the original message number // used by the server to ID the msg. String strMsgNumber = fileName.substring( fileName.indexOf(" "), fileName.lastIndexOf(" ")) .trim(); msgNumber = Integer.parseInt(strMsgNumber); System.out.print("" + msgNumber + ", ");
//Process the message startProcess(); }//end for loop on directory length
try{ System.out.println("Writing: " + emailPath + newFolder); //Write the updated string into the // MBOX file (email folder). DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( emailPath + newFolder)); dataOut.writeBytes( new String(mBoxStrBuf)); dataOut.close(); }catch(Exception ex){ System.out.println( "Writing MBOX file"); ex.printStackTrace(); }//end catch
//Write the possibly modified // goodPhraseList into an output file writeGoodPhraseList();
//Make it possible for the user to // delete all processed messages from // the server, and notify the user that // the time has come for a deletion // decision. deleteButton.setEnabled(true); textArea.append("nDo you want to " + "delete messages from server?n"); //Sound an audio alert try{ Toolkit.getDefaultToolkit().beep(); Thread.currentThread().sleep(300); Toolkit.getDefaultToolkit().beep(); Thread.currentThread().sleep(300); Toolkit.getDefaultToolkit().beep(); }catch(Exception ex){ ex.printStackTrace();} }//end actionPerformed }//end ActionListener );//end addActionListener
//Register an action listener on the delete // button deleteButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){ deleteButton.setEnabled(false); textArea.append("n");
//Get connected to the email server int port = 110; //pop3 mail port try{ //Get a socket, connected to the // specified server on the specified // port. socket = new Socket(server,port);
//Get an input stream from the socket inputStream = new BufferedReader( new InputStreamReader( socket.getInputStream()));
//Get an output stream to the socket outputStream = new PrintWriter( new OutputStreamWriter( socket.getOutputStream()),true);
//Display the msg received from the // server on the command-line screen // immediately following connection. String connectMsg = validateOneLine(); System.out.println( "Connected to server " + connectMsg);
//The communication process is now in // the AUTHORIZATION state. Send the // user name and password to the // server. outputStream.println("USER " + userName); //Get response and confirm that the // response was +OK and was not -ERR. String userResponse = validateOneLine(); //Display the response on the // command-line screen. System.out.println("USER " + userResponse); //Send the password to the server outputStream.println("PASS " + password); //Validate the server's response as // +OK. Display the response in the // process. System.out.println("PASS " + validateOneLine()); }catch(Exception ex){ ex.printStackTrace();}
//Process the files in the msgToDelete // collection and delete those messages // from the email server for(int cnt = 0; cnt < msgToDelete.size();cnt++){ pathFileName = (String)msgToDelete. elementAt(cnt); String strMsgNumber = pathFileName. substring(pathFileName.indexOf(" "), pathFileName.lastIndexOf(" ")). trim(); int msgNumber = Integer.parseInt( strMsgNumber);
//Deletion of a message from the // server is accomplished by marking // the message for deletion while in // the TRANSACTION state. The // message is actually deleted when // the client sends a QUIT command // to the server causing the server // to enter the UPDATE state. If the // program aborts prematurely before // sending a QUIT command, marked // messages are not deleted from the // server. //Mark the message for deletion.
//Message deletion has been disabled // for test purposes. textArea.append( "nMessage deletion disabled");
/* outputStream.println("DELE " + msgNumber);
//Validate the response and display // it on the GUI. textArea.append( "Msg: " + msgNumber + " " + validateOneLine()+"n"); textArea.append( "Deleted:" + msgNumber + "n"); */ //Now move the file that has been // processed and deleted from the // server to the archive folder on // the local disk. BigDog02b.moveFile(pathFileName, archivePath);
}//end for loop on msgToDelete.size()
//Terminate the session with the // server causing the messages to // actually be deleted from the server. outputStream.println("QUIT"); String quitResponse = validateOneLine(); //Display the response on the // command-line screen. System.out.println( "QUIT " + quitResponse);
//Server is now in the UPDATE mode. // It will delete all files marked // with the DELE command earlier // in the execution of the program. //Close the socket try{ socket.close(); }catch(Exception ex){ System.out.println("n" + ex);}
textArea.append("nnMessages deleted " + "from server.n"); }//end actionPerformed }//end ActionListener );//end addActionListener
//Configure the GUI by placing the // various components on it, setting the // size, and making it visible. add(startButton); add(deleteButton); deleteButton.setEnabled(false); add(textArea); textArea.setText(""); setLayout(new FlowLayout());
setTitle("Copyright 2004, R.G.Baldwin"); setSize(400,400); //Make the GUI visible. setVisible(true); }//end constructor //===========================================//
//Validate a one-line response. //The purpose of this method is to confirm that // the server returned +OK and not -ERR to the // previous command. //If +OK, the method returns the string // returned by the server. //If -ERR, the method displays the string // returned by the server and terminates the // session. private String validateOneLine(){ try{ String response = inputStream.readLine(); if(response.startsWith("+OK")){ return response; }else{ System.out.println(response); //Terminate the session. outputStream.println("QUIT"); socket.close(); System.out.println( "Premature QUIT on -ERR"); System.exit(0); }//end else }catch(IOException e){ System.out.println("n" + e);} //The following return statement is requied // to satisfy the compiler. return "Make compiler happy"; }//end validateOneLine() //===========================================//
//The purpose of this method is to kick off the // processing of a new message. void startProcess(){ //Create a three-digit string representing // the message number. This will be used to // tag the subject before the message is // written into the MBOX file. if(msgNumber < 10){ msgNumberStr = "00" + msgNumber; }else if(msgNumber > 99){ msgNumberStr = "" + msgNumber; }else{ msgNumberStr = "0" + msgNumber; }//end else
//Get and save the unique identifier assigned // by the public email server. uidl = pathFileName.substring( pathFileName.lastIndexOf(" "));
//Determine the type of message and take the // appropriate action.
if(isBad()){ //This message was determined to be from // a confirmed spammer, virus writer, other // machine, or some other undesirable // source. No point in sending them a // message. Tag the message as {BD} // and write it into the MBOX file processBad(); }else if(isReply()){ //This message is a reply to a previous // message sent to someone inviting them // to confirm that they are a human and // not a machine. Add the email address // to the list of good addresses for // future messages, retrieve the original // message that triggered the inquiry, tag // the original message as {GD} and // write it into the MBOX file. processReply(); }else if(isGood()){ //This message was determined either to be // from an approved sender, or to have an // approved subject. Tag the message as // {GD} and write it into the MBOX file. processGood(); }else if(isSpam()){ //This message has been processed by a spam // filter and has been determined to be // spam. It will be marked {SP} along with // a spam score before being written into // the MBOX file. processSpam(); }else{ //This message is from an unknown address. // It is probably spam, but may be from // someone worth communicating with. Send // a message asking the sender to confirm // that they are a human. Tag the message // as {QU} and write it into the MBOX file. // If a reply is received // in a reasonable time, that reply will // trigger the processReply procedure // described above. Otherwise, manually // delete the message from the local // archive folder after a reasonable // amount of time has transpired. processQuarantine(); }//end else
}//end startProcess //===========================================//
//Purpose: To write the data from a TreeSet // object into an output file. //This method is the reverse of the method // named makeGoodPhraseList.
void writeGoodPhraseList(){ try{ DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( goodPhraseFile));
//Use an iterator to access the data in // the TreeSet object. Iterator iter = goodPhraseList.iterator(); String data;
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();}
}//end writeGoodPhraseList //===========================================//
//This method tests the sender of the message // and the subject of the message against the // list of items in the badPhraseFile. // Returns true on match, false otherwise. private boolean isBad(){ boolean match = false;
//Get the Subject line decode if necessary, // convert it to upper case subject = BigDog02b.readLines( pathFileName,"Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
//Get the sender and convert it to upper // case sender = BigDog02b.readLines(pathFileName, "From:","From:"); sender = sender.toUpperCase();
//The Subject and From lines have been // captured. Screen each of them against // an upper case version ofwords and // phrases in a TreeSet object containing // quarantine email addresses and subjects. match = screenForBadSubjAndFromLines(); return match; }//end isBad method //===========================================//
//This method screens the Subject and From // lines to determine if they contain bad // subjects or email addresses. If so, the // method returns true. Otherwise, it returns // false. An exact match on an upper-case basis // is required private boolean screenForBadSubjAndFromLines(){ Iterator iterator = badPhraseList.iterator(); while(iterator.hasNext()){ String badWord = ((String)(iterator.next())). toUpperCase(); if(!(badWord.equals(""))){ if((subject.indexOf(badWord) != -1) || (sender.indexOf(badWord) != -1)){ //An exact match was found. return true; }//end if((subject.indexOf... }//end if!(badWord.equals("") }//end while iterator has next return false; }//end screenForBadSubjAndFromLines //===========================================//
//This method is used to process messages that // have been determined to be in the bad // category. void processBad(){
//Add the message to the MBOX file. // You can tag the subject with any // string that you want to pass as // the second parameter. mBoxStrBuf = addToMboxStr(mBoxStrBuf, "{BD}{"+msgNumberStr+"}", pathFileName);
//Add this message to the list of messages // scheduled to be deleted from the public // email server msgToDelete.add(pathFileName);
}//end processBad //===========================================//
//This method tests the subject of the current // message to determine if the message is a // reply to a message sent to an email address // earlier. If the subject contains +OK, it is // assumed to be a reply because that is // the beginning of a unique ID assigned to // each message that is sent. It is also the // beginning of the file name by which message // files are stored locally. Returns true on // match, false otherwise. If it is a reply, // the unique ID in the subject of the message // matches the file name of the earlier // message that triggered the sending of an // email message to the email address. That // makes it possible to locate and retrieve // the original message from a local archive // folder. private boolean isReply(){ boolean match = false; String subject = "";
//Get the subject, decode if necessary, and // convert to upper case subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
if(subject.indexOf("+OK ") != -1){
//Tag the message {GD} and write it into // the MBOX file. The file in the archives // represented by this message should also // be written into the MBOX file.by the // processReply method if it can be found. // You can tag the subject with any string // that you want to pass as the second // parameter. mBoxStrBuf = addToMboxStr(mBoxStrBuf, "{GD}{"+msgNumberStr+"}", pathFileName);
return true; }else{ return false; }//end else }//end isReply method //===========================================//
//This method uses information in the subject // of the current message to retrieve an // earlier message file from a local archive // folder. The earlier message is tagged {GD} // and written into the MBOX file.
private void processReply(){ String sender = "No sender identified"; String emailAddr = "No email address identified"; String subject = "";
//Beep twice to alert the user that a reply // is being processed. Toolkit.getDefaultToolkit().beep(); try{ Thread.currentThread().sleep(200); }catch(Exception ex){System.out.println(ex);} Toolkit.getDefaultToolkit().beep();
//Get the subject, decode if necessary, and // trim off the newline character subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject). trim();
//Now parse the subject to get the name of // the original file. File theFile = null; try{ //Note, this assumes that the requested // file is now located in the folder // pointed to by archivePath theFile = new File(archivePath + subject.substring( subject.indexOf("+OK"))); }catch(Exception ex){ System.out.println("n" + ex); System.out.println("Getting theFile"); System.out.println("pathFileName:" + pathFileName); }//end catch textArea.append("nProcessing reply message " + msgNumberStr + "n" + subject + "nFile: " + theFile + "n"); if(theFile.exists()){
//Read the file from the local archive // folder. Extract the email address. Add // the email address to the goodPhraseList. // Tag the message {GD} and write it into // the MBOX file. Note that the last // parameter identifies the path and file // name of the file being retrieved. // You can tag the subject with any string // that you want to pass as the second // parameter. mBoxStrBuf = addToMboxStr(mBoxStrBuf, "{GD}{"+msgNumberStr+"}", theFile.toString());
//Add the message to the list of messages // scheduled for deletion from the public // email server. msgToDelete.add(pathFileName);
//Now get the sender email address and add // it to the goodPhraseList //Get the sender, convert to upper case, // and trim off the new line character. sender = BigDog02b.readLines( theFile.toString(), "From:","From:"); sender = sender.toUpperCase().trim();
//Deal with the format of the email // address. Some have the email address // in angle brackets with something like a // name ahead of the angle brackets. // Others simply have an email address. try{ if((sender.indexOf("<") != -1) && (sender.indexOf(">") != -1)){ emailAddr = sender.substring( sender.indexOf("<") + 1, sender.indexOf(">")).toUpperCase(); }else if(sender.indexOf(" ") != 1){ //Get rid of text ahead of the email // address emailAddr = sender.substring(sender. lastIndexOf(" ") + 1).toUpperCase(); }else{ emailAddr = sender.toUpperCase(); }//end else }catch(Exception ex){ System.out.println("n" + ex); System.out.println( "Getting sender for goodPhraseList"); System.out.println("sender:" + sender); System.out.println("pathFileName:" + pathFileName); }//end catch
//Add the email address to the good list. goodPhraseList.add(emailAddr);
}else{ textArea.append("nUnable to locate file " + "referred to in reply.n"); //Beep to alert the user of this problem. Toolkit.getDefaultToolkit().beep(); try{ Thread.currentThread().sleep(200); }catch(Exception ex){ System.out.println(ex);} Toolkit.getDefaultToolkit().beep(); }//end else
}//end processReply //===========================================//
//This method tests the sender of the message // and the subject of the message against the // list of items in the goodPhraseFile. Returns // true on match, false otherwise. private boolean isGood(){ boolean match = false; //Get the subject, decode if necessary, and // convert to upper case subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
//Get the sender and convert to upper case sender = BigDog02b.readLines(pathFileName, "From:","From:"); sender = sender.toUpperCase();
//The Subject and From lines have been // captured. Screen each of them against // an upper case version ofwords and // phrases in a TreeSet object containing // good email addresses and subjects. match = screenForGoodSubjAndFromLines(); return match; }//end isGood method //===========================================//
//This method screens the Subject and From // lines to determine if they contain good // subjects or email addresses. If so, the // method returns true. Otherwise, it returns // false. An exact match on an upper-case basis // is required private boolean screenForGoodSubjAndFromLines(){ Iterator iterator = goodPhraseList.iterator(); while(iterator.hasNext()){ String goodWord = ((String)(iterator.next())). toUpperCase(); if(!(goodWord.equals(""))){ if((subject.indexOf(goodWord) != -1) || (sender.indexOf(goodWord) != -1)){ //An exact match was found. System.out.println("ngoodWord:" + goodWord); return true; }//end if((subject.indexOf... }//end if!(goodWord.equals("") }//end while iterator has next return false; }//end screenForGoodSubjAndFromLines //===========================================//
//This method processes a message that has been // determined to be a good message. It writes // the message into the MBOX file, and adds // the identification of the message to the // list of messages scheduled for deletion from // the server later. void processGood(){ //Add the message to the MBOX file.You can // tag the subject with any string that you // want to pass as the second parameter. mBoxStrBuf = addToMboxStr(mBoxStrBuf, "{GD}{"+msgNumberStr+"}", pathFileName);
//Add the message to the list of messages // scheduled for deletion from the public // email server. msgToDelete.add(pathFileName);
}//end processGood //===========================================//
//This method is used to process messages that // have been determined to be in the quarantine // category. These are messages which probably // are spam or viruses, sent by machines. // However, some small percentage may have been // sent by a human who wishes to communicate in // a meaningful way, but whose email address // has not yet been entered into the good list. // As a result, each of these messages // triggers an email message to be sent // automatically asking the sender to // demonstrate that they are a human by // replying to the message. The original // message is tagged {QU} and written into the // MBOX file. It is also stored in a local // archive folder. The receipt of a reply // later will cause the original message to be // retrieved from the local archive folder, // tagged {GD}, and written into the MBOX file. void processQuarantine(){
String subject = ""; String sender = ""; String date = ""; String header = "";
//Read the message from a local file, tag it // {QU} and write it into the MBOX file. // You can tag the subject with any // string that you want to pass as // the second parameter. I elected to add // the original message number and the spam // score to the tag. This information is // useful when using ordinary email program // filters to direct the messages to // specific email folders. mBoxStrBuf = addToMboxStr(mBoxStrBuf, "{QU}{"+msgNumberStr+"}{"+hitCount+"}", pathFileName);
//Add the message to the list of messages // scheduled for deletion from the public // email server. msgToDelete.add(pathFileName);
//Now prepare for composing and sending the // email message to the sender of the // current message. //Get the Subject line decode if necessary, // and convert it to upper case subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
//Get the sender in upper case sender = BigDog02b.readLines(pathFileName, "From:","From:"); sender = sender.toUpperCase();
//Now get the date. date = BigDog02b.readLines(pathFileName, "Date:","Date:"); date = date.toUpperCase();
//Now get the header of the original message. header = BigDog02b.readLines(pathFileName, null,"Status:");
//Use this information to send an email // message to the sender. Need to avoid a // substring index error later if the sender // or the subject are blank. if(!(sender.equals("") || subject.equals(""))){ sendEmailMsg(sender,subject,date, header,pathFileName); }else{ textArea.append( "nUnable to send messagen"); }//end else }//end processQuarantine //===========================================//
//This method is used to automatically send an // email message to the sender of every // quarantine message, asking them to indicate // that they are a human rather than a machine // by replying to the message. //The incoming sender parameter is used to // establish the address of the recipient. //The incoming parameter subject is reported // to the recipient along with the date to // identify the message to the recipient. //The incoming pathFileName is used to // place a unique identifier in the subject of // the message that is sent. This identifies // the original message that triggered this // event. private void sendEmailMsg(String sender, String subject, String date, String header, String pathFileName){ //Enable the following two statements and // enclose the remaining body of the method // in a large block comment when testing the // program to avoid sending nuisance // messages. textArea.append("sendEmail disabledn"); return;
/* //Start a block comment here to disable
//Don't send messages to any email address // on the doNotSendList. boolean okToSend = true; for(int cnt = 0; cnt < doNotSendList.length; cnt++){ if(sender.toUpperCase().indexOf( doNotSendList[cnt]. toUpperCase()) != -1){ okToSend = false; textArea.append("nDon't send to: " + sender.toUpperCase() + "n"); break; }//end if }//end for loop
if(okToSend){ //Get the email address from the incoming // parameter sender. Sometimes the actual // address is enclosed in angle brackets. String emailAddr = ""; if((sender.indexOf("<") != -1) && (sender.indexOf(">") != -1)){ try{ emailAddr = sender.substring( sender.indexOf("<") + 1, sender.indexOf(">")); }catch(Exception ex){ System.out.println("n" + ex); System.out.println("Sending email"); System.out.println( "Getting emailAddr in <>"); System.out.println("sender:" + sender); System.out.println("pathFileName:" + pathFileName); System.out.println("Forcing a valid " + "email address structure"); emailAddr = "dummy@dummy.com"; }//end catch }else{ //Sometimes the email address simply // follows the word From: in the header // of the message from which the sender // parameter is derived. try{ emailAddr = sender.substring( sender.toUpperCase().indexOf( "FROM:")+5); }catch(Exception ex){ System.out.println("n" + ex); System.out.println("Sending email"); System.out.println( "Getting emailAddr"); System.out.println("sender:" + sender); System.out.println("pathFileName:" + pathFileName); System.out.println("Forcing a valid " + "email address structure"); emailAddr = "dummy@dummy.com"; }//end catch emailAddr = emailAddr.trim(); }//end else
//Make sure that emailAddr contains an @ // indicating that it is probably a // properly formatted email address. if(emailAddr.indexOf("@") == -1){ Toolkit.getDefaultToolkit().beep(); try{ Thread.currentThread().sleep(200); }catch(Exception ex){ System.out.println(ex);} Toolkit.getDefaultToolkit().beep(); System.out.println("nCan't send to:" + emailAddr); return; }//end if
//Extract the file name from the // pathFileName parameter and the actual // subject from the incoming subject // parameter. String fileName = "No file name available"; try{ fileName = pathFileName.substring( pathFileName.lastIndexOf("/") + 1); String theSubject = subject.substring(9); }catch(Exception ex){ System.out.println("n" + ex); System.out.println("Sending email"); System.out.println( "Getting fileName and theSubject"); System.out.println("subject:" + subject); System.out.println("fileName:" + fileName); System.out.println("pathFileName:" + pathFileName); }//end catch
//Display information about the message. I // may decide to write this into a history // file later so that I will have a record // of messages sent. textArea.append("nSending email to:n" + emailAddr + "n" + fileName + "n" + date.trim() + "n");
try{ //Pass a string containing the name of // the smtp server as a parameter to the // SmtpClient constructor. SmtpClient smtp = new SmtpClient(smtpServer);
//Pass the sender's email address to the // from() method. smtp.from(fromAddr);
//Pass the email address of the recipient // to the method named to(). smtp.to(emailAddr);
//Get an output stream for the message PrintStream msg = smtp.startMessage();
//Write the message header in the output // stream. msg.println("To: " + emailAddr); msg.println("Subject: " + subjOut + fileName); msg.println();//blank line
//Write the text of the message in the // output stream. msg.println( "I recently received a message from yourn"+ "Email address with the following subjectn"+ "and date:nn"+
subject + "n" + date + "nn" +
"Because your Email address has not been n"+ "entered into the Approved Sender list of my n"+ "SPAM blocking software, the message has beenn"+ "placed in the Quarantine folder. To move n"+ "the message from the Quarantine folder into n"+ "my Inbox, you will need to press your Reply n"+ "button and send this message back to me n"+ "making no changes to the Subject line or then"+ "body of the message. This will also cause n"+ "your Email address to be added to my n"+ "Approved Sender list so that future messagesn"+ "from you won't be similarly delayed.nn"+
"I apologize for this inconvenience. n"+ "However, due to the large amount of SPAM n"+ "that I must contend with, I have been n"+ "forced to implement a mail handling system n"+ "that asks you for a one-time confirmation n"+ "that you intend to communicate with me via n"+ "Email.nn"+
"If you didn't send the original message, I n"+ "apologize for the intrusion. However, it isn"+ "possible that someone is using your Email n"+ "address for misleading, possibly fraudulent,n"+ "and possibly malicious purposes. I stronglyn"+ "encourage you to file a complaint regarding n"+ "the inappropriate use of your Email address.n"+
"I have provided all of the information belown"+ "that you will need to file such a n"+ "complaint.nn"+
"The information provided below my signaturen"+ "block is the full header of the original n"+ "Email message. You will find a short n"+ "tutorial at n"+ "http://www.dickbaldwin.com/java/Java2158.htmn"+ "that explains how to use this header to filen"+ "a complaint.nn"+
"If we all ban together in opposing SPAM and n"+ "Email viruses, perhaps we can have a n"+ "positive impact on this increasingly seriousn"+ "problem.nn"+
"Regards,n"+ signature +
"=======HEADER BEGINS HERE========nn"+ header +"n"
);//end of message
//Close the stream and send the message smtp.closeServer();
}catch( Exception e ){ System.out.println("n" + e); System.out.println("Sending email"); System.out.println(pathFileName); }//end catch }//end if(okToSend)
*/ //end a block comment here to disable }//end sendEmailMsg //===========================================//
//Purpose: To create a TreeSet object // containing words used to screen the message // From and Subject lines. //This method reads strings from a text file // and creates the list as a TreeSet object // with no duplicates. //Only the primary portion of the good // Email address should be included in the // file used to create the list. This would // be x@y.z
//After creating the list, it writes the data // from the list into a backup file named // ....bakN, where N is the value of the // next available file name in the directory. //A new backup file with a unique name is // created each time the program is run. Once // the number of backup files reaches 5, the // program automatically deletes the oldest // file before creating a new backup // file. Thus the program automatically // maintains a sequence of five backup files // with extensions .bak0 through bak5 with one // number missing. The age-order of the files // should be determined by the modificatin date // and not by the name of the file. //The data read from the file is converted to // upper case before being added to the TreeSet // object.
void makeGoodPhraseList(){ goodPhraseList = new TreeSet();
//Read words or phrases from text file and // populate the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( goodPhraseFile)); String data; //temp holding area
while((data = inData.readLine()) != null){ goodPhraseList.add(data.toUpperCase()); }//end while loop
inData.close();//Close input file
//Write a backup file before making any // modifications to the data.
//First determine the name of the next // backup file allowed in the directory. int N = 0; File theFile = null; String baseFileName = goodPhraseFile. substring(0,goodPhraseFile.indexOf( ".txt")); for(N = 0;N < 6;N++){ theFile = new File(baseFileName + ".bak" + N); if(!(theFile.exists()))break; }//end for loop
//Cause N to rotate from 0 through 5 if(N == 5){//del file 0 for use next time new File(baseFileName + ".bak0").delete(); }//end if else{//delete the next file in sequence if(new File( baseFileName + ".bak" + (N + 1)).exists()){ new File( baseFileName + ".bak" + (N + 1)).delete(); }//end if }//end else
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( theFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = goodPhraseList.iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();} }//end makeGoodPhraseList //===========================================//
//Purpose: To create a TreeSet object // containing words used to screen the message // From and Subject lines. //This method reads strings from a text file // and creates the list as a TreeSet object // with no duplicates. //Only the primary portion of the bad // Email address should be included in the // file used to create the list. This would // be x@y.z
//After creating the list, it writes the data // from the list back out into the file. This // is done to keep the contents of the file // sorted in upper case. Since the program // doesn't modify the contents of the list, // there is no point in creating backup files.
void makeBadPhraseList(){ badPhraseList = new TreeSet();
//Read words or phrases from text file and // populate the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( badPhraseFile)); String data; //temp holding area
while((data = inData.readLine()) != null){ badPhraseList.add(data.toUpperCase()); }//end while loop
inData.close();//Close input file
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( badPhraseFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = badPhraseList.iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();} }//end makeBadPhraseList //===========================================//
private StringBuffer addToMboxStr( StringBuffer mBoxStrBuf, String tag, String pathFileName){
StringBuffer message = new StringBuffer( "No message found"); message = new StringBuffer( BigDog02b.readLines( pathFileName,null,null));
//Prepare the message for appending to the // end of the MBOX string. This requires // the creation of four new header lines // and the prepending of those four lines // onto the message. Examples of those // four new header lines follow:
//From - Wed Jan 21 15:59:09 2004 //X-UIDL: 400ed7770000000b //X-Mozilla-Status: 0000 //X-Mozilla-Status2: 00000000
// The first line contains the date and // time. The second line contains the UIDL // from the email server. The meaning of // the third and fourth lines can be found // at various web sites including // http://www.eyrich-net.org/mozilla/ // X-Mozilla-Status.html?en // For a new message, the status values // given above are satisfactory.
//Create Mozilla header lines and insert // them at the beginning of the message. // First get a 24-char date string matching // the format required by Mozilla. String theDate = new Date().toString(); theDate = "From - " + theDate.substring(0,19) + theDate.substring(23); //Create the UIDL string. String xUidl = "X-UIDL:" + pathFileName.substring( pathFileName.lastIndexOf(" ")); //Create the two status strings. String xMozillaStatus = "X-Mozilla-Status: 0000"; String xMozillaStatus2 = "X-Mozilla-Status2: 00000000";
message.insert(0,theDate + "n" + xUidl + "n" + xMozillaStatus + "n" + xMozillaStatus2 + "n"); //Append a new line at the end of the // message. message.append("n"); //Insert tag in subject line message = message.insert(message.indexOf( "Subject: ")+9,tag);
//Append this message at the end of the // string that will be used to create the // MBOX file. mBoxStrBuf.append(message);
return mBoxStrBuf;
}//end addToMboxStr //===========================================//
//This method passes the message through a spam // screener to determine if it should be // considered spam. The screener program // produces and returns a score based on the // number of hits against offensive words and // phrases. The number of hits is compared to // a hitLimit value that is established in the // general instance variables at the beginning // of the program. When the number of hits // reaches that value, the screener terminates // in order to avoid wasting time. If that // limit has been reached, this method returns // true indicating that the message is thought // to be spam. Otherwise, it returns false. If // it returns true, the control program invokes // the method named processSpam to deal with // the message. private boolean isSpam(){ BigDog02SpamScreen01 screener = new BigDog02SpamScreen01(dataPath, subjAndHtmlPhraseFile, rawTextPhraseFile, hitLimit);
hitCount = screener.screenMsg(pathFileName); if(hitCount >= hitLimit){ return true; }else{ return false; }//end else }//end isSpam method //===========================================//
//This method deals with a message that has // been identified as spam. void processSpam(){
//Add the message to the MBOX file. //You can tag the subject with any string // you want to pass as the second parameter. // I elected to tag it with {SP} indicating // that it is spam. I also added the message // number and the spam score which may be // useful for using email program filters to // cause the messages to be directed to // specific email folders. mBoxStrBuf = addToMboxStr(mBoxStrBuf, "{SP}{"+msgNumberStr+"}{"+hitCount+"}", pathFileName);
//Add this message to the list of messages // scheduled to be deleted from the public // email server msgToDelete.add(pathFileName);
}//end processSpam //===========================================//
}//end class BigDog02j //=============================================//
Listing 4
|
File BigDog02k
/*File BigDog02k.java Copyright 2004, R.G.Baldwin Rev 03/06/04
This is a special modified version of the program named BigDog02j. The purpose of this version is to examine message files that have been manually copied from the archive folder into a folder named temp, and to delete all files other than those that are quarantined with a hit count of zero. The quarantined files can then be used to train the spam screening algorithms to do a better job in the future.
The main purpose of this and the program used to train the algorithm is to identify messages that clearly seem to contain spam, but which currently are being categorized as quarantined with zero spam hits. Messages in quarantine with no spam hits deserve special manual scrutiny to make certain that they don't represent messages from a computer than need to be read (such as machine generated airline reservations.).
This program processes a set of message files written by the program named BigDog02g that have been manually copied into a folder named temp.
This program should be run after a virus checker has been used to confirm that all files copied into the temp directory are free of viruses.
Tested using SDK 1.4.2 under WinXP ************************************************/
import java.net.*; import java.io.*; import java.util.*; import java.awt.*; import java.awt.event.*;
class BigDog02k extends Frame{
String dataPath = "./temp/";
//Following two files contain lists of phrases // used in processing the messages before they // are subjected to the spam screen. String goodPhraseFile = "BigDog02GoodList.txt"; String badPhraseFile = "BigDog02BadList.txt";
//Following two files contain lists of phrases // used in performing the spam screen. String subjAndHtmlPhraseFile = "BigDog02SubjAndHtml.txt"; String rawTextPhraseFile = "BigDog02RawText.txt";
//Following are working variables used by the // program for various purposes. TreeSet goodPhraseList; TreeSet badPhraseList; String pathFileName; Button startButton = new Button("Start/Next");
TextArea textArea = new TextArea(20,50); String subject = "No Subject line found"; String sender = "No From line found";
int hitCount = 0; int hitLimit = 6; //Will delete all msg files with a hit count // greater than or equal to the following. int deleteLimit = 1;
public static void main(String[] args){ //Construct an object of this class new BigDog02k(); }//end main //===========================================//
//Constructor BigDog02k(){ makeGoodPhraseList(); makeBadPhraseList();
//Register a window listener to service // the close button on the Frame. this.addWindowListener( new WindowAdapter(){ public void windowClosing(WindowEvent e){ System.exit(0); }//end windowClosing }//end WindowAdapter() );//end addWindowListener
//Register an ActionListener on the // startButton. startButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){ startButton.setEnabled(false); //Get a directory listing File dataDir = new File(dataPath); //The following code creates a // directory listing containing only // those files that begin with +OK. //This is an anonymous implementation // of a class that implements // FilenameFilter. String[] dirList = dataDir.list( new FilenameFilter(){ public boolean accept( File dir,String name){ if(!(new File(dir,name). isFile())) return false; return name.startsWith("+OK"); }//end accept }//end FilenameFilter );//end list
//Now process the files in the // directory int msgCounter = 0; for(msgCounter = 0; msgCounter < dirList.length; msgCounter++){ String fileName = dirList[msgCounter]; pathFileName = dataPath + fileName;
//Process the message startProcess(); }//end for loop on directory length System.out.println("Finished"); }//end actionPerformed }//end ActionListener );//end addActionListener
//Configure the GUI by placing the // various components on it, setting the // size, and making it visible. add(startButton); add(textArea); textArea.setText(""); setLayout(new FlowLayout());
setTitle("Copyright 2004, R.G.Baldwin"); setSize(400,400); //Make the GUI visible. setVisible(true); }//end constructor //===========================================//
//The purpose of this method is to kick off the // processing of a new message. void startProcess(){ //Determine the type of message and take the // appropriate action.
if(isBad()){ //This message was determined to be from // a confirmed spammer, virus writer, other // machine, or some other undesirable // source. No point in sending them a // message. Tag the message as {BD} // and write it into the MBOX file System.out.print("{BD}: "); deleteFile(pathFileName); }else if(isGood()){ //This message was determined either to be // from an approved sender, or to have an // approved subject. Tag the message as // {GD} and write it into the MBOX file. System.out.print("{GD}: "); deleteFile(pathFileName); }else if(isSpam()){ //This message has been processed by a spam // filter and has been determined to be // spam. It will be marked {SP} along with // a spam score before being written into // the MBOX file. System.out.print("{SP} "); deleteFile(pathFileName); }else{ //This message is from an unknown address. // It is probably spam, but may be from // someone worth communicating with. // Process the message to determine the // number of spam hits and delete it if // the number is greater than zero. Can // modify the comparison value if it is // decided to keep files with greater // hit count values. if(hitCount >= deleteLimit){ System.out.print("{QU}{"+hitCount+"} "); deleteFile(pathFileName); }//end if }//end if(isSpam()
}//end startProcess //===========================================//
//This method tests the sender of the message // and the subject of the message against the // list of items in the badPhraseFile. // Returns true on match, false otherwise. private boolean isBad(){ boolean match = false;
//Get the Subject line decode if necessary, // convert it to upper case subject = BigDog02b.readLines( pathFileName,"Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
//Get the sender and convert it to upper // case sender = BigDog02b.readLines(pathFileName, "From:","From:"); sender = sender.toUpperCase();
//The Subject and From lines have been // captured. Screen each of them against // an upper case version ofwords and // phrases in a TreeSet object containing // quarantine email addresses and subjects. match = screenForBadSubjAndFromLines(); return match; }//end isBad method //===========================================//
//This method screens the Subject and From // lines to determine if they contain bad // subjects or email addresses. If so, the // method returns true. Otherwise, it returns // false. An exact match on an upper-case basis // is required private boolean screenForBadSubjAndFromLines(){ Iterator iterator = badPhraseList.iterator(); while(iterator.hasNext()){ String badWord = ((String)(iterator.next())). toUpperCase(); if(!(badWord.equals(""))){ if((subject.indexOf(badWord) != -1) || (sender.indexOf(badWord) != -1)){ //An exact match was found. return true; }//end if((subject.indexOf... }//end if!(badWord.equals("") }//end while iterator has next return false; }//end screenForBadSubjAndFromLines //===========================================//
//This method tests the sender of the message // and the subject of the message against the // list of items in the goodPhraseFile. Returns // true on match, false otherwise. private boolean isGood(){ boolean match = false; //Get the subject, decode if necessary, and // convert to upper case subject = BigDog02b.readLines(pathFileName, "Subject:","Subject:"); subject = BigDog02b.decodeSubj(subject); subject = subject.toUpperCase();
//Get the sender and convert to upper case sender = BigDog02b.readLines(pathFileName, "From:","From:"); sender = sender.toUpperCase();
//The Subject and From lines have been // captured. Screen each of them against // an upper case version ofwords and // phrases in a TreeSet object containing // good email addresses and subjects. match = screenForGoodSubjAndFromLines(); return match; }//end isGood method //===========================================//
//This method screens the Subject and From // lines to determine if they contain good // subjects or email addresses. If so, the // method returns true. Otherwise, it returns // false. An exact match on an upper-case basis // is required private boolean screenForGoodSubjAndFromLines(){ Iterator iterator = goodPhraseList.iterator(); while(iterator.hasNext()){ String goodWord = ((String)(iterator.next())). toUpperCase(); if(!(goodWord.equals(""))){ if((subject.indexOf(goodWord) != -1) || (sender.indexOf(goodWord) != -1)){ //An exact match was found. System.out.println("ngoodWord:" + goodWord); return true; }//end if((subject.indexOf... }//end if!(goodWord.equals("") }//end while iterator has next return false; }//end screenForGoodSubjAndFromLines //===========================================//
//Purpose: To create a TreeSet object // containing words used to screen the message // From and Subject lines. //This method reads strings from a text file // and creates the list as a TreeSet object // with no duplicates. //Only the primary portion of the good // Email address should be included in the // file used to create the list. This would // be x@y.z
//After creating the list, it writes the data // from the list into a backup file named // ....bakN, where N is the value of the // next available file name in the directory. //A new backup file with a unique name is // created each time the program is run. Once // the number of backup files reaches 5, the // program automatically deletes the oldest // file before creating a new backup // file. Thus the program automatically // maintains a sequence of five backup files // with extensions .bak0 through bak5 with one // number missing. The age-order of the files // should be determined by the modificatin date // and not by the name of the file. //The data read from the file is converted to // upper case before being added to the TreeSet // object.
void makeGoodPhraseList(){ goodPhraseList = new TreeSet();
//Read words or phrases from text file and // populate the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( goodPhraseFile)); String data; //temp holding area
while((data = inData.readLine()) != null){ goodPhraseList.add(data.toUpperCase()); }//end while loop
inData.close();//Close input file
//Write a backup file before making any // modifications to the data.
//First determine the name of the next // backup file allowed in the directory. int N = 0; File theFile = null; String baseFileName = goodPhraseFile. substring(0,goodPhraseFile.indexOf( ".txt")); for(N = 0;N < 6;N++){ theFile = new File(baseFileName + ".bak" + N); if(!(theFile.exists()))break; }//end for loop
//Cause N to rotate from 0 through 5 if(N == 5){//del file 0 for use next time new File(baseFileName + ".bak0").delete(); }//end if else{//delete the next file in sequence if(new File( baseFileName + ".bak" + (N + 1)).exists()){ new File( baseFileName + ".bak" + (N + 1)).delete(); }//end if }//end else
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( theFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = goodPhraseList.iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();} }//end makeGoodPhraseList //===========================================//
//Purpose: To create a TreeSet object // containing words used to screen the message // From and Subject lines. //This method reads strings from a text file // and creates the list as a TreeSet object // with no duplicates. //Only the primary portion of the bad // Email address should be included in the // file used to create the list. This would // be x@y.z
//After creating the list, it writes the data // from the list back out into the file. This // is done to keep the contents of the file // sorted in upper case. Since the program // doesn't modify the contents of the list, // there is no point in creating backup files.
void makeBadPhraseList(){ badPhraseList = new TreeSet();
//Read words or phrases from text file and // populate the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( badPhraseFile)); String data; //temp holding area
while((data = inData.readLine()) != null){ badPhraseList.add(data.toUpperCase()); }//end while loop
inData.close();//Close input file
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( badPhraseFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = badPhraseList.iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();} }//end makeBadPhraseList //===========================================//
//This method passes the message through a spam // screener to determine if it should be // considered spam. The screener program // produces and returns a score based on the // number of hits against offensive words and // phrases. The number of hits is compared to // a hitLimit value that is established in the // general instance variables at the beginning // of the program. When the number of hits // reaches that value, the screener terminates // in order to avoid wasting time. If that // limit has been reached, this method returns // true indicating that the message is thought // to be spam. Otherwise, it returns false. If // it returns true, the control program invokes // the method named processSpam to deal with // the message. private boolean isSpam(){ BigDog02SpamScreen01 screener = new BigDog02SpamScreen01(dataPath, subjAndHtmlPhraseFile, rawTextPhraseFile, hitLimit);
hitCount = screener.screenMsg(pathFileName); if(hitCount >= hitLimit){ return true; }else{ return false; }//end else }//end isSpam method //===========================================//
void deleteFile(String pathFileName){ File tempFile = new File(pathFileName); if(tempFile.exists()){ boolean deleted = tempFile.delete(); if(deleted){ System.out.println( "Deleted: " + pathFileName); }//end if }//end if
}//end deleteFile
}//end class BigDog02k //=============================================//
Listing 5
|
File BigDog02m
/*File BigDog02m.java Copyright 2003, R.G.Baldwin Rev 03/07/04
The purpose of this program is to process text files produced by BigDog02g for the purpose of using the information contained in those files to update the word list stored in BigDog02SubjAndHtml.txt
This program should be run following BigDog02k. It is used to train the subject line screener to do a better job of detecting spam in the subject lines and in the HTML body of messages. It should not be used in an attempt to train the raw body text screener, except that when this program displays raw body text, that text can be manually copied to the clipboard and then pasted into the text file named BigDog02RawText.txt.
In operation, a large block of message files should be manually copied from the archive folder to the folder named temp. Then BigDog02k should be run to delete {GD}, {BD}, and {SP} files and also to delete {QU} files with a spam hit count greater than zero. This program should then be run in an attempt to find and save offensive words and phrases in the subject line and HTML body that would cause those files to experience a spam hit count greater than zero in the future. This serves to reduce the number of {QU} messages that must be examined following the running of either BigDog02i or BigDog02j.
Tested using SDK 1.4.2 under WinXP ************************************************/ import java.io.*; import java.util.*; import java.awt.*; import java.awt.event.*;
class BigDog02m extends Frame{
BufferedReader inData; TextArea textArea = new TextArea(12,50); Button copyButton = new Button( "Copy Selected Text"); Button postButton = new Button("Post Text"); Button deleteButton = new Button( "Delete Local File"); Button nextButton = new Button("Next"); TextField fromField = new TextField( "From data will appear here",50); TextField subjField = new TextField( "Subject data will appear here",50); TextField outputWordField = new TextField( "User pastes output words here",50); TextField operMsgField = new TextField( "User instructions appear here. " + "Press Next to process first message.",50); TreeSet subjWordList; String[] dirList; int fileCounter = 0; String dataPath = "./temp/"; File dataDir = new File(dataPath); String msgToUser = "nPost phrases for this message.n" + "Then press Next to process next message.";
public static void main(String[] args){ BigDog02m thisObj = new BigDog02m(); thisObj.makeSubjWordList(); }//end main //===========================================//
BigDog02m(){//constructor //Register a window listener to service // the close button on the Frame. This is // an anonymous class defiition. this.addWindowListener( new WindowAdapter(){ public void windowClosing(WindowEvent e){ //Write the updated word list stored in // a TreeSet object to an output file // on shutdown. It is also written // when you click the Next button and // there are no remaining files to be // processed. writeSubjWordList(); System.exit(0); }//end windowClosing }//end WindowAdapter() );//end addWindowListener
setLayout(new FlowLayout());
//Register an ActionListener on the // nextButton. nextButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){
//Protect against ArrayIndexOutOfBounds if((fileCounter >= 0) && (fileCounter < dirList.length)){
if(fileCounter == (dirList.length - 1)){ //The user clicked the Next button // but there are no more files. //Write the modified word list // stored in the TreeSet object to // an output file. This also // happens when the user clicks the // close button on the Frame later, // but this write operation is // provided here just in case the // user terminates without pressing // the close button. The user can // post additional words to the // TreeSet object after this write // operation occurs. That is why // an additional write operation // occurs when the user presses // the close button. writeSubjWordList(); msgToUser = "nnNo more messages." + "nPost phrases for this " + "message.n Then press " + "close to terminate."; //Disable the Next button so that // the user cannot fire any more // events of this type. nextButton.setEnabled(false); }//end if no more messages
//Identify the file being processed textArea.setText("Processing " + dirList[fileCounter] + "n");
//Provide instructions to the user. operMsgField.setText("Paste a phrase" + " in the output field and press " + "Post. Post as many new phrases " + "as you want. Press next to " + "process next message."); outputWordField.setText("Paste " + "output phrase here and then " + "press Post.");
try{ //Open the file containing a local // copy of the message.
inData = new BufferedReader( new FileReader(dataPath + dirList[fileCounter]));
String data; //temp holding area
//Precondition the display of // Subject in the GUI by skipping // header lines prior to the // Subject line. Mark the beginning // of the file. Set the // readAheadLimit to 10000 // characters before the mark will // be lost. inData.mark(10000); //Some messages may not contain a // Subject or From line. Don't // want the old one to continue to // be visible in the GUI. subjField.setText( "No Subj line found yet"); fromField.setText( "No From line found yet"); while((data = inData.readLine()) != null){ //A null result indicates end of // file.
//Trap the Subject line, decode // if necessary, convert it to // upper case, and display it in // a field on the GUI. if(data.startsWith("Subject:")){
data = decodeSubj(data); subjField.setText( data.toUpperCase()); break;//No need to keep reading }//end if(data.startsWith("Subj.. }//end while loop on null
//Reset back to beginning of file. // The Subject for this message is // now showing in the GUI. inData.reset();
//Precondition the display of From // line in the GUI by skipping // header lines prior to the From // line. Code is similar to that // discssed above. while((data = inData.readLine()) != null){ if(data.startsWith("From:")){ fromField.setText( data.toUpperCase()); break; }//end if }//end while loop on null
//Reset back to beginning of file. // The From line for this message // is now showing in the GUI. Read // and display the entire file. // This data is displayed for // informtion purposes only to help // the user decide what to do in // terms of updating the word list // used by BigDog02i or BigDog02j // for processing the Subject line. inData.reset();
//Start by reading the entire // message into a single upper case // String object with no line // breaks. Limit the size of the // file that the program is willing // to read. int lineLimit = 500; int lineCount = 0; inData.reset();//rewind input String msgString = ""; while(((data = inData.readLine()) != null) && ++lineCount < lineLimit){ msgString += data + "n"; }//end while data != null
if(lineCount == lineLimit){ System.out.println( dirList[fileCounter] + " terminated, excessive " + "length"); }//end if(lineCount == lineLimit)
//Expand base64 data in msg body. msgString = decodeBody(msgString). toUpperCase(); msgString = removeNewLine( msgString);
//Get and display embedded email // addresses String emailString = getEmailAddrs( msgString);
//Get the HTML as a single string String cleanString = getCleanHtmlString(msgString);
String msgToUser = "Initial msgToUser"; if(cleanString != null){ msgToUser = "This is clean HTMLn"; msgString = cleanString; }//end if(cleanString != null) else{//cleanString == null msgToUser = "No HTML found, this" + " is raw textn"; }//end cleanString == null
//Display on multiple lines int lineLen = 90; int cnt = 0; for(cnt = 0; cnt < (msgString.length()) /lineLen; cnt++){ textArea.append( msgString.substring( lineLen*cnt, lineLen*cnt+lineLen) + "n"); }//end for loop //Display remaining characters textArea.append( msgString.substring( lineLen*(cnt-1)+lineLen) + "n"); textArea.append("n" + msgToUser + "n");
}catch(Exception ex){ ex.printStackTrace();}
//Increment the fileCounter so that // the next time the Next button // fires an ActionEvent, the next // file in the directory listing will // be processed. fileCounter++;
}//end if on fileCounter in bounds else{ //File counter out of bounds. This // happens if you delete all the // files. textArea.setText( "No more files. Press Close to " + "terminate."); nextButton.setEnabled(false); }//end else counter is out of bounds }//end actionPerformed }//end ActionListener );//end addActionListener
//Register an object of the following // anonymous class on both the Post button // and the outputWordField. That way, the // contents of the outputWordField can be // posted to the new word list by either // clicking the Post button, or pressing the // Enter key when the outputWordField has the // focus. ActionListener postListener = new ActionListener(){ public void actionPerformed( ActionEvent e){ //Get the word or phrase from the field // and add it to the TreeSet object. String tempWord = outputWordField.getText(); subjWordList.add(tempWord);
//Provide feedback to confirm that it // has been posted. This tells the // user that she is free to post // another word if she desires. outputWordField.setText( tempWord + " posted"); }//end actionPerformed };//end ActionListener
//Register the ActionListener object on // the two source objects. postButton.addActionListener(postListener); outputWordField.addActionListener( postListener);
//Register an ActionListener on the // copyButton to copy selected text to the // outputWordField. First tries to copy // selected text from the Subject. If that // produces an empty string, tries to copy // selected text from the text area. There // must not be any text selected in the // Subject in order to copy selected text // from the text area. copyButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){ String selected = subjField.getSelectedText();
if(selected.equals("")){ selected = textArea.getSelectedText(); }//end if(selected.equals("")) outputWordField.setText(selected); }//end actionPerformed }//end new ActionListener );//end addActionListener
//Register an ActionListener on the Delete // button to make it possible for the // user to remove a file from the local // directory. deleteButton.addActionListener( new ActionListener(){ public void actionPerformed( ActionEvent e){ //Delete the local file currently being // displayed in the GUI. Must subtract // one from the value of the file // counter to cause it to reference the // current file because it has already // been incremented by the event // handler for the Next button in // preparation for processing the next // file.
//Create a File object that represents // the current file. File tempFile = new File(dataPath + dirList[fileCounter-1]);
if(tempFile.exists()){ try{ inData.close(); }catch(Exception ex){ ex.printStackTrace();} tempFile.delete();//Delete the file }//end if
//Fire a synthetic event on the Next // button to cause the program to // process the next file in the // directory listing without user // interaction. Toolkit.getDefaultToolkit(). getSystemEventQueue(). postEvent(new ActionEvent( nextButton, ActionEvent. ACTION_PERFORMED, "Next")); }//end actionPerformed }//end ActionListener );//end addActionListener
//Configure the GUI by placing the various // components on it. add(copyButton); add(postButton); add(nextButton); add(deleteButton); add(fromField); add(subjField); add(outputWordField); add(operMsgField); add(textArea); setTitle("Copyright 2004, R.G.Baldwin"); //Will need to make the GUI narrower in order // to create the figures for publication. setSize(400,400); //Make the GUI visible. setVisible(true);
//The following code creates a directory // listing containing only those files that // start with +OK. dirList = dataDir.list( new FilenameFilter(){ public boolean accept( File dir,String name){ if(!(new File(dir,name). isFile())) return false; return name.startsWith("+OK"); }//end accept }//end FilenameFilter );//end list
//Create a message in the text area at // startup showing the list of files in the // directory that are available for // processing. this.textArea.append("Files to be processed" + "n"); //Display the list of files for(int cnt = 0;cnt < dirList.length;cnt++){ this.textArea.append(dirList[cnt] + "n"); }//end for loop
}//end constructor //===========================================//
//Purpose: To create a TreeSet object // containing words used to filter the message // subject lines in the program named // Pop303.java. //This method reads strings from a text file // named BigDog02SubjAndHtml.txt and creates // the list as a TreeSet object sorted in // natural order with no duplicates. //After creating the list, it writes the data // from the list into a backup file named // Pop303a.bakN, where N is the value of the // next available file name in the directory. //A new backup file with a unique name is // created each time the program is run. Once // the number of backup files reaches 5, the // program automatically deletes the oldest // file before creating a new backup // file. Thus the program automatically // maintains a sequence of five backup files // with extensions .bak0 through bak5 with one // number missing. The age-order of the files // should be determined by the modificatin date // and not by the name of the file. //The data read from the file is converted to // upper case before being added to the TreeSet // object.
void makeSubjWordList(){ subjWordList = new TreeSet();
//Read words or phrases from text file and // populate the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( "BigDog02SubjAndHtml.txt")); String data; //temp holding area
while((data = inData.readLine()) != null){ subjWordList.add(data.toUpperCase()); }//end while loop
inData.close();//Close input file
//Write a backup file before making any // modifications to the data.
//First determine the name of the next // backup file allowed in the directory. int N = 0; File theFile = null; for(N = 0;N < 6;N++){ theFile = new File( "BigDog02SubjAndHtml.bak" + N); if(!(theFile.exists()))break; }//end for loop
//Cause N to rotate from 0 through 5 if(N == 5){//del file 0 for use next time new File("BigDog02SubjAndHtml.bak0"). delete(); }//end if else{//delete the next file in sequence if(new File( "BigDog02SubjAndHtml.bak" + (N + 1)).exists()){ new File( "BigDog02SubjAndHtml.bak" + (N + 1)).delete(); }//end if }//end else
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( theFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = subjWordList.iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();} }//end makeSubjWordList //===========================================//
//Purpose: To write the data from a TreeSet // object into a file named // BigDog02SubjAndHtml.txt that is used in the // programs named BigDog02i or BigDog02j to // filter the message subject lines. //This method is the reverse of the method // named makeSubjWordList.
void writeSubjWordList(){ try{ DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( "BigDog02SubjAndHtml.txt"));
//Use an iterator to access the data in // the TreeSet object. Iterator iter = subjWordList.iterator(); String data;
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close(); }catch(Exception e){e.printStackTrace();} }//end SubjWordList //===========================================//
//Removes newline characters from an incoming // String object and converts them to spaces. String removeNewLine(String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("n"); if(index > -1){ stringBuf.delete(index,index+1); stringBuf.insert(index," "); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end removeNewLine() //===========================================//
//This method is called to decode a Subject // line.
//Sometimes the Subject line is encoded using // techniques designed to allow the use of // non-ASCII characters in message headers // (See RFC2047). //The following code determines if the Subject // line has been encoded using the ISO-8859-1 // character set with an encoding value of B or // Q. If so, the encoded material is decoded. //Messages with an encoding value of Q contain // a mixture of ASCII characters and encoded // characters, so it is possible to partially // read them without the need for decoding. // They also sometimes use an underscore in // place of a space to make them more readable. private String decodeSubj(String data){ try{ if(data.toUpperCase().indexOf( "=?ISO-8859-1?B?") != -1){ //Need to decode for value of B. int startIndex = data.toUpperCase(). indexOf("=?ISO-8859-1?B?") + 15; int endIndex = data.length()-2; sun.misc.BASE64Decoder dec = new sun.misc.BASE64Decoder(); data = "Subject: " + "=?ISO-8859-1?B? " + new String(dec.decodeBuffer( data.substring(startIndex,endIndex))); }//end if..."=?ISO-8859-1?B?"
if(data.toUpperCase().indexOf( "=?ISO-8859-1?Q?") != -1){ //Need to decode for value of Q. int startIndex = data.toUpperCase(). indexOf("=?ISO-8859-1?Q?") + 15; int endIndex = data.length()-2; String decodedData = data.substring( startIndex,endIndex);
//Decode non-ASCII characters StringBuffer stringBuf = new StringBuffer(decodedData); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("="); if(index > -1){ String hexString = new String(stringBuf).substring( index+1,index+3); char decodedChar = (char)Integer.parseInt( hexString.trim(),16); stringBuf.delete(index,index+3); stringBuf.insert(index,decodedChar); }//end if }//end while(index > -1)
//Replace underscore with space. index = 0; while(index > -1){ index = stringBuf.lastIndexOf("_"); if(index > -1){ stringBuf.deleteCharAt(index); stringBuf.insert(index,' '); }//end if }//end while(index > -1)
data = "Subject: " +"=?ISO-8859-1?Q? " + new String(stringBuf); }//end if..."=?ISO-8859-1?Q?" }catch(Exception ex){ex.printStackTrace();} return data; }//end decodeSubj //===========================================//
//Expand base64 data in msg body. private String decodeBody(String data){ String decodedData = ""; int currentPartIndex; int nextPartIndex; try{ if(data.toUpperCase().indexOf( "Content-Transfer-Encoding: base64". toUpperCase()) != -1){ //This message has base64 encoding if((data.toUpperCase().indexOf( "Content-Type: text/html". toUpperCase()) != -1) && (data.toUpperCase().indexOf( "Content-Type: multipart". toUpperCase()) == -1)){ //This is a non-multipart message with // base64 encoding. //Locate the end of the header. int base64Index = data.indexOf( "Status:"); if(base64Index != -1){ int crIndex = data.indexOf( "n",base64Index); String tempStr = data.substring( crIndex+2,data.length()); sun.misc.BASE64Decoder dec = new sun.misc.BASE64Decoder(); decodedData = "Start base64 " + new String( dec.decodeBuffer(tempStr)) + " End base64"; }//end if(base64Index != -1) }//end if((data.toUpperCase().indexOf(... else{ int boundaryIndex = data.indexOf( "boundary="); int newLineIndex = data.indexOf( "n",boundaryIndex);
if(boundaryIndex != -1){ String multipartCode = data.substring( boundaryIndex+10,newLineIndex-1); nextPartIndex = data.indexOf( multipartCode,newLineIndex+1); while(nextPartIndex != -1){ int base64Index = data.indexOf( "Content-Transfer-Encoding: " + "base64",nextPartIndex); currentPartIndex = nextPartIndex; nextPartIndex = data.indexOf( multipartCode,nextPartIndex+1); if((base64Index != -1) && (base64Index < nextPartIndex)){
//Don't process .gif or .jpg file // attachments String partBody = data.substring( currentPartIndex, nextPartIndex).toUpperCase(); if((partBody.indexOf(".GIF") == -1) && (partBody.indexOf( ".JPG") == -1)){ //gif image not found. Process // the data int crIndex = data.indexOf( "n",base64Index);
//Search for the required blank // line preceeding the block // of base64 data //Prevent infinite loop on bad // data int count = 0; char firstChar = data.charAt( crIndex+1); while((firstChar != 'n') && (count < 100)){ crIndex = data.indexOf( "n",crIndex+1); firstChar = data.charAt( crIndex+1); count++; }//end while
String tempStr = data.substring( crIndex+2,nextPartIndex); sun.misc.BASE64Decoder dec = new sun.misc.BASE64Decoder(); decodedData += new String( dec.decodeBuffer(tempStr)); decodedData += "nn-----End " + "base64 part-----nn"; }//end if(partBody.toUpperCa... else{ decodedData += "-----Image " + "stripped off-----"; }//end else }//end if(base64Index != -1) else{ if(nextPartIndex != -1){ decodedData += data.substring( currentPartIndex, nextPartIndex); decodedData += "nn-----End " + "non-base64 part-----nn"; }//end if(nextPartIndex != -1) }//end else }//end while loop on nextPartIndex... }//end if(boundaryIndex != -1) }//end else return decodedData; }//end if(data.toUpperCase().indexOf("Co... else{ //This msg does not have base64 encoding return data; }//end else }catch(Exception ex){ex.printStackTrace();} return "Make Compiler Happy"; }//end decodeBody //===========================================//
//This method receives an incoming string. It // searches the string for all occurrences of // the @ character. When it finds an @ // character, it extracts the substring that // includes that character along with 50 // previous and 15 following characters. It // appends a n to the substring and appends // it to an output string. //The purpose is to return a string containing // concatenated substrings, each of which // probably contains an Email address. //If it doesn't find any @ characters, it // returns null. private String getEmailAddrs(String data){ String dataOut = ""; int index = data.indexOf("@"); if(index == -1) return null; while(index != -1){ if(index > 50){ //Eliminate as much non-ASCII data as // possible by testing following // characters for non-ASCII values if((data.charAt(index+1) < 126) && (data.charAt(index+2) < 126) && (data.charAt(index+3) < 126) && (data.charAt(index+4) < 126) && (data.charAt(index+5) < 126) && (data.charAt(index+6) < 126) ){ dataOut += data.substring(index - 50, index + 15) + "n"; }//end if }else{ dataOut += data.substring(0,index + 15) + "n"; }//end else index = data.indexOf("@",index+1); }//end while loop return dataOut; }//end getEmailAddrs //===========================================//
private String getCleanHtmlString( String msgString){ String cleanString = removeTags(msgString);
if(cleanString != null){ cleanString = repNbsp(cleanString); if(cleanString != null){ cleanString = remEntities(cleanString); if(cleanString != null){ cleanString = remEquals(cleanString); if(cleanString != null){ cleanString = remTabs(cleanString); if(cleanString != null){ cleanString = remMultipleSpaces( cleanString); }//end if(cleanString != null){ }//end if(cleanString != null){ }//endif(cleanString != null){ }//endif(cleanString != null){ }//end if(cleanString != null)
return cleanString; }//end method getCleanHtmlString //===========================================//
//This method determines if a message // contains HTML and removes all tags. If // there is no HTML in the message text, it // returns null. private String removeTags(String msgString){ int isHtml = -1; int startIndex = -1; int endIndex = -1;
//Search for clues that the message // contains HTML. isHtml = msgString.indexOf("<HTML"); if(isHtml == -1) isHtml = msgString.indexOf("<BODY"); if(isHtml == -1) isHtml = msgString.indexOf("<FONT"); if(isHtml == -1) isHtml = msgString.indexOf("<DIV"); if(isHtml == -1) isHtml = msgString.indexOf("<STRONG"); if(isHtml == -1) isHtml = msgString.indexOf("<BR"); if(isHtml == -1) isHtml = msgString.indexOf("<TABLE"); if(isHtml == -1) isHtml = msgString.indexOf("<SPAN"); if(isHtml == -1) isHtml = msgString.indexOf("<UL"); if(isHtml == -1) isHtml = msgString.indexOf("<OL"); if(isHtml == -1) isHtml = msgString.indexOf("<P>");
if(isHtml != -1){ //Msg contains HTML but not in very good // form since it is missing the matching // HTML tags.
//Eliminate as much of the header as // possible by finding the location of // the last identifiable item in the // message header and discarding // everything prior to that point.
int tempIndex = -1; startIndex = -1; String line = "";
//Create an array of valid header lines. String[] headerLines = {"STATUS:", "X-MAILSCANNER:", "X-MAILSCANNER-INFORMATION:", "X-MSMAIL-PRIORITY:", "X-PRIORITY:", "X-MAILER:", "DATE:", "SUBJECT:", "REPLY-TO:", "FROM:", "MESSAGE-ID:", "RECEIVED:" };//end array definition
for(int cnt = 0; cnt < headerLines.length;cnt++){ tempIndex = msgString.lastIndexOf( headerLines[cnt]); if(tempIndex > startIndex){ //Save the larger index value startIndex = tempIndex; //Save corresponding header line line = headerLines[cnt]; }//end if }//end for loop
if(startIndex != -1){ //Use that header line to eliminate // everything prior from the message // header. msgString = msgString.substring( startIndex); }//end if(startIndex != -1) }//end if(isHtml != -1)
//Process the string if it contains HTML if(isHtml != -1){ //msgString has been determined to contain // HTML. //Insert a dummy first character to ensure // that the first character is not the // beginning of a tag. msgString = "X" + msgString; int leftIndex=0; int rightIndex=0; String outputString = ""; while(leftIndex != -1){ leftIndex = msgString.indexOf( '<',rightIndex); if((leftIndex != -1) && (rightIndex != -1)){ outputString += msgString.substring( rightIndex+1,leftIndex); rightIndex = msgString.indexOf( '>',leftIndex); //Have to deal with missing > char, // particularly for truncated messages. if(rightIndex > (msgString.length() - 2)){ //Don't try to process the last few // characters when left and right // angle brackets don't match. break; }//end if(rightIndex > (msgString... if(rightIndex == -1){ //Create an artificial right angle // bracket to replace the missing // one. rightIndex = leftIndex + 1; }//end if(rightIndex == -1) }//end ((leftIndex != -1) && ... }//end while(leftIndex != -1) //Get text at the tail end. if((rightIndex + 1) < msgString.length()){ outputString += msgString.substring( rightIndex+1); }//end if if(outputString.equals("")){ //msgString contained HTML, but it was // all removed in the cleanup process. // The output string is empty. return null; }else{ //Return the string produced by removing // HTML material from msgString. return outputString; }//end else }//end if(isHtml != -1) else{ //Apparently msgString doesn't contain // HTML. return null; }//end else }//end removeTags //===========================================//
//Purpose of this method is to replace all // occurences of "&NBSP;" with " " private static String repNbsp( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("&NBSP;"); if(index > -1){ stringBuf.replace(index,index+6," "); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end repNbsp() //===========================================//
//Removes entities from an HTML body identified // by the string &...; Converts those entities // that represent English language characters // and punctuation (32 - 126) to the // corresponding character and inserts it into // the message text. private static String remEntities( String msgString){
//Insert a dummy first character msgString = "X" + msgString; int leftIndex=0; int rightIndex=0; String outputString = ""; while(leftIndex != -1){ leftIndex = msgString.indexOf( '&',rightIndex); if((leftIndex != -1) && (rightIndex != -1)){
if(leftIndex > rightIndex){ outputString += msgString.substring( rightIndex+1,leftIndex); }//end if rightIndex = msgString.indexOf( ';',leftIndex);
if((leftIndex != -1) && (rightIndex != -1)){ String extract = msgString.substring( leftIndex,rightIndex + 1). toUpperCase();
//Make sure we didn't extract good text // by accident. Apparently real entity // cannot contain more than seven // characters, as in &#nnnn; Remove // spaces before making the test. if(remSpaces(remEquals( remTabs(extract))).length() > 6){ //Apparently not an entity. Put it // back in the text. outputString += extract; }//end if(rightIndex-leftIndex > 6) else{ //Remove any spaces prior to further // processing extract = remSpaces(remEquals( remTabs(extract))); }//end else
//Convert English language character // entities to characters and insert // them in the text. //Don't try to restore HEX // representations at this time. Maybe // add that later. Ignore extracted // sequences longer than six // characters. try{ if((extract.charAt(1) == '#') && (extract.charAt(2) != 'X') && (extract.length() <=6)){ //Get the internal characters of // the entity. String strValue = extract. substring(2,extract.length()-1); //Try to convert to a numeric char // type. May throw an exception. char theChar = (char)Integer.parseInt(strValue); //Ignore all but English language // characters and punctuation. if((theChar >= 32) && (theChar <= 126)){ char[] charArray = {theChar}; String theStr = new String( charArray).toUpperCase(); outputString += theStr; }//end ((theChar >= 32) && ... }//end ((extract.charAt(1) == .. }catch(NumberFormatException ex){ //Ignore it. It is apparently a // badly formed entity. }//end catch }//end if((leftIndex != -1) && ...
//Have to deal with missing ; char. if(rightIndex > (msgString.length() - 2)){ //Don't try to process the last few // characters when left and right // angle brackets aren't matching. break; }//end if(rightIndex > (msgString... if(rightIndex == -1){ //Create an artificial right ; char // to replace the missing one. rightIndex = leftIndex + 1; }//end if(rightIndex == -1) }//end if((leftIndex != -1)... }//end while(leftIndex != -1) //Get the text at the tail end if((rightIndex + 1) < msgString.length()){ outputString += msgString.substring( rightIndex+1); }//end if((rightIndex + 1)< msgString...
if(outputString.equals("")){ //The entire string was apparently made up // of entities. It is empty now. return null; }else{ return outputString; }//end else }//end remEntities //===========================================//
//Method removes all '=' characters from an // incoming string. private static String remEquals( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("="); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end remEquals() //===========================================//
//Method removes tab characters from an // incoming string. private static String remTabs( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("t"); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end remTabs() //===========================================//
//Method removes space characters from an // incoming string. private static String remSpaces( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf(" "); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while String outputString = new String(stringBuf); return outputString; }//end remSpaces() //===========================================//
//Method converts all multiple spaces to a // single space. This is not ideal. If there // are multiple spaces within a word, all but // one of the spaces will be removed, leaving // one extraneous space in the word. private static String remMultipleSpaces( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf(" "); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end remMultipleSpaces() //===========================================//
}//end class BigDog02m //=============================================//
Listing 6
|
File BigDog02SpamScreen01
/*File BigDog02SpamScreen01.java Copyright 2004, R.G.Baldwin Rev 02/27/04
This class implements a set of rules for detecting SPAM messages. An int value is returned showing the number of hits against offensive words and phrases that occur up to a specified hitLimit. The subject and a clean version of HTML content is screened against one list of offensive words and phrases. Raw body text is screened against a different list of offensive words and phrases. The process of screening raw body text tends to be rather slow so care should be taken to keep that list short.
An object of this class has one entry point and one exit point, which is the public method named screenMsg. ************************************************/
import java.util.*; import java.io.*;
public class BigDog02SpamScreen01{
TreeSet subjAndHtmlPhraseList; TreeSet rawTextPhraseList; String subjAndHtmlPhraseFile; String rawTextPhraseFile; String phrase; String dataPath; int hitLimit;
public BigDog02SpamScreen01(String dataPath, String subjAndHtmlPhraseFile, String rawTextPhraseFile, int hitLimit){
this.dataPath = dataPath; this.subjAndHtmlPhraseFile = subjAndHtmlPhraseFile; this.rawTextPhraseFile = rawTextPhraseFile; this.hitLimit = hitLimit; //Read the files containing words and // phrases and create TreeSet objects // containing those words and phrases in // alphabetical order with no duplicates. makeSubjAndHtmlPhraseList(); makeRawTextPhraseList(); }//end constructor //===========================================//
//This method is used to identify spam // messages and to return an int value that // indicates the number of hits against // offensive words and phrases up to a limit // of hitLimit.
public int screenMsg(String pathFileName){ BufferedReader inData = null; int hitCount = 0;
try{ //Open the file containing a local copy of // the message. inData = new BufferedReader(new FileReader( pathFileName)); String data;
//Get the Subject line by skipping header // lines prior to the Subject line. Mark // the beginning of the file to make it // easy to rewind later. Set the readAhead // Limit to 150000 characters before the // mark will be lost. Limit the size of the // file that the program is willing to // read. inData.mark(150000); int lineLimit = 1000; int lineCount = 0; String subject = "No subj found";
while(((data = inData.readLine())!= null) && ++lineCount < lineLimit){ if(data.toUpperCase().startsWith( "SUBJECT:")){ subject = decodeSubj(data); }//end if(data starts with SUBJECT) }//end while readLine != null
//Reset back to beginning of file. The // Subject for this message has now been // saved. inData.reset();
//Screen the Subject line against a list of // offensive words and phrases. hitCount = screenForOffensiveSubject( hitCount,subject); if(hitCount >= hitLimit){ inData.close(); return hitCount; }//end if(hitCount >= hitLimit)
//Screen HTML (if any) for offensive words // and phrases. //Start by reading the entire message into // a single upper case String object. // Limit the size of the file that the // program is willing to read to avoid // excessive delays in screening very large // files. lineCount = 0; inData.reset();//rewind the input stream String msgString = ""; while(((data = inData.readLine())!= null) && ++lineCount < lineLimit){ msgString += data + "n"; }//end while data != null
if(lineCount == lineLimit){ System.out.println(pathFileName + " terminated, excessive length"); }//end if(lineCount == lineLimit)
//Expand base64 data in msg body. msgString = decodeBody(msgString). toUpperCase(); msgString = removeNewLine(msgString);
//Screen the HTML portion of the string for // offensive words and phrases. hitCount = screenForOffensiveHtml( msgString,hitCount); if(hitCount >= hitLimit){ inData.close(); return hitCount; }//end hitCount >= hitLimit)
//Screen the raw body text for offensive // words or phrases. This is last in the // sequence because it probably takes the // longest amount of time to accomplish. hitCount = screenForOffensiveRawText( msgString,hitCount); if(hitCount >= hitLimit){ inData.close(); return hitCount; }//end if (hitCount >= hitLimit)
inData.close(); return hitCount;//with hitCount < hitLimit }catch(Exception e){e.printStackTrace();} return hitCount;//make compiler happy }//end screenMsg //===========================================//
//This method tests a string to see if it // contains a word or phrase that may have // extraneous characters inserted into it, // such as VI*A-GRA. //If the string contains the sequence of // characters making up the word or phrase, // with spanLim or fewer extraneous characters // between any two of the word's characters, // the method returns true. For example, if // spanLim = 1, the spammer can insert one // character between any two of the characters // that make up the word and the word will // still be detected. However, if the // spammer inserts two or more characters, // the offending word will not be detected. //Need to be careful to avoid making spanLim // too large. Large values of spanLim result // in false alarms due to the fact that // widely-separated characters can be // considered to be part of the word or // phrase. For example, if spanLim = 2 or // greater, the word PORN will be found in // the word imPORtaNt. private int matchPhrase(String data, String phrase, int spanLim){ this.phrase = phrase; StringBuffer str = new StringBuffer(); ArrayList locationData = new ArrayList();
//Compare each char in the data with each // unique char in the word or phrase. If // there is a match, append the char to str // and save the location of the char in // the ArrayList referred to by locationData.
//Eliminate duplicate char in the word or // phrase by storing in a TreeSet. Note that // this will also sort the char, but that // doesn't matter. TreeSet treeSet = new TreeSet(); for(int cnt = 0; cnt < phrase.length(); cnt++){ treeSet.add( new Character(phrase.charAt(cnt))); }//end for loop
//Get the unique characters from the set and // save them in a StringBuffer Iterator iter = treeSet.iterator(); StringBuffer tempPhrase = new StringBuffer(); while(iter.hasNext()){ tempPhrase.append( ((Character)(iter.next())).charValue()); }//end while
//Use the StringBuffer of unique characters // to test the string and extract matching // characters from the string. Discard all // non-matching characters. This converts // the original data into a string of // characters, each of which is a character // in the word or phrase. All other // characters have been removed. Thus, if // the data contains the word or phrase, it // will occur somewhere in the compressed // string with no extra characters in // between. An example might be as follows: // SMSPMASPAMMPAS for(int i = 0; i < data.length(); i++){ for(int j = 0; j < tempPhrase.length(); j++){ if(data.charAt(i) == tempPhrase.charAt(j)){ str.append(data.charAt(i)); locationData.add(new Integer(i)); }//end if }//end for on tempPhrase }//end for on data
//Test to see if the extracted char sequence // contains the word or phrase. int match = str.indexOf(phrase); if(match == -1){ return -1;//no match }//end if
//There is a match. Confirm that the span // between target characters in data is not // greater than allowed by the incoming // spanLim parameter. int maxSpan = 0; int locA = ((Integer)locationData. get(match)).intValue(); int locB = 0; int startIndex = locA; for(int cnt = 1; cnt < phrase.length(); cnt++){ locB = ((Integer)locationData.get( match + cnt)).intValue(); int span = locB - locA; if(span > maxSpan){ maxSpan = span; }//end if locA = locB; }//end for loop
if(maxSpan > spanLim+1){ return -1;//span too large }else{ return startIndex;//made a match }//end else
}//end matchPhrase //===========================================//
//Purpose: To create a TreeSet object // containing words used to screen the message // subject lines and HTML text blocks. //This method reads strings from a text file // and creates the list as a TreeSet object // with no duplicates. //See additional comments in the later section // regarding the makeBodyList method.
private void makeSubjAndHtmlPhraseList(){ subjAndHtmlPhraseList = new TreeSet();
//Read word list from text file and populate // the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( subjAndHtmlPhraseFile)); String data; //temp holding area
while((data = inData.readLine()) != null){ subjAndHtmlPhraseList.add( data.toUpperCase());
}//end while loop inData.close();//Close file
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( subjAndHtmlPhraseFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = subjAndHtmlPhraseList. iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close();
}catch(Exception e){e.printStackTrace();} }//end makeSubjAndHtmlPhraseList //===========================================//
//Purpose: To create a TreeSet object // containing words and phrases used to screen // the raw BODY text. See notes above // regarding the list used to screen the // Subject line of each message in the method // named makeSubjAndHtmlPhraseList.
//It is important to maintain these two lists // as separate lists. Because of the much // larger number of characters in the body than // in the Subject, false alarms are much more // likely in the body. Therefore, individual // words that work well when screening the // Subject line may produce false alarms when // screening the body. For example, the word // PORN appears in the word IMPORTANT. It is // much more likely that the word IMPORTANT // will appear somewhere in the body than in // the Subject line (although it may appear in // the Subject line as well, thus producing a // false alarm in both cases). Also, the word // ANTIVIRUS works well in the Subject, but // cannot be used to screen the body because // many servers insert that word into the // message header after they test the message // for viruses. Also, IP addresses and URLs // work well in the body, but rarely appear in // the Subject. Therefore, testing the Subject // against a long list of URLs simply wastes // time.
//The following words (among others) should not // be added to the list for the reasons given:
//PORN may be confused with IMPORTANT //SPAM causes lots of false alarms. I inserted // a space as in "SPAM " to decrease false // alarms. Will probably also decrease valid // hits. //ANTIVIRUS appears in some valid message hdrs //WEIGHT often appears in messages regarding // html fonts //SLUT may be confused with SOLUTION //==End of prohibited list==
private void makeRawTextPhraseList(){ rawTextPhraseList = new TreeSet();
//Read word list from text file and populate // the TreeSet object. try{ BufferedReader inData = new BufferedReader(new FileReader( rawTextPhraseFile)); String data; //temp holding area
while((data = inData.readLine()) != null){ rawTextPhraseList.add(data. toUpperCase());
}//end while loop inData.close();//Close file
//Now write the output file DataOutputStream dataOut = new DataOutputStream( new FileOutputStream( rawTextPhraseFile));
//Use an Iterator object to access the data // in the TreeSet object. Iterator iter = rawTextPhraseList. iterator();
while(iter.hasNext()){ data = (String)iter.next(); dataOut.writeBytes(data + "n"); }//end while
dataOut.close();
}catch(Exception e){e.printStackTrace();} }//end makeRawTextPhraseList //===========================================//
//This method screens the Subject line against // an upper-case version of a list of offensive // words and phrases, returning the number of // hits up to a limit of hitLimit. An exact // match is not required. Rather, the // characters in the offensive phrase in the // Subject may be separated by as many as one // extraneous character. private int screenForOffensiveSubject( int hitCount,String subject){ int matchLocation = -1; Iterator iterator = subjAndHtmlPhraseList.iterator(); while(iterator.hasNext()){ String offensivePhrase = ((String)(iterator.next())). toUpperCase(); if(!(offensivePhrase.equals(""))){ //First try for an exact match because it // is fastest and less prone to false // positives. Award two hits for a // successful exact match. matchLocation = subject.toUpperCase(). indexOf(offensivePhrase);
if(matchLocation != -1){ //An exact match was found. Award one // hit for the exact match and another // hit later for a match of either // type. hitCount++; }else{ //There was no exact match. //Search for a match between the words // and phrases in the // subjAndHtmlPhraseList and the // Subject line allowing for one // extraneous character between the // characters in the Subject line. matchLocation = matchPhrase( subject.toUpperCase(), offensivePhrase,1); }//end else onmatchLocation != -1) }//end if!(offensivePhrase.equals("")
if(matchLocation != -1){ //A match was found. hitCount++; if(hitCount >= hitLimit){ return hitCount; }//end if }//end if matchLocation != -1 }//end while iterator has next return hitCount;//with hitCount < hitLimit
}//end screenForOffensiveSubject //===========================================//
//This method extracts an HTML code block, if // it exists from an incoming string. Then it // converts that block into clean text free of // all manifestations of HTML. //Then it screens the clean text against a list // of offensive words and phrases looking for // exact matches. It returns when the value of // hitCount is equal to hitLimit or when the // end of the clean text is reached, whichever // occurs first. private int screenForOffensiveHtml( String msgString,int hitCount){
String cleanString = getCleanHtmlString( msgString);
if(cleanString != null){ //Screen the cleanString for offensive // words and phrases. Require an exact // match. int indexOfOffensivePhrase = 0; Iterator iterator = subjAndHtmlPhraseList.iterator(); while(iterator.hasNext()){ String offensivePhrase = ((String)(iterator.next())). toUpperCase(); if(!(offensivePhrase.equals(""))){ indexOfOffensivePhrase = cleanString. indexOf(offensivePhrase); if(indexOfOffensivePhrase != -1){ //An exact match was found. Award // two hits for an exact match // because it is less prone to false // positives than a match with // intervening extraneous characters. hitCount++; hitCount++;
if(hitCount >= hitLimit){ return hitCount; }//end }//end if(indexOfOffensivePhrase != -1) }//end if!(offensivePhrase.equals("") }//end while iterator has next }//end if(cleanString != null) else{//cleanString == null return hitCount;//with hitCount < hitLimit }//end cleanString == null return hitCount;//required by compiler }//end screenForOffensiveHtml(); //===========================================//
//Removes newline characters from an incoming // String object. String removeNewLine(String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("n"); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while return new String(stringBuf); }//end removeNewLine() //===========================================//
//Method screens a String containing the entire // raw text for a message against offensive // words and phrases. Hopefully a match will // have been found in one of the faster // processes performed before this one. Allows // one extraneous chracter between each of the // characters in the offending word or phrase. private int screenForOffensiveRawText( String msgString,int hitCount){
int indexOfOffensivePhrase = 0; Iterator iterator = rawTextPhraseList.iterator(); while(iterator.hasNext()){ String offensivePhrase = ((String)(iterator.next())). toUpperCase(); if(!(offensivePhrase.equals(""))){ //First try for an exact match because it // is faster and less prone to false // positives. Award two hits for an // exact match. indexOfOffensivePhrase = msgString.toUpperCase().indexOf( offensivePhrase); if(indexOfOffensivePhrase != -1){ //An exact match was found. Award one // hit for the exact match and another // hit later for a match of either // type. hitCount++; }else{ //An exact match was not found. Try // for a match with intervening // extraneous characters, which is more // prone to false positives. indexOfOffensivePhrase = matchPhrase( msgString,offensivePhrase,1); }//end else on indexOfOffensivePhrase !=.
if(indexOfOffensivePhrase != -1){ //A match was found of one type or the // other. hitCount++;
if(hitCount >= hitLimit){ return hitCount; }//end }//end if(hitCount >= hitLimit) }//end if!(offensivePhrase.equals("") }//end while iterator has next return hitCount;//with hitCount < hitLimit }//end screenForOffensiveRawText //===========================================//
//This method gets and returns a string // extracted from HTML text. Various features // are used to make the string as useful as // practical consistent with speedy operation. private String getCleanHtmlString( String msgString){ String cleanString = removeTags(msgString);
if(cleanString != null){ cleanString = repNbsp(cleanString); if(cleanString != null){ cleanString = remEntities(cleanString); if(cleanString != null){ cleanString = remEquals(cleanString); if(cleanString != null){ cleanString = remTabs(cleanString); if(cleanString != null){ cleanString = remMultipleSpaces( cleanString); }//end if(cleanString != null){ }//end if(cleanString != null){ }//endif(cleanString != null){ }//endif(cleanString != null){ }//end if(cleanString != null) //The following doesn't make sense if(cleanString != null){ return cleanString; }else{ return cleanString; }//end else
}//end method getCleanHtmlString
//===========================================//
//This method determines if a message // contains HTML and removes all tags. If // there is no HTML in the message text, it // returns null. private String removeTags(String msgString){ int isHtml = -1; int startIndex = -1; int endIndex = -1;
//Search for clues that the message // contains HTML. isHtml = msgString.indexOf("<HTML"); if(isHtml == -1) isHtml = msgString.indexOf("<BODY"); if(isHtml == -1) isHtml = msgString.indexOf("<FONT"); if(isHtml == -1) isHtml = msgString.indexOf("<DIV"); if(isHtml == -1) isHtml = msgString.indexOf("<STRONG"); if(isHtml == -1) isHtml = msgString.indexOf("<BR"); if(isHtml == -1) isHtml = msgString.indexOf("<TABLE"); if(isHtml == -1) isHtml = msgString.indexOf("<SPAN"); if(isHtml == -1) isHtml = msgString.indexOf("<UL"); if(isHtml == -1) isHtml = msgString.indexOf("<OL"); if(isHtml == -1) isHtml = msgString.indexOf("<P>");
if(isHtml != -1){ //Msg contains HTML but not in very good // form since it is missing the matching // HTML tags.
//Eliminate as much of the header as // possible by finding the location of // the last identifiable item in the // message header and discarding // everything prior to that point.
int tempIndex = -1; startIndex = -1; String line = "";
//Create an array of valid header lines. String[] headerLines = {"STATUS:", "X-MAILSCANNER:", "X-MAILSCANNER-INFORMATION:", "X-MSMAIL-PRIORITY:", "X-PRIORITY:", "X-MAILER:", "DATE:", "SUBJECT:", "REPLY-TO:", "FROM:", "MESSAGE-ID:", "RECEIVED:" };//end array definition
for(int cnt = 0; cnt < headerLines.length;cnt++){ tempIndex = msgString.lastIndexOf( headerLines[cnt]); if(tempIndex > startIndex){ //Save the larger index value startIndex = tempIndex; //Save corresponding header line line = headerLines[cnt]; }//end if }//end for loop
if(startIndex != -1){ //Use that header line to eliminate // everything prior from the message // header. msgString = msgString.substring( startIndex); }//end if(startIndex != -1) }//end if(isHtml != -1)
//Process the string if it contains HTML if(isHtml != -1){ //msgString has been determined to contain // HTML. //Insert a dummy first character to ensure // that the first character is not the // beginning of a tag. msgString = "X" + msgString; int leftIndex=0; int rightIndex=0; String outputString = ""; while(leftIndex != -1){ leftIndex = msgString.indexOf( '<',rightIndex); if((leftIndex != -1) && (rightIndex != -1)){ outputString += msgString.substring( rightIndex+1,leftIndex); rightIndex = msgString.indexOf( '>',leftIndex); //Have to deal with missing > char, // particularly for truncated messages. if(rightIndex > (msgString.length() - 2)){ //Don't try to process the last few // characters when left and right // angle brackets don't match. break; }//end if(rightIndex > (msgString... if(rightIndex == -1){ //Create an artificial right angle // bracket to replace the missing // one. rightIndex = leftIndex + 1; }//end if(rightIndex == -1) }//end ((leftIndex != -1) && ... }//end while(leftIndex != -1) //Get text at the tail end. if((rightIndex + 1) < msgString.length()){ outputString += msgString.substring( rightIndex+1); }//end if if(outputString.equals("")){ //msgString contained HTML, but it was // all removed in the cleanup process. // The output string is empty. return null; }else{ //Return the string produced by removing // HTML material from msgString. return outputString; }//end else }//end if(isHtml != -1) else{ //Apparently msgString doesn't contain // HTML. return null; }//end else }//end removeTags //===========================================//
//Purpose of this method is to replace all // occurences of "&NBSP;" with " " private static String repNbsp( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("&NBSP;"); if(index > -1){ stringBuf.replace(index,index+6," "); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end repNbsp() //===========================================//
//Removes entities from an HTML body identified // by the string &...; Converts those entities // that represent English language characters // and punctuation (32 - 126) to the // corresponding character and inserts it into // the message text. private static String remEntities( String msgString){
//Insert a dummy first character msgString = "X" + msgString; int leftIndex=0; int rightIndex=0; String outputString = ""; while(leftIndex != -1){ leftIndex = msgString.indexOf( '&',rightIndex); if((leftIndex != -1) && (rightIndex != -1)){
if(leftIndex > rightIndex){ outputString += msgString.substring( rightIndex+1,leftIndex); }//end if rightIndex = msgString.indexOf( ';',leftIndex);
if((leftIndex != -1) && (rightIndex != -1)){ String extract = msgString.substring( leftIndex,rightIndex + 1). toUpperCase();
//Make sure we didn't extract good text // by accident. Apparently real entity // cannot contain more than seven // characters, as in &#nnnn; Remove // spaces before making the test. if(remSpaces(remEquals( remTabs(extract))).length() > 6){ //Apparently not an entity. Put it // back in the text. outputString += extract; }//end if(rightIndex-leftIndex > 6) else{ //Remove any spaces prior to further // processing extract = remSpaces(remEquals( remTabs(extract))); }//end else
//Convert English language character // entities to characters and insert // them in the text. //Don't try to restore HEX // representations at this time. Maybe // add that later. Ignore extracted // sequences longer than six // characters. try{ if((extract.charAt(1) == '#') && (extract.charAt(2) != 'X') && (extract.length() <=6)){ //Get the internal characters of // the entity. String strValue = extract. substring(2,extract.length()-1); //Try to convert to a numeric char // type. May throw an exception. char theChar = (char)Integer.parseInt(strValue); //Ignore all but English language // characters and punctuation. if((theChar >= 32) && (theChar <= 126)){ char[] charArray = {theChar}; String theStr = new String( charArray).toUpperCase(); outputString += theStr; }//end ((theChar >= 32) && ... }//end ((extract.charAt(1) == .. }catch(NumberFormatException ex){ //Ignore it. It is apparently a // badly formed entity. }//end catch }//end if((leftIndex != -1) && ...
//Have to deal with missing ; char. if(rightIndex > (msgString.length() - 2)){ //Don't try to process the last few // characters when left and right // angle brackets aren't matching. break; }//end if(rightIndex > (msgString... if(rightIndex == -1){ //Create an artificial right ; char // to replace the missing one. rightIndex = leftIndex + 1; }//end if(rightIndex == -1) }//end if((leftIndex != -1)... }//end while(leftIndex != -1) //Get the text at the tail end if((rightIndex + 1) < msgString.length()){ outputString += msgString.substring( rightIndex+1); }//end if((rightIndex + 1)< msgString...
if(outputString.equals("")){ //The entire string was apparently made up // of entities. It is empty now. return null; }else{ return outputString; }//end else }//end remEntities //===========================================//
//Method removes all '=' characters from an // incoming string. private static String remEquals( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("="); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end remEquals() //===========================================//
//Method removes tab characters from an // incoming string. private static String remTabs( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("t"); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end remTabs() //===========================================//
//Method removes space characters from an // incoming string. private static String remSpaces( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf(" "); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while String outputString = new String(stringBuf); return outputString; }//end remSpaces() //===========================================//
//Method converts all multiple spaces to a // single space. This is not ideal. If there // are multiple spaces within a word, all but // one of the spaces will be removed, leaving // one extraneous space in the word. private static String remMultipleSpaces( String msgString){ StringBuffer stringBuf = new StringBuffer(msgString); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf(" "); if(index > -1){ stringBuf.delete(index,index+1); }//end if }//end while String outputString = new String(stringBuf); if(outputString.equals("")){ return null; }else{ return outputString; }//end else }//end remMultipleSpaces() //===========================================//
//This method is called to decode a Subject // line.
//Sometimes the Subject line is encoded using // techniques designed to allow the use of // non-ASCII characters in message headers // (See RFC2047). //The following code determines if the Subject // line has been encoded using the ISO-8859-1 // character set with an encoding value of B or // Q. If so, the encoded material is decoded. //Messages with an encoding value of Q contain // a mixture of ASCII characters and encoded // characters, so it is possible to partially // read them without the need for decoding. // They also sometimes use an underscore in // place of a space to make them more readable. private String decodeSubj(String data){ try{ if(data.toUpperCase().indexOf( "=?ISO-8859-1?B?") != -1){ //Need to decode for value of B. int startIndex = data.toUpperCase(). indexOf("=?ISO-8859-1?B?") + 15; int endIndex = data.length()-2; sun.misc.BASE64Decoder dec = new sun.misc.BASE64Decoder(); data = "Subject: " + "=?ISO-8859-1?B? " + new String(dec.decodeBuffer( data.substring(startIndex,endIndex))); }//end if..."=?ISO-8859-1?B?"
if(data.toUpperCase().indexOf( "=?ISO-8859-1?Q?") != -1){ //Need to decode for value of Q. int startIndex = data.toUpperCase(). indexOf("=?ISO-8859-1?Q?") + 15; int endIndex = data.length()-2; String decodedData = data.substring( startIndex,endIndex);
//Decode non-ASCII characters StringBuffer stringBuf = new StringBuffer(decodedData); int index = 0; while(index > -1){ index = stringBuf.lastIndexOf("="); if(index > -1){ String hexString = new String(stringBuf).substring( index+1,index+3); char decodedChar = (char)Integer.parseInt( hexString.trim(),16); stringBuf.delete(index,index+3); stringBuf.insert(index,decodedChar); }//end if }//end while(index > -1)
//Replace underscore with space. index = 0; while(index > -1){ index = stringBuf.lastIndexOf("_"); if(index > -1){ stringBuf.deleteCharAt(index); stringBuf.insert(index,' '); }//end if }//end while(index > -1)
data = "Subject: " +"=?ISO-8859-1?Q? " + new String(stringBuf); }//end if..."=?ISO-8859-1?Q?" }catch(Exception ex){ex.printStackTrace();} return data; }//end decodeSubj //===========================================//
//Expand base64 data in msg body. private String decodeBody(String data){ String decodedData = ""; int currentPartIndex; int nextPartIndex; try{ if(data.toUpperCase().indexOf( "Content-Transfer-Encoding: base64". toUpperCase()) != -1){ //This message has base64 encoding if((data.toUpperCase().indexOf( "Content-Type: text/html". toUpperCase()) != -1) && (data.toUpperCase().indexOf( "Content-Type: multipart". toUpperCase()) == -1)){ //This is a non-multipart message with // base64 encoding. //Locate the end of the header. int base64Index = data.indexOf( "Status:"); if(base64Index != -1){ int crIndex = data.indexOf( "n",base64Index); String tempStr = data.substring( crIndex+2,data.length()); sun.misc.BASE64Decoder dec = new sun.misc.BASE64Decoder(); decodedData = "Start base64 " + new String( dec.decodeBuffer(tempStr)) + " End base64"; }//end if(base64Index != -1) }//end if((data.toUpperCase().indexOf(... else{ int boundaryIndex = data.indexOf( "boundary="); int newLineIndex = data.indexOf( "n",boundaryIndex);
if(boundaryIndex != -1){ String multipartCode = data.substring( boundaryIndex+10,newLineIndex-1); nextPartIndex = data.indexOf( multipartCode,newLineIndex+1); while(nextPartIndex != -1){ int base64Index = data.indexOf( "Content-Transfer-Encoding: " + "base64",nextPartIndex); currentPartIndex = nextPartIndex; nextPartIndex = data.indexOf( multipartCode,nextPartIndex+1); if((base64Index != -1) && (base64Index < nextPartIndex)){
//Don't process .gif or .jpg file // attachments String partBody = data.substring( currentPartIndex, nextPartIndex).toUpperCase(); if((partBody.indexOf(".GIF") == -1) && (partBody.indexOf( ".JPG") == -1)){ //gif image not found. Process // the data int crIndex = data.indexOf( "n",base64Index);
//Search for the required blank // line preceeding the block // of base64 data //Prevent infinite loop on bad // data int count = 0; char firstChar = data.charAt( crIndex+1); while((firstChar != 'n') && (count < 100)){ crIndex = data.indexOf( "n",crIndex+1); firstChar = data.charAt( crIndex+1); count++; }//end while
String tempStr = data.substring( crIndex+2,nextPartIndex); sun.misc.BASE64Decoder dec = new sun.misc.BASE64Decoder(); decodedData += new String( dec.decodeBuffer(tempStr)); decodedData += "nn-----End " + "base64 part-----nn"; }//end if(partBody.toUpperCa... else{ decodedData += "-----Image " + "stripped off-----"; }//end else }//end if(base64Index != -1) else{ if(nextPartIndex != -1){ decodedData += data.substring( currentPartIndex, nextPartIndex); decodedData += "nn-----End " + "non-base64 part-----nn"; }//end if(nextPartIndex != -1) }//end else }//end while loop on nextPartIndex... }//end if(boundaryIndex != -1) }//end else return decodedData; }//end if(data.toUpperCase().indexOf("Co... else{ //This msg does not have base64 encoding return data; }//end else }catch(Exception ex){ex.printStackTrace();} return "Make Compiler Happy"; }//end decodeBody //===========================================//
}//end class BigDog02SpamScreen01
Listing 7
|
File BigDog02RawText.txt
DELIVERY FAILURE DELIVERY NOTIFICATION: DELIVERY STATUS NOTIFICATION EMAIL QUARANTINED DUE TO VIRUS FAILURE NOTICE INBOUND ATTACHMENT REMOVED - ROUTE66 MAIL ADMINISTRATOR MAIL DELIVERY FAILED: RETURNING MESSAGE TO SENDER MAIL DELIVERY SUBSYSTEM MAIL DELIVERY SYSTEM MAIL SYSTEM ERROR - RETURNED MAIL MAIL TRANSACTION FAILED MAILER-DAEMON NAV DETECTED A VIRUS IN A DOCUMENT YOU AUTHORED RETURNED MAIL RETURNED MAIL: SEE TRANSCRIPT FOR DETAILS RETURNED MAIL: USER UNKNOWN SYSTEM ADMINISTRATOR TO SENDER VIRUS FOUND AND ACTION TAKEN. UNABLE TO DELIVER YOUR MESSAGE UNDELIVERABLE MAIL UNDELIVERABLE: USUARIO INEXISTENTE / USER DOES NOT EXIST VIRUS FOUND IN A MESSAGE YOU SENT VIRUS FOUND IN SENT MESSAGE WARNING: COULD NOT SEND MESSAGE WARNING: E-MAIL VIRUSES DETECTED
Listing 8 BigDog02BadList.txt
|
Copyright 2004, Richard G. Baldwin. Reproduction in whole or
in
part in any form or medium without express written permission from
Richard
Baldwin is prohibited.
About the author
Richard Baldwin
is a college professor (at Austin Community College in Austin, TX) and
private consultant whose primary focus is a combination of Java, C#,
and XML. In addition to the many platform and/or language independent
benefits of Java and C# applications, he believes that a combination of
Java, C#, and XML will become the primary driving force in the delivery
of structured information on the Web.
Richard has participated in numerous consulting projects, and he
frequently provides onsite training at the high-tech companies located
in and around Austin, Texas. He is the author of Baldwin’s
Programming Tutorials, which
has gained a worldwide following among experienced and aspiring
programmers. He has also published articles in JavaPro magazine.
Richard holds an MSEE degree from Southern Methodist University
and has many years of experience in the application of computer
technology to real-world problems.
Baldwin@DickBaldwin.com
-end-