JavaEnlisting Java in the War Against SPAM: The Communications Module

Enlisting Java in the War Against SPAM: The Communications Module

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Java Programming Notes # 2150


Preface


This is the first lesson in a series designed to teach you how to write
a Java program to remove SPAM from your Email server before it is
downloaded into your primary Email client.

The communications module

This lesson explains the communications module used to communicate with
your Email server, and to remove SPAM messages from the
server.

SPAM screening algorithm

The program is designed to allow you to use my SPAM screening
algorithm, or to invent your own.  Subsequent lessons will explain
the
inner workings of my SPAM screening algorithm.  You can use my
algorithm as a starting point if you decide to invent your own. 
Those lessons will
also explain how the system can be trained to do an increasingly better
job of screening SPAM over time.

Viewing tip

You may find it useful to open another copy of this lesson in a
separate browser window.  That will make it easier for you to
scroll back and forth among the different listings and figures while
you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive
collection of online Java tutorials.  You will find those lessons
published at Gamelan.com
However, as of the date of this writing, Gamelan doesn’t maintain a
consolidated index of my Java tutorial lessons, and sometimes
they are difficult to locate there.  You will find a consolidated
index at www.DickBaldwin.com.

Preview

Can you write better SPAM screening
algorithms?

Did you ever think that you might be able to write better SPAM
screening algorithms than those available in the SPAM screening
software that you are now using?  If so, this lesson is for you.

Even if that is not the case, like most of us, you are probably
overwhelmed by SPAM
and therefore you may find this lesson interesting.

Remove SPAM from the server

In this lesson, I will show you how to write a Java program
that supplements the SPAM screening software that you are currently
using.  This program is used to identify and remove SPAM from your
Email server before it is downloaded into your primary Email client.

Any SPAM that makes it past this program can be further acted upon
by the SPAM screener that is built into your Email client.

The communications module

This series will consist of four lessons.  This lesson, which
is the first in the series,
will explain the communications module used to communicate with
your Email server, and to remove SPAM messages from the
server.

As mentioned earlier, the program is designed to allow you to invent
and implement your own
SPAM screening algorithm in addition to, or as an alternative to my
algorithm.

My algorithm and algorithm training programs

The second lesson will explain the inner
workings of my SPAM screening algorithm.  My algorithm operates
separately on the Subject line, the From line,
and the body text of each Email message.

The third lesson will explain a companion program designed to make
use of historical data to easily train the algorithm to do a better job
of identifying SPAM based on the Subject of the message.

The fourth lesson will explain another companion program designed to
make use of historical data to easily train the algorithm to do a
better job of identifying SPAM based on the body text of
the message, which includes the From line.

Effectiveness of my algorithm

At this point in time, after about one week of training, my
algorithm reliably identifies about ninety percent of all SPAM and
allow me to delete it
from my Email server before downloading it into my primary Email
client.  Only time will tell if that percentage improves in the
future.

Discussion
and Sample Code


Stripped down Screen class

The version of the program that I will discuss in this lesson has a
stripped-down version of a class named Screen. This version of
the program
allows for testing the communications module on your system with your
Email server
without doing any actual screening for SPAM and without deleting any
messages from
the server.

I will explain the full version of the class named Screen in
the next lesson when I explain my algorithm for identifying SPAM.

Purpose of the program

The purpose of this program is to read messages from a POP3 (Post
Office Protocol – Version 3)
server, to
analyze the messages according to a set of screening rules, and to
delete those messages from the server that fail the screening test.

(As written, the program asks the user to confirm the
deletion of each message from the server, but this confirmation step
could easily be
removed if you decide to do so.)

Key words and phrases

This version of the program screens for SPAM on the basis of key words
or
phrases in the From line, key words or phrases in the Subject
line, and
key words or phrases in the body text.

Friendly Email addresses and subjects

A list of friendly Email addresses and friendly subjects is used to
screen the From
line and the Subject line.  Messages that are from
friendly Email addresses, and messages that have known good Subject
lines are not
deleted from the server and no information about those messages is
saved on the local disk. They are simply ignored after determining that
they are friendly.

Different lists for Subject and body
text

Different lists of words and phrases are used for screening Subject
lines and body text for SPAM. This is important because the same set of
words
and phrases can’t always be used for both cases.

For example, the word ANTIVIRUS is appropriate for screening the
Subject line, but is not appropriate for screening the
body text. The
word ANTIVIRUS often appears legally in the header of Email messages
that have been scanned for viruses by the server, but also often
appears in the Subject line of SPAM messages.

Common spammer tricks are defeated

Several common spammer tricks are defeated by my SPAM screening
algorithm.

For example, the common spammer trick of inserting extra characters
between the
characters in an offending word or phrase is defeated.  Also, the
common trick of mixing the case of the
characters in an offending word or phrase is also defeated.

As a specific example, my algorithm will recommend deletion of any
message having
any of the following in its Subject line or its body
text if the word VIAGRA is included in the lists used to screen for
SPAM:

vIaGrA
V.IagRA
V.I.A.G.R.A

These two characteristics alone have a significantly positive
impact on the effectiveness of training the algorithm to do a better
job of
identifying SPAM in the future.

My algorithm also defeats the common trick of appending random
characters to the end of the Subject line, because it
doesn’t require a
match for the entire Subject line.  Rather, it
searches for words
or phrases internal to the text of the Subject line.

The user interface

Figure 1 shows the GUI through which the user controls the program.

Graphical user interface

Figure 1 Graphical User Interface

(Note that this GUI was purposely made narrow in order
to cause it to fit into this narrow publication format.  I
recommend that you increase the width of the Frame to at least 750
pixels, and increase the width of the TextField and TextArea objects to
at least 100 characters each.)

The Offending Phrase

When the program identifies a message that is a candidate for deletion,
the reason for that recommendation is shown in the third text field
from the top in Figure 1.

(An actual SPAM message is being displayed in the
GUI in Figure 1, but the
stripped-down version of the class
named Screen was being used, so no Offending Phrase is
shown in Figure 1.)

Deleting a message from the server

The user confirms that the message should be deleted from the Server by
clicking the Delete button
in Figure 1. If the user doesn’t want to delete the message, she should
click the Start/Next button instead.

(Note that the capability to actually delete messages
from the server was disabled in the program shown in Listing 42 near
the end of this lesson.  Make certain that you are ready to
actually delete messages from the server before re-enabling that
capability.)

The Netscape approach to SPAM screening

I currently use Netscape version 7.1 as my Email client. 
Basically, it provides two forms of SPAM screening.  One form,
which is referred to as Junk Mail Controls, is
apparently based on some sort of artificial intelligence.  This
capability can be
trained over time to identify the kinds of messages that you consider
to be junk mail.  This capability is very easy to train. 
However, it
produces lots of false positives and is very difficult to un-train when
that happens.  (I will have more to say
about false positives later.)

The other form of SPAM screening used by Netscape 7.1 is referred to as
Message Filters.  This approach depends on exact
character matching in the subject, the body, or in other parts
of the message, such as sender, date, priority, etc.

In this case, you must enter the exact words or phrases
into a form that will be used for matching purposes.  This
approach is
practically useless for SPAM screening due to the tendency of spammers
to insert random
characters into the offending words and phrases and to randomly modify
the case of the characters in offending words and phrases.  Also,
the process of entering the words and phrases into the form is very
tedious and time consuming.  I long ago gave up on using
Netscape’s Message Filters for SPAM filtering.

False positives

All SPAM screening algorithms are subject to reporting false
positives to some degree.  That is to say, a message may be
erroneously
identified as SPAM when it is actually a good message.

One of the
major problems with my Netscape 7.1 system results from false
positives.  Because of the high rate of false positives produced
by Junk Mail Controls, whenever
a message is identified as SPAM, I must confirm that it is SPAM before
deleting it.  At that point in time, unless I am willing to
actually open the message and to be confronted with a variety of
offensive images and other offensive material, I must make my decision
solely on the basis of the subject and
the from address information.  Often this is not sufficient
information to
make an informed decision and I have no choice but to open the message.

Also, as I mentioned earlier, when Junk Mail Controls does
report a false positive, there is no definitive way to make certain
that it doesn’t happen again in the future.  It is necessary to un-train
the algorithm regarding messages of that type, which can be a long
process, possibly involving many similar occurrences in the future.

More information is available with my system

When a user of my system is required to confirm deletion of a message
from the server, the following information is available to assist in
the making
of the decision:

  • From line
  • Subject line
  • Offending line of text, which may or may not be the subject
  • Offending word or phrase in the offending line of text
  • Entire raw text of the message down to and including the
    offending line

No images are rendered in my system, so it is not necessary for the
user to view offending images in order to make the decision to delete.

Having viewed the above information, if the user is still unable to
make
an informed decision to delete the message, the user still has the
option to let the message pass through and be downloaded into the
primary Email client.  Once having viewed the message later in the
primary Email client, the user still has the option of updating the
offending word lists in my system with IP addresses, URLs, etc, so that
deletion decisions on future similar messages will be easier to make.

Saved in local archive folder

The raw text of all messages that are identified as candidates for
deletion from the
server are saved in an archive folder on the local disk, regardless of
whether the user elects to delete them from the server or not. Thus if
a message is deleted from the server and it is later determined that
was a mistake, a raw text copy of the deleted message is available
locally in the archive folder.

(You should probably empty this folder periodically so
that it won’t fill up your disk.)

Saved in history folder

Except for messages from friendly Email addresses or messages with
friendly Subject lines, all messages that
are not identified as candidates for deletion from the server are saved
in a history
folder on the local disk.  These messages are used later to train
the algorithm to do a better job of identifying SPAM in the
future.  I will explain this process in Part 3 and Part 4 of this
series of lessons.

Protection against viruses

Before any message is saved in a local file, asterisks are inserted
into the text on ten-character intervals in an attempt to destroy any
virus code that may be embedded in the message.

If a message makes it through the screen and is later identified as
having a virus as an attachment, a series of ten or more bytes can be
extracted from the virus code and added to the word list as an
offending phrase.  This will cause any future messages having that
same virus code as an attachment to be identified as a candidate for
deletion from the server.

Possible upgrades

Numerous upgrades to my system are possible and I’m confident that you
will have ideas that I haven’t thought of.  If so, I would like to
hear about them.

One possible upgrade would be to create a premium list of words and
phrases that will always result in automatic deletion of the message
from the server without prior confirmation by the user. For example,
the user might want to have any message containing the word VIAGRA to
be automatically deleted.

Be careful with this

However, care is urged in this regard. Certain words such as SPAM
and PORN occasionally occur in a message with the letters separated by
only a few characters.  Depending on the degree of separation, my
algorithm may identify those messages as
being candidates for deletion.

For example, the offending word PORN occurs in the non-offending word
imPORtaNt with the letters R and N separated by only two characters.
The word SLUT appears in the word SoLUTion with only one character
between the S and the L. The word SPAM often occurs in different
variations of body text.

If such a premium word list is used for automatic deletion, it should
probably be restricted to only those situations where the characters is
the word exactly match (except for case) a word in the subject
or the body of the message with no intervening characters separating
the characters in the message.  Experience shows, however that
very few matches would be made on this basis, so it may not be worth
the effort.

Number of separation characters

Another possible upgrade would be to allow the user to specify the
number of characters that may occur between the letters of an offending
word or phrase in the message.

That value is currently hard-coded into the
program.  As of this writing, that value is set to one for
screening against offending words or phrases.  The value is set to
zero when testing for friendly Email
addresses in the From line and known good data in the Subject
line.

If the number of characters is set to zero, many spam messages with
offending words or phrases will avoid detection. If that value is set
to a large number, many false positives will occur. Therefore, care
should be taken when adjusting this value.

Automatic deletion of all SPAM candidates

For the brave among us, another possible modification would be to allow
the program to automatically delete all messages that are determined to
be candidates for deletion.

Since a text version of each of these messages is saved locally in an
archive folder, a separate program could be written to allow the user
to review those messages locally at her convenience, just in case a
valid message was inadvertently deleted from the server.

Training programs

Companion programs that I have written provide for maintaining and
upgrading the offending word and phrase lists.  These lists are
saved in local text files.

These
training programs are used to analyze the non-deleted message files
saved
locally in the history folder in order to train the algorithm to do a
better job of identifying SPAM messages in the future.

These programs
are designed for extreme ease of use to encourage the user to train the
algorithm frequently.  The better the algorithm is trained, the
better it will perform.

I will explain these training programs in Part 3 and Part 4 of this
series of
lessons.

Simple text files

All three word lists are maintained in local text files, which can be
created and edited with an ordinary text editor if need be.  Thus,
if some
corruption gets into one of the word lists, it is easy to correct the
situation using an ordinary text editor.

Technical information on POP3 protocol

For technical information on the POP3 protocol, see http://www.cis.ohio-state.edu/htbin/rfc/rfc1725.html
I will frequently refer to this document as the technical
document
in the discussion that follows.

Command summary

A POP3 command summary based on the technical
document
is
shown in Figure 2.

Minimal POP3 Commands:
USER name
PASS string
QUIT
STAT
LIST [msg]
RETR msg
DELE msg
NOOP
RSET
QUIT

Optional POP3 Commands:
APOP name digest
TOP msg n
UIDL [msg]

POP3 Replies:
+OK
-ERR
Figure 2

This program uses the commands that are highlighted in red in Figure
2.  I will explain those commands in conjunction with the code
that
uses them.

File names

The following file names are hard-coded into the program.  You may
want to change these file names for your version of the program.

  • Local copy – the file name for a local copy of each message is
    based on the
    unique identifier for that message (UIDL) obtained from the
    mail server.
  • Pop302a.txt – contains a word list for screening the Subject
    lines for offensive words and phrases.
  • Pop302b.txt – contains a word list for screening the body text
    lines for offensive words and phrases.
  • Pop302c.txt – contains a list of friendly Email addresses and
    subjects for
    screening the From and Subject lines to
    identify friendly messages.

Three classes

This program consists of two main classes and one minor class. An
object of the class named
Pop302 handles all communications with the POP3 server.

A method belonging to an object of the class named Screen is
used to screen each message in an
attempt to identify SPAM.

This class can be totally replaced by Java programmers who wish to
design their own screening algorithm provided that they maintain the
interface with the object of the class named Pop302.

An object of a very simple class named ScreenResult is used as
a wrapper to return several items of information from the screening
method.

Testing

The program was tested using SDK 1.4.2 under WinXP in conjunction with
two different POP3 Email servers.

The class named Pop302

As mentioned earlier, an object of the class named Pop302
handles all communications with the Email server, including the
deletion of messages from the server.  An object of the class
named Screen applies screening rules in an attempt to identify
SPAM.

Stripped-down version of the Screen class

I will explain the class named Pop302 in this lesson, and will
explain the class named Screen in the next lesson.

However, I will provide a stripped-down version of the Screen
class in this lesson.  You can use the stripped-down version to
test Pop302 on your
system with your Email server, but no actual screening for SPAM will
take place.

Will discuss in fragments

I will discuss the program in fragments.  A complete listing of
the program is provided in Listing 42 near the end of the lesson. 
You should be able to copy and paste that listing into your Java IDE to
compile and test the program on your system.

Instance variables

The Pop302 class begins in Listing 1 with the declaration of
several instance variables.  The purpose of these variables will
become clear when I discuss them in conjunction with their use.

class Pop302 extends Frame{
int msgCounter = 0;
int msgNumber;
TextArea textArea;
TextField subjField;
TextField fromField;
TextField operMsgField;
int numberMsgs = 0;
String uidl = "";//unique msg ID
BufferedReader inputStream;
PrintWriter outputStream;
Socket socket;
Screen screener;

Listing 1

As you can see, Pop302
extends Frame.  Therefore, an object of the class Pop302
is a GUI.

The main method

The main method is shown in its entirety in Listing 2.

  public static void main(String[] args){
if(args.length != 3){
System.out.println("Usage: java Pop301 "
+ "server userName password");
System.exit(0);
}//end if

new Pop302(args[0],args[1],args[2]);
}//end main

Listing 2

When you start this program running, you need to provide the following
information regarding your Email server as command line parameters in
the order shown:

  • server
  • user name
  • password

The main method then instantiates an object of the Pop302
class, passing this information as parameters to the constructor.

The constructor

The Pop302 class consists mainly of the constructor plus a
couple of helper methods.  The constructor code begins in Listing
3.

  Pop302(String server,String userName,
String password){
screener = new Screen(this);

Listing 3

The constructor begins by instantiating an object of the class named Screen,
passing a reference to the Pop302 object as a parameter.

Code in the Screen class uses this reference later to display a
progress indicator in the third text field in Figure 1.

(Note that the stripped-down version of the Screen
class discussed in this lesson doesn’t display the progress
indicator.  You will have to wait until the next lesson to see
that code.)

Get a socket

The code in Listing 4 instantiates a new Socket object on the
standard port for POP3 servers.

    int port = 110; //pop3 mail port
try{
socket = new Socket(server,port);

Listing 4

When the constructor for the Socket class returns successfully,
a TCP/IP connection will have been made with port 110 on the Email
server identified as server.

If the attempt to make the connection fails, the program will throw an
exception.  For example, if the value of server is
invalid, the program will throw an UnknownHostException.

(If you are unfamiliar with socket programming in Java,
see the lessons beginning with number 550 at www.DickBaldwin.com
.)

Ready to communicate

At this point, the Email server is ready to communicate using the POP3
protocol.  In order to communicate, the program must be able to
send messages to the server and read messages that are sent from the
server.

Input and output streams

The code in Listing 5 gets input and output streams on the Socket
object that make it possible to send messages to the server and to read
messages sent from the server.

      inputStream = new BufferedReader(
new InputStreamReader(
socket.getInputStream()));

outputStream = new PrintWriter(
new OutputStreamWriter(
socket.getOutputStream()),true);

Listing 5

The code in Listing 5 is straightforward and shouldn’t require
further explanation.  If you are unfamiliar with this code, see
the lessons on socket programming and input/output at www.DickBaldwin.com.

Basic POP3 operation

The following is a quotation from the technical
document
referred to earlier:

“Initially, the server host starts the POP3 service by
listening on TCP port 110. When a client host wishes to make use of the
service, it establishes a TCP connection with the server host. When the
connection is established, the POP3 server sends a greeting.
The client and POP3 server then exchange commands and responses
(respectively)
until the connection is closed or aborted.”

The document goes on to explain:

“Commands in the POP3 consist of a keyword, possibly
followed by one or more arguments. All commands are terminated by a
CRLF pair. Keywords and arguments consist of printable ASCII
characters. Keywords and arguments are each separated by a single SPACE
character. Keywords are three or four characters long. Each argument
may be up to 40 characters long.”

Finally, the document tells us:

“Responses in the POP3 consist of a status indicator
and a keyword possibly followed by additional information. All
responses are terminated by a CRLF pair. There are currently two status
indicators: positive (“+OK”) and negative (“-ERR”).”

The greeting

That brings us to the greeting mentioned above.

The code in Listing 6 gets and displays the greeting received from the
Email server.  In the process, the code in Listing 6 invokes the
method named validateOneLine to confirm that the message
received from the Email server begins with +OK,
and not with -ERR.

      String connectMsg = validateOneLine();
System.out.println("Connected to server "
+ connectMsg);

Listing 6

(If the
response begins with -ERR, the program terminates the communication
session with the server, prints an error message, and terminates.)

The validateOneLine method

The code in Listing 6 invokes the method named validateOneLine
to get and validate the message sent by the server.  At this
point, I am going to set the discussion of the
constructor aside for a moment and discuss the method named validateOneLine.

The validateOneLine method begins in Listing 7.

  private String validateOneLine(){
try{
String response = inputStream.readLine();
if(response.startsWith("+OK")){
return response;
}//end if

Listing 7

The method begins by reading a line of text sent by the server and
confirming that the text begins with +OK.  If so, the
method simply returns that line of text as a String object,
where it is displayed by the second statement in Listing 6.

If -ERR is received

If the received line of text does not begin with +OK, it
must begin with -ERR, which is the only other possibility
allowed by the protocol.

Listing 8 shows the behavior of the validateOneLine method when
the received line of text does not begin with +OK.

      else{
System.out.println(response);
//Terminate the session.
outputStream.println("QUIT");
socket.close();
System.out.println(
"Premature QUIT on -ERR");
System.exit(0);
}//end else
}catch(IOException e){e.printStackTrace();}
//The following return statement is required
// to satisfy the compiler.
return "Make compiler happy";
}//end validateOneLine()

Listing 8

In this case, the method:

  • Displays the line of text that was received.
  • Sends a QUIT command to the server to terminate the
    session.
  • Closes the socket.
  • Prints an error message.
  • Terminates the program.

As you will see later, this method is invoked at numerous places in the
program to get and validate a server response to commands sent to the
server by the
program.

The greeting

The greeting sent by one of my Email servers is shown in Figure
3.

+OK POP3 server1.yohance.com v2001.78rh
server ready

Figure 3

(The actual text in the greeting will vary from
one Email
server to the next.

Note that I manually inserted a line break immediately
following
78rh in Figure 3 to force the greeting to fit in this narrow
publication format.)

The AUTHORIZATION state

The following is a quotation from the technical
document
mentioned earlier:

“A POP3 session progresses through a number of states
during its lifetime. Once the TCP connection has been opened and the
POP3 server has sent the greeting, the session enters the AUTHORIZATION
state. In this state, the client must identify itself to the POP3
server.”

Returning to the constructor

At this point, the greeting has been received, and the POP3 session is
in the AUTHORIZATION state.  It is now time for the program to
send the user name and the password to the server.

Commands are sent in plain text, upper case to the server.  Some
commands require an argument following the command, as is the case with
the USER command shown in Listing 9.

      //Send the command
outputStream.println("USER " + userName);

//Get and validate response
String userResponse = validateOneLine();

//Display the response
System.out.println("USER " + userResponse);

Listing 9

The code in Listing 9
produces the output shown in Figure 4 on my system with my Email server.
 
(The response from your Email server may differ.)
 

USER +OK User name accepted, password please
Figure 4


The APOP command

There is an optional APOP command, which allows the user name
and password to be encrypted before being sent to the server.  The
use of the APOP command would be more secure than the approach
shown in Listing 9 and Listing 10.  However, this command is not
supported by all Email servers, and apparently is not supported by my
server.

Send the password

The code in Listing 10 sends the password, validates the response, and
displays the response.

      //Send the password to the server
outputStream.println("PASS " + password);

//Validate and display response
System.out.println(
"PASS " + validateOneLine());
}catch(Exception e){e.printStackTrace();}

Listing 10

The code in Listing 10 produces the output shown in Figure 5.
 

PASS +OK Mailbox open, 7 messages
Figure 5

(Obviously
the number of messages available will vary from one run to the next.)

The TRANSACTION
state

Returning now to the technical
document
, we find:

“… the client must identify itself to the POP3 server.
Once the client has successfully done this, the server acquires
resources associated with the client’s maildrop, and the session enters
the TRANSACTION state. In this state, the client requests actions on
the part of the POP3 server.”

Having received the +OK response shown in Figure 5, our POP3
session is now in the TRANSACTION state.

The QUIT command and the UPDATE state

We find the following information in the technical
document
:

“When the client has issued the QUIT command, the
session enters the UPDATE state. In this state, the POP3 server
releases any resources acquired during the TRANSACTION state and says
goodbye. The TCP connection is then closed.”

Terminating the POP3 session

We are still discussing the constructor.  Listing 11 shows the
code used to register a WindowListener object on the close
button on the Frame.  The purpose of this listener
is to terminate the POP3 session and to terminate the program when the
user presses the close button.

    this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){

//Terminate the session
outputStream.println("QUIT");

//Validate and display response
String quitResponse =
validateOneLine();
System.out.println(
"QUIT " + quitResponse);
textArea.append(quitResponse + "n");

//Close the socket
try{
socket.close();
}catch(Exception ex){
ex.printStackTrace();}

//Terminate the program
System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener

Listing 11

(Note that the code in Listing 11 is an anonymous class
definition.  If you are unfamiliar with anonymous class
definitions in Java, you can learn about them by studying the tutorial
lessons at www.DickBaldwin.com
.)

The windowClosing method

By defining the windowClosing method in the anonymous class,
the code in Listing 11:

  • Sends a QUIT command to the server.
  • Validates and displays the response.
  • Closes the socket.
  • Terminates the program

The goodbye message from the server

In addition to displaying the response on the command-line screen, the
code in Listing 11 also displays it in the large text area in Figure
1.  However, you will have to look very quickly to see it there
before the GUI disappears.

The response provided by my server is shown in Figure 6.
 

QUIT +OK Sayonara
Figure 6

The UPDATE state

At this point, the POP3 session is in the UPDATE state. 
Among other things, this means that the server will delete all of the
messages that were marked for deletion by the DELE command
while the session was in the TRANSACTION state.

Here is some of what the technical
document
has to say about the UPDATE state:

“When the client issues the QUIT command from the
TRANSACTION state, the POP3 session enters the UPDATE state. (Note that
if the client issues the QUIT command from the AUTHORIZATION state, the
POP3 session terminates but does NOT enter the UPDATE state.)

If a session terminates for some reason other than a
client-issued QUIT command, the POP3 session does NOT enter the UPDATE
state and MUST not remove any messages from the
maildrop.

The POP3 server removes all messages marked as deleted from the
maildrop. It then releases any exclusive-access lock on the maildrop
and replies as to the status of these operations. The TCP connection is
then closed.”

Defining the GUI

Note that the GUI shown in Figure 1 was purposely made narrow so that
it would fit into this narrow publication format.  However, it is
much
more useful if it is wide enough to display each text line in
the message in its entirety without a requirement for horizontal
scrolling.  Therefore, I recommend that you resize the GUI to make
it
at least 750 pixels wide.  I also recommend that you make each of
the text
fields and the text area at least 100 characters wide.

Set the layout

Listing 12 sets the GUI layout to FlowLayout.  Although
this isn’t very fancy, it works pretty well in this case.

    setLayout(new FlowLayout());

Listing 12

Construct GUI components

Listing 13 constructs the two buttons, the three text fields, and the
text area shown in Figure 1.

    final Button startButton =
new Button("Start/Next");
final Button deleteButton =
new Button("Delete");
fromField = new TextField(
"Display From line here",50);
subjField = new TextField(
"Display Subj here",50);
operMsgField = new TextField(
"Display operator messages here",50);
textArea = new TextArea(15,50);

//Display initial message
textArea.append("Display raw data heren");

Listing 13

No labels are provided

In order to preserve real estate on the screen, I
did not provide labels to identify the text fields in Figure 1. 
Rather, when the text fields are instantiated, the initial text
showing in each text field indicates its purpose.  For example,
the initial text that appears in the topmost text field is “Display
From line here.”

The last statement in Listing 13 also displays the purpose of the text
area in the text area when it first appears on the screen.

Not yet added to the GUI

Note that at this point, the GUI components have been constructed, but
have not yet been placed in the GUI.  This will be taken care of
later.

References to buttons are final

Note also that it is necessary to declare the references to the two Button
objects to be final, because they are accessed later from
within an anonymous class definition.  Local and anonymous classes
can access local variables only if they are declared final.

ActionListener on the Start/Next button

Listing 14 shows the beginning of the registration of an anonymous ActionListener
object on the Start/Next button shown in Figure 1.

    startButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){

//Clear the operator message field
operMsgField.setText("");

Listing 14

Listing 14 clears the third text field from the top in Figure 1 by
storing an empty string in that text field.

Retrieve and screen messages for SPAM

As mentioned earlier, the POP3 session is now in the TRANSACTION
state.  The code in Listing 15 begins the process of retrieving
all the messages currently on the server and screening those messages
for SPAM.

The number of messages on the server

One of the first things that we need to know is how many messages are
currently in the dropbox on the server.  The code in Listing 15
sends a STAT command to the server to get this information.

          try{
if(numberMsgs == 0){
outputStream.println("STAT");
String stat = validateOneLine();

//Get number of messages as String.
String numberMsgsStr =
stat.substring(
4,stat.indexOf(" ",5));

//Convert the String to an int.
numberMsgs = Integer.parseInt(
numberMsgsStr);
}//end if numberMsgs == 0

Listing 15

Get number of messages only at beginning of
session

As the session progresses and DELE commands are sent to the
server, messages are marked for deletion.  Once a message is
marked for deletion, it is no longer included in the count of messages
on the server.  Therefore, we must make certain that we obtain the
number of messages on the server only at the beginning of the session.

As you will see later, the variable numberMsgs is used by the
program
to count the number of messages processed that have been
processed.  Since we must
retrieve the number of messages on the server only once at the
beginning of the session, we execute this code only when the value of numberMsgs
is zero.

Issue a STAT command

The code in Listing 15 begins by issuing a STAT command, and
then getting, validating, and saving the response.  Here is part
of what the technical
document
has to say about the response to the STAT command.

“The POP3 server issues a positive response with a line
containing information for the maildrop. This line is called a “drop
listing” for that maildrop.

In order to simplify parsing, all POP3 servers required to use
a certain format for drop listings. The positive response consists of
“+OK” followed by a single space, the number of messages in the
maildrop, a single space, and the size of the maildrop in octets.”

Get number of messages as a String

Having saved the response to the STAT command, the code in
Listing 15 extracts a substring from that string containing the number
of messages as a String.

Convert the String to an int

Then the code in Listing 15 invokes the parseInt method of the Integer
class to convert the string representing the number of messages to an int.

Referring to a message by its number

Later we will see that messages can be referred to by their message
number.

(Note that message numbers begin with 1 and not
with 0.)

Retrieve and screen each message

The next step is to retrieve each message from the server and to screen
it for SPAM.  Basically this consists of:

  • Retrieving each message from the server
  • Writing that message into a local disk file
  • Passing the disk file
    to a method belonging to an object of the Screen class where it
    is screened for SPAM

The screening method returns a boolean value
indicating whether or not the message is a candidate for deletion from
the server due to a failure to satisfy one of the SPAM rules.

Get the unique ID

Each message is stored on the server with a unique ID.  The unique
ID for the message is retrieved first and is used to create a unique
file name for
storing the message in a local disk file.

Note that the msgCounter variable was initialized to 0 when it
was declared in Listing 1.  We will see later that this value is
incremented each time a new message is processed.  Because the
message numbers start with 1 instead of 0, msgNumber must
always
be one greater than msgCounter.

The unique ID for a message is obtained from the server by issuing a UIDL
command and saving the response.  Listing 16 shows the code used
to get, validate, and save the unique ID for the next message.

            msgNumber = msgCounter + 1;

if(msgNumber <= numberMsgs){
outputStream.println(
"UIDL " + msgNumber);
uidl = validateOneLine();

Listing 16

The UIDL command

Here is some of what the technical
document
has to say about the UIDL command:

“Arguments: a message-number (optionally)  If
a message-number is given, it may
NOT refer to a message marked as deleted.

Restrictions: may only be given in the TRANSACTION state.

Discussion: If an argument was given and the POP3 server issues a
positive response
with a line containing information for that message. This line is
called a “unique-id listing” for that message.  … A unique-id
listing consists
of the message-number of the message, followed by a single space and
the unique-id of the message.”

No need to parse the response

In this case, I will use the entire response string as a file
name and therefore I won’t be concerned about parsing the response.

(I’m also not interested in the response produced when
the UIDL command is issued without a message number because
this program never
issues the command without a message number.)

A possible safety
upgrade

While writing this lesson, it has occurred to me that a useful safety
upgrade would be to:

  • Parse the response to the UIDL command
  • Extract and save the message number
  • Compare that value with the value of msgNumber being
    maintained internally by this program before sending a DELE
    command to the server

That would ensure that this program is properly synchronized with the
server’s view of message numbers before a command is given to delete a
message.

Open an
output file

The code in Listing 17 uses the unique ID to open an output file in
which to save the message.

              String fileName =
"c:/MailFiles/" + uidl + ".txt";
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
fileName));

Listing 17

(You may want to modify this code to cause the messages
to be stored in a different location on the disk.  If so, modify
the string shown in blue in Listing 17.  Make certain
that the folder where you plan to save the files exists before running
the program.)

The code in Listing 17 is straightforward and shouldn’t require further
explanation.  If you are unfamiliar with code like this, see the
tutorials on file I/O at www.DickBaldwin.com.

Begin the message retrieval process

Listing 18 issues a RETR command to begin the message retrieval
process, and then validates the response.

              outputStream.println(
"RETR " + msgNumber);

String retrResponse =
validateOneLine();

Listing 18

Note that the RETR command specifies a particular message based
on its message number.

Response to the RETR command

Figure 7 shows a typical response produced by my Email server to the
receipt of a RETR command.
 

+OK 1818 octets
Figure 7

The RETR command

Here is some of what the technical
document
has to say about the RETR command:

“Arguments:
a message-number (required) which may not refer to a message marked as
deleted.

Discussion:
If the POP3 server issues a positive response, then the response given
is multi-line. After the initial +OK, the POP3 server sends the message
corresponding to the given message-number, being careful to byte-stuff
the termination character (as with all multi-line responses).”

What is meant by byte-stuffing?

Here is part of what the technical
document
has to say about multi-line responses and byte-stuffing.

“Responses to certain commands are multi-line. In these
cases, … after sending the first line of the response and a CRLF, any
additional lines are sent, each terminated by a CRLF pair. When all
lines of the response have been sent, a final line is sent,
consisting of a termination octet (decimal code 046, “.”) and a CRLF
pair
. If any line of the multi-line response begins with the
termination octet, the line is “byte-stuffed” by pre-pending the
termination octet to that line of the response.”

In other words, a message is terminated by a line that has a period as
the first character followed immediately by a CRLF pair.  If the
first character of a normal line begins with a period, byte-stuffing is
used to deal with that situation.

Didn’t strip any bytes

In the event that a line in the message begins with a period, then it
will begin with two periods after byte-stuffing takes place on the
server.

Since having two periods at the beginning of the line is unlikely to
have a detrimental impact on the screening process, I didn’t bother to
strip any bytes
that may have been prepended onto the line by the server during
byte-stuffing.

However, you may
want to upgrade the program to cause it to deal more correctly with
this situation if you consider it to be a problem.

Clear the text area

The code in Listing 19 clears the text area at the beginning of each
message.  If you don’t do this, the string contained in the text
area will become very long and the program will run slowly as a result.

              textArea.setText("");

Listing 19

Read first line and insert stars

The code in Listing 20 reads the first line of the message from the
server.  Then it invokes the method named insertStars to
insert asterisks on ten-character intervals in the text.

              //Read first line of message
String msgLine =
inputStream.readLine();

//Insert asterisks
msgLine = insertStars(msgLine);

Listing 20

There is a possibility of retrieving a message that contains executable
virus code.  My purpose in inserting an asterisk every ten
characters is to break up the byte pattern and hopefully to
corrupt any executable virus code that may be contained in the byte
stream before writing those bytes to in a local disk file.

The insertStars method

At this point, I will set the discussion of the constructor aside and
present the method named insertStars, which is shown in Listing
21.

The code in this method is straightforward and should not require
further explanation.

  private String insertStars(String stringIn){
StringBuffer stringBuffer =
new StringBuffer(stringIn);
int length = stringBuffer.length();
for(int cnt = 9; cnt < length; cnt+=10){
stringBuffer.insert(cnt,'*');
}//end for loop
return new String(stringBuffer);
}//end insertStars

Listing 21

Read and save all lines of message

Returning now to the discussion of the constructor, the code in Listing
22 continues reading lines of text from the server, inserting stars,
and writing those lines of text into the output file until a line is
received that contains a single period.

              while(!(msgLine.equals("."))){
dataOut.writeBytes(
msgLine + "n");
msgLine = inputStream.readLine();
msgLine = insertStars(msgLine);
}//end while

//Close the output file.
dataOut.close();

Listing 22

Newline characters are written at the end of each line of text when it
is written into the output file.

Display messages for the user

It is almost time to pass the file containing the message to the
screening method to allow it to screen for SPAM.  Before doing
that, however, the code in Listing 23 writes messages in the text
fields and text area of Figure 1 to let the user know what is happening.

              fromField.setText("Call screener");
subjField.setText("Call screener");
operMsgField.setText(
"Call screener");
textArea.setText(
"Progress Meter: ");

Listing 23

The progress indicator

Occasionally a very long message is received that requires a
perceptible amount of time for screening.  When that happens (with
the version of the Screen class that will be discussed in the
next lesson),
the screening method writes a stream of periods into
the
text area to let the user know that the system is actually working on a
message and isn’t simply hung up.  Hence the words “Progress
Meter” are placed in the text area in
Listing 23 to tell the user what that stream of periods indicates.

(The stripped-down version of the Screen method
that I will discuss in this lesson does not provide this type of visual
feedback.)

Information from the screening method

Several different pieces of information need to be returned from the
screening method.  However, in Java, a method can return only one
value.  To accommodate this, an empty object instantiated from the
ScreenResult class is passed as a parameter to the screening
method.  The code in the screening method populates the fields in
that object so as to make the information available upon return.

The ScreenResult class

At this point, I will set the discussion of the constructor aside and
show you the ScreenResult class in Listing 24.

class ScreenResult{
public String subject = "";
public String from = "";
public String thePhrase = "";
public String text = "";
}//end ScreenResults

Listing 24

As you can see, this is a very simple class, an object of which exists
solely as a place to store four strings that are populated by the
screening method for later use by the calling method.

Screen the file for SPAM

Returning now to the constructor, the code in Listing 25:

  • Declares a local variable named match and initializes it
    to false.
  • Instantiates a new empty object of the ScreenResult class.
  • Invokes the screenMsg method belonging to an object of
    the Screen class, passing the name of the disk file containing
    the message, the unique identifier for the message, and the empty ScreenResult
    object as parameters, and storing the returned value in the variable
    named match.
              boolean match = false;

ScreenResult theResult =
new ScreenResult();
match = screener.screenMsg(
fileName,uidl,theResult);

Listing 25

Upon further reflection

Frequently when I write a lesson explaining
code that I have written, I realize that there are sections of code
that I would write differently if I had it to do over again.  That
is the case here.

In this case, if I were to rewrite this program, I would upgrade the
definition of the ScreenResult class to include an additional
field of type boolean named match.

Then I would require the screenMsg method of the Screen
class to return a reference to a populated object of type ScreenResult
instead of returning type boolean.  I would eliminate the ScreenResult
parameter from the parameter list of the screenMsg method.

Then I would cause the code in the calling method to accommodate those
changes and to extract the value of match from the object
returned by the
screenMsg instead of dealing with match
separately as is the case in Listing 25.

In my opinion, this would result in a somewhat cleaner user
interface.  However, at this point, I am too far down the road to
turn back, so I will just leave the program as it is.  I may
upgrade it sometime in the future to implement this improvement.

Designing your own SPAM screening algorithm

Should you decide to design your own screening algorithm, this is where
you would connect your algorithm to the communication module.  In
other words, your version of the method named screenMsg should
return true if it is recommending that the message be deleted from the
server. Also, the object of type ScreenResult passed as a
parameter to the method should be populated with information to be
displayed in the text fields and the text area of the GUI shown in
Figure 1.

You may or may not decide to make callbacks on the communication module
to support the progress indicator while your method is working.

Display the results of the screening process

Listing 26 displays the information that was encapsulated in the ScreenResult
object by the screening method in the text fields and text area of
Figure 1.

              fromField.setText(theResult.from);
subjField.setText(
theResult.subject);
operMsgField.setText(
"Offending Phrase: "
+ theResult.thePhrase);
textArea.setText(theResult.text);

//Scroll the text area to the end
textArea.select(
theResult.text.length()-2,
theResult.text.length()-1);

Listing 26

The code in Listing 26 is straightforward and shouldn’t require further
explanation.

Information available to the user

At this point, the user can view:

  • The contents of the From line of the message
  • The contents of the Subject line of the message
  • The complete raw text of the message down to the line containing
    the offending word or phrase, if any
  • The offending word or phrase, if any

If the screening method returned true, this information will remain on
the screen for the user to ponder.  However, if the screening
method returned false, it will disappear from the screen very quickly,
and probably won’t even be seen by the user.

Increment the message counter

Listing 27 increments the message counter in preparation for processing
the next message.

              msgCounter++;

Listing 27

A candidate for deletion from the server

A return value of true from the screenMsg method means that the
screening method is recommending
that the message be deleted from the server.

Listing 28 shows the behavior of the actionPerformed method
registered on
the Start/Next button under this circumstance.

              if(match == true){
return;
}//end if match == true

Listing 28

Wait for further action by the user

The message has been identified as a candidate for deletion from the
server.  The actionPerformed method simply returns with
the information described above showing in the text fields and text
area of Figure 1.  The user can view this information while
deciding what to do next.  Nothing further will happen in the
program until the user presses either the Delete button
or the Start/Next button.

Pressing the Delete button

If the user presses the Delete button in Figure 1, the
message will
be deleted from the server.  I will explain exactly how this
happens later when I discuss the ActionListener object that
will be registered on the Delete button.

Pressing the Start/Next button

If the user presses the Start/Next button in Figure 1,
the message
will not be deleted from the server, the actionPerformed method
belonging to the ActionListener object registered on that
button will be executed, and the next message on the server
will be retrieved and screened for SPAM.

Message is not a candidate for deletion

If the screenMsg method returns false, the message has not been
identified as a candidate for deletion, and control reaches the point
in the actionPerformed
method shown in Listing 29.

              Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));
}//end if msgNumber <= numberMsgs

Listing 29

At this point, we could require the user to press the Start/Next
button to retrieve and screen the next message.  However, in the
interest of convenience, we will relieve the user of that
responsibility.

Firing a synthetic event

The code in Listing 29 fires an ActionEvent identical to that
which would be fired if the user were to press the Start/Next
button.  This causes the program to retrieve the next message on
the server and to begin the screening process immediately.

(If you are unfamiliar with the concept of posting
events in the system event queue, you can learn about that in the
tutorial lessons at www.DickBaldwin.com.
)

When all messages have been screened …

Listing 30 shows the completion of the registration of an anonymous ActionListener
object on the Start/Next button that was begun in
Listing 14.

            else{//msgNumber > numberMsgs
startButton.setEnabled(false);

subjField.setText(
"No more messages, press Close");
fromField.setText(
"No more messages, press Close");
operMsgField.setText(
"No more messages, press Close");
textArea.setText(
"No more messages, press Close");
}//end else

}//end try
catch(Exception ex){
ex.printStackTrace();}
}//end actionPerformed
}//end ActionListener
);//end addActionListener

Listing 30

The code in Listing 30 is executed when all of the messages on the
server have been
screened.

This code disables the Start/Next button and posts
messages instructing the user to press the close button to terminate
the program.

Beyond that, the code in Listing 30 simply completes a try/catch
block,
and wraps up the cryptic code required for the definition of an
anonymous
class.

An ActionListener on the Delete button

The Delete button shown in Figure 1 is used to cause
messages to be deleted from the server.  Listing 31 shows the
beginning of the registration of an anonymous ActionListener object
on the Delete button.

    deleteButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){

operMsgField.setText("");

Listing 31

The code in Listing 31 simply clears the third text field from the top
in Figure 1 when the user presses the Delete button.

Marking messages for deletion from the server

Deletion of a message from the server is accomplished by marking the
message for deletion while in the TRANSACTION state. The
message is actually deleted later when the client sends a QUIT command
to the server causing the server to enter the UPDATE state.

(If the program aborts prematurely before sending a QUIT
command,
marked messages are not deleted from the server.)

The deletion code

Listing 32 shows the code used to

  • Mark the message for deletion
  • Validate the response
  • Display a deletion message
          outputStream.println(
"DELE " + msgNumber);
textArea.append(
"DELE "+validateOneLine()+"n");
textArea.append(
"Deleted:" + msgNumber + "n");

Listing 32

(See the earlier section entitled A possible safety upgrade for a suggestion
related to upgrading this program.)

The DELE code has been temporarily disabled

Note that the three corresponding statements in Listing 42
near the end of the lesson have been disabled by marking them as
comments.  I did this to keep you from accidentally deleting
messages from your server during your early stages of testing this
program with your Email server.

You can enable the three statements in Listing 42 by removing the
comment indicators.  However, you should not enable them until you
are confident
that you really do want to delete messages from the server.

(Once a message is deleted from the server, it cannot be
recovered from the server.)

A synthetic ActionEvent

The code in Listing 33 fires a synthetic ActionEvent identical
to that which would be fired if the user presses the Start/Next
button.

          Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));
}//end actionPerformed
}//end ActionListener
);//end addActionListener

Listing 33

Thus, when the user presses the Delete button, the
message is marked for deletion on the server and the next message on
the server is retrieved immediately for SPAM screening without a
requirement for the user to request the next message.

Finish configuring the GUI

The code in Listing 34 finishes configuring the GUI by placing the
various components in the Frame, setting its size, and making
it visible.

    add(startButton);
add(deleteButton);
add(fromField);
add(subjField);
add(operMsgField);
add(textArea);
setTitle("Copyright 2004, R.G.Baldwin");
setSize(400,400);//modify for larger GUI
setVisible(true);
}//end constructor

Listing 34

As I mentioned earlier, you will probably find the program to be more
useful if you increase the width of the Frame to at least 750
pixels and increase the size of the text fields and text area in
Listing 13 to be at least 100 characters wide.

That completes the discussion of the class named Pop302.

Stripped-down Screen class

The following sections provide a brief discussion of a stripped-down
version of the class named Screen, which you can use to test
this program on your system with your Email server.

This stripped-down version of the Screen class doesn’t actually
do any SPAM screening.  Rather, it populates the ScreenResult
object with information from the message and toggles its return value
between true and false for each successive message.

My full version of the Screen class implements my SPAM
screening algorithm.  I will explain the details of my full Screen
class in the next lesson in this series.

A dummy constructor

The definition of the stripped-down Screen class begins in
Listing 35.

class Screen{
boolean returnValue = true;

Screen(Pop302 dummy){//dummy constructor
}//end constructor

Listing 35

A dummy constructor is required to satisfy the instantiation of the Screen
object in Listing 3.

The screenMsg method

The code in Listing 25 invokes the screenMsg method of an
object of the Screen class for the purpose of applying SPAM
screening rules to a message stored in a disk file.

The definition of the stripped-down screenMsg method begins in
Listing 36.

  public boolean screenMsg(String fileName,
String uidl,ScreenResult theResult){
try{
BufferedReader inData
= new BufferedReader(new FileReader(
fileName));

Listing 36

The code in Listing 36 gets a BufferedReader object that will
be used to read the raw text of the message stored in the file whose
name was received as a parameter.

Initialize the ScreenResult object

The code in Listing 37 populates three of the fields in the ScreenResult
object received as an incoming parameter.  Two of these fields are
populated with messages that will be overwritten later if Subject
and
From data is successfully extracted from the file
containing the
message.

      theResult.subject = "No Subj line found";
theResult.from = "No From line found";
theResult.thePhrase = "No phrase "
+ "available for test program.";

Listing 37

The text that is stored in the field named thePhrase will
not be overwritten later because this stripped-down version knows
nothing about offending SPAM word or phrases.

Get the Subject data

Without getting into the details, the code in Listing 38 attempts to
extract a text line from the message that begins with “Subject:”
If successful, the data is used to overwrite the contents of the subject
field of the ScreenResult object.

      String data;
inData.mark(10000);
while((data = inData.readLine()) != null){
if(data.startsWith("Subject:")){
theResult.subject = data.toUpperCase();
break;
}//end if
}//end while loop

Listing 38

Get the From data

Similarly, the code in Listing 39 attempts to extract a text line from
an upper-case version of the message that begins with “From:”
If successful, the data is used to overwrite the contents of the from
field of the ScreenResult object.

      inData.reset();
while((data = inData.readLine()) != null){
if(data.toUpperCase().
startsWith("FROM:")){
theResult.from = data;
break;
}//end if data starts with From
}//end while loop on null

Listing 39

Get the entire message

The code in Listing 40 attempts to read the entire message and deposit
it in the text field of the ScreenResult object.

      inData.reset();
while((data = inData.readLine()) != null){
theResult.text =
theResult.text + data + "n";
}//end while loop on read until null
inData.close();
}catch(Exception e){e.printStackTrace();}

Listing 40

Return a boolean value

Finally, the code in Listing 41 returns a boolean value.  This
value toggles between true and false as each successive message is
processed.  Therefore, it has no meaning insofar as SPAM is
concerned.

Notice:  A true return value should not be used to
indicate that you should delete a message from the server.

    if(returnValue == false){
returnValue = true;
}else{
returnValue = false;
}//end else
return returnValue;
}//end screenMsg

}//End stripped-down Screen class

Listing 41

This boolean value will be stored in the variable named match
in Listing 25, and will be tested in the if statement of
Listing 28.

If the return value is true

If the return value is true, the actionPerformed method will
return immediately in Listing 28, allowing the user to ponder the data
returned by the screenMsg method in deciding whether or not to
delete the message from the server.

Once again, let me caution you not to enable the DELE
code in Listing 42 near the end of the lesson until you are certain
that you actually want to delete messages from the server.  If you
do enable it, do not press the Delete button just because this
stripped-down version of the screenMsg method returns true.

If the return value is false

If the screenMsg method returns false, the code in Listing 29
immediately fires a synthetic ActionEvent, attributable to the Start/Next
button, which cases the next message to be retrieved from the server.

Run the Program

I encourage you to copy the code from Listing 42 into
your text editor.  Compile and execute the
program.  Experiment with it, making changes, and observing the
results
of your
changes.

You may want to modify this code to cause the messages to be stored
in a different location on your disk.  If so, modify the string in
the statement
in Listing 17 that reads “c:/MailFiles/”
+ uidl + “.txt”
to
specify a different folder. Make certain that the folder
where you plan to save the files exists before running the program.

(Once again, let me caution you not to enable the
DELE
code in Listing 42 until you are certain
that you actually want to delete messages from the server.  Once a
message is deleted from the server, there is no way to recover it from
the server.)

Summary

This lesson explains the communications module used to communicate with
your Email server, and to remove SPAM messages from the
server before they are downloaded into your primary Email client.

The program is designed to allow you to use my SPAM screening
algorithm, or to invent your own.  I will present the details of
my SPAM screening algorithm in the next lesson in the series.

The version of the program discussed in this lesson has a
stripped-down version of a class named Screen. This version of
the program
makes it possible for you to test the communications module on your
system with your
Email server
without doing any actual screening for SPAM.

The capability to actually delete messages from the server is disabled
in the version of the program shown in Listing 42.  You should not
enable that capability until you fully understand what you are doing
and you are certain that you really do want to delete messages from the
server.  Once a message is deleted from the server, it cannot be
recovered from the server.

What’s Next?

In the next lesson, I will present and explain my version of the
class named Screen.  This class contains my version of a
SPAM screening algorithm.  You may want to use my version, replace
my version with an algorithm of your own, or do some combination of the
two.

Complete Program Listing


A complete listing of the program follows in Listing 42.  Note
that this listing contains a stripped-down version of the class named Screen
The full version of the class named Screen will be provided in
the next lesson in this series.

Also, the three DELE statements shown in red in Listing 42
have been purposely disabled to prevent you from accidentally deleting
messages from your server while testing this program.

Do not enable these three statements until you are ready
to actually delete messages from the server.  Once a message is
deleted from the server, it cannot be recovered from the server.

Disclaimer of responsibility:  If you elect to use this program
you use it at your own risk.  Make absolutely certain that you
understand what you are doing before you execute the program.  The
author of this program, Richard G. Baldwin, accepts no responsibility
for any losses that you may incur as a result of using this program.

/*File Pop302.java Copyright 2004, R.G.Baldwin
Rev 01/01/04

Note: This version has a stripped down class
named Screen. This version allows testing of
the Email server communications without doing
any actual testing for SPAM.

The purpose of this program is to read messages
from a POP3 server, analyze the messages
according to screening rules, and delete those
messages from the server that fail the screening
test. (As written, the program asks the user
to confirm the deletion of each message, but
this confirmation step could easily be removed.)

This version of the program screens on the basis
of key words or phrases in the From line, key
words or phrases in the Subject line, and key
words or phrases in the body text.

A list of friendly Email addresses is used to
screen the From line. Messages that are from
friendly Email addresses are not deleted from
the server and no information about those
messages is saved on the local disk. They are
totally ignored after determining that they were
sent from a friendly Email address.

Different lists of words are used for screening
Subject lines and body text. For example,
ANTIVIRUS is appropriate for screening the
Subject line, but is not appropriate for
screening the body text. The word ANTIVIRUS
often appears legally in the header of Email
messages that have been scanned for viruses by
the server, but also often appears in the Subject
line of SPAM messages.

The common spammer tricks of inserting extra
characters between the characters in the
offending word and mixing the case of the
characters in the offending word is defeated by
this program.

For example, this program will flag for deletion
a message having any of the following in its
Subject line or its body text:

vIaGrA
V.IagRA
V.I.A.G.R.A

This program also defeats the common trick of
appending random characters to the end of the
Subject line, because it doesn't require a match
for the entire Subject line.

When the program detects a message that is a
candidate for deletion, the user is asked to
verify the deletion by clicking the Delete
button. If the user doesn't want to delete the
message, she should click the Start/Next
button.

The following information is available to the
user for making that decision:
- From
- Subject
- Offending line, which may also be the subject
- Offending word or phrase
- Entire raw text of the message up to and
including the offending line

All messages that are candidates for deletion
from the server are saved in an archive folder
on the local disk, regardless of whether the
user elects to delete them from the server. Thus
if a message is deleted from the server and it is
later determined that was a mistake, a raw text
copy of the deleted message is available locally
in the archive folder. You should probably empty
this folder periodically so that it won't fill
up your disk.

Except for friendly messages, all messages that
are not candidates for deletion from the server
are saved in a history folder on the local
disk. These messages can be used later to train
the program to do a better job of recognizing
SPAM.

Before any message is saved in a local file,
asterisks are inserted into the text on
ten-character intervals in an attempt to destroy
any virus code that may be embedded in the
message.

Numerous upgrades are possible. One possible
upgrade is to create a premium list of words and
phrases that will always result in deletion of
the message from the server without prior
approval by the user. For example, the user
might want to have any message containing
VIAGRA to be automatically deleted. However,
great care is urged in this regard. Certain
words such as SPAM and PORN occasionally occur
in a message with the letters separated by only
a few characters. This program would identify
those messages as being candidates for deletion.
For example, the offending word PORN occurs in
the non-offending word imPORtaNt with the letters
R and N separated by only two characters. The
word SLUT appears in the word SoLUTion with only
one character between the S and the L. The word
SPAM often occurs in different variations of
body text.

Another possible upgrade would be to allow the
user to specify the number of characters that may
occur between the letters of an offending word
or phrase. As programmed, that value is
hard-coded into the program, and as of this
writing, that value is one.

If the number of characters is set to zero, many
spam messages will avoid detection. If that
value is set to a large number, many false alarms
will occur. Therefore, care should be taken when
adjusting this value.

Another possible modification would be to allow
the program to automatically delete all
messages that are determined to be candidates
for deletion. Since these messages are saved
locally in an archive folder, a separate program
could be written to allow the user to review
those messages locally at her convenience just
in case a valid message was inadvertently
deleted from the server.

Companion programs that I have written provide
for creating and maintaining the word lists
discussed above in disk files. These programs
are used to analyze the non-deleted message files
saved locally in the history folder in order to
train this program to do a better job of
identifying SPAM messages in the future. These
programs are designed for ease of use to
encourage the user to train the program
frequently.

All three word lists are maintained in simple
text files, which can be edited with an
ordinary text editor if need be.

For technical information on POP3, see RFC 1725
at
http://www.cis.ohio-state.edu/htbin/rfc/rfc1725.
html

A POP3 Command Summary follows based on the
information at that web site.

Minimal POP3 Commands:
USER name
PASS string
QUIT
STAT
LIST [msg]
RETR msg
DELE msg
NOOP
RSET
QUIT

Optional POP3 Commands:
APOP name digest
TOP msg n
UIDL [msg]

POP3 Replies:
+OK
-ERR

File names: The following file names are hard-
coded into the program:

The file name for a local copy of a message is
the unique identifier for that message obtained
from the mail server.

Pop302a.txt - contains a word list for screening
the Subject lines.

Pop302b.txt - contains a word list for screening
the body text lines.

Pop302c.txt - contains a list of friendly Email
addresses for screening the From lines to
identify friendly messages.

This program consists of two main classes. An
object of the class named Pop302 handles all
communications with the Pop3 server.

An object of the class named Screen screens each
message in an attempt to identify SPAM. This
class can be totally replaced by Java programmers
who wish to design their own screening algorithm
provided they maintain the interface with the
object of the class named Pop302.

Tested using SDK 1.4.2 under WinXP
************************************************/

import java.net.*;
import java.io.*;
import java.util.*;
import java.awt.*;
import java.awt.event.*;

class Pop302 extends Frame{
int msgCounter = 0;
int msgNumber;
TextArea textArea;
TextField subjField;
TextField fromField;
TextField operMsgField;
int numberMsgs = 0;
String uidl = "";//unique msg ID
BufferedReader inputStream;
PrintWriter outputStream;
Socket socket;
Screen screener;

public static void main(String[] args){
if(args.length != 3){
System.out.println("Usage: java Pop301 "
+ "server userName password");
System.exit(0);
}//end if

new Pop302(args[0],args[1],args[2]);
}//end main
//===========================================//

Pop302(String server,String userName,
String password){
//Instantiate a new Screen object and pass
// this to allow for the object to call back
// and update the progress indicator.
screener = new Screen(this);

int port = 110; //pop3 mail port
try{
//Get a socket, connected to the
// specified server on the specified
// port.
socket = new Socket(server,port);

//Get an input stream from the socket
inputStream = new BufferedReader(
new InputStreamReader(
socket.getInputStream()));

//Get an output stream to the socket.
// Note that this stream will autoflush.
outputStream = new PrintWriter(
new OutputStreamWriter(
socket.getOutputStream()),true);

//Display the msg received from the
// server on the command-line screen
// immediately following connection.
String connectMsg = validateOneLine();
System.out.println("Connected to server "
+ connectMsg);

//The communication process is now in the
// AUTHORIZATION state. Send the user
// name and password to the server. Note
// that the use of an APOP command
// for sending user name and password
// would probably be more secure
// if it is supported by the server.
// However, my server apparently doesn't
// support APOP.
//Commands are sent in plain text, upper
// case to the server. Some commands
// require an argument following the
// command, as is the case with USER.
//Send the command.
outputStream.println("USER " + userName);
//Get response and confirm that the
// response was +OK and was not -ERR.
String userResponse = validateOneLine();
//Display the response on the command-
// line screen. Cannot display in the
// GUI at this point in time because the
// GUI object is not ready for use at
// this point in the execution of the
// constructor.
System.out.println("USER " + userResponse);
//Send the password to the server
outputStream.println("PASS " + password);
//Validate the server's response as +OK.
// Display the response in the process.
System.out.println(
"PASS " + validateOneLine());
}catch(Exception e){e.printStackTrace();}

//Register a window listener to service
// the close button on the Frame. This is
// an anonymous class defiition.
this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){

//Terminate the session with the
// server.
outputStream.println("QUIT");
String quitResponse =
validateOneLine();
//Display the response on the
// command-line screen.
System.out.println(
"QUIT " + quitResponse);
//Also display the response on the
// GUI. However, you probably won't
// see it because the GUI is
// closing.
textArea.append(quitResponse + "n");

//Server is now in the UPDATE mode.
// It will delete all files marked
// with the DELE command earlier
// in the execution of the program.
//Close the socket
try{
socket.close();
}catch(Exception ex){
ex.printStackTrace();}

System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener

//Note, this GUI was purposely made narrow
// in order to make it fit into the
// publication format. You should make
// it wider and also increase the width of
// the text fields and the TextArea defined
// below to make it more useful.
setLayout(new FlowLayout());
//Note that the compiler requires the
// references to the following buttons to
// be final because they are accessed from
// within an anonymous class definition.
final Button startButton =
new Button("Start/Next");
final Button deleteButton =
new Button("Delete");
subjField = new TextField(
"Display Subj here",50);
fromField = new TextField(
"Display From line here",50);
operMsgField = new TextField(
"Display operator messages here",50);
textArea = new TextArea(15,50);
textArea.append("Display raw data heren");

//Register an ActionListener on the
// startButton. This is an anonymous
// class definition.
startButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
//Clear the operator message field
operMsgField.setText("");

try{
//The communication process is now
// in the TRANSACTION state.
//Retrive and screen messages
if(numberMsgs == 0){
//Calculate numberMsgs only at
// the beginning of the run,
// because it changes when
// messages are deleted.
outputStream.println("STAT");
String stat = validateOneLine();
//Get the number of messages as
// a String.
String numberMsgsStr =
stat.substring(
4,stat.indexOf(" ",5));
//Convert the String to an int.
numberMsgs = Integer.parseInt(
numberMsgsStr);
}//end if numberMsgs == 0
//NOTE: Msg numbers begin with 1,
// not 0.
//Retrieve and screen each
// message. Each msg ends with a
// period on a new line.
msgNumber = msgCounter + 1;

if(msgNumber <= numberMsgs){
//Process the next message.

//Get and save a unique identifier
// for the message from the server
// and validate the response.
outputStream.println(
"UIDL " + msgNumber);
uidl = validateOneLine();

//Open an output file to save
// the message. Use the UIDL
// as the file name. Others
// may need to modify the
// following code to identify
// a folder for local storage of
// the messages.
String fileName =
"c:/MailFiles/" + uidl +".txt";
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
fileName));

//Send a RETR command to begin
// the message retrieval process
outputStream.println(
"RETR " + msgNumber);
//Validate the response.
String retrResponse =
validateOneLine();

//Clear the text in the TextArea
// at the beginning of each new
// message. If you don't do
// this, the String being
// displayed will become very
// long and the program will run
// very slowly for large numbers
// of messages.
textArea.setText("");

//Read the first line in the
// message from the server.
String msgLine =
inputStream.readLine();
//Insert asterisks in the text
// in an attempt to destroy
// viruses before the file is
// stored locally.
msgLine = insertStars(msgLine);

//Continue reading lines until
// a "." is encountered as the
// first char in a line. That
// signals the end of the msg.
while(!(msgLine.equals("."))){
//Write the line to the output
// file and read the next
// line. Insert newline
// characters when writing the
// output to the file.
dataOut.writeBytes(
msgLine + "n");
msgLine = inputStream.readLine();
//Insert asterisks to destroy
// virus code.
msgLine = insertStars(msgLine);
}//end while
//Close the output file. The
// message is now stored in a
// local file with a file name
// based on the unique ID
// provided by the server. Note
// that a unique ID provided by
// one server may duplicate a
// unique server provided by a
// different server.
dataOut.close();

//Now screen the file testing
// for reasons to delete the
// message from the server.
//First initialize the text showing
// in the various components in the
// GUI.
fromField.setText("Call screener");
subjField.setText("Call screener");
operMsgField.setText(
"Call screener");
textArea.setText(
"Progress Meter: ");
//Initialize the match flag
// to false.
boolean match = false;

//Now cause the message file to be
// screened. In the event that you
// decide to design your own
// screening algorithm, this is
// where you you would probably
// make the first modification to
// the program. Your version of
// the method named screenMsg
// should return true if it is
// recommending that the message be
// deleted from the server. Also,
// the object of type ScreenResult
// passed as a parameter to the
// method should be populated with
// information to be displayed in
// the text fields and text area of
// the GUI.
ScreenResult theResult =
new ScreenResult();
match = screener.screenMsg(
fileName,uidl,theResult);

//Now display the information
// encapsulated in the ScreenResult
// object by the screenMsg method.
fromField.setText(theResult.from);
subjField.setText(
theResult.subject);
operMsgField.setText(
"Offending Phrase: "
+ theResult.thePhrase);
textArea.setText(theResult.text);
//Scroll the text area to the end
textArea.select(
theResult.text.length()-2,
theResult.text.length()-1);

//At this point, the user can
// view the From line and the
// Subject line for the message,
// the complete text of the message
// down to the line containing the
// offending word or phrase, as
// well as that word or phrase.

//Increment the message counter
// in preparation for
// processing the next message.
msgCounter++;

//A return value of true means that
// the screener is recommending
// deletion of the message from the
// Email server.
if(match == true){
//The message has been flagged
// as a candidate for deletion
// from the server. Return
// from the ActionPerformed
// method and take no further
// action until the user
// presses the Delete button
// or the Start/Next button.
//Pressing the Delete button
// causes the message to be
// deleted from the server.
//Pressing the Start/Next
// button causes it to be
// preserved.
return;
}//end if match == true

//Control reaches this point only
// if match is not true.
//The messaage is not a
// candidate for deletion from
// the server.
//At this point, we could
// require the user to press
// the Start/Next button to
// process the next message.
//However, we won't do that. The
// following code fires an event
// identical to that which would
// be fired if the user pressed
// the Start/Next button.
Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));
}//end if msgNumber <= numberMsgs
else{//msgNumber > numberMsgs
//No more messages. Disable the
//Start/Next button.
startButton.setEnabled(false);
//Instruct the user to terminate
// the program.
subjField.setText(
"No more messages, press Close");
fromField.setText(
"No more messages, press Close");
operMsgField.setText(
"No more messages, press Close");
textArea.setText(
"No more messages, press Close");
}//end else
}//end try
catch(Exception ex){
ex.printStackTrace();}
}//end actionPerformed
}//end ActionListener
);//end addActionListener

//Register an ActionListener on the Delete
// button to make it possible for the
// user to remove a message from the
// server.
deleteButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
//Clear the operator message field
operMsgField.setText("");

//Deletion of a message from the
// server is accomplished by marking
// the message for deletion while in
// the TRANSACTION state. The
// message is actually deleted when
// the client sends a QUIT command
// to the server causing the server
// to enter the UPDATE state. If the
// program aborts prematurely before
// sending a QUIT command, marked
// messages are not deleted from the
// server.
//Mark the message for deletion.
//The following statement has been purposely
// disabled to prevent you from inadvertently
// deleting messages from your Email server. You
// should not enable this statement until you
// are confident that you really do want to
// delete messages from the server.
// outputStream.println(
// "DELE " + msgNumber);

//Validate the response and display
// it on the GUI. You probably won't
// see it on the GUI because of what
// heppens next. The program
// immediately clears the display
// and begins processing the
// next message. If you modify the
// program to eliminate the clearing
// of the display between messages,
// you will see this response.
// textArea.append(
// "DELE "+validateOneLine()+"n");
// textArea.append(
// "Deleted:" + msgNumber + "n");

//Create and fire a synthetic event
// that simulates the user pressing
// the Start/Next button. This
// initialtes the processing of the
// next message.
Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));
}//end actionPerformed
}//end ActionListener
);//end addActionListener

//Configure the GUI by placing the
// various components on it, setting the size
// and making it visible.
add(startButton);
add(deleteButton);
add(fromField);
add(subjField);
add(operMsgField);
add(textArea);
setTitle("Copyright 2004, R.G.Baldwin");
//Increase the following parameters and
// modify the construction parameters for
// the text fields and the text area to
// increase the size of the GUI.
setSize(400,400);
//Make the GUI visible.
setVisible(true);
}//end constructor
//===========================================//

//Validate a one-line response.
//The purpose of this method is to confirm that
// the server returned +OK and not -ERR to the
// previous command.
//If +OK, the method returns the string
// returned by the server.
//If -ERR, the method displays the string
// returned by the server and terminates the
// session.
private String validateOneLine(){
try{
String response = inputStream.readLine();
if(response.startsWith("+OK")){
return response;
}else{
System.out.println(response);
//Terminate the session.
outputStream.println("QUIT");
socket.close();
System.out.println(
"Premature QUIT on -ERR");
System.exit(0);
}//end else
}catch(IOException e){e.printStackTrace();}
//The following return statement is requied
// to satisfy the compiler.
return "Make compiler happy";
}//end validateOneLine()
//===========================================//

//Purpose of this method is to insert an
// asterisk (star) every tenth character in
// order to destroy virus code before it is
// written into the output file. While this
// makes the local version of the message
// harder to read, it does little to reduce its
// usefulness for computer analysis.
private String insertStars(String stringIn){
StringBuffer stringBuffer =
new StringBuffer(stringIn);
int length = stringBuffer.length();
for(int cnt = 9; cnt < length; cnt+=10){
stringBuffer.insert(cnt,'*');
}//end for loop
return new String(stringBuffer);
}//end insertStars
//===========================================//
}//end class Pop302
//=============================================//

//Class to encapsulate screening results. An
// object of this type is passed to the screenMsg
// method where it is populated with the results
// of the screen.
class ScreenResult{
public String subject = "";
public String from = "";
public String thePhrase = "";
public String text = "";
}//end ScreenResults
//=============================================//

//This is a stripped-down version of the class
// named Screen, designed solely to make it
// possible for you to test the class named
// Pop302 on your system with your Email server.
//Each time the method named screenMsg is called,
// it's return value toggles between true and
// false.
class Screen{
boolean returnValue = true;

Screen(Pop302 dummy){//dummy constructor
}//end constructor

public boolean screenMsg(String fileName,
String uidl,ScreenResult theResult){
try{
BufferedReader inData
= new BufferedReader(new FileReader(
fileName));

theResult.subject = "No Subj line found";
theResult.from = "No From line found";
theResult.thePhrase = "No phrase "
+ "available for test program.";

String data;
inData.mark(10000);
while((data = inData.readLine()) != null){
if(data.startsWith("Subject:")){
theResult.subject = data.toUpperCase();
break;
}//end if
}//end while loop

inData.reset();
while((data = inData.readLine()) != null){
if(data.toUpperCase().
startsWith("FROM:")){
theResult.from = data;
break;
}//end if data starts with From
}//end while loop on null

inData.reset();
while((data = inData.readLine()) != null){
theResult.text =
theResult.text + data + "n";
}//end while loop on read until null
inData.close();
}catch(Exception e){e.printStackTrace();}

//Toggle return value between true and false
if(returnValue == false){
returnValue = true;
}else{
returnValue = false;
}//end else
return returnValue;
}//end screenMsg
//===========================================//
}//End stripped-down Screen class

Listing 42

 


Copyright 2004, Richard G. Baldwin.  Reproduction in whole or
in
part in any form or medium without express written permission from
Richard
Baldwin is prohibited.

About the author

Richard Baldwin
is a college professor (at Austin Community College in Austin, TX) and
private consultant whose primary focus is a combination of Java, C#,
and XML. In addition to the many platform and/or language independent
benefits of Java and C# applications, he believes that a combination of
Java, C#, and XML will become the primary driving force in the delivery
of structured information on the Web.

Richard has participated in numerous consulting projects, and he
frequently provides onsite training at the high-tech companies located
in and around Austin, Texas.  He is the author of Baldwin’s
Programming Tutorials, which
has gained a worldwide following among experienced and aspiring
programmers. He has also published articles in JavaPro magazine.

Richard holds an MSEE degree from Southern Methodist University
and has many years of experience in the application of computer
technology to real-world problems.

Baldwin@DickBaldwin.com

-end-
 

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories