July 24, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Enlisting Java in the War Against SPAM: The Communications Module

  • January 6, 2004
  • By Richard G. Baldwin
  • Send Email »
  • More Articles »

Java Programming Notes # 2150


Preface

This is the first lesson in a series designed to teach you how to write a Java program to remove SPAM from your Email server before it is downloaded into your primary Email client.

The communications module

This lesson explains the communications module used to communicate with your Email server, and to remove SPAM messages from the server.

SPAM screening algorithm

The program is designed to allow you to use my SPAM screening algorithm, or to invent your own.  Subsequent lessons will explain the inner workings of my SPAM screening algorithm.  You can use my algorithm as a starting point if you decide to invent your own.  Those lessons will also explain how the system can be trained to do an increasingly better job of screening SPAM over time.

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java tutorials.  You will find those lessons published at Gamelan.com.  However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there.  You will find a consolidated index at www.DickBaldwin.com.

Preview

Can you write better SPAM screening algorithms?

Did you ever think that you might be able to write better SPAM screening algorithms than those available in the SPAM screening software that you are now using?  If so, this lesson is for you.

Even if that is not the case, like most of us, you are probably overwhelmed by SPAM and therefore you may find this lesson interesting.

Remove SPAM from the server

In this lesson, I will show you how to write a Java program that supplements the SPAM screening software that you are currently using.  This program is used to identify and remove SPAM from your Email server before it is downloaded into your primary Email client.

Any SPAM that makes it past this program can be further acted upon by the SPAM screener that is built into your Email client.

The communications module

This series will consist of four lessons.  This lesson, which is the first in the series, will explain the communications module used to communicate with your Email server, and to remove SPAM messages from the server.

As mentioned earlier, the program is designed to allow you to invent and implement your own SPAM screening algorithm in addition to, or as an alternative to my algorithm.

My algorithm and algorithm training programs

The second lesson will explain the inner workings of my SPAM screening algorithm.  My algorithm operates separately on the Subject line, the From line, and the body text of each Email message.

The third lesson will explain a companion program designed to make use of historical data to easily train the algorithm to do a better job of identifying SPAM based on the Subject of the message.

The fourth lesson will explain another companion program designed to make use of historical data to easily train the algorithm to do a better job of identifying SPAM based on the body text of the message, which includes the From line.

Effectiveness of my algorithm

At this point in time, after about one week of training, my algorithm reliably identifies about ninety percent of all SPAM and allow me to delete it from my Email server before downloading it into my primary Email client.  Only time will tell if that percentage improves in the future.

Discussion and Sample Code

Stripped down Screen class

The version of the program that I will discuss in this lesson has a stripped-down version of a class named Screen. This version of the program allows for testing the communications module on your system with your Email server without doing any actual screening for SPAM and without deleting any messages from the server.

I will explain the full version of the class named Screen in the next lesson when I explain my algorithm for identifying SPAM.

Purpose of the program

The purpose of this program is to read messages from a POP3 (Post Office Protocol - Version 3) server, to analyze the messages according to a set of screening rules, and to delete those messages from the server that fail the screening test.
(As written, the program asks the user to confirm the deletion of each message from the server, but this confirmation step could easily be removed if you decide to do so.)
Key words and phrases

This version of the program screens for SPAM on the basis of key words or phrases in the From line, key words or phrases in the Subject line, and key words or phrases in the body text.

Friendly Email addresses and subjects

A list of friendly Email addresses and friendly subjects is used to screen the From line and the Subject line.  Messages that are from friendly Email addresses, and messages that have known good Subject lines are not deleted from the server and no information about those messages is saved on the local disk. They are simply ignored after determining that they are friendly.

Different lists for Subject and body text

Different lists of words and phrases are used for screening Subject lines and body text for SPAM. This is important because the same set of words and phrases can't always be used for both cases.

For example, the word ANTIVIRUS is appropriate for screening the Subject line, but is not appropriate for screening the body text. The word ANTIVIRUS often appears legally in the header of Email messages that have been scanned for viruses by the server, but also often appears in the Subject line of SPAM messages.

Common spammer tricks are defeated

Several common spammer tricks are defeated by my SPAM screening algorithm.

For example, the common spammer trick of inserting extra characters between the characters in an offending word or phrase is defeated.  Also, the common trick of mixing the case of the characters in an offending word or phrase is also defeated.

As a specific example, my algorithm will recommend deletion of any message having any of the following in its Subject line or its body text if the word VIAGRA is included in the lists used to screen for SPAM:

vIaGrA
V.IagRA
V.I.A.G.R.A

These two characteristics alone have a significantly positive impact on the effectiveness of training the algorithm to do a better job of identifying SPAM in the future.

My algorithm also defeats the common trick of appending random characters to the end of the Subject line, because it doesn't require a match for the entire Subject line.  Rather, it searches for words or phrases internal to the text of the Subject line.

The user interface

Figure 1 shows the GUI through which the user controls the program.

Graphical user interface

Figure 1 Graphical User Interface
(Note that this GUI was purposely made narrow in order to cause it to fit into this narrow publication format.  I recommend that you increase the width of the Frame to at least 750 pixels, and increase the width of the TextField and TextArea objects to at least 100 characters each.)
The Offending Phrase

When the program identifies a message that is a candidate for deletion, the reason for that recommendation is shown in the third text field from the top in Figure 1.
(An actual SPAM message is being displayed in the GUI in Figure 1, but the stripped-down version of the class named Screen was being used, so no Offending Phrase is shown in Figure 1.)
Deleting a message from the server

The user confirms that the message should be deleted from the Server by clicking the Delete button in Figure 1. If the user doesn't want to delete the message, she should click the Start/Next button instead.
(Note that the capability to actually delete messages from the server was disabled in the program shown in Listing 42 near the end of this lesson.  Make certain that you are ready to actually delete messages from the server before re-enabling that capability.)
The Netscape approach to SPAM screening

I currently use Netscape version 7.1 as my Email client.  Basically, it provides two forms of SPAM screening.  One form, which is referred to as Junk Mail Controls, is apparently based on some sort of artificial intelligence.  This capability can be trained over time to identify the kinds of messages that you consider to be junk mail.  This capability is very easy to train.  However, it produces lots of false positives and is very difficult to un-train when that happens.  (I will have more to say about false positives later.)

The other form of SPAM screening used by Netscape 7.1 is referred to as Message Filters.  This approach depends on exact character matching in the subject, the body, or in other parts of the message, such as sender, date, priority, etc.

In this case, you must enter the exact words or phrases into a form that will be used for matching purposes.  This approach is practically useless for SPAM screening due to the tendency of spammers to insert random characters into the offending words and phrases and to randomly modify the case of the characters in offending words and phrases.  Also, the process of entering the words and phrases into the form is very tedious and time consuming.  I long ago gave up on using Netscape's Message Filters for SPAM filtering.

False positives

All SPAM screening algorithms are subject to reporting false positives to some degree.  That is to say, a message may be erroneously identified as SPAM when it is actually a good message.

One of the major problems with my Netscape 7.1 system results from false positives.  Because of the high rate of false positives produced by Junk Mail Controls, whenever a message is identified as SPAM, I must confirm that it is SPAM before deleting it.  At that point in time, unless I am willing to actually open the message and to be confronted with a variety of offensive images and other offensive material, I must make my decision solely on the basis of the subject and the from address information.  Often this is not sufficient information to make an informed decision and I have no choice but to open the message.

Also, as I mentioned earlier, when Junk Mail Controls does report a false positive, there is no definitive way to make certain that it doesn't happen again in the future.  It is necessary to un-train the algorithm regarding messages of that type, which can be a long process, possibly involving many similar occurrences in the future.

More information is available with my system

When a user of my system is required to confirm deletion of a message from the server, the following information is available to assist in the making of the decision:
  • From line
  • Subject line
  • Offending line of text, which may or may not be the subject
  • Offending word or phrase in the offending line of text
  • Entire raw text of the message down to and including the offending line
No images are rendered in my system, so it is not necessary for the user to view offending images in order to make the decision to delete.

Having viewed the above information, if the user is still unable to make an informed decision to delete the message, the user still has the option to let the message pass through and be downloaded into the primary Email client.  Once having viewed the message later in the primary Email client, the user still has the option of updating the offending word lists in my system with IP addresses, URLs, etc, so that deletion decisions on future similar messages will be easier to make.

Saved in local archive folder

The raw text of all messages that are identified as candidates for deletion from the server are saved in an archive folder on the local disk, regardless of whether the user elects to delete them from the server or not. Thus if a message is deleted from the server and it is later determined that was a mistake, a raw text copy of the deleted message is available locally in the archive folder.
(You should probably empty this folder periodically so that it won't fill up your disk.)
Saved in history folder

Except for messages from friendly Email addresses or messages with friendly Subject lines, all messages that are not identified as candidates for deletion from the server are saved in a history folder on the local disk.  These messages are used later to train the algorithm to do a better job of identifying SPAM in the future.  I will explain this process in Part 3 and Part 4 of this series of lessons.

Protection against viruses

Before any message is saved in a local file, asterisks are inserted into the text on ten-character intervals in an attempt to destroy any virus code that may be embedded in the message.

If a message makes it through the screen and is later identified as having a virus as an attachment, a series of ten or more bytes can be extracted from the virus code and added to the word list as an offending phrase.  This will cause any future messages having that same virus code as an attachment to be identified as a candidate for deletion from the server.

Possible upgrades

Numerous upgrades to my system are possible and I'm confident that you will have ideas that I haven't thought of.  If so, I would like to hear about them.

One possible upgrade would be to create a premium list of words and phrases that will always result in automatic deletion of the message from the server without prior confirmation by the user. For example, the user might want to have any message containing the word VIAGRA to be automatically deleted.

Be careful with this

However, care is urged in this regard. Certain words such as SPAM and PORN occasionally occur in a message with the letters separated by only a few characters.  Depending on the degree of separation, my algorithm may identify those messages as being candidates for deletion.

For example, the offending word PORN occurs in the non-offending word imPORtaNt with the letters R and N separated by only two characters. The word SLUT appears in the word SoLUTion with only one character between the S and the L. The word SPAM often occurs in different variations of body text.

If such a premium word list is used for automatic deletion, it should probably be restricted to only those situations where the characters is the word exactly match (except for case) a word in the subject or the body of the message with no intervening characters separating the characters in the message.  Experience shows, however that very few matches would be made on this basis, so it may not be worth the effort.

Number of separation characters

Another possible upgrade would be to allow the user to specify the number of characters that may occur between the letters of an offending word or phrase in the message.

That value is currently hard-coded into the program.  As of this writing, that value is set to one for screening against offending words or phrases.  The value is set to zero when testing for friendly Email addresses in the From line and known good data in the Subject line.

If the number of characters is set to zero, many spam messages with offending words or phrases will avoid detection. If that value is set to a large number, many false positives will occur. Therefore, care should be taken when adjusting this value.

Automatic deletion of all SPAM candidates

For the brave among us, another possible modification would be to allow the program to automatically delete all messages that are determined to be candidates for deletion.

Since a text version of each of these messages is saved locally in an archive folder, a separate program could be written to allow the user to review those messages locally at her convenience, just in case a valid message was inadvertently deleted from the server.

Training programs

Companion programs that I have written provide for maintaining and upgrading the offending word and phrase lists.  These lists are saved in local text files.

These training programs are used to analyze the non-deleted message files saved locally in the history folder in order to train the algorithm to do a better job of identifying SPAM messages in the future.

These programs are designed for extreme ease of use to encourage the user to train the algorithm frequently.  The better the algorithm is trained, the better it will perform.

I will explain these training programs in Part 3 and Part 4 of this series of lessons.

Simple text files

All three word lists are maintained in local text files, which can be created and edited with an ordinary text editor if need be.  Thus, if some corruption gets into one of the word lists, it is easy to correct the situation using an ordinary text editor.

Technical information on POP3 protocol

For technical information on the POP3 protocol, see http://www.cis.ohio-state.edu/htbin/rfc/rfc1725.html.  I will frequently refer to this document as the technical document in the discussion that follows.

Command summary

A POP3 command summary based on the technical document is shown in Figure 2.

Minimal POP3 Commands:
USER name
PASS string
QUIT
STAT
LIST [msg]
RETR msg
DELE msg
NOOP
RSET
QUIT

Optional POP3 Commands:
APOP name digest
TOP msg n
UIDL [msg]

POP3 Replies:
+OK
-ERR
Figure 2

This program uses the commands that are highlighted in red in Figure 2.  I will explain those commands in conjunction with the code that uses them.

File names

The following file names are hard-coded into the program.  You may want to change these file names for your version of the program.
  • Local copy - the file name for a local copy of each message is based on the unique identifier for that message (UIDL) obtained from the mail server.
  • Pop302a.txt - contains a word list for screening the Subject lines for offensive words and phrases.
  • Pop302b.txt - contains a word list for screening the body text lines for offensive words and phrases.
  • Pop302c.txt - contains a list of friendly Email addresses and subjects for screening the From and Subject lines to identify friendly messages.
Three classes

This program consists of two main classes and one minor class. An object of the class named Pop302 handles all communications with the POP3 server.

A method belonging to an object of the class named Screen is used to screen each message in an attempt to identify SPAM.

This class can be totally replaced by Java programmers who wish to design their own screening algorithm provided that they maintain the interface with the object of the class named Pop302.

An object of a very simple class named ScreenResult is used as a wrapper to return several items of information from the screening method.

Testing

The program was tested using SDK 1.4.2 under WinXP in conjunction with two different POP3 Email servers.

The class named Pop302

As mentioned earlier, an object of the class named Pop302 handles all communications with the Email server, including the deletion of messages from the server.  An object of the class named Screen applies screening rules in an attempt to identify SPAM.

Stripped-down version of the Screen class

I will explain the class named Pop302 in this lesson, and will explain the class named Screen in the next lesson.

However, I will provide a stripped-down version of the Screen class in this lesson.  You can use the stripped-down version to test Pop302 on your system with your Email server, but no actual screening for SPAM will take place.

Will discuss in fragments

I will discuss the program in fragments.  A complete listing of the program is provided in Listing 42 near the end of the lesson.  You should be able to copy and paste that listing into your Java IDE to compile and test the program on your system.

Instance variables

The Pop302 class begins in Listing 1 with the declaration of several instance variables.  The purpose of these variables will become clear when I discuss them in conjunction with their use.

class Pop302 extends Frame{
int msgCounter = 0;
int msgNumber;
TextArea textArea;
TextField subjField;
TextField fromField;
TextField operMsgField;
int numberMsgs = 0;
String uidl = "";//unique msg ID
BufferedReader inputStream;
PrintWriter outputStream;
Socket socket;
Screen screener;

Listing 1

As you can see, Pop302 extends Frame.  Therefore, an object of the class Pop302 is a GUI.

The main method


The main method is shown in its entirety in Listing 2.

  public static void main(String[] args){
if(args.length != 3){
System.out.println("Usage: java Pop301 "
+ "server userName password");
System.exit(0);
}//end if

new Pop302(args[0],args[1],args[2]);
}//end main

Listing 2

When you start this program running, you need to provide the following information regarding your Email server as command line parameters in the order shown:
  • server
  • user name
  • password
The main method then instantiates an object of the Pop302 class, passing this information as parameters to the constructor.

The constructor

The Pop302 class consists mainly of the constructor plus a couple of helper methods.  The constructor code begins in Listing 3.

  Pop302(String server,String userName,
String password){
screener = new Screen(this);

Listing 3

The constructor begins by instantiating an object of the class named Screen, passing a reference to the Pop302 object as a parameter.

Code in the Screen class uses this reference later to display a progress indicator in the third text field in Figure 1.
(Note that the stripped-down version of the Screen class discussed in this lesson doesn't display the progress indicator.  You will have to wait until the next lesson to see that code.)
Get a socket

The code in Listing 4 instantiates a new Socket object on the standard port for POP3 servers.

    int port = 110; //pop3 mail port
try{
socket = new Socket(server,port);

Listing 4

When the constructor for the Socket class returns successfully, a TCP/IP connection will have been made with port 110 on the Email server identified as server.

If the attempt to make the connection fails, the program will throw an exception.  For example, if the value of server is invalid, the program will throw an UnknownHostException.
(If you are unfamiliar with socket programming in Java, see the lessons beginning with number 550 at www.DickBaldwin.com.)
Ready to communicate

At this point, the Email server is ready to communicate using the POP3 protocol.  In order to communicate, the program must be able to send messages to the server and read messages that are sent from the server.

Input and output streams

The code in Listing 5 gets input and output streams on the Socket object that make it possible to send messages to the server and to read messages sent from the server.

      inputStream = new BufferedReader(
new InputStreamReader(
socket.getInputStream()));

outputStream = new PrintWriter(
new OutputStreamWriter(
socket.getOutputStream()),true);

Listing 5

The code in Listing 5 is straightforward and shouldn't require further explanation.  If you are unfamiliar with this code, see the lessons on socket programming and input/output at www.DickBaldwin.com.

Basic POP3 operation

The following is a quotation from the technical document referred to earlier:
"Initially, the server host starts the POP3 service by listening on TCP port 110. When a client host wishes to make use of the service, it establishes a TCP connection with the server host. When the connection is established, the POP3 server sends a greeting. The client and POP3 server then exchange commands and responses (respectively) until the connection is closed or aborted."
The document goes on to explain:
"Commands in the POP3 consist of a keyword, possibly followed by one or more arguments. All commands are terminated by a CRLF pair. Keywords and arguments consist of printable ASCII characters. Keywords and arguments are each separated by a single SPACE character. Keywords are three or four characters long. Each argument may be up to 40 characters long."
Finally, the document tells us:
"Responses in the POP3 consist of a status indicator and a keyword possibly followed by additional information. All responses are terminated by a CRLF pair. There are currently two status indicators: positive ("+OK") and negative ("-ERR")."
The greeting

That brings us to the greeting mentioned above.

The code in Listing 6 gets and displays the greeting received from the Email server.  In the process, the code in Listing 6 invokes the method named validateOneLine to confirm that the message received from the Email server begins with +OK, and not with -ERR.

      String connectMsg = validateOneLine();
System.out.println("Connected to server "
+ connectMsg);

Listing 6

(If the response begins with -ERR, the program terminates the communication session with the server, prints an error message, and terminates.)
The validateOneLine method

The code in Listing 6 invokes the method named validateOneLine to get and validate the message sent by the server.  At this point, I am going to set the discussion of the constructor aside for a moment and discuss the method named validateOneLine.

The validateOneLine method begins in Listing 7.

  private String validateOneLine(){
try{
String response = inputStream.readLine();
if(response.startsWith("+OK")){
return response;
}//end if

Listing 7

The method begins by reading a line of text sent by the server and confirming that the text begins with +OK.  If so, the method simply returns that line of text as a String object, where it is displayed by the second statement in Listing 6.

If -ERR is received

If the received line of text does not begin with +OK, it must begin with -ERR, which is the only other possibility allowed by the protocol.

Listing 8 shows the behavior of the validateOneLine method when the received line of text does not begin with +OK.

      else{
System.out.println(response);
//Terminate the session.
outputStream.println("QUIT");
socket.close();
System.out.println(
"Premature QUIT on -ERR");
System.exit(0);
}//end else
}catch(IOException e){e.printStackTrace();}
//The following return statement is required
// to satisfy the compiler.
return "Make compiler happy";
}//end validateOneLine()

Listing 8

In this case, the method:
  • Displays the line of text that was received.
  • Sends a QUIT command to the server to terminate the session.
  • Closes the socket.
  • Prints an error message.
  • Terminates the program.
As you will see later, this method is invoked at numerous places in the program to get and validate a server response to commands sent to the server by the program.

The greeting

The greeting sent by one of my Email servers is shown in Figure 3.

+OK POP3 server1.yohance.com v2001.78rh
server ready

Figure 3
(The actual text in the greeting will vary from one Email server to the next.

Note that I manually inserted a line break immediately following 78rh in Figure 3 to force the greeting to fit in this narrow publication format.)
The AUTHORIZATION state

The following is a quotation from the technical document mentioned earlier:
"A POP3 session progresses through a number of states during its lifetime. Once the TCP connection has been opened and the POP3 server has sent the greeting, the session enters the AUTHORIZATION state. In this state, the client must identify itself to the POP3 server."
Returning to the constructor

At this point, the greeting has been received, and the POP3 session is in the AUTHORIZATION state.  It is now time for the program to send the user name and the password to the server.

Commands are sent in plain text, upper case to the server.  Some commands require an argument following the command, as is the case with the USER command shown in Listing 9.

      //Send the command
outputStream.println("USER " + userName);

//Get and validate response
String userResponse = validateOneLine();

//Display the response
System.out.println("USER " + userResponse);

Listing 9

The code in Listing 9 produces the output shown in Figure 4 on my system with my Email server.  (The response from your Email server may differ.)
 
USER +OK User name accepted, password please
Figure 4

The APOP command


There is an optional APOP command, which allows the user name and password to be encrypted before being sent to the server.  The use of the APOP command would be more secure than the approach shown in Listing 9 and Listing 10.  However, this command is not supported by all Email servers, and apparently is not supported by my server.

Send the password

The code in Listing 10 sends the password, validates the response, and displays the response.

      //Send the password to the server
outputStream.println("PASS " + password);

//Validate and display response
System.out.println(
"PASS " + validateOneLine());
}catch(Exception e){e.printStackTrace();}

Listing 10

The code in Listing 10 produces the output shown in Figure 5.
 
PASS +OK Mailbox open, 7 messages
Figure 5
(Obviously the number of messages available will vary from one run to the next.)
The TRANSACTION state

Returning now to the technical document, we find:
"... the client must identify itself to the POP3 server. Once the client has successfully done this, the server acquires resources associated with the client's maildrop, and the session enters the TRANSACTION state. In this state, the client requests actions on the part of the POP3 server."
Having received the +OK response shown in Figure 5, our POP3 session is now in the TRANSACTION state.

The QUIT command and the UPDATE state

We find the following information in the technical document:
"When the client has issued the QUIT command, the session enters the UPDATE state. In this state, the POP3 server releases any resources acquired during the TRANSACTION state and says goodbye. The TCP connection is then closed."
Terminating the POP3 session

We are still discussing the constructor.  Listing 11 shows the code used to register a WindowListener object on the close button on the Frame.  The purpose of this listener is to terminate the POP3 session and to terminate the program when the user presses the close button.

    this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){

//Terminate the session
outputStream.println("QUIT");

//Validate and display response
String quitResponse =
validateOneLine();
System.out.println(
"QUIT " + quitResponse);
textArea.append(quitResponse + "\n");

//Close the socket
try{
socket.close();
}catch(Exception ex){
ex.printStackTrace();}

//Terminate the program
System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener

Listing 11

(Note that the code in Listing 11 is an anonymous class definition.  If you are unfamiliar with anonymous class definitions in Java, you can learn about them by studying the tutorial lessons at www.DickBaldwin.com.)
The windowClosing method

By defining the windowClosing method in the anonymous class, the code in Listing 11:
  • Sends a QUIT command to the server.
  • Validates and displays the response.
  • Closes the socket.
  • Terminates the program
The goodbye message from the server

In addition to displaying the response on the command-line screen, the code in Listing 11 also displays it in the large text area in Figure 1.  However, you will have to look very quickly to see it there before the GUI disappears.

The response provided by my server is shown in Figure 6.
 
QUIT +OK Sayonara
Figure 6

The UPDATE state

At this point, the POP3 session is in the UPDATE state.  Among other things, this means that the server will delete all of the messages that were marked for deletion by the DELE command while the session was in the TRANSACTION state.

Here is some of what the technical document has to say about the UPDATE state:
"When the client issues the QUIT command from the TRANSACTION state, the POP3 session enters the UPDATE state. (Note that if the client issues the QUIT command from the AUTHORIZATION state, the POP3 session terminates but does NOT enter the UPDATE state.)

If a session terminates for some reason other than a client-issued QUIT command, the POP3 session does NOT enter the UPDATE state and MUST not remove any messages from the maildrop.

The POP3 server removes all messages marked as deleted from the maildrop. It then releases any exclusive-access lock on the maildrop and replies as to the status of these operations. The TCP connection is then closed."
Defining the GUI

Note that the GUI shown in Figure 1 was purposely made narrow so that it would fit into this narrow publication format.  However, it is much more useful if it is wide enough to display each text line in the message in its entirety without a requirement for horizontal scrolling.  Therefore, I recommend that you resize the GUI to make it at least 750 pixels wide.  I also recommend that you make each of the text fields and the text area at least 100 characters wide.

Set the layout

Listing 12 sets the GUI layout to FlowLayout.  Although this isn't very fancy, it works pretty well in this case.

    setLayout(new FlowLayout());

Listing 12

Construct GUI components

Listing 13 constructs the two buttons, the three text fields, and the text area shown in Figure 1.

    final Button startButton =
new Button("Start/Next");
final Button deleteButton =
new Button("Delete");
fromField = new TextField(
"Display From line here",50);
subjField = new TextField(
"Display Subj here",50);
operMsgField = new TextField(
"Display operator messages here",50);
textArea = new TextArea(15,50);

//Display initial message
textArea.append("Display raw data here\n");

Listing 13

No labels are provided

In order to preserve real estate on the screen, I did not provide labels to identify the text fields in Figure 1.  Rather, when the text fields are instantiated, the initial text showing in each text field indicates its purpose.  For example, the initial text that appears in the topmost text field is "Display From line here."

The last statement in Listing 13 also displays the purpose of the text area in the text area when it first appears on the screen.

Not yet added to the GUI

Note that at this point, the GUI components have been constructed, but have not yet been placed in the GUI.  This will be taken care of later.

References to buttons are final

Note also that it is necessary to declare the references to the two Button objects to be final, because they are accessed later from within an anonymous class definition.  Local and anonymous classes can access local variables only if they are declared final.

ActionListener on the Start/Next button

Listing 14 shows the beginning of the registration of an anonymous ActionListener object on the Start/Next button shown in Figure 1.

    startButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){

//Clear the operator message field
operMsgField.setText("");

Listing 14

Listing 14 clears the third text field from the top in Figure 1 by storing an empty string in that text field.

Retrieve and screen messages for SPAM

As mentioned earlier, the POP3 session is now in the TRANSACTION state.  The code in Listing 15 begins the process of retrieving all the messages currently on the server and screening those messages for SPAM.

The number of messages on the server

One of the first things that we need to know is how many messages are currently in the dropbox on the server.  The code in Listing 15 sends a STAT command to the server to get this information.

          try{
if(numberMsgs == 0){
outputStream.println("STAT");
String stat = validateOneLine();

//Get number of messages as String.
String numberMsgsStr =
stat.substring(
4,stat.indexOf(" ",5));

//Convert the String to an int.
numberMsgs = Integer.parseInt(
numberMsgsStr);
}//end if numberMsgs == 0

Listing 15

Get number of messages only at beginning of session

As the session progresses and DELE commands are sent to the server, messages are marked for deletion.  Once a message is marked for deletion, it is no longer included in the count of messages on the server.  Therefore, we must make certain that we obtain the number of messages on the server only at the beginning of the session.

As you will see later, the variable numberMsgs is used by the program to count the number of messages processed that have been processed.  Since we must retrieve the number of messages on the server only once at the beginning of the session, we execute this code only when the value of numberMsgs is zero.

Issue a STAT command

The code in Listing 15 begins by issuing a STAT command, and then getting, validating, and saving the response.  Here is part of what the technical document has to say about the response to the STAT command.
"The POP3 server issues a positive response with a line containing information for the maildrop. This line is called a "drop listing" for that maildrop.

In order to simplify parsing, all POP3 servers required to use a certain format for drop listings. The positive response consists of "+OK" followed by a single space, the number of messages in the maildrop, a single space, and the size of the maildrop in octets."

Get number of messages as a String

Having saved the response to the STAT command, the code in Listing 15 extracts a substring from that string containing the number of messages as a String.

Convert the String to an int

Then the code in Listing 15 invokes the parseInt method of the Integer class to convert the string representing the number of messages to an int.

Referring to a message by its number

Later we will see that messages can be referred to by their message number.
(Note that message numbers begin with 1 and not with 0.)
Retrieve and screen each message

The next step is to retrieve each message from the server and to screen it for SPAM.  Basically this consists of:
  • Retrieving each message from the server
  • Writing that message into a local disk file
  • Passing the disk file to a method belonging to an object of the Screen class where it is screened for SPAM
The screening method returns a boolean value indicating whether or not the message is a candidate for deletion from the server due to a failure to satisfy one of the SPAM rules.

Get the unique ID

Each message is stored on the server with a unique ID.  The unique ID for the message is retrieved first and is used to create a unique file name for storing the message in a local disk file.

Note that the msgCounter variable was initialized to 0 when it was declared in Listing 1.  We will see later that this value is incremented each time a new message is processed.  Because the message numbers start with 1 instead of 0, msgNumber must always be one greater than msgCounter.

The unique ID for a message is obtained from the server by issuing a UIDL command and saving the response.  Listing 16 shows the code used to get, validate, and save the unique ID for the next message.

            msgNumber = msgCounter + 1;

if(msgNumber <= numberMsgs){
outputStream.println(
"UIDL " + msgNumber);
uidl = validateOneLine();

Listing 16

The UIDL command

Here is some of what the technical document has to say about the UIDL command:
"Arguments: a message-number (optionally)  If a message-number is given, it may NOT refer to a message marked as deleted.

Restrictions: may only be given in the TRANSACTION state.

Discussion: If an argument was given and the POP3 server issues a positive response with a line containing information for that message. This line is called a "unique-id listing" for that message.  ... A unique-id listing consists of the message-number of the message, followed by a single space and the unique-id of the message."

No need to parse the response

In this case, I will use the entire response string as a file name and therefore I won't be concerned about parsing the response.
(I'm also not interested in the response produced when the UIDL command is issued without a message number because this program never issues the command without a message number.)
A possible safety upgrade

While writing this lesson, it has occurred to me that a useful safety upgrade would be to:
  • Parse the response to the UIDL command
  • Extract and save the message number
  • Compare that value with the value of msgNumber being maintained internally by this program before sending a DELE command to the server
That would ensure that this program is properly synchronized with the server's view of message numbers before a command is given to delete a message.

Open an output file

The code in Listing 17 uses the unique ID to open an output file in which to save the message.

              String fileName =
"c:/MailFiles/" + uidl + ".txt";
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
fileName));

Listing 17

(You may want to modify this code to cause the messages to be stored in a different location on the disk.  If so, modify the string shown in blue in Listing 17.  Make certain that the folder where you plan to save the files exists before running the program.)
The code in Listing 17 is straightforward and shouldn't require further explanation.  If you are unfamiliar with code like this, see the tutorials on file I/O at www.DickBaldwin.com.

Begin the message retrieval process

Listing 18 issues a RETR command to begin the message retrieval process, and then validates the response.

              outputStream.println(
"RETR " + msgNumber);

String retrResponse =
validateOneLine();

Listing 18

Note that the RETR command specifies a particular message based on its message number.

Response to the RETR command

Figure 7 shows a typical response produced by my Email server to the receipt of a RETR command.
 
+OK 1818 octets
Figure 7

The RETR command

Here is some of what the technical document has to say about the RETR command:
"Arguments: a message-number (required) which may not refer to a message marked as deleted.

Discussion: If the POP3 server issues a positive response, then the response given is multi-line. After the initial +OK, the POP3 server sends the message corresponding to the given message-number, being careful to byte-stuff the termination character (as with all multi-line responses)."

What is meant by byte-stuffing?

Here is part of what the technical document has to say about multi-line responses and byte-stuffing.
"Responses to certain commands are multi-line. In these cases, ... after sending the first line of the response and a CRLF, any additional lines are sent, each terminated by a CRLF pair. When all lines of the response have been sent, a final line is sent, consisting of a termination octet (decimal code 046, ".") and a CRLF pair. If any line of the multi-line response begins with the termination octet, the line is "byte-stuffed" by pre-pending the termination octet to that line of the response."
In other words, a message is terminated by a line that has a period as the first character followed immediately by a CRLF pair.  If the first character of a normal line begins with a period, byte-stuffing is used to deal with that situation.

Didn't strip any bytes

In the event that a line in the message begins with a period, then it will begin with two periods after byte-stuffing takes place on the server.

Since having two periods at the beginning of the line is unlikely to have a detrimental impact on the screening process, I didn't bother to strip any bytes that may have been prepended onto the line by the server during byte-stuffing.

However, you may want to upgrade the program to cause it to deal more correctly with this situation if you consider it to be a problem.

Clear the text area

The code in Listing 19 clears the text area at the beginning of each message.  If you don't do this, the string contained in the text area will become very long and the program will run slowly as a result.

              textArea.setText("");

Listing 19

Read first line and insert stars

The code in Listing 20 reads the first line of the message from the server.  Then it invokes the method named insertStars to insert asterisks on ten-character intervals in the text.

              //Read first line of message
String msgLine =
inputStream.readLine();

//Insert asterisks
msgLine = insertStars(msgLine);

Listing 20

There is a possibility of retrieving a message that contains executable virus code.  My purpose in inserting an asterisk every ten characters is to break up the byte pattern and hopefully to corrupt any executable virus code that may be contained in the byte stream before writing those bytes to in a local disk file.

The insertStars method

At this point, I will set the discussion of the constructor aside and present the method named insertStars, which is shown in Listing 21.

The code in this method is straightforward and should not require further explanation.

  private String insertStars(String stringIn){
StringBuffer stringBuffer =
new StringBuffer(stringIn);
int length = stringBuffer.length();
for(int cnt = 9; cnt < length; cnt+=10){
stringBuffer.insert(cnt,'*');
}//end for loop
return new String(stringBuffer);
}//end insertStars

Listing 21

Read and save all lines of message

Returning now to the discussion of the constructor, the code in Listing 22 continues reading lines of text from the server, inserting stars, and writing those lines of text into the output file until a line is received that contains a single period.

              while(!(msgLine.equals("."))){
dataOut.writeBytes(
msgLine + "\n");
msgLine = inputStream.readLine();
msgLine = insertStars(msgLine);
}//end while

//Close the output file.
dataOut.close();

Listing 22

Newline characters are written at the end of each line of text when it is written into the output file.

Display messages for the user

It is almost time to pass the file containing the message to the screening method to allow it to screen for SPAM.  Before doing that, however, the code in Listing 23 writes messages in the text fields and text area of Figure 1 to let the user know what is happening.

              fromField.setText("Call screener");
subjField.setText("Call screener");
operMsgField.setText(
"Call screener");
textArea.setText(
"Progress Meter: ");

Listing 23

The progress indicator

Occasionally a very long message is received that requires a perceptible amount of time for screening.  When that happens (with the version of the Screen class that will be discussed in the next lesson), the screening method writes a stream of periods into the text area to let the user know that the system is actually working on a message and isn't simply hung up.  Hence the words "Progress Meter" are placed in the text area in Listing 23 to tell the user what that stream of periods indicates.
(The stripped-down version of the Screen method that I will discuss in this lesson does not provide this type of visual feedback.)
Information from the screening method

Several different pieces of information need to be returned from the screening method.  However, in Java, a method can return only one value.  To accommodate this, an empty object instantiated from the ScreenResult class is passed as a parameter to the screening method.  The code in the screening method populates the fields in that object so as to make the information available upon return.

The ScreenResult class

At this point, I will set the discussion of the constructor aside and show you the ScreenResult class in Listing 24.

class ScreenResult{
public String subject = "";
public String from = "";
public String thePhrase = "";
public String text = "";
}//end ScreenResults

Listing 24

As you can see, this is a very simple class, an object of which exists solely as a place to store four strings that are populated by the screening method for later use by the calling method.

Screen the file for SPAM

Returning now to the constructor, the code in Listing 25:
  • Declares a local variable named match and initializes it to false.
  • Instantiates a new empty object of the ScreenResult class.
  • Invokes the screenMsg method belonging to an object of the Screen class, passing the name of the disk file containing the message, the unique identifier for the message, and the empty ScreenResult object as parameters, and storing the returned value in the variable named match.

              boolean match = false;

ScreenResult theResult =
new ScreenResult();
match = screener.screenMsg(
fileName,uidl,theResult);

Listing 25

Upon further reflection

Frequently when I write a lesson explaining code that I have written, I realize that there are sections of code that I would write differently if I had it to do over again.  That is the case here.

In this case, if I were to rewrite this program, I would upgrade the definition of the ScreenResult class to include an additional field of type boolean named match.

Then I would require the screenMsg method of the Screen class to return a reference to a populated object of type ScreenResult instead of returning type boolean.  I would eliminate the ScreenResult parameter from the parameter list of the screenMsg method.

Then I would cause the code in the calling method to accommodate those changes and to extract the value of match from the object returned by the
screenMsg instead of dealing with match separately as is the case in Listing 25.

In my opinion, this would result in a somewhat cleaner user interface.  However, at this point, I am too far down the road to turn back, so I will just leave the program as it is.  I may upgrade it sometime in the future to implement this improvement.

Designing your own SPAM screening algorithm


Should you decide to design your own screening algorithm, this is where you would connect your algorithm to the communication module.  In other words, your version of the method named screenMsg should return true if it is recommending that the message be deleted from the server. Also, the object of type ScreenResult passed as a parameter to the method should be populated with information to be displayed in the text fields and the text area of the GUI shown in Figure 1.

You may or may not decide to make callbacks on the communication module to support the progress indicator while your method is working.

Display the results of the screening process

Listing 26 displays the information that was encapsulated in the ScreenResult object by the screening method in the text fields and text area of Figure 1.

              fromField.setText(theResult.from);
subjField.setText(
theResult.subject);
operMsgField.setText(
"Offending Phrase: "
+ theResult.thePhrase);
textArea.setText(theResult.text);

//Scroll the text area to the end
textArea.select(
theResult.text.length()-2,
theResult.text.length()-1);

Listing 26

The code in Listing 26 is straightforward and shouldn't require further explanation.

Information available to the user

At this point, the user can view:
  • The contents of the From line of the message
  • The contents of the Subject line of the message
  • The complete raw text of the message down to the line containing the offending word or phrase, if any
  • The offending word or phrase, if any
If the screening method returned true, this information will remain on the screen for the user to ponder.  However, if the screening method returned false, it will disappear from the screen very quickly, and probably won't even be seen by the user.

Increment the message counter

Listing 27 increments the message counter in preparation for processing the next message.

              msgCounter++;

Listing 27

A candidate for deletion from the server

A return value of true from the screenMsg method means that the screening method is recommending that the message be deleted from the server.

Listing 28 shows the behavior of the actionPerformed method registered on the Start/Next button under this circumstance.

              if(match == true){
return;
}//end if match == true

Listing 28

Wait for further action by the user

The message has been identified as a candidate for deletion from the server.  The actionPerformed method simply returns with the information described above showing in the text fields and text area of Figure 1.  The user can view this information while deciding what to do next.  Nothing further will happen in the program until the user presses either the Delete button or the Start/Next button.

Pressing the Delete button

If the user presses the Delete button in Figure 1, the message will be deleted from the server.  I will explain exactly how this happens later when I discuss the ActionListener object that will be registered on the Delete button.

Pressing the Start/Next button

If the user presses the Start/Next button in Figure 1, the message will not be deleted from the server, the actionPerformed method belonging to the ActionListener object registered on that button will be executed, and the next message on the server will be retrieved and screened for SPAM.

Message is not a candidate for deletion

If the screenMsg method returns false, the message has not been identified as a candidate for deletion, and control reaches the point in the actionPerformed method shown in Listing 29.

              Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));
}//end if msgNumber <= numberMsgs

Listing 29

At this point, we could require the user to press the Start/Next button to retrieve and screen the next message.  However, in the interest of convenience, we will relieve the user of that responsibility.

Firing a synthetic event

The code in Listing 29 fires an ActionEvent identical to that which would be fired if the user were to press the Start/Next button.  This causes the program to retrieve the next message on the server and to begin the screening process immediately.
(If you are unfamiliar with the concept of posting events in the system event queue, you can learn about that in the tutorial lessons at www.DickBaldwin.com.)
When all messages have been screened ...

Listing 30 shows the completion of the registration of an anonymous ActionListener object on the Start/Next button that was begun in Listing 14.

            else{//msgNumber > numberMsgs
startButton.setEnabled(false);

subjField.setText(
"No more messages, press Close");
fromField.setText(
"No more messages, press Close");
operMsgField.setText(
"No more messages, press Close");
textArea.setText(
"No more messages, press Close");
}//end else

}//end try
catch(Exception ex){
ex.printStackTrace();}
}//end actionPerformed
}//end ActionListener
);//end addActionListener

Listing 30

The code in Listing 30 is executed when all of the messages on the server have been screened.

This code disables the Start/Next button and posts messages instructing the user to press the close button to terminate the program.

Beyond that, the code in Listing 30 simply completes a try/catch block, and wraps up the cryptic code required for the definition of an anonymous class.

An ActionListener on the Delete button

The Delete button shown in Figure 1 is used to cause messages to be deleted from the server.  Listing 31 shows the beginning of the registration of an anonymous ActionListener object on the Delete button.

    deleteButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){

operMsgField.setText("");

Listing 31

The code in Listing 31 simply clears the third text field from the top in Figure 1 when the user presses the Delete button.

Marking messages for deletion from the server


Deletion of a message from the server is accomplished by marking the message for deletion while in the TRANSACTION state. The message is actually deleted later when the client sends a QUIT command to the server causing the server to enter the UPDATE state.
(If the program aborts prematurely before sending a QUIT command, marked messages are not deleted from the server.)
The deletion code

Listing 32 shows the code used to
  • Mark the message for deletion
  • Validate the response
  • Display a deletion message

          outputStream.println(
"DELE " + msgNumber);
textArea.append(
"DELE "+validateOneLine()+"\n");
textArea.append(
"Deleted:" + msgNumber + "\n");

Listing 32

(See the earlier section entitled A possible safety upgrade for a suggestion related to upgrading this program.)
The DELE code has been temporarily disabled

Note that the three corresponding statements in Listing 42 near the end of the lesson have been disabled by marking them as comments.  I did this to keep you from accidentally deleting messages from your server during your early stages of testing this program with your Email server.

You can enable the three statements in Listing 42 by removing the comment indicators.  However, you should not enable them until you are confident that you really do want to delete messages from the server.
(Once a message is deleted from the server, it cannot be recovered from the server.)
A synthetic ActionEvent

The code in Listing 33 fires a synthetic ActionEvent identical to that which would be fired if the user presses the Start/Next button.

          Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));
}//end actionPerformed
}//end ActionListener
);//end addActionListener

Listing 33

Thus, when the user presses the Delete button, the message is marked for deletion on the server and the next message on the server is retrieved immediately for SPAM screening without a requirement for the user to request the next message.

Finish configuring the GUI

The code in Listing 34 finishes configuring the GUI by placing the various components in the Frame, setting its size, and making it visible.

    add(startButton);
add(deleteButton);
add(fromField);
add(subjField);
add(operMsgField);
add(textArea);
setTitle("Copyright 2004, R.G.Baldwin");
setSize(400,400);//modify for larger GUI
setVisible(true);
}//end constructor

Listing 34

As I mentioned earlier, you will probably find the program to be more useful if you increase the width of the Frame to at least 750 pixels and increase the size of the text fields and text area in Listing 13 to be at least 100 characters wide.

That completes the discussion of the class named Pop302.

Stripped-down Screen class

The following sections provide a brief discussion of a stripped-down version of the class named Screen, which you can use to test this program on your system with your Email server.

This stripped-down version of the Screen class doesn't actually do any SPAM screening.  Rather, it populates the ScreenResult object with information from the message and toggles its return value between true and false for each successive message.

My full version of the Screen class implements my SPAM screening algorithm.  I will explain the details of my full Screen class in the next lesson in this series.

A dummy constructor

The definition of the stripped-down Screen class begins in Listing 35.

class Screen{
boolean returnValue = true;

Screen(Pop302 dummy){//dummy constructor
}//end constructor

Listing 35

A dummy constructor is required to satisfy the instantiation of the Screen object in Listing 3.

The screenMsg method

The code in Listing 25 invokes the screenMsg method of an object of the Screen class for the purpose of applying SPAM screening rules to a message stored in a disk file.

The definition of the stripped-down screenMsg method begins in Listing 36.

  public boolean screenMsg(String fileName,
String uidl,ScreenResult theResult){
try{
BufferedReader inData
= new BufferedReader(new FileReader(
fileName));

Listing 36

The code in Listing 36 gets a BufferedReader object that will be used to read the raw text of the message stored in the file whose name was received as a parameter.

Initialize the ScreenResult object

The code in Listing 37 populates three of the fields in the ScreenResult object received as an incoming parameter.  Two of these fields are populated with messages that will be overwritten later if Subject and From data is successfully extracted from the file containing the message.

      theResult.subject = "No Subj line found";
theResult.from = "No From line found";
theResult.thePhrase = "No phrase "
+ "available for test program.";

Listing 37

The text that is stored in the field named thePhrase will not be overwritten later because this stripped-down version knows nothing about offending SPAM word or phrases.

Get the Subject data

Without getting into the details, the code in Listing 38 attempts to extract a text line from the message that begins with "Subject:".  If successful, the data is used to overwrite the contents of the subject field of the ScreenResult object.

      String data;
inData.mark(10000);
while((data = inData.readLine()) != null){
if(data.startsWith("Subject:")){
theResult.subject = data.toUpperCase();
break;
}//end if
}//end while loop

Listing 38

Get the From data

Similarly, the code in Listing 39 attempts to extract a text line from an upper-case version of the message that begins with "From:".  If successful, the data is used to overwrite the contents of the from field of the ScreenResult object.

      inData.reset();
while((data = inData.readLine()) != null){
if(data.toUpperCase().
startsWith("FROM:")){
theResult.from = data;
break;
}//end if data starts with From
}//end while loop on null

Listing 39

Get the entire message

The code in Listing 40 attempts to read the entire message and deposit it in the text field of the ScreenResult object.

      inData.reset();
while((data = inData.readLine()) != null){
theResult.text =
theResult.text + data + "\n";
}//end while loop on read until null
inData.close();
}catch(Exception e){e.printStackTrace();}

Listing 40

Return a boolean value

Finally, the code in Listing 41 returns a boolean value.  This value toggles between true and false as each successive message is processed.  Therefore, it has no meaning insofar as SPAM is concerned.
Notice:  A true return value should not be used to indicate that you should delete a message from the server.

    if(returnValue == false){
returnValue = true;
}else{
returnValue = false;
}//end else
return returnValue;
}//end screenMsg

}//End stripped-down Screen class

Listing 41

This boolean value will be stored in the variable named match in Listing 25, and will be tested in the if statement of Listing 28.

If the return value is true

If the return value is true, the actionPerformed method will return immediately in Listing 28, allowing the user to ponder the data returned by the screenMsg method in deciding whether or not to delete the message from the server.
Once again, let me caution you not to enable the DELE code in Listing 42 near the end of the lesson until you are certain that you actually want to delete messages from the server.  If you do enable it, do not press the Delete button just because this stripped-down version of the screenMsg method returns true.
If the return value is false

If the screenMsg method returns false, the code in Listing 29 immediately fires a synthetic ActionEvent, attributable to the Start/Next button, which cases the next message to be retrieved from the server.

Run the Program

I encourage you to copy the code from Listing 42 into your text editor.  Compile and execute the program.  Experiment with it, making changes, and observing the results of your changes.

You may want to modify this code to cause the messages to be stored in a different location on your disk.  If so, modify the string in the statement in Listing 17 that reads "c:/MailFiles/" + uidl + ".txt" to specify a different folder. Make certain that the folder where you plan to save the files exists before running the program.

(Once again, let me caution you not to enable the DELE code in Listing 42 until you are certain that you actually want to delete messages from the server.  Once a message is deleted from the server, there is no way to recover it from the server.)

Summary

This lesson explains the communications module used to communicate with your Email server, and to remove SPAM messages from the server before they are downloaded into your primary Email client.

The program is designed to allow you to use my SPAM screening algorithm, or to invent your own.  I will present the details of my SPAM screening algorithm in the next lesson in the series.

The version of the program discussed in this lesson has a stripped-down version of a class named Screen. This version of the program makes it possible for you to test the communications module on your system with your Email server without doing any actual screening for SPAM.

The capability to actually delete messages from the server is disabled in the version of the program shown in Listing 42.  You should not enable that capability until you fully understand what you are doing and you are certain that you really do want to delete messages from the server.  Once a message is deleted from the server, it cannot be recovered from the server.

What's Next?

In the next lesson, I will present and explain my version of the class named Screen.  This class contains my version of a SPAM screening algorithm.  You may want to use my version, replace my version with an algorithm of your own, or do some combination of the two.

Complete Program Listing

A complete listing of the program follows in Listing 42.  Note that this listing contains a stripped-down version of the class named Screen.  The full version of the class named Screen will be provided in the next lesson in this series.

Also, the three DELE statements shown in red in Listing 42 have been purposely disabled to prevent you from accidentally deleting messages from your server while testing this program.
Do not enable these three statements until you are ready to actually delete messages from the server.  Once a message is deleted from the server, it cannot be recovered from the server.
Disclaimer of responsibility:  If you elect to use this program you use it at your own risk.  Make absolutely certain that you understand what you are doing before you execute the program.  The author of this program, Richard G. Baldwin, accepts no responsibility for any losses that you may incur as a result of using this program.

/*File Pop302.java Copyright 2004, R.G.Baldwin
Rev 01/01/04

Note: This version has a stripped down class
named Screen. This version allows testing of
the Email server communications without doing
any actual testing for SPAM.

The purpose of this program is to read messages
from a POP3 server, analyze the messages
according to screening rules, and delete those
messages from the server that fail the screening
test. (As written, the program asks the user
to confirm the deletion of each message, but
this confirmation step could easily be removed.)

This version of the program screens on the basis
of key words or phrases in the From line, key
words or phrases in the Subject line, and key
words or phrases in the body text.

A list of friendly Email addresses is used to
screen the From line. Messages that are from
friendly Email addresses are not deleted from
the server and no information about those
messages is saved on the local disk. They are
totally ignored after determining that they were
sent from a friendly Email address.

Different lists of words are used for screening
Subject lines and body text. For example,
ANTIVIRUS is appropriate for screening the
Subject line, but is not appropriate for
screening the body text. The word ANTIVIRUS
often appears legally in the header of Email
messages that have been scanned for viruses by
the server, but also often appears in the Subject
line of SPAM messages.

The common spammer tricks of inserting extra
characters between the characters in the
offending word and mixing the case of the
characters in the offending word is defeated by
this program.

For example, this program will flag for deletion
a message having any of the following in its
Subject line or its body text:

vIaGrA
V.IagRA
V.I.A.G.R.A

This program also defeats the common trick of
appending random characters to the end of the
Subject line, because it doesn't require a match
for the entire Subject line.

When the program detects a message that is a
candidate for deletion, the user is asked to
verify the deletion by clicking the Delete
button. If the user doesn't want to delete the
message, she should click the Start/Next
button.

The following information is available to the
user for making that decision:
- From
- Subject
- Offending line, which may also be the subject
- Offending word or phrase
- Entire raw text of the message up to and
including the offending line

All messages that are candidates for deletion
from the server are saved in an archive folder
on the local disk, regardless of whether the
user elects to delete them from the server. Thus
if a message is deleted from the server and it is
later determined that was a mistake, a raw text
copy of the deleted message is available locally
in the archive folder. You should probably empty
this folder periodically so that it won't fill
up your disk.

Except for friendly messages, all messages that
are not candidates for deletion from the server
are saved in a history folder on the local
disk. These messages can be used later to train
the program to do a better job of recognizing
SPAM.

Before any message is saved in a local file,
asterisks are inserted into the text on
ten-character intervals in an attempt to destroy
any virus code that may be embedded in the
message.

Numerous upgrades are possible. One possible
upgrade is to create a premium list of words and
phrases that will always result in deletion of
the message from the server without prior
approval by the user. For example, the user
might want to have any message containing
VIAGRA to be automatically deleted. However,
great care is urged in this regard. Certain
words such as SPAM and PORN occasionally occur
in a message with the letters separated by only
a few characters. This program would identify
those messages as being candidates for deletion.
For example, the offending word PORN occurs in
the non-offending word imPORtaNt with the letters
R and N separated by only two characters. The
word SLUT appears in the word SoLUTion with only
one character between the S and the L. The word
SPAM often occurs in different variations of
body text.

Another possible upgrade would be to allow the
user to specify the number of characters that may
occur between the letters of an offending word
or phrase. As programmed, that value is
hard-coded into the program, and as of this
writing, that value is one.

If the number of characters is set to zero, many
spam messages will avoid detection. If that
value is set to a large number, many false alarms
will occur. Therefore, care should be taken when
adjusting this value.

Another possible modification would be to allow
the program to automatically delete all
messages that are determined to be candidates
for deletion. Since these messages are saved
locally in an archive folder, a separate program
could be written to allow the user to review
those messages locally at her convenience just
in case a valid message was inadvertently
deleted from the server.

Companion programs that I have written provide
for creating and maintaining the word lists
discussed above in disk files. These programs
are used to analyze the non-deleted message files
saved locally in the history folder in order to
train this program to do a better job of
identifying SPAM messages in the future. These
programs are designed for ease of use to
encourage the user to train the program
frequently.

All three word lists are maintained in simple
text files, which can be edited with an
ordinary text editor if need be.

For technical information on POP3, see RFC 1725
at
http://www.cis.ohio-state.edu/htbin/rfc/rfc1725.
html

A POP3 Command Summary follows based on the
information at that web site.

Minimal POP3 Commands:
USER name
PASS string
QUIT
STAT
LIST [msg]
RETR msg
DELE msg
NOOP
RSET
QUIT

Optional POP3 Commands:
APOP name digest
TOP msg n
UIDL [msg]

POP3 Replies:
+OK
-ERR

File names: The following file names are hard-
coded into the program:

The file name for a local copy of a message is
the unique identifier for that message obtained
from the mail server.

Pop302a.txt - contains a word list for screening
the Subject lines.

Pop302b.txt - contains a word list for screening
the body text lines.

Pop302c.txt - contains a list of friendly Email
addresses for screening the From lines to
identify friendly messages.

This program consists of two main classes. An
object of the class named Pop302 handles all
communications with the Pop3 server.

An object of the class named Screen screens each
message in an attempt to identify SPAM. This
class can be totally replaced by Java programmers
who wish to design their own screening algorithm
provided they maintain the interface with the
object of the class named Pop302.

Tested using SDK 1.4.2 under WinXP
************************************************/

import java.net.*;
import java.io.*;
import java.util.*;
import java.awt.*;
import java.awt.event.*;

class Pop302 extends Frame{
int msgCounter = 0;
int msgNumber;
TextArea textArea;
TextField subjField;
TextField fromField;
TextField operMsgField;
int numberMsgs = 0;
String uidl = "";//unique msg ID
BufferedReader inputStream;
PrintWriter outputStream;
Socket socket;
Screen screener;

public static void main(String[] args){
if(args.length != 3){
System.out.println("Usage: java Pop301 "
+ "server userName password");
System.exit(0);
}//end if

new Pop302(args[0],args[1],args[2]);
}//end main
//===========================================//

Pop302(String server,String userName,
String password){
//Instantiate a new Screen object and pass
// this to allow for the object to call back
// and update the progress indicator.
screener = new Screen(this);

int port = 110; //pop3 mail port
try{
//Get a socket, connected to the
// specified server on the specified
// port.
socket = new Socket(server,port);

//Get an input stream from the socket
inputStream = new BufferedReader(
new InputStreamReader(
socket.getInputStream()));

//Get an output stream to the socket.
// Note that this stream will autoflush.
outputStream = new PrintWriter(
new OutputStreamWriter(
socket.getOutputStream()),true);

//Display the msg received from the
// server on the command-line screen
// immediately following connection.
String connectMsg = validateOneLine();
System.out.println("Connected to server "
+ connectMsg);

//The communication process is now in the
// AUTHORIZATION state. Send the user
// name and password to the server. Note
// that the use of an APOP command
// for sending user name and password
// would probably be more secure
// if it is supported by the server.
// However, my server apparently doesn't
// support APOP.
//Commands are sent in plain text, upper
// case to the server. Some commands
// require an argument following the
// command, as is the case with USER.
//Send the command.
outputStream.println("USER " + userName);
//Get response and confirm that the
// response was +OK and was not -ERR.
String userResponse = validateOneLine();
//Display the response on the command-
// line screen. Cannot display in the
// GUI at this point in time because the
// GUI object is not ready for use at
// this point in the execution of the
// constructor.
System.out.println("USER " + userResponse);
//Send the password to the server
outputStream.println("PASS " + password);
//Validate the server's response as +OK.
// Display the response in the process.
System.out.println(
"PASS " + validateOneLine());
}catch(Exception e){e.printStackTrace();}

//Register a window listener to service
// the close button on the Frame. This is
// an anonymous class defiition.
this.addWindowListener(
new WindowAdapter(){
public void windowClosing(WindowEvent e){

//Terminate the session with the
// server.
outputStream.println("QUIT");
String quitResponse =
validateOneLine();
//Display the response on the
// command-line screen.
System.out.println(
"QUIT " + quitResponse);
//Also display the response on the
// GUI. However, you probably won't
// see it because the GUI is
// closing.
textArea.append(quitResponse + "\n");

//Server is now in the UPDATE mode.
// It will delete all files marked
// with the DELE command earlier
// in the execution of the program.
//Close the socket
try{
socket.close();
}catch(Exception ex){
ex.printStackTrace();}

System.exit(0);
}//end windowClosing
}//end WindowAdapter()
);//end addWindowListener

//Note, this GUI was purposely made narrow
// in order to make it fit into the
// publication format. You should make
// it wider and also increase the width of
// the text fields and the TextArea defined
// below to make it more useful.
setLayout(new FlowLayout());
//Note that the compiler requires the
// references to the following buttons to
// be final because they are accessed from
// within an anonymous class definition.
final Button startButton =
new Button("Start/Next");
final Button deleteButton =
new Button("Delete");
subjField = new TextField(
"Display Subj here",50);
fromField = new TextField(
"Display From line here",50);
operMsgField = new TextField(
"Display operator messages here",50);
textArea = new TextArea(15,50);
textArea.append("Display raw data here\n");

//Register an ActionListener on the
// startButton. This is an anonymous
// class definition.
startButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
//Clear the operator message field
operMsgField.setText("");

try{
//The communication process is now
// in the TRANSACTION state.
//Retrive and screen messages
if(numberMsgs == 0){
//Calculate numberMsgs only at
// the beginning of the run,
// because it changes when
// messages are deleted.
outputStream.println("STAT");
String stat = validateOneLine();
//Get the number of messages as
// a String.
String numberMsgsStr =
stat.substring(
4,stat.indexOf(" ",5));
//Convert the String to an int.
numberMsgs = Integer.parseInt(
numberMsgsStr);
}//end if numberMsgs == 0
//NOTE: Msg numbers begin with 1,
// not 0.
//Retrieve and screen each
// message. Each msg ends with a
// period on a new line.
msgNumber = msgCounter + 1;

if(msgNumber <= numberMsgs){
//Process the next message.

//Get and save a unique identifier
// for the message from the server
// and validate the response.
outputStream.println(
"UIDL " + msgNumber);
uidl = validateOneLine();

//Open an output file to save
// the message. Use the UIDL
// as the file name. Others
// may need to modify the
// following code to identify
// a folder for local storage of
// the messages.
String fileName =
"c:/MailFiles/" + uidl +".txt";
DataOutputStream dataOut =
new DataOutputStream(
new FileOutputStream(
fileName));

//Send a RETR command to begin
// the message retrieval process
outputStream.println(
"RETR " + msgNumber);
//Validate the response.
String retrResponse =
validateOneLine();

//Clear the text in the TextArea
// at the beginning of each new
// message. If you don't do
// this, the String being
// displayed will become very
// long and the program will run
// very slowly for large numbers
// of messages.
textArea.setText("");

//Read the first line in the
// message from the server.
String msgLine =
inputStream.readLine();
//Insert asterisks in the text
// in an attempt to destroy
// viruses before the file is
// stored locally.
msgLine = insertStars(msgLine);

//Continue reading lines until
// a "." is encountered as the
// first char in a line. That
// signals the end of the msg.
while(!(msgLine.equals("."))){
//Write the line to the output
// file and read the next
// line. Insert newline
// characters when writing the
// output to the file.
dataOut.writeBytes(
msgLine + "\n");
msgLine = inputStream.readLine();
//Insert asterisks to destroy
// virus code.
msgLine = insertStars(msgLine);
}//end while
//Close the output file. The
// message is now stored in a
// local file with a file name
// based on the unique ID
// provided by the server. Note
// that a unique ID provided by
// one server may duplicate a
// unique server provided by a
// different server.
dataOut.close();

//Now screen the file testing
// for reasons to delete the
// message from the server.
//First initialize the text showing
// in the various components in the
// GUI.
fromField.setText("Call screener");
subjField.setText("Call screener");
operMsgField.setText(
"Call screener");
textArea.setText(
"Progress Meter: ");
//Initialize the match flag
// to false.
boolean match = false;

//Now cause the message file to be
// screened. In the event that you
// decide to design your own
// screening algorithm, this is
// where you you would probably
// make the first modification to
// the program. Your version of
// the method named screenMsg
// should return true if it is
// recommending that the message be
// deleted from the server. Also,
// the object of type ScreenResult
// passed as a parameter to the
// method should be populated with
// information to be displayed in
// the text fields and text area of
// the GUI.
ScreenResult theResult =
new ScreenResult();
match = screener.screenMsg(
fileName,uidl,theResult);

//Now display the information
// encapsulated in the ScreenResult
// object by the screenMsg method.
fromField.setText(theResult.from);
subjField.setText(
theResult.subject);
operMsgField.setText(
"Offending Phrase: "
+ theResult.thePhrase);
textArea.setText(theResult.text);
//Scroll the text area to the end
textArea.select(
theResult.text.length()-2,
theResult.text.length()-1);

//At this point, the user can
// view the From line and the
// Subject line for the message,
// the complete text of the message
// down to the line containing the
// offending word or phrase, as
// well as that word or phrase.

//Increment the message counter
// in preparation for
// processing the next message.
msgCounter++;

//A return value of true means that
// the screener is recommending
// deletion of the message from the
// Email server.
if(match == true){
//The message has been flagged
// as a candidate for deletion
// from the server. Return
// from the ActionPerformed
// method and take no further
// action until the user
// presses the Delete button
// or the Start/Next button.
//Pressing the Delete button
// causes the message to be
// deleted from the server.
//Pressing the Start/Next
// button causes it to be
// preserved.
return;
}//end if match == true

//Control reaches this point only
// if match is not true.
//The messaage is not a
// candidate for deletion from
// the server.
//At this point, we could
// require the user to press
// the Start/Next button to
// process the next message.
//However, we won't do that. The
// following code fires an event
// identical to that which would
// be fired if the user pressed
// the Start/Next button.
Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));
}//end if msgNumber <= numberMsgs
else{//msgNumber > numberMsgs
//No more messages. Disable the
//Start/Next button.
startButton.setEnabled(false);
//Instruct the user to terminate
// the program.
subjField.setText(
"No more messages, press Close");
fromField.setText(
"No more messages, press Close");
operMsgField.setText(
"No more messages, press Close");
textArea.setText(
"No more messages, press Close");
}//end else
}//end try
catch(Exception ex){
ex.printStackTrace();}
}//end actionPerformed
}//end ActionListener
);//end addActionListener

//Register an ActionListener on the Delete
// button to make it possible for the
// user to remove a message from the
// server.
deleteButton.addActionListener(
new ActionListener(){
public void actionPerformed(
ActionEvent e){
//Clear the operator message field
operMsgField.setText("");

//Deletion of a message from the
// server is accomplished by marking
// the message for deletion while in
// the TRANSACTION state. The
// message is actually deleted when
// the client sends a QUIT command
// to the server causing the server
// to enter the UPDATE state. If the
// program aborts prematurely before
// sending a QUIT command, marked
// messages are not deleted from the
// server.
//Mark the message for deletion.
//The following statement has been purposely
// disabled to prevent you from inadvertently
// deleting messages from your Email server. You
// should not enable this statement until you
// are confident that you really do want to
// delete messages from the server.
// outputStream.println(
// "DELE " + msgNumber);

//Validate the response and display
// it on the GUI. You probably won't
// see it on the GUI because of what
// heppens next. The program
// immediately clears the display
// and begins processing the
// next message. If you modify the
// program to eliminate the clearing
// of the display between messages,
// you will see this response.
// textArea.append(
// "DELE "+validateOneLine()+"\n");
// textArea.append(
// "Deleted:" + msgNumber + "\n");

//Create and fire a synthetic event
// that simulates the user pressing
// the Start/Next button. This
// initialtes the processing of the
// next message.
Toolkit.getDefaultToolkit().
getSystemEventQueue().
postEvent(new ActionEvent(
startButton,
ActionEvent.
ACTION_PERFORMED,
"Start/Next"));
}//end actionPerformed
}//end ActionListener
);//end addActionListener

//Configure the GUI by placing the
// various components on it, setting the size
// and making it visible.
add(startButton);
add(deleteButton);
add(fromField);
add(subjField);
add(operMsgField);
add(textArea);
setTitle("Copyright 2004, R.G.Baldwin");
//Increase the following parameters and
// modify the construction parameters for
// the text fields and the text area to
// increase the size of the GUI.
setSize(400,400);
//Make the GUI visible.
setVisible(true);
}//end constructor
//===========================================//

//Validate a one-line response.
//The purpose of this method is to confirm that
// the server returned +OK and not -ERR to the
// previous command.
//If +OK, the method returns the string
// returned by the server.
//If -ERR, the method displays the string
// returned by the server and terminates the
// session.
private String validateOneLine(){
try{
String response = inputStream.readLine();
if(response.startsWith("+OK")){
return response;
}else{
System.out.println(response);
//Terminate the session.
outputStream.println("QUIT");
socket.close();
System.out.println(
"Premature QUIT on -ERR");
System.exit(0);
}//end else
}catch(IOException e){e.printStackTrace();}
//The following return statement is requied
// to satisfy the compiler.
return "Make compiler happy";
}//end validateOneLine()
//===========================================//

//Purpose of this method is to insert an
// asterisk (star) every tenth character in
// order to destroy virus code before it is
// written into the output file. While this
// makes the local version of the message
// harder to read, it does little to reduce its
// usefulness for computer analysis.
private String insertStars(String stringIn){
StringBuffer stringBuffer =
new StringBuffer(stringIn);
int length = stringBuffer.length();
for(int cnt = 9; cnt < length; cnt+=10){
stringBuffer.insert(cnt,'*');
}//end for loop
return new String(stringBuffer);
}//end insertStars
//===========================================//
}//end class Pop302
//=============================================//

//Class to encapsulate screening results. An
// object of this type is passed to the screenMsg
// method where it is populated with the results
// of the screen.
class ScreenResult{
public String subject = "";
public String from = "";
public String thePhrase = "";
public String text = "";
}//end ScreenResults
//=============================================//

//This is a stripped-down version of the class
// named Screen, designed solely to make it
// possible for you to test the class named
// Pop302 on your system with your Email server.
//Each time the method named screenMsg is called,
// it's return value toggles between true and
// false.
class Screen{
boolean returnValue = true;

Screen(Pop302 dummy){//dummy constructor
}//end constructor

public boolean screenMsg(String fileName,
String uidl,ScreenResult theResult){
try{
BufferedReader inData
= new BufferedReader(new FileReader(
fileName));

theResult.subject = "No Subj line found";
theResult.from = "No From line found";
theResult.thePhrase = "No phrase "
+ "available for test program.";

String data;
inData.mark(10000);
while((data = inData.readLine()) != null){
if(data.startsWith("Subject:")){
theResult.subject = data.toUpperCase();
break;
}//end if
}//end while loop

inData.reset();
while((data = inData.readLine()) != null){
if(data.toUpperCase().
startsWith("FROM:")){
theResult.from = data;
break;
}//end if data starts with From
}//end while loop on null

inData.reset();
while((data = inData.readLine()) != null){
theResult.text =
theResult.text + data + "\n";
}//end while loop on read until null
inData.close();
}catch(Exception e){e.printStackTrace();}

//Toggle return value between true and false
if(returnValue == false){
returnValue = true;
}else{
returnValue = false;
}//end else
return returnValue;
}//end screenMsg
//===========================================//
}//End stripped-down Screen class

Listing 42
 

Copyright 2004, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com

-end-
 






Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel