Enlisting Java in the War Against SPAM: The Communications Module
Java Programming Notes # 2150
- Preface
- Preview
- Discussion and Sample Code
- Run the Program
- Summary
- What's Next?
- Complete Program Listing
Preface
The communications module
This lesson explains the communications module used to communicate with your Email server, and to remove SPAM messages from the server.
SPAM screening algorithm
The program is designed to allow you to use my SPAM screening algorithm, or to invent your own. Subsequent lessons will explain the inner workings of my SPAM screening algorithm. You can use my algorithm as a starting point if you decide to invent your own. Those lessons will also explain how the system can be trained to do an increasingly better job of screening SPAM over time.
Viewing tip
You may find it useful to open another copy of this lesson in a separate browser window. That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.
Supplementary material
I recommend that you also study the other lessons in my extensive collection of online Java tutorials. You will find those lessons published at Gamelan.com. However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there. You will find a consolidated index at www.DickBaldwin.com.
Preview
Can you write better SPAM screening
algorithms?
Did you ever think that you might be able to write better SPAM
screening algorithms than those available in the SPAM screening
software that you are now using? If so, this lesson is for you.
Even if that is not the case, like most of us, you are probably
overwhelmed by SPAM
and therefore you may find this lesson interesting.
Remove SPAM from the server
In this lesson, I will show you how to write a Java program
that supplements the SPAM screening software that you are currently
using. This program is used to identify and remove SPAM from your
Email server before it is downloaded into your primary Email client.
Any SPAM that makes it past this program can be further acted upon
by the SPAM screener that is built into your Email client.
The communications module
This series will consist of four lessons. This lesson, which
is the first in the series,
will explain the communications module used to communicate with
your Email server, and to remove SPAM messages from the
server.
As mentioned earlier, the program is designed to allow you to invent
and implement your own
SPAM screening algorithm in addition to, or as an alternative to my
algorithm.
My algorithm and algorithm training programs
The second lesson will explain the inner
workings of my SPAM screening algorithm. My algorithm operates
separately on the Subject line, the From line,
and the body text of each Email message.
The third lesson will explain a companion program designed to make
use of historical data to easily train the algorithm to do a better job
of identifying SPAM based on the Subject of the message.
The fourth lesson will explain another companion program designed to
make use of historical data to easily train the algorithm to do a
better job of identifying SPAM based on the body text of
the message, which includes the From line.
Effectiveness of my algorithm
At this point in time, after about one week of training, my algorithm reliably identifies about ninety percent of all SPAM and allow me to delete it from my Email server before downloading it into my primary Email client. Only time will tell if that percentage improves in the future.
Discussion and Sample Code
The version of the program that I will discuss in this lesson has a stripped-down version of a class named Screen. This version of the program allows for testing the communications module on your system with your Email server without doing any actual screening for SPAM and without deleting any messages from the server.
I will explain the full version of the class named Screen in the next lesson when I explain my algorithm for identifying SPAM.
Purpose of the program
The purpose of this program is to read messages from a POP3 (Post Office Protocol - Version 3) server, to analyze the messages according to a set of screening rules, and to delete those messages from the server that fail the screening test.
(As written, the program asks the user to confirm the deletion of each message from the server, but this confirmation step could easily be removed if you decide to do so.)Key words and phrases
This version of the program screens for SPAM on the basis of key words or phrases in the From line, key words or phrases in the Subject line, and key words or phrases in the body text.
Friendly Email addresses and subjects
A list of friendly Email addresses and friendly subjects is used to screen the From line and the Subject line. Messages that are from friendly Email addresses, and messages that have known good Subject lines are not deleted from the server and no information about those messages is saved on the local disk. They are simply ignored after determining that they are friendly.
Different lists for Subject and body text
Different lists of words and phrases are used for screening Subject lines and body text for SPAM. This is important because the same set of words and phrases can't always be used for both cases.
For example, the word ANTIVIRUS is appropriate for screening the Subject line, but is not appropriate for screening the body text. The word ANTIVIRUS often appears legally in the header of Email messages that have been scanned for viruses by the server, but also often appears in the Subject line of SPAM messages.
Common spammer tricks are defeated
Several common spammer tricks are defeated by my SPAM screening algorithm.
For example, the common spammer trick of inserting extra characters between the characters in an offending word or phrase is defeated. Also, the common trick of mixing the case of the characters in an offending word or phrase is also defeated.
As a specific example, my algorithm will recommend deletion of any message having any of the following in its Subject line or its body text if the word VIAGRA is included in the lists used to screen for SPAM:
vIaGrA
V.IagRA
V.I.A.G.R.A
These two characteristics alone have a significantly positive impact on the effectiveness of training the algorithm to do a better job of identifying SPAM in the future.
My algorithm also defeats the common trick of appending random characters to the end of the Subject line, because it doesn't require a match for the entire Subject line. Rather, it searches for words or phrases internal to the text of the Subject line.
The user interface
Figure 1 shows the GUI through which the user controls the program.

Figure 1 Graphical User Interface
(Note that this GUI was purposely made narrow in order to cause it to fit into this narrow publication format. I recommend that you increase the width of the Frame to at least 750 pixels, and increase the width of the TextField and TextArea objects to at least 100 characters each.)The Offending Phrase
When the program identifies a message that is a candidate for deletion, the reason for that recommendation is shown in the third text field from the top in Figure 1.
(An actual SPAM message is being displayed in the GUI in Figure 1, but the stripped-down version of the class named Screen was being used, so no Offending Phrase is shown in Figure 1.)Deleting a message from the server
The user confirms that the message should be deleted from the Server by clicking the Delete button in Figure 1. If the user doesn't want to delete the message, she should click the Start/Next button instead.
(Note that the capability to actually delete messages from the server was disabled in the program shown in Listing 42 near the end of this lesson. Make certain that you are ready to actually delete messages from the server before re-enabling that capability.)The Netscape approach to SPAM screening
I currently use Netscape version 7.1 as my Email client. Basically, it provides two forms of SPAM screening. One form, which is referred to as Junk Mail Controls, is apparently based on some sort of artificial intelligence. This capability can be trained over time to identify the kinds of messages that you consider to be junk mail. This capability is very easy to train. However, it produces lots of false positives and is very difficult to un-train when that happens. (I will have more to say about false positives later.)
The other form of SPAM screening used by Netscape 7.1 is referred to as Message Filters. This approach depends on exact character matching in the subject, the body, or in other parts of the message, such as sender, date, priority, etc.
In this case, you must enter the exact words or phrases into a form that will be used for matching purposes. This approach is practically useless for SPAM screening due to the tendency of spammers to insert random characters into the offending words and phrases and to randomly modify the case of the characters in offending words and phrases. Also, the process of entering the words and phrases into the form is very tedious and time consuming. I long ago gave up on using Netscape's Message Filters for SPAM filtering.
False positives
All SPAM screening algorithms are subject to reporting false positives to some degree. That is to say, a message may be erroneously identified as SPAM when it is actually a good message.
One of the major problems with my Netscape 7.1 system results from false positives. Because of the high rate of false positives produced by Junk Mail Controls, whenever a message is identified as SPAM, I must confirm that it is SPAM before deleting it. At that point in time, unless I am willing to actually open the message and to be confronted with a variety of offensive images and other offensive material, I must make my decision solely on the basis of the subject and the from address information. Often this is not sufficient information to make an informed decision and I have no choice but to open the message.
Also, as I mentioned earlier, when Junk Mail Controls does report a false positive, there is no definitive way to make certain that it doesn't happen again in the future. It is necessary to un-train the algorithm regarding messages of that type, which can be a long process, possibly involving many similar occurrences in the future.
More information is available with my system
When a user of my system is required to confirm deletion of a message from the server, the following information is available to assist in the making of the decision:
- From line
- Subject line
- Offending line of text, which may or may not be the subject
- Offending word or phrase in the offending line of text
- Entire raw text of the message down to and including the offending line
Having viewed the above information, if the user is still unable to make an informed decision to delete the message, the user still has the option to let the message pass through and be downloaded into the primary Email client. Once having viewed the message later in the primary Email client, the user still has the option of updating the offending word lists in my system with IP addresses, URLs, etc, so that deletion decisions on future similar messages will be easier to make.
Saved in local archive folder
The raw text of all messages that are identified as candidates for deletion from the server are saved in an archive folder on the local disk, regardless of whether the user elects to delete them from the server or not. Thus if a message is deleted from the server and it is later determined that was a mistake, a raw text copy of the deleted message is available locally in the archive folder.
(You should probably empty this folder periodically so that it won't fill up your disk.)Saved in history folder
Except for messages from friendly Email addresses or messages with friendly Subject lines, all messages that are not identified as candidates for deletion from the server are saved in a history folder on the local disk. These messages are used later to train the algorithm to do a better job of identifying SPAM in the future. I will explain this process in Part 3 and Part 4 of this series of lessons.
Protection against viruses
Before any message is saved in a local file, asterisks are inserted into the text on ten-character intervals in an attempt to destroy any virus code that may be embedded in the message.
If a message makes it through the screen and is later identified as having a virus as an attachment, a series of ten or more bytes can be extracted from the virus code and added to the word list as an offending phrase. This will cause any future messages having that same virus code as an attachment to be identified as a candidate for deletion from the server.
Possible upgrades
Numerous upgrades to my system are possible and I'm confident that you will have ideas that I haven't thought of. If so, I would like to hear about them.
One possible upgrade would be to create a premium list of words and phrases that will always result in automatic deletion of the message from the server without prior confirmation by the user. For example, the user might want to have any message containing the word VIAGRA to be automatically deleted.
Be careful with this
However, care is urged in this regard. Certain words such as SPAM and PORN occasionally occur in a message with the letters separated by only a few characters. Depending on the degree of separation, my algorithm may identify those messages as being candidates for deletion.
For example, the offending word PORN occurs in the non-offending word imPORtaNt with the letters R and N separated by only two characters. The word SLUT appears in the word SoLUTion with only one character between the S and the L. The word SPAM often occurs in different variations of body text.
If such a premium word list is used for automatic deletion, it should probably be restricted to only those situations where the characters is the word exactly match (except for case) a word in the subject or the body of the message with no intervening characters separating the characters in the message. Experience shows, however that very few matches would be made on this basis, so it may not be worth the effort.
Number of separation characters
Another possible upgrade would be to allow the user to specify the number of characters that may occur between the letters of an offending word or phrase in the message.
That value is currently hard-coded into the program. As of this writing, that value is set to one for screening against offending words or phrases. The value is set to zero when testing for friendly Email addresses in the From line and known good data in the Subject line.
If the number of characters is set to zero, many spam messages with offending words or phrases will avoid detection. If that value is set to a large number, many false positives will occur. Therefore, care should be taken when adjusting this value.
Automatic deletion of all SPAM candidates
For the brave among us, another possible modification would be to allow the program to automatically delete all messages that are determined to be candidates for deletion.
Since a text version of each of these messages is saved locally in an archive folder, a separate program could be written to allow the user to review those messages locally at her convenience, just in case a valid message was inadvertently deleted from the server.
Training programs
Companion programs that I have written provide for maintaining and upgrading the offending word and phrase lists. These lists are saved in local text files.
These training programs are used to analyze the non-deleted message files saved locally in the history folder in order to train the algorithm to do a better job of identifying SPAM messages in the future.
These programs are designed for extreme ease of use to encourage the user to train the algorithm frequently. The better the algorithm is trained, the better it will perform.
I will explain these training programs in Part 3 and Part 4 of this series of lessons.
Simple text files
All three word lists are maintained in local text files, which can be created and edited with an ordinary text editor if need be. Thus, if some corruption gets into one of the word lists, it is easy to correct the situation using an ordinary text editor.
Technical information on POP3 protocol
For technical information on the POP3 protocol, see http://www.cis.ohio-state.edu/htbin/rfc/rfc1725.html. I will frequently refer to this document as the technical document in the discussion that follows.
Command summary
A POP3 command summary based on the technical document is shown in Figure 2.
Minimal POP3 Commands: Figure 2 |
This program uses the commands that are highlighted in red in Figure 2. I will explain those commands in conjunction with the code that uses them.
File names
The following file names are hard-coded into the program. You may want to change these file names for your version of the program.
- Local copy - the file name for a local copy of each message is based on the unique identifier for that message (UIDL) obtained from the mail server.
- Pop302a.txt - contains a word list for screening the Subject lines for offensive words and phrases.
- Pop302b.txt - contains a word list for screening the body text lines for offensive words and phrases.
- Pop302c.txt - contains a list of friendly Email addresses and subjects for screening the From and Subject lines to identify friendly messages.
This program consists of two main classes and one minor class. An object of the class named Pop302 handles all communications with the POP3 server.
A method belonging to an object of the class named Screen is used to screen each message in an attempt to identify SPAM.
This class can be totally replaced by Java programmers who wish to design their own screening algorithm provided that they maintain the interface with the object of the class named Pop302.
An object of a very simple class named ScreenResult is used as a wrapper to return several items of information from the screening method.
Testing
The program was tested using SDK 1.4.2 under WinXP in conjunction with two different POP3 Email servers.
The class named Pop302
As mentioned earlier, an object of the class named Pop302 handles all communications with the Email server, including the deletion of messages from the server. An object of the class named Screen applies screening rules in an attempt to identify SPAM.
Stripped-down version of the Screen class
I will explain the class named Pop302 in this lesson, and will explain the class named Screen in the next lesson.
However, I will provide a stripped-down version of the Screen class in this lesson. You can use the stripped-down version to test Pop302 on your system with your Email server, but no actual screening for SPAM will take place.
Will discuss in fragments
I will discuss the program in fragments. A complete listing of the program is provided in Listing 42 near the end of the lesson. You should be able to copy and paste that listing into your Java IDE to compile and test the program on your system.
Instance variables
The Pop302 class begins in Listing 1 with the declaration of several instance variables. The purpose of these variables will become clear when I discuss them in conjunction with their use.
class Pop302 extends Frame{ |
The main method
The main method is shown in its entirety in Listing 2.
public static void main(String[] args){ |
- server
- user name
- password
The constructor
The Pop302 class consists mainly of the constructor plus a couple of helper methods. The constructor code begins in Listing 3.
Pop302(String server,String userName, |
Code in the Screen class uses this reference later to display a progress indicator in the third text field in Figure 1.
(Note that the stripped-down version of the Screen class discussed in this lesson doesn't display the progress indicator. You will have to wait until the next lesson to see that code.)Get a socket
The code in Listing 4 instantiates a new Socket object on the standard port for POP3 servers.
int port = 110; //pop3 mail port |
If the attempt to make the connection fails, the program will throw an exception. For example, if the value of server is invalid, the program will throw an UnknownHostException.
(If you are unfamiliar with socket programming in Java, see the lessons beginning with number 550 at www.DickBaldwin.com.)Ready to communicate
At this point, the Email server is ready to communicate using the POP3 protocol. In order to communicate, the program must be able to send messages to the server and read messages that are sent from the server.
Input and output streams
The code in Listing 5 gets input and output streams on the Socket object that make it possible to send messages to the server and to read messages sent from the server.
inputStream = new BufferedReader( |
Basic POP3 operation
The following is a quotation from the technical document referred to earlier:
"Initially, the server host starts the POP3 service by listening on TCP port 110. When a client host wishes to make use of the service, it establishes a TCP connection with the server host. When the connection is established, the POP3 server sends a greeting. The client and POP3 server then exchange commands and responses (respectively) until the connection is closed or aborted."The document goes on to explain:
"Commands in the POP3 consist of a keyword, possibly followed by one or more arguments. All commands are terminated by a CRLF pair. Keywords and arguments consist of printable ASCII characters. Keywords and arguments are each separated by a single SPACE character. Keywords are three or four characters long. Each argument may be up to 40 characters long."Finally, the document tells us:
"Responses in the POP3 consist of a status indicator and a keyword possibly followed by additional information. All responses are terminated by a CRLF pair. There are currently two status indicators: positive ("+OK") and negative ("-ERR")."The greeting
That brings us to the greeting mentioned above.
The code in Listing 6 gets and displays the greeting received from the Email server. In the process, the code in Listing 6 invokes the method named validateOneLine to confirm that the message received from the Email server begins with +OK, and not with -ERR.
String connectMsg = validateOneLine(); |
(If the response begins with -ERR, the program terminates the communication session with the server, prints an error message, and terminates.)The validateOneLine method
The code in Listing 6 invokes the method named validateOneLine to get and validate the message sent by the server. At this point, I am going to set the discussion of the constructor aside for a moment and discuss the method named validateOneLine.
The validateOneLine method begins in Listing 7.
private String validateOneLine(){ |
If -ERR is received
If the received line of text does not begin with +OK, it must begin with -ERR, which is the only other possibility allowed by the protocol.
Listing 8 shows the behavior of the validateOneLine method when the received line of text does not begin with +OK.
else{ |
- Displays the line of text that was received.
- Sends a QUIT command to the server to terminate the session.
- Closes the socket.
- Prints an error message.
- Terminates the program.
The greeting
The greeting sent by one of my Email servers is shown in Figure 3.
+OK POP3 server1.yohance.com v2001.78rh |
(The actual text in the greeting will vary from one Email server to the next.The AUTHORIZATION state
Note that I manually inserted a line break immediately following 78rh in Figure 3 to force the greeting to fit in this narrow publication format.)
The following is a quotation from the technical document mentioned earlier:
"A POP3 session progresses through a number of states during its lifetime. Once the TCP connection has been opened and the POP3 server has sent the greeting, the session enters the AUTHORIZATION state. In this state, the client must identify itself to the POP3 server."Returning to the constructor
At this point, the greeting has been received, and the POP3 session is in the AUTHORIZATION state. It is now time for the program to send the user name and the password to the server.
Commands are sent in plain text, upper case to the server. Some commands require an argument following the command, as is the case with the USER command shown in Listing 9.
//Send the command |
USER +OK User name accepted, password please Figure 4 |
The APOP command
There is an optional APOP command, which allows the user name and password to be encrypted before being sent to the server. The use of the APOP command would be more secure than the approach shown in Listing 9 and Listing 10. However, this command is not supported by all Email servers, and apparently is not supported by my server.
Send the password
The code in Listing 10 sends the password, validates the response, and displays the response.
//Send the password to the server |
PASS +OK Mailbox open, 7 messages Figure 5 |
(Obviously the number of messages available will vary from one run to the next.)The TRANSACTION state
Returning now to the technical document, we find:
"... the client must identify itself to the POP3 server. Once the client has successfully done this, the server acquires resources associated with the client's maildrop, and the session enters the TRANSACTION state. In this state, the client requests actions on the part of the POP3 server."Having received the +OK response shown in Figure 5, our POP3 session is now in the TRANSACTION state.
The QUIT command and the UPDATE state
We find the following information in the technical document:
"When the client has issued the QUIT command, the session enters the UPDATE state. In this state, the POP3 server releases any resources acquired during the TRANSACTION state and says goodbye. The TCP connection is then closed."Terminating the POP3 session
We are still discussing the constructor. Listing 11 shows the code used to register a WindowListener object on the close button on the Frame. The purpose of this listener is to terminate the POP3 session and to terminate the program when the user presses the close button.
this.addWindowListener( |
(Note that the code in Listing 11 is an anonymous class definition. If you are unfamiliar with anonymous class definitions in Java, you can learn about them by studying the tutorial lessons at www.DickBaldwin.com.)The windowClosing method
By defining the windowClosing method in the anonymous class, the code in Listing 11:
- Sends a QUIT command to the server.
- Validates and displays the response.
- Closes the socket.
- Terminates the program
In addition to displaying the response on the command-line screen, the code in Listing 11 also displays it in the large text area in Figure 1. However, you will have to look very quickly to see it there before the GUI disappears.
The response provided by my server is shown in Figure 6.
QUIT +OK Sayonara Figure 6 |
The UPDATE state
At this point, the POP3 session is in the UPDATE state. Among other things, this means that the server will delete all of the messages that were marked for deletion by the DELE command while the session was in the TRANSACTION state.
Here is some of what the technical document has to say about the UPDATE state:
"When the client issues the QUIT command from the TRANSACTION state, the POP3 session enters the UPDATE state. (Note that if the client issues the QUIT command from the AUTHORIZATION state, the POP3 session terminates but does NOT enter the UPDATE state.)Defining the GUI
If a session terminates for some reason other than a client-issued QUIT command, the POP3 session does NOT enter the UPDATE state and MUST not remove any messages from the maildrop.
The POP3 server removes all messages marked as deleted from the maildrop. It then releases any exclusive-access lock on the maildrop and replies as to the status of these operations. The TCP connection is then closed."
Note that the GUI shown in Figure 1 was purposely made narrow so that it would fit into this narrow publication format. However, it is much more useful if it is wide enough to display each text line in the message in its entirety without a requirement for horizontal scrolling. Therefore, I recommend that you resize the GUI to make it at least 750 pixels wide. I also recommend that you make each of the text fields and the text area at least 100 characters wide.
Set the layout
Listing 12 sets the GUI layout to FlowLayout. Although this isn't very fancy, it works pretty well in this case.
setLayout(new FlowLayout()); |
Listing 13 constructs the two buttons, the three text fields, and the text area shown in Figure 1.
final Button startButton = |
In order to preserve real estate on the screen, I did not provide labels to identify the text fields in Figure 1. Rather, when the text fields are instantiated, the initial text showing in each text field indicates its purpose. For example, the initial text that appears in the topmost text field is "Display From line here."
The last statement in Listing 13 also displays the purpose of the text area in the text area when it first appears on the screen.
Not yet added to the GUI
Note that at this point, the GUI components have been constructed, but have not yet been placed in the GUI. This will be taken care of later.
References to buttons are final
Note also that it is necessary to declare the references to the two Button objects to be final, because they are accessed later from within an anonymous class definition. Local and anonymous classes can access local variables only if they are declared final.
ActionListener on the Start/Next button
Listing 14 shows the beginning of the registration of an anonymous ActionListener object on the Start/Next button shown in Figure 1.
startButton.addActionListener( |
Retrieve and screen messages for SPAM
As mentioned earlier, the POP3 session is now in the TRANSACTION state. The code in Listing 15 begins the process of retrieving all the messages currently on the server and screening those messages for SPAM.
The number of messages on the server
One of the first things that we need to know is how many messages are currently in the dropbox on the server. The code in Listing 15 sends a STAT command to the server to get this information.
try{ |
As the session progresses and DELE commands are sent to the server, messages are marked for deletion. Once a message is marked for deletion, it is no longer included in the count of messages on the server. Therefore, we must make certain that we obtain the number of messages on the server only at the beginning of the session.
As you will see later, the variable numberMsgs is used by the program to count the number of messages processed that have been processed. Since we must retrieve the number of messages on the server only once at the beginning of the session, we execute this code only when the value of numberMsgs is zero.
Issue a STAT command
The code in Listing 15 begins by issuing a STAT command, and then getting, validating, and saving the response. Here is part of what the technical document has to say about the response to the STAT command.
"The POP3 server issues a positive response with a line containing information for the maildrop. This line is called a "drop listing" for that maildrop.Get number of messages as a StringIn order to simplify parsing, all POP3 servers required to use a certain format for drop listings. The positive response consists of "+OK" followed by a single space, the number of messages in the maildrop, a single space, and the size of the maildrop in octets."
Having saved the response to the STAT command, the code in Listing 15 extracts a substring from that string containing the number of messages as a String.
Convert the String to an int
Then the code in Listing 15 invokes the parseInt method of the Integer class to convert the string representing the number of messages to an int.
Referring to a message by its number
Later we will see that messages can be referred to by their message number.
(Note that message numbers begin with 1 and not with 0.)Retrieve and screen each message
The next step is to retrieve each message from the server and to screen it for SPAM. Basically this consists of:
- Retrieving each message from the server
- Writing that message into a local disk file
- Passing the disk file to a method belonging to an object of the Screen class where it is screened for SPAM
Get the unique ID
Each message is stored on the server with a unique ID. The unique ID for the message is retrieved first and is used to create a unique file name for storing the message in a local disk file.
Note that the msgCounter variable was initialized to 0 when it was declared in Listing 1. We will see later that this value is incremented each time a new message is processed. Because the message numbers start with 1 instead of 0, msgNumber must always be one greater than msgCounter.
The unique ID for a message is obtained from the server by issuing a UIDL command and saving the response. Listing 16 shows the code used to get, validate, and save the unique ID for the next message.
msgNumber = msgCounter + 1; |
Here is some of what the technical document has to say about the UIDL command:
"Arguments: a message-number (optionally) If a message-number is given, it may NOT refer to a message marked as deleted.No need to parse the response
Restrictions: may only be given in the TRANSACTION state.
Discussion: If an argument was given and the POP3 server issues a positive response with a line containing information for that message. This line is called a "unique-id listing" for that message. ... A unique-id listing consists of the message-number of the message, followed by a single space and the unique-id of the message."
In this case, I will use the entire response string as a file name and therefore I won't be concerned about parsing the response.
(I'm also not interested in the response produced when the UIDL command is issued without a message number because this program never issues the command without a message number.)A possible safety upgrade
While writing this lesson, it has occurred to me that a useful safety upgrade would be to:
- Parse the response to the UIDL command
- Extract and save the message number
- Compare that value with the value of msgNumber being
maintained internally by this program before sending a DELE
command to the server
Open an output file
The code in Listing 17 uses the unique ID to open an output file in which to save the message.
String fileName = |
(You may want to modify this code to cause the messages to be stored in a different location on the disk. If so, modify the string shown in blue in Listing 17. Make certain that the folder where you plan to save the files exists before running the program.)The code in Listing 17 is straightforward and shouldn't require further explanation. If you are unfamiliar with code like this, see the tutorials on file I/O at www.DickBaldwin.com.
Begin the message retrieval process
Listing 18 issues a RETR command to begin the message retrieval process, and then validates the response.
outputStream.println( |
Response to the RETR command
Figure 7 shows a typical response produced by my Email server to the receipt of a RETR command.
+OK 1818 octets Figure 7 |
The RETR command
Here is some of what the technical document has to say about the RETR command:
"Arguments: a message-number (required) which may not refer to a message marked as deleted.What is meant by byte-stuffing?Discussion: If the POP3 server issues a positive response, then the response given is multi-line. After the initial +OK, the POP3 server sends the message corresponding to the given message-number, being careful to byte-stuff the termination character (as with all multi-line responses)."
Here is part of what the technical document has to say about multi-line responses and byte-stuffing.
"Responses to certain commands are multi-line. In these cases, ... after sending the first line of the response and a CRLF, any additional lines are sent, each terminated by a CRLF pair. When all lines of the response have been sent, a final line is sent, consisting of a termination octet (decimal code 046, ".") and a CRLF pair. If any line of the multi-line response begins with the termination octet, the line is "byte-stuffed" by pre-pending the termination octet to that line of the response."In other words, a message is terminated by a line that has a period as the first character followed immediately by a CRLF pair. If the first character of a normal line begins with a period, byte-stuffing is used to deal with that situation.
Didn't strip any bytes
In the event that a line in the message begins with a period, then it will begin with two periods after byte-stuffing takes place on the server.
Since having two periods at the beginning of the line is unlikely to have a detrimental impact on the screening process, I didn't bother to strip any bytes that may have been prepended onto the line by the server during byte-stuffing.
However, you may want to upgrade the program to cause it to deal more correctly with this situation if you consider it to be a problem.
Clear the text area
The code in Listing 19 clears the text area at the beginning of each message. If you don't do this, the string contained in the text area will become very long and the program will run slowly as a result.
textArea.setText(""); |
The code in Listing 20 reads the first line of the message from the server. Then it invokes the method named insertStars to insert asterisks on ten-character intervals in the text.
//Read first line of message |
The insertStars method
At this point, I will set the discussion of the constructor aside and present the method named insertStars, which is shown in Listing 21.
The code in this method is straightforward and should not require further explanation.
private String insertStars(String stringIn){ |
Returning now to the discussion of the constructor, the code in Listing 22 continues reading lines of text from the server, inserting stars, and writing those lines of text into the output file until a line is received that contains a single period.
while(!(msgLine.equals("."))){ |
Display messages for the user
It is almost time to pass the file containing the message to the screening method to allow it to screen for SPAM. Before doing that, however, the code in Listing 23 writes messages in the text fields and text area of Figure 1 to let the user know what is happening.
fromField.setText("Call screener"); |
Occasionally a very long message is received that requires a perceptible amount of time for screening. When that happens (with the version of the Screen class that will be discussed in the next lesson), the screening method writes a stream of periods into the text area to let the user know that the system is actually working on a message and isn't simply hung up. Hence the words "Progress Meter" are placed in the text area in Listing 23 to tell the user what that stream of periods indicates.
(The stripped-down version of the Screen method that I will discuss in this lesson does not provide this type of visual feedback.)Information from the screening method
Several different pieces of information need to be returned from the screening method. However, in Java, a method can return only one value. To accommodate this, an empty object instantiated from the ScreenResult class is passed as a parameter to the screening method. The code in the screening method populates the fields in that object so as to make the information available upon return.
The ScreenResult class
At this point, I will set the discussion of the constructor aside and show you the ScreenResult class in Listing 24.
class ScreenResult{ |
Screen the file for SPAM
Returning now to the constructor, the code in Listing 25:
- Declares a local variable named match and initializes it to false.
- Instantiates a new empty object of the ScreenResult class.
- Invokes the screenMsg method belonging to an object of the Screen class, passing the name of the disk file containing the message, the unique identifier for the message, and the empty ScreenResult object as parameters, and storing the returned value in the variable named match.
boolean match = false; |
Frequently when I write a lesson explaining code that I have written, I realize that there are sections of code that I would write differently if I had it to do over again. That is the case here.
In this case, if I were to rewrite this program, I would upgrade the definition of the ScreenResult class to include an additional field of type boolean named match.
Then I would require the screenMsg method of the Screen class to return a reference to a populated object of type ScreenResult instead of returning type boolean. I would eliminate the ScreenResult parameter from the parameter list of the screenMsg method.
Then I would cause the code in the calling method to accommodate those changes and to extract the value of match from the object returned by the screenMsg instead of dealing with match separately as is the case in Listing 25.
In my opinion, this would result in a somewhat cleaner user interface. However, at this point, I am too far down the road to turn back, so I will just leave the program as it is. I may upgrade it sometime in the future to implement this improvement.
Designing your own SPAM screening algorithm
Should you decide to design your own screening algorithm, this is where you would connect your algorithm to the communication module. In other words, your version of the method named screenMsg should return true if it is recommending that the message be deleted from the server. Also, the object of type ScreenResult passed as a parameter to the method should be populated with information to be displayed in the text fields and the text area of the GUI shown in Figure 1.
You may or may not decide to make callbacks on the communication module to support the progress indicator while your method is working.
Display the results of the screening process
Listing 26 displays the information that was encapsulated in the ScreenResult object by the screening method in the text fields and text area of Figure 1.
fromField.setText(theResult.from); |
Information available to the user
At this point, the user can view:
- The contents of the From line of the message
- The contents of the Subject line of the message
- The complete raw text of the message down to the line containing the offending word or phrase, if any
- The offending word or phrase, if any
Increment the message counter
Listing 27 increments the message counter in preparation for processing the next message.
msgCounter++; |
A return value of true from the screenMsg method means that the screening method is recommending that the message be deleted from the server.
Listing 28 shows the behavior of the actionPerformed method registered on the Start/Next button under this circumstance.
if(match == true){ |
The message has been identified as a candidate for deletion from the server. The actionPerformed method simply returns with the information described above showing in the text fields and text area of Figure 1. The user can view this information while deciding what to do next. Nothing further will happen in the program until the user presses either the Delete button or the Start/Next button.
Pressing the Delete button
If the user presses the Delete button in Figure 1, the message will be deleted from the server. I will explain exactly how this happens later when I discuss the ActionListener object that will be registered on the Delete button.
Pressing the Start/Next button
If the user presses the Start/Next button in Figure 1, the message will not be deleted from the server, the actionPerformed method belonging to the ActionListener object registered on that button will be executed, and the next message on the server will be retrieved and screened for SPAM.
Message is not a candidate for deletion
If the screenMsg method returns false, the message has not been identified as a candidate for deletion, and control reaches the point in the actionPerformed method shown in Listing 29.
Toolkit.getDefaultToolkit(). |
Firing a synthetic event
The code in Listing 29 fires an ActionEvent identical to that which would be fired if the user were to press the Start/Next button. This causes the program to retrieve the next message on the server and to begin the screening process immediately.
(If you are unfamiliar with the concept of posting events in the system event queue, you can learn about that in the tutorial lessons at www.DickBaldwin.com.)When all messages have been screened ...
Listing 30 shows the completion of the registration of an anonymous ActionListener object on the Start/Next button that was begun in Listing 14.
else{//msgNumber > numberMsgs |
This code disables the Start/Next button and posts messages instructing the user to press the close button to terminate the program.
Beyond that, the code in Listing 30 simply completes a try/catch block, and wraps up the cryptic code required for the definition of an anonymous class.
An ActionListener on the Delete button
The Delete button shown in Figure 1 is used to cause messages to be deleted from the server. Listing 31 shows the beginning of the registration of an anonymous ActionListener object on the Delete button.
deleteButton.addActionListener( |
Marking messages for deletion from the server
Deletion of a message from the server is accomplished by marking the message for deletion while in the TRANSACTION state. The message is actually deleted later when the client sends a QUIT command to the server causing the server to enter the UPDATE state.
(If the program aborts prematurely before sending a QUIT command, marked messages are not deleted from the server.)The deletion code
Listing 32 shows the code used to
- Mark the message for deletion
- Validate the response
- Display a deletion message
outputStream.println( |
(See the earlier section entitled A possible safety upgrade for a suggestion related to upgrading this program.)The DELE code has been temporarily disabled
Note that the three corresponding statements in Listing 42 near the end of the lesson have been disabled by marking them as comments. I did this to keep you from accidentally deleting messages from your server during your early stages of testing this program with your Email server.
You can enable the three statements in Listing 42 by removing the comment indicators. However, you should not enable them until you are confident that you really do want to delete messages from the server.
(Once a message is deleted from the server, it cannot be recovered from the server.)A synthetic ActionEvent
The code in Listing 33 fires a synthetic ActionEvent identical to that which would be fired if the user presses the Start/Next button.
Toolkit.getDefaultToolkit(). |
Finish configuring the GUI
The code in Listing 34 finishes configuring the GUI by placing the various components in the Frame, setting its size, and making it visible.
add(startButton); |
That completes the discussion of the class named Pop302.
Stripped-down Screen class
The following sections provide a brief discussion of a stripped-down version of the class named Screen, which you can use to test this program on your system with your Email server.
This stripped-down version of the Screen class doesn't actually do any SPAM screening. Rather, it populates the ScreenResult object with information from the message and toggles its return value between true and false for each successive message.
My full version of the Screen class implements my SPAM screening algorithm. I will explain the details of my full Screen class in the next lesson in this series.
A dummy constructor
The definition of the stripped-down Screen class begins in Listing 35.
class Screen{ |
The screenMsg method
The code in Listing 25 invokes the screenMsg method of an object of the Screen class for the purpose of applying SPAM screening rules to a message stored in a disk file.
The definition of the stripped-down screenMsg method begins in Listing 36.
public boolean screenMsg(String fileName, |
Initialize the ScreenResult object
The code in Listing 37 populates three of the fields in the ScreenResult object received as an incoming parameter. Two of these fields are populated with messages that will be overwritten later if Subject and From data is successfully extracted from the file containing the message.
theResult.subject = "No Subj line found"; |
Get the Subject data
Without getting into the details, the code in Listing 38 attempts to extract a text line from the message that begins with "Subject:". If successful, the data is used to overwrite the contents of the subject field of the ScreenResult object.
String data; |
Similarly, the code in Listing 39 attempts to extract a text line from an upper-case version of the message that begins with "From:". If successful, the data is used to overwrite the contents of the from field of the ScreenResult object.
inData.reset(); |
The code in Listing 40 attempts to read the entire message and deposit it in the text field of the ScreenResult object.
inData.reset(); |
Finally, the code in Listing 41 returns a boolean value. This value toggles between true and false as each successive message is processed. Therefore, it has no meaning insofar as SPAM is concerned.
Notice: A true return value should not be used to indicate that you should delete a message from the server.
if(returnValue == false){ |
If the return value is true
If the return value is true, the actionPerformed method will return immediately in Listing 28, allowing the user to ponder the data returned by the screenMsg method in deciding whether or not to delete the message from the server.
Once again, let me caution you not to enable the DELE code in Listing 42 near the end of the lesson until you are certain that you actually want to delete messages from the server. If you do enable it, do not press the Delete button just because this stripped-down version of the screenMsg method returns true.If the return value is false
If the screenMsg method returns false, the code in Listing 29 immediately fires a synthetic ActionEvent, attributable to the Start/Next button, which cases the next message to be retrieved from the server.
Run the Program
I encourage you to copy the code from Listing 42 into your text editor. Compile and execute the program. Experiment with it, making changes, and observing the results of your changes.
You may want to modify this code to cause the messages to be stored
in a different location on your disk. If so, modify the string in
the statement
in Listing 17 that reads "c:/MailFiles/"
+ uidl + ".txt" to
specify a different folder. Make certain that the folder
where you plan to save the files exists before running the program.
(Once again, let me caution you not to enable the DELE code in Listing 42 until you are certain that you actually want to delete messages from the server. Once a message is deleted from the server, there is no way to recover it from the server.)
Summary
This lesson explains the communications module used to communicate with your Email server, and to remove SPAM messages from the server before they are downloaded into your primary Email client.The program is designed to allow you to use my SPAM screening algorithm, or to invent your own. I will present the details of my SPAM screening algorithm in the next lesson in the series.
The version of the program discussed in this lesson has a stripped-down version of a class named Screen. This version of the program makes it possible for you to test the communications module on your system with your Email server without doing any actual screening for SPAM.
The capability to actually delete messages from the server is disabled in the version of the program shown in Listing 42. You should not enable that capability until you fully understand what you are doing and you are certain that you really do want to delete messages from the server. Once a message is deleted from the server, it cannot be recovered from the server.
What's Next?
In the next lesson, I will present and explain my version of the
class named Screen. This class contains my version of a
SPAM screening algorithm. You may want to use my version, replace
my version with an algorithm of your own, or do some combination of the
two.
Complete Program Listing
Also, the three DELE statements shown in red in Listing 42 have been purposely disabled to prevent you from accidentally deleting messages from your server while testing this program.
Do not enable these three statements until you are ready to actually delete messages from the server. Once a message is deleted from the server, it cannot be recovered from the server.Disclaimer of responsibility: If you elect to use this program you use it at your own risk. Make absolutely certain that you understand what you are doing before you execute the program. The author of this program, Richard G. Baldwin, accepts no responsibility for any losses that you may incur as a result of using this program.
/*File Pop302.java Copyright 2004, R.G.Baldwin |
Copyright 2004, Richard G. Baldwin. Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.
About the author
Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.
Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.
-end-
This article was originally published on January 6, 2004