Java Using Java to Clean Up Your Bookmark Library

Using Java to Clean Up Your Bookmark Library

Java Programming Notes # 2410


Preface

Many of us who have been using browsers on the web for many years have
accumulated vast bookmark libraries containing many broken bookmarks.  In my
own case, before I embarked on my bookmark cleanup campaign, I had accumulated more
than 5,200 bookmarks, many of which had probably been broken for years.

In this lesson, I will show you how to write a program that will identify
potentially broken bookmarks so that you can either delete them from your library
or repair them.  The program works for Firefox and Netscape bookmark
libraries as well as Internet Explorer Favorites libraries.

Viewing tip

You may find it useful to open another copy of this lesson in a
separate browser window.  That will make it easier for you to
scroll back
and forth among the different listings and figures while you are
reading
about them.

Supplementary material

I recommend that you also study the other lessons in my extensive
collection of online Java tutorials.  You will find those lessons
published
at Gamelan.com
However, as of the date of this writing, Gamelan doesn’t maintain a
consolidated index of my Java tutorial lessons, and sometimes they are
difficult to locate there.  You will find a consolidated index at www.DickBaldwin.com.

General Background Information

A Firefox bookmark library

Firefox and Netscape use the same technique for creating and maintaining a
bookmark library.  In particular, by default, the bookmarks are stored in a
file named bookmarks.html that you will find somewhere on your hard disk
in an area that is dedicated to the browser.

(Internet Explorer, on the other hand, uses a completely different
approach to creating and maintaining its library of Favorites.  This
program is compatible with the approaches used by all three programs.)

Path to the Firefox bookmark file

For example, here is the path to the Firefox bookmark file on my computer
running under Windows XP:

C:Documents and SettingsOwnerApplication
Data
MozillaFirefoxProfilesathy94h2.default
bookmarks.html

Note that by default everything in and beyond the folder named
Application Data
is hidden.  You must select "Show hidden files and
folders"
under Folder Options in order to be able to see the bookmark
file.  Depending on your operating system, your bookmark file may or may
not be similarly located on your hard disk.

Also note that the folder named athy94h2.default appears to be a
random folder name that is established when you install Firefox.

A browser view of a Firefox bookmark file


Figure 2

IE Favorites

As mentioned earlier, the approach that Microsoft uses to create and maintain
the IE Favorites library is entirely different from the approach used by
Firefox and Netscape.  The IE Favorites library is simply a directory tree structure rooted
in a Windows folder at a location similar to the following:

C:Documents and SettingsOwnerFavorites

Each Favorite item (bookmark) is stored in a separate text file having an extension of url.

(The Microsoft properties dialog refers to these files as Internet
Shortcut files.)

The name and the URL for the bookmark

The name of the bookmark is the name of the Internet Shortcut file.

The URL for the bookmark along with some other information is stored in the
Internet Shortcut file.

Bookmark library structure

Folders in the IE Favorites library are created by creating ordinary Windows
folders as children, grandchildren, etc., of the folder named Favorites.

The Windows Explorer view

I’m going to show you three views of the IE Favorites, which unfortunately
bear little resemblance to one another.  Figure 3 shows a screen shot of an
ordinary Windows Explorer window in which the files have been sorted
according to the Name by clicking the sorting bar at the
top.


Figure 3

With the exception of the file named aacmd.bat, each of the files in
Figure 3 represents an item in the Favorites library
(a bookmark).  There are, in
addition, other bookmarks in the folders named Adobe Studio, HP
Recommended Sites
, Links, and Media.  The order of the
files and the folders in the view shown in Figure 3 depends on which of the
sorting bars at the top has most recently been clicked.

Connecting to a server via an Internet Shortcut file

Double-clicking one of the Internet Shortcut files shown in
Figure 3 will
cause the default browser to attempt to connect to the server whose URL
is contained in the Internet Shortcut file.

The IE Favorites view

The view shown in Figure 4 is the view taken from inside the IE browser after
having clicked the button with the large gold star near
the top.


Figure 4

The order is controlled by the user

As you can see, the order of the items in Figure 4 doesn’t match the order of
the items in Figure 3.  In fact, the user can
change the order of the items in Figure 4 by selecting an item and dragging it up or down to a new
location.  The user can also change the order of the items by clicking the
button labeled Organize and making use of tools that are found there
(see Figure 12)
.  Note, however, that neither of these approaches to rearranging the items in this
view has any effect on the order of the actual files in the folder.

This ability to rearrange the items is very useful from the viewpoint of
making the Favorites library useful, but as you will see later makes it more
difficult to clean up the library by deleting or repairing broken links.

The view with the most natural order

The view that shows the Favorites items in the most natural order is the view
shown in Figure 5.  This view is the result of opening a command window and
executing a DIR command in the Favorites folder.

In Figure 5, you can see
the names of the individual files having an extension of url.  These are
the Internet Shortcut files.  The names of these files match the names of
the Favorites items that appear in the view shown in Figure 4.


Figure 5

Will use this order

The order of the Internet Shortcut files shown in Figure 5 matches the
processing order of the program that I will explain later.  The program
processes the Favorites directory listing recursively.  Thus in the case
shown in Figure 5, the program begins by processing the following three
files having the url extension in the order shown:

  • .NET Development.url
  • .NET Framework Home Page.url
  • ACC WebMail Login,baldwin,ACC Email Pwd…

Then the program makes a recursive call and process all of the files in
the directory named Adobe Studio.

Once all the files in directory named Adobe Studio have been processed
(along with the files in its sub-directories, if any), the program
returns to the level shown in Figure 5 and processes the file named Antivirus
daily download.url
.  It will continue processing files in the order
shown until it encounters the directory named HP Recommended Sites
At that point, it makes a recursive call to process the files in that
directory and its sub-directories.

IE Favorites can be difficult to locate

What you will see later is that even when you have identified a Favorites
item with a broken link, it can be difficult to locate it in the IE Favorites
view shown in Figure 4 in order to delete or repair the item.  As near as I
have been able to determine, that view does not provide a mechanism by which you
can search for a specified item (but perhaps I overlooked that
capability)
.

Let me see the headers …

Generally, this program operates by attempting to contact the server
specified in the URL for each bookmark and asking that server to send back the
response header lines for the
resource specified by the URL.

(The program requests that the server send only the
response header lines and not the entire resource in order to preserve bandwidth
and improve speed.)

HTTP requests

According to Wikipedia, whenever
an HTTP client contacts an HTTP server, it can send one of the requests shown in
Figure 6.

HTTP request methods

  • GET By far the most common method used to request for a
    specified URL.
  • HEAD Identical to GET, except that the page content is not
    returned; just the headers are. Useful for retrieving meta-information.
  • POST Similar to GET, except that a message body, typically
    containing key-value pairs from an HTML form submission, is included in
    the request.
  • PUT Used for uploading files to a specified

    URI
    on a web-server.
  • DELETE Rarely implemented, deletes a resource (i.e. a file).
  • TRACE Echoes back the received request, so that a client can
    see what intermediate servers are adding or changing in the request.
  • OPTIONS Returns the HTTP methods that the server supports.
    This can be used to check the functionality of a web server.
  • CONNECT Rarely implemented, for use with a proxy that can
    change to being an

    SSL
    tunnel.

HTTP servers are supposed to implement at least GET and HEAD methods
and, whenever possible, also OPTIONS method.

Figure 6

Response header lines

When this program contacts a server, it sends a HEAD request using the HTTP
1.1 protocol, requesting that only the response header lines be returned.

(You can view request and response headers for any URL at
http://web-sniffer.net/.)

For example, the entry of
HTTP://WWW.DICKBALDWIN.COM/ABC
into the web sniffer page shown above
produced the output shown in Figure 7.

HTTP/1.1 404 Not Found
Date: Sat, 17 Sep 2005 13:56:02 GMT
Server: Apache	
Content-Length: 320
Connection: close
Content-Type: text/html; charset=iso-8859-1
Figure 7

Ignore all but the status line

This program ignores all but the first response header line, taking the
content of that line as an indication of the quality of the bookmark.

According to HTTP Made Really
Easy
, the initial response line, often called the status line, has
three parts separated by spaces:

  • The HTTP version
  • A response status code that gives the result of the request
  • An English reason phrase describing the status code

Typical HTTP 1.1 status lines

Typical HTTP 1.1 status lines from different servers are shown in
Figure 8.

HTTP/1.1 200 OK
HTTP/1.1 301 Moved Permanently
HTTP/1.1 302 Moved Temporarily
HTTP/1.1 302 Found
HTTP/1.1 302 Object moved
HTTP/1.1 400 Bad Request
HTTP/1.1 401 Authorization Required
HTTP/1.1 403 Access Forbidden
HTTP/1.1 403 Invalid method
HTTP/1.1 404 Not found
HTTP/1.1 404 Object Not Found
HTTP/1.1 405 Method Not Allowed
HTTP/1.1 405
HTTP/1.1 500 Server Error
HTTP/1.1 500 Internal Server Error
HTTP/1.1 501 Method Not Implemented
HTTP/1.1 501 Method Not Supported
Figure 8

As you can see in Figure 8, the reason phrase for the same response
status code
varies from one server to another.

The status code

Also according to HTTP Made Really
Easy
,

  • The status code is meant to be computer-readable; the reason phrase is
    meant to be human-readable, and may vary.
  • The status code is a three-digit integer, and the first digit identifies
    the general category of response:

    • 1xx indicates an informational message only
    • 2xx indicates success of some kind
    • 3xx redirects the client to another URL
    • 4xx indicates an error on the client’s part
    • 5xx indicates an error on the server’s part

Program output

This program processes a specified bookmark library (Firefox, Netscape, or
IE)
and produces seven separate reports that indicate the quality of each
bookmark in the library.

(For cases where the bookmark library is large, the user is
allowed to specify a subset of bookmarks to process based on the positional indices of
the bookmarks in the library.)

Six of the seven reports contain the status line plus additional information
about the bookmarks.  The reports are written into
text files named 000.txt through 600.txt.

Why do we need seven different reports?

The file named 000.txt contains information about every bookmark in
the subset of bookmarks being processed.

In addition, the bookmarks are partitioned into five categories based on the first character in the status code.  The
files named 100.txt through 500.txt contain information about
bookmarks where the first character in the status code matches the first
character in the file name.

(For example, only those bookmarks that produced a response status code beginning with the
character 4, indicating
an error on the client’s part, are contained in the file
named 400.txt.  Furthermore, those bookmarks are not contained in
any other report other than 000.txt, which contains all bookmarks.)

A report on exceptions

The file named 600.txt contains information about bookmarks for which
the program was unable to successfully communicate with the specified server. 
Figure 9 shows some typical examples in this category.

java.net.ConnectException: Connection timed out: connect
java.net.SocketException: Network is unreachable: connect
Figure 9

Most important results for cleanup effort

Referring back to the meaning of the different status codes, it
is apparent that the contents of the files named 400.txt, 500.txt,
and 600.txt are the most important with regard to the task of identifying
and either deleting or repairing broken bookmarks.

Sample output

However, if you just want to repair an IE bookmark, it may be easier to use
the standard Windows Search tool shown in Figure 13 to
find the Internet Shortcut file that represents the bookmark of interest.


Figure 13

If you are an IE user, you are probably already aware that you activate this
search tool by clicking the button with the picture of the magnifying glass and the word Search
at the top of a standard Windows XP Explorer window.

Searching for Internet Shortcut files

To search for a specific Internet Shortcut file representing an IE bookmark, open an
Explorer
window on the Favorites folder, which will probably have a path similar to the
following:

C:Documents and SettingsOwnerFavorites

Then open the search tool and enter the name of the file, (which is also
the name of the bookmark)
, in the search dialog that appears in the left
pane of Figure 13.  Click the Search
button.  If the file exists in the Favorites folder or one of its
sub-folders, a link to the file will appear in the right pane of
Figure 13 when the search is complete.

Double-click to test the bookmark

At this point, you can double-click the link in the right pane to manually
test the bookmark that the file represents if such a test is needed.  You
can also right-click the link and select Properties to expose a dialog
that will allow you to edit the URL in order to repair it.

Deleting the file to delete the bookmark

You could also delete the file showing in the right pane of
Figure 13 to delete the bookmark.  However, I’m
not absolutely certain that is a safe thing to do.  Because Windows has the
ability to maintain the order of the bookmarks in the IE bookmarks view (Figure
4
)
, according to the arrangement that you create by dragging the
bookmarks up and down, the Internet Shortcut files don’t exist in a vacuum. 
There is some linkage (possibly an index file) between the existence of
the Internet Shortcut files and IE.  It is possible that deleting those
files outside of IE could cause a problem with IE’s ability to manage the
bookmarks represented by those files.

(However, I frequently drag shortcuts onto the Links toolbar and
delete shortcuts from the Links toolbar with no apparent ill effects. 
The Links toolbar is apparently just another view of the Links folder shown
in Figure 4.  On the basis of that experience,
I suspect that it is probably safe to delete an Internet Shortcut file in
order to delete an IE bookmark.  However, you might want to be a little
cautious in this regard.  For example, it might be a good idea to make
certain that IE isn’t running when you delete the files.)

Program Preview

This section provides a preview of the program named Bookmarks10.

Purpose

The purpose of this program is to help you to clean up your bookmark library
by identifying potentially broken bookmarks.  The program is compatible
with bookmark libraries for the following browsers:

  • Firefox
  • Netscape
  • Internet Explorer

Processes HTTP bookmarks only

This program does not attempt to connect to secure web sites using the HTTPS
protocol.  Also, it does not support FTP and protocols other than HTTP. 
If the bookmark library contains bookmarks that specify a protocol other than
HTTP, those bookmarks are simply ignored.

Methodology

The program attempts to connect to the server using the HTTP 1.1 protocol
and to retrieve the response headers from the server for each bookmark
within a specified range of bookmarks in the bookmark library.

The program uses the first line in the response header to categorize the
response into one of five categories as described at

http://www.jmarshall.com/easy/http/
.

According to the source given above, the initial response line, often called the
status line, has three parts separated by spaces:

  • The HTTP version
  • A response status code that gives the result of the request
  • An English reason phrase describing the status code.

The HTTP version is in the format "HTTP/x.x".

The status code is
meant to be computer-readable.

The reason phrase is meant to be
human-readable, and may vary.

Format and meaning of the status code

The status code is a three-digit integer, and the first digit identifies the
general category of response:

  • 1xx indicates an informational message only
  • 2xx indicates success of some kind
  • 3xx redirects the client to another URL
  • 4xx indicates an error on the client’s part
  • 5xx indicates an error on the server’s part

Some typical status lines follow:

  • HTTP/1.1 200 OK
  • HTTP/1.1 301 Moved Permanently
  • HTTP/1.1 302 Moved Temporarily
  • HTTP/1.1 302 Found
  • HTTP/1.1 302 Object moved
  • HTTP/1.1 400 Bad Request
  • HTTP/1.1 401 Authorization Required
  • HTTP/1.1 403 Access Forbidden
  • HTTP/1.1 403 Invalid method
  • HTTP/1.1 404 Not found
  • HTTP/1.1 404 Object Not Found
  • HTTP/1.1 405 Method Not Allowed
  • HTTP/1.1 405
  • HTTP/1.1 500 Server Error
  • HTTP/1.1 500 Internal Server Error
  • HTTP/1.1 501 Method Not Implemented
  • HTTP/1.1 501 Method Not Supported

Note that the reason phrase does vary from one web server to another. 
Also note that I haven’t seen any status lines that show a status code in the
1xx range.

Program output

The first header response line along with additional information about each
bookmark within the specified range is stored in a set of output text files
named 100.txt through 500.txt.  The user can examine the
information provided in those text files to determine the quality of each
bookmark.

For those bookmarks that appear to be broken on the basis of the
web server response, the user can either delete the bookmark from the library,
or attempt to repair it.

The program produces two more output files in addition to the five
output files described above.  A file named 000.txt contains
information about every bookmark within the range of specified bookmarks. 
A file named 600.txt contains information about each bookmark for which
the program threw an exception when trying to connect to the server. 
Some sample exceptions follow:

  • java.net.UnknownHostException:
    www.BadBookmark.com
  • java.net.ConnectException: Connection timed out: connect
  • java.net.SocketException: Network is unreachable: connect

Program input

The following five values must be provided as command-line parameters. 
All command-line parameters are provided as strings, but must be convertible to
the types shown below.

  • String bkMrkPath:  Path to the folder containing a
    Firefox bookmark file or containing a multitude of IE url files.
  • String bkMrkFile:  Name of the Firefox bookmark file. 
    Use a dummy name for this parameter when processing IE favorites.
  • int lowBkMrkLimit:  Index of first bookmark to
    process.  Indices begin with 0 for the first bookmark.
  • int numToProc:  Number of bookmarks to process.
  • String browser:  Type of browser:  F for
    Firefox, N for Netscape, or I for Internet Explorer.

Figure 14 shows the contents of a typical batch file used to process 200
bookmarks beginning with bookmark index 100 in an IE Favorites library.

java Bookmarks10 
"C:/Documents and Settings/Owner/Favorites/" 
DummyFileName 
100 
200 
I
Figure 14

Note that it was necessary to display each of the command-line parameters on
a different line in Figure 14 to force this material to fit in this narrow
publication format.

Program testing

This program was tested using J2SE 5.0 under WinXP.  J2SE 5.0 or later is required due to the
use of generics.

Discussion
and Sample Code


The program named Bookmarks10

I will
explain this program in fragments.  You can view a complete listing of the
program in Listing 20 near the end of the lesson.

The class definition
begins in Listing 1.  The code in Listing 1 simply declares several
variables used to produce the output files.

class Bookmarks10{
  //Output text file streams
  DataOutputStream file000;
  DataOutputStream file100;
  DataOutputStream file200;
  DataOutputStream file300;
  DataOutputStream file400;
  DataOutputStream file500;
  DataOutputStream file600;

Listing 1

The main method

The main method begins in Listing 2.

  public static void main(String[] args){
    //Confirm correct number of command-line parameters.
    // If the number is not correct, display a usage msg
    // and terminate the program.
    if(args.length != 5){
      System.out.println("Command-line parameter error");
      System.out.println();
      System.out.println("Usage: java Bookmarks10");
      System.out.println("followed by:");
      System.out.println("Bookmark path");
      System.out.println("Bookmark file");
      System.out.println("Low bookmark limit");
      System.out.println("Number bookmarks to process");
      System.out.println("Browser, F, N, or I");
      
      System.out.println();
      System.out.println("Terminating Program");
      System.exit(0);      
    }//end if
    
    //The following values are provided as command-line
    // parameters.

    //Path to the folder containing a Firefox bookmark
    // file or containing a multitude of IE .url files.
    String bkMrkPath = args[0];
    //Name of the Firefox bookmark file.  Just use a 
    // dummy name for this parameter when processing IE
    // favorites
    String bkMrkFile = args[1];
    //Index of first bookmark to process.
    int lowBkMrkLimit = Integer.parseInt(args[2]);
    //Number of bookmarks to process.
    int numToProc = Integer.parseInt(args[3]);
    //Type of browser: F for Firefox, N for Navigator,
    // or I for Internet Explorer.
    String browser = args[4];
    //End of command-line parameters

Listing 2

The code in Listing 2 simply deals with the required command-line parameters
and shouldn’t require further explanation.

Instantiate an object of this class

The code in Listing 3 instantiates an object of this class and stores its
reference in a reference variable named thisObj.

    Bookmarks10 thisObj = new Bookmarks10();

Listing 3

The reference variable named thisObj will be used later to invoke
instance methods belonging to the object.

Get name and URL for each bookmark

The code in Listing 4 gets the name and the URL for each of the bookmarks and
encapsulates them in an object of type Bookmark.  All of the
Bookmark
objects are encapsulated in an object of type ArrayList.

    //The following collection encapsulates all of the
    // bookmarks awaiting final processing.  The
    // getIEBookmarks method requires that a method
    // parameter points to the ArrayList object on input
    // because of its recursive nature.  The
    // getFireFoxBookmarks method is not recursive and it
    // overwrites this object with a new ArrayList object
    // that it creates.
    ArrayList <Bookmark> theBookmarks = 
                                new ArrayList <Bookmark>();
    if(browser.toUpperCase().equals("F")){
      //Process Firefox bookmarks.
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("N")){
      //Process Netscape Navigator bookmarks.  Same format
      // as Firefox
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("I")){
      //Process Inernet Explorer favorites.
      theBookmarks = thisObj.getIEBookmarks(
                                   bkMrkPath,theBookmarks);
    }else{
      System.out.println("Don't recognize browser");
      System.out.println("Terminating program");
      System.exit(0);
    }//end else

Listing 4

Code was explained in an earlier lesson

The code in Listing 4, along with the methods named getFireFoxBookmarks
and getIEBookmarks is very similar to code that I explained in the
earlier lesson entitled
Creating a
Portable Bookmark Library using Java, Part 2
.  Therefore, I won’t explain that code again
here.  Rather, I will simply refer you to that earlier lesson.  You
can view those methods in Listing 20 near the end of the lesson.

Once the code in Listing 4 has executed, all of the required bookmark
information has been encapsulated in an ArrayList object referred to by
theBookmarks.

Process the bookmarks

Continuing with the main method, the code in Listing 5 invokes the
method named processBkMrks to process all of the bookmarks that have been
encapsulated in the ArrayList object.

    thisObj.processBkMrks(lowBkMrkLimit,numToProc,
                                             theBookmarks);
  }// end main

Listing 5

Listing 5 also signals the end of the main method.

The processBkMrks method

The method named processBkMrks begins in Listing 6.  This method
processes bookmarks previously stored in an ArrayList object referred to
by theBookmarks.

  void processBkMrks(int lowBkMrkLimit,
                     int numToProc,
                     ArrayList <Bookmark> theBookmarks){
    int eligibleCounter = 0;
    String theName = null;
    String theUrl = null;

Listing 6

This method receives a reference to the ArrayList object containing
bookmark information along with information identifying the bookmarks to
process.  The parameter named lowBkMrkLimit specifies the index of
the first bookmark to process.  The parameter named numToProc
specifies the number of bookmarks to process.

Listing 6 declares and initializes some local working variables.

Create the output files

The code in Listing 7 creates the seven output files and places one line of
explanatory text in each file.

    try{
      file000 = new DataOutputStream(
                          new FileOutputStream("000.txt"));
      file000.writeBytes(
                     "This file contains all headersnn");
      
      file100 = new DataOutputStream(
                          new FileOutputStream("100.txt"));
      file100.writeBytes(
          "This file contains all 100-series headersnn");
      
      file200 = new DataOutputStream(
                          new FileOutputStream("200.txt"));
      file200.writeBytes(
          "This file contains all 200-series headersnn");
      
      file300 = new DataOutputStream(
                          new FileOutputStream("300.txt"));
      file300.writeBytes(
          "This file contains all 300-series headersnn");
      
      file400 = new DataOutputStream(
                          new FileOutputStream("400.txt"));
      file400.writeBytes(
          "This file contains all 400-series headersnn");
      
      file500 = new DataOutputStream(
                          new FileOutputStream("500.txt"));
      file500.writeBytes(
          "This file contains all 500-series headersnn");
      
      file600 = new DataOutputStream(
                          new FileOutputStream("600.txt"));
      file600.writeBytes(
                "This file contains exception outputnn");
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch

Listing 7

The code in Listing 7 is straightforward and shouldn’t require further
explanation.

Iterate on the ArrayList object

Listing 8 shows the beginning of a for loop that is used to iterate on
the ArrayList object and to examine each bookmark encapsulated in the
object.

    for(int msgCntr = 0;msgCntr < theBookmarks.size();
                                                msgCntr++){
      theName = theBookmarks.get(msgCntr).bkMrkName;
      theUrl = theBookmarks.get(msgCntr).bkMrkUrl;

Listing 8

The code in Listing 8 extracts and saves the name and the URL for each
bookmark that it examines.

Determine eligibility

Listing 9 shows the beginning of an if statement that determines the eligibility of
the current bookmark for processing based on the specified range of bookmark indices and the
protocol.

      if((msgCntr >= lowBkMrkLimit) && 
                    (msgCntr < lowBkMrkLimit + numToProc)){
        //Strip off the protocol for the HTTP protocol only
        if(theUrl.substring(0,7).toUpperCase().
                                        equals("HTTP://")){
          theUrl = theUrl.substring(7);
          //This bookmark is eligible for processing.
          eligibleCounter++;
          //Display progress on standard output
          System.out.println("n" + msgCntr + " " 
                                 + theName + " " + theUrl);
                                 
          //Try to connect to the server to retrieve the
          // response headers.
          tryToConnect(msgCntr,theName,theUrl);

Listing 9

In order to be eligible for processing, the bookmark must specify the HTTP
protocol and the index of the bookmark must fall within the range specified by
the user.

If the bookmark is determined to be eligible, the URL for the bookmark, along
with some other information is passed to a method named tryToConnect
This method, which I will explain later, contains the code that attempts to
connect to the server specified by the URL and to retrieve the response header
for the specified resource.

If protocol is not HTTP

Continuing for the moment with the method named processBkMrks, the
code in the else clause in Listing 10 deals with those bookmarks for
which the index is in the specified range, but for which the protocol is not
HTTP.

        }else{
          //This protocol can't be handled by this program.
          // Document that fact in the file named 000.txt.
          try{
            file000.writeBytes(msgCntr + " " + 
                          "Can't handle this protocol.n");
            file000.writeBytes(
                         theName + "   " + theUrl +"nn");
          }catch(IOException e){
            try{
              file600.writeBytes(e + "nn");
            }catch(Exception ex){
              ex.printStackTrace();
            }//end catch
            e.printStackTrace();
            System.exit(0);
          }//end catch
        }//end else regarding protocol
      }//end if regarding the bookmark indices
    }//end for loop iterating on the ArrayList object

Listing 10

The code in the else clause in Listing 10 writes a notification into the file named 000.txt to the
effect that the protocol is not eligible for processing.

Listing 10 also contains some cleanup code including a catch block and
several end points including the end point for the for loop that began in
Listing 8 and is used
to iterate on the bookmarks encapsulated in the ArrayList object.

Store summary information

Listing 11 stores summary information about the run at the end of the file named
000.txt and closes all output text files.

    try{
      
      file000.writeBytes("Number eligible bookmarks = " 
                                 + eligibleCounter + "n");
      file000.writeBytes("Bookmark range = " 
            + lowBkMrkLimit
            + " to " + (lowBkMrkLimit + numToProc) + "n");
      file000.writeBytes("Total number bookmarks = " 
                             + theBookmarks.size() + "n");
      file000.close();
      file100.close();
      file200.close();
      file300.close();
      file400.close();
      file500.close();
      file600.close();
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
  }//end processBkMrks

Listing 11

Listing 11 also signals the end of the method named processBkMrks.

Sample summary information

Figure 15 shows a sample of the summary information that resulted from running the program on my Firefox bookmark library.

Number eligible bookmarks = 1203
Bookmark range = 2869 to 7919
Total number bookmarks = 4088
Figure 15

As you can see, there were a little over 1200 available bookmarks between the
specified beginning index and the end of the library at an index of 4087. 
Of this total, 1203 were deemed to be eligible for processing.  Presumably
the remaining bookmarks specified the wrong protocol.

The method named tryToConnect

Listing 12 shows the beginning of the method named tryToConnect, which is invoked on all eligible bookmarks in Listing 9.

The purpose of this method is to try to connect to the server specified by a given URL and to download the response header lines
for the specified resource.

  void tryToConnect(int cnt, String theName,String URL){
    String server = "";
    String theFile = "";

    //Handle cases with a file specified or with no file
    // specified but a trailing slash on the URL.
    if(URL.indexOf("/") != -1){
      server = URL.substring(0,URL.indexOf("/"));
      theFile = URL.substring(URL.indexOf("/"));
    }else
      //Handle the case of no slash and no file specified.
      if(URL.indexOf("/") == -1){
        server = URL;
        theFile = "/";
    }//end if

Listing 12

After declaring and initializing a couple of local working variables, the
code in Listing 12 gets values for the server and the resource that is requested
by the bookmark.

Different URL formats

The code in Listing 12 deals with the fact that URLs can come in different formats. 
For example, some URLs specify a resource and some do not.  In the latter
case, the expectation is that the server will deliver a default resource, such
as a file named index.html.  In this case, the resource needs to be
specified as a single forward-slash character when the HEAD request is sent to
the server.

Get a Socket connection to the server

The code in Listing 13 tries to get a socket
connection to the server on port 80, the standard HTTP port.

    int port = 80; //http port
    try{
      Socket socket = new Socket(server,port);//get socket

      //Get input and output streams from the socket      
      BufferedReader inputStream = 
                  new BufferedReader(new InputStreamReader(
                                 socket.getInputStream()));
      PrintWriter outputStream = 
                    new PrintWriter(new OutputStreamWriter(
                           socket.getOutputStream()),true);

Listing 13

If the connection is achieved, Listing 13 gets input and output streams on
the socket by which the program can send a request to the server and read the
response provided by the server.

If the attempt to get the socket connection fails, the code in a catch block
shown later in Listing 19 will be executed to cause that
failure to be
noted in the output file named 600.txt.

Request the response headers

The code in Listing 14 sends a HEAD request to the server asking it to send back the response header lines
pertaining to the resource specified by theFile using the HTTP 1.1 protocol.

      outputStream.println(
                          "HEAD " + theFile + " HTTP/1.1");
      outputStream.println("Host: " + server);
      //May need to modify the following for non-Windows
      // systems, (see Wikipedia reference) to cause hard
      // line breaks consisting of both a carriage return
      // and a line feed to be sent to the server.
      outputStream.println();
      outputStream.println();

Listing 14

You can read more about the format requirements of the HTTP 1.1 protocol at Wikipedia.

(Note the comment in Listing 14 regarding hard line breaks and
non-windows systems.)

Read and save the first response line header

The code in Listing 15 reads and saves the first line sent back by the server
in the response header for the resource.  For the purposes of this program,
we don’t care about the other lines in the response header, so we don’t read
them.

      String line = inputStream.readLine();

Listing 15

Save the first line for all bookmarks

The code in Listing 16 saves the first header response line in the file named
000.txt, along with the index value for the bookmark.  This
information can be useful later for reference purposes.

      file000.writeBytes(cnt + " " + theName + " " + URL 
                                                   + "n");

Listing 16

Distribute the fist line among different output
files

The code in Listing 17 distributes another copy of the first response header
line among five different output files based on the first character of the status code.  For example, all lines for which the status code begins with
2 go into the file named 200.txt, and all lines for which the
status code begins with 4 go into the file named 400.txt.

      if(line.startsWith("HTTP/1.0")){
        file000.writeBytes(
                    "HTTP/1.0 results are not reliablen");
      }//end if
      file000.writeBytes(line + "n");
      file000.writeBytes("n");

      //Save first line of all 100 series headers in the
      // file named 100.txt
      if(line.substring(9,10).equals("1")){
        file100.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file100.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file100.writeBytes(line + "n");
        file100.writeBytes("n");
      }//end if

      //Save first line of all 200 series headers in the
      // file named 200.txt
      if(line.substring(9,10).equals("2")){
        file200.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file200.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file200.writeBytes(line + "n");
        file200.writeBytes("n");
      }//end if

      //Save first line of all 300 series headers in the
      // file named 300.txt
      if(line.substring(9,10).equals("3")){
        file300.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file300.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file300.writeBytes(line + "n");
        file300.writeBytes("n");
      }//end if

      //Save first line of all 400 series headers in the
      // file named 400.txt
      if(line.substring(9,10).equals("4")){
        file400.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file400.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file400.writeBytes(line + "n");
        file400.writeBytes("n");
      }//end if
      
      //Save first line of all 500 series headers in the
      // file named 500.txt
      if(line.substring(9,10).equals("5")){
        file500.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file500.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file500.writeBytes(line + "n");
        file500.writeBytes("n");
      }//end if

Listing 17

Close the connection

The code in Listing 18 closes the Socket connection.

      socket.close();
    }//end try

Listing 18

Listing 18 also signals the end of the try block that began in
Listing 13.

Unable to connect

Listing 19 shows the catch block that is
associated with the try block that began in Listing
13

    catch(Exception e){
      try{
        file600.writeBytes(cnt + " " + theName + "n");
        file600.writeBytes(server + theFile + "n");
        file600.writeBytes(e + "n");
        file600.writeBytes("n");
      }catch(IOException ex){
        ex.printStackTrace();
      }//end catch
    }//end catch
  }//end tryToConnect

}//end class Bookmarks10 definition

Listing 19

The code in Listing 19 is executed if the program is unable to make the
connection with the server specified by the bookmark.  In this event,
information regarding the problem is recorded in the output file named
600.txt
Figure 11 shows an example of such
output

Listing 19 also signals the end of the method named tryToConnect and
the end of the class named Bookmarks10.

Run the Program

I encourage you to copy the code from Listing 20 into your text editor,
compile it, and execute it.  Experiment with it, making changes, and
observing the results of your changes.

If you feel really ambitious, you might want to expand the code causing the program
to automatically delete broken bookmarks from the bookmark library.

Summary

In this lesson, I showed you how to write a Java program that will help you
to identify broken bookmarks in your bookmark library so that you can either
delete them or repair them.

I began by explaining the differences between IE bookmark libraries and
Firefox/Netscape bookmark libraries.

I explained that this program identifies potentially broken bookmarks in all
three kinds of bookmark libraries: IE, Firefox, and Netscape.

I explained how you can use the output produced by this program to first find
and then to delete or repair broken bookmarks in your bookmark library.

I explained how the HTTP 1.1 protocol can be used to connect to a server and
request the response headers associated with a specified resource.

I explained how you can use the information contained in the first response
header line to assess the quality of a bookmark.

I described and provided examples of each of the seven output text files
produced by this program.

Finally, I explained in detail how this program accomplishes its purpose and gave
usage examples for the program.

Complete Program Listing


A complete listing of the program discussed in this lesson is shown in
Listing 20 below.
 

/* File Bookmarks10.java 
Copyright 2005, R.G.Baldwin
Revised 09/15/05

The purpose of this program is to help you to clean up your
bookmark library.  It is compatible with bookmark libraries
for the following browsers:

Firefox
Netscape
Internet Explorer

For each bookmark within a specified range of bookmarks 
within the bookmark library, the program attempts to use 
the bookmark to connect to the web server using the 
HTTP 1.1 protocol and to retrieve the response headers from
the web server. It uses the first line in the response 
header to categorize the response into one of five
categories as described at 
http://www.jmarshall.com/easy/http/.

This program does not attempt to connect to secure web 
sites using the HTTPS protocol.  Also, it does not support 
FTP and other protocols.  If the bookmark library contains 
bookmarks that specify a protocol other than simple HTTP, 
those bookmarks are simply ignored.

According to the source given above, the initial response 
line, called the status line, has three parts separated by 
spaces:

1. The HTTP version
2. A response status code that gives the result of the 
   request
3. An English reason phrase describing the status code.

The HTTP version is in the format "HTTP/x.x".

The status code is meant to be computer-readable. The 
reason phrase is meant to be human-readable, and may vary.

The status code is a three-digit integer, and the first 
digit identifies the general category of response:

1xx indicates an informational message only
2xx indicates success of some kind
3xx redirects the client to another URL
4xx indicates an error on the client's part
5xx indicates an error on the server's part

The header response line along with additional information 
about each bookmark within the specified range is stored in
a set of output text files named 100.txt through 500.txt.
The user can examine the information provided in those text
files to determine the quality of the bookmark. For those 
bookmarks that are determined to have problems on the basis
of the web server response, the user can either delete the 
bookmark from the library, or attempt to repair it.

Some typical status lines follow:

HTTP/1.1 200 OK
HTTP/1.1 301 Moved Permanently
HTTP/1.1 302 Moved Temporarily
HTTP/1.1 302 Found
HTTP/1.1 302 Object moved
HTTP/1.1 400 Bad Request
HTTP/1.1 401 Authorization Required
HTTP/1.1 403 Access Forbidden
HTTP/1.1 403 Invalid method
HTTP/1.1 404 Not found
HTTP/1.1 404 Object Not Found
HTTP/1.1 405 Method Not Allowed
HTTP/1.1 405
HTTP/1.1 500 Server Error
HTTP/1.1 500 Internal Server Error
HTTP/1.1 501 Method Not Implemented
HTTP/1.1 501 Method Not Supported

Note that the reason phrase does vary from one web server
to another.  Also note that I haven't seen any status 
lines that show a status code in the 1xx range.

The status codes that are probably the most important in 
terms of cleaning up the bookmark library are those in the 
4xx and 5xx range.

In addition to the five output files described above, the 
program also produces two additional output files. A file 
named 000.txt contains information about every bookmark 
within the range of specified bookmarks.

A file named 600.txt contains information about each 
bookmark for which the program threw an exception when
trying to connect to the web site, such as the following:

java.net.UnknownHostException: www.BadBookmark.com
java.net.ConnectException: Connection timed out: connect
java.net.SocketException: Network is unreachable: connect


The following five values must be provided as command-line 
parameters.  All command-line parameters are provided as
strings, but must be convertible to the types shown below.

String bkMrkPath: Path to the folder containing a Firefox
 bookmark file or containing a multitude of IE .url files.
String bkMrkFile: Name of the Firefox bookmark file.  Just
 use a dummy name for this parameter when processing IE
 favorites
int lowBkMrkLimit: Index of first bookmark to process.
int numToProc: Number of bookmarks to process.
String browser: Type of browser: F for Firefox, N for
 Navigator, or I for Internet Explorer.

Tested using J2SE 5.0 under WinXP.  J2SE 5.0 or later is
required due to the use of generics.
**********************************************************/
import java.net.*;
import java.io.*;
import java.util.*;

class Bookmarks10{
  //Output text file streams
  DataOutputStream file000;
  DataOutputStream file100;
  DataOutputStream file200;
  DataOutputStream file300;
  DataOutputStream file400;
  DataOutputStream file500;
  DataOutputStream file600;

  public static void main(String[] args){
    //Confirm correct number of command-line parameters.
    // If the number is not correct, display a usage msg
    // and terminate the program.
    if(args.length != 5){
      System.out.println("Command-line parameter error");
      System.out.println();
      System.out.println("Usage: java Bookmarks10");
      System.out.println("followed by:");
      System.out.println("Bookmark path");
      System.out.println("Bookmark file");
      System.out.println("Low bookmark limit");
      System.out.println("Number bookmarks to process");
      System.out.println("Browser, F, N, or I");
      
      System.out.println();
      System.out.println("Terminating Program");
      System.exit(0);      
    }//end if
    
    //The following values are provided as command-line
    // parameters.

    //Path to the folder containing a Firefox bookmark
    // file or containing a multitude of IE .url files.
    String bkMrkPath = args[0];
    //Name of the Firefox bookmark file.  Just use a 
    // dummy name for this parameter when processing IE
    // favorites
    String bkMrkFile = args[1];
    //Index of first bookmark to process.
    int lowBkMrkLimit = Integer.parseInt(args[2]);
    //Number of bookmarks to process.
    int numToProc = Integer.parseInt(args[3]);
    //Type of browser: F for Firefox, N for Navigator,
    // or I for Internet Explorer.
    String browser = args[4];
    //End of command-line parameters
    
    //Instantiate a new object of this class.    
    Bookmarks10 thisObj = new Bookmarks10();

    //Get the name and the URL for each of the bookmarks.
    // Encapsulate them in an object of type Bookmark.
    // Encapsulate all of the Bookmark objects in an object
    // of type ArrayList.
    
    //The following collection encapsulates all of the
    // bookmarks awaiting final processing.  The
    // getIEBookmarks method requires that a method
    // parameter points to the ArrayList object on input
    // because of its recursive nature.  The
    // getFireFoxBookmarks method is not recursive and it
    // overwrites this object with a new ArrayList object
    // that it creates.
    ArrayList <Bookmark> theBookmarks = 
                                new ArrayList <Bookmark>();
    if(browser.toUpperCase().equals("F")){
      //Process Firefox bookmarks.
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("N")){
      //Process Netscape Navigator bookmarks.  Same format
      // as Firefox
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("I")){
      //Process Inernet Explorer favorites.
      theBookmarks = thisObj.getIEBookmarks(
                                   bkMrkPath,theBookmarks);
    }else{
      System.out.println("Don't recognize browser");
      System.out.println("Terminating program");
      System.exit(0);
    }//end else

    //Process the bookmarks.
    thisObj.processBkMrks(lowBkMrkLimit,numToProc,
                                             theBookmarks);
  }// end main
  //-----------------------------------------------------//
  
  //This method processes bookmarks previously stored in an
  // ArrayList object.
  void processBkMrks(int lowBkMrkLimit,
                     int numToProc,
                     ArrayList <Bookmark> theBookmarks){
    int eligibleCounter = 0;
    String theName = null;
    String theUrl = null;
    
    //Create the output files.
    try{
      file000 = new DataOutputStream(
                          new FileOutputStream("000.txt"));
      file000.writeBytes(
                     "This file contains all headersnn");
      
      file100 = new DataOutputStream(
                          new FileOutputStream("100.txt"));
      file100.writeBytes(
          "This file contains all 100-series headersnn");
      
      file200 = new DataOutputStream(
                          new FileOutputStream("200.txt"));
      file200.writeBytes(
          "This file contains all 200-series headersnn");
      
      file300 = new DataOutputStream(
                          new FileOutputStream("300.txt"));
      file300.writeBytes(
          "This file contains all 300-series headersnn");
      
      file400 = new DataOutputStream(
                          new FileOutputStream("400.txt"));
      file400.writeBytes(
          "This file contains all 400-series headersnn");
      
      file500 = new DataOutputStream(
                          new FileOutputStream("500.txt"));
      file500.writeBytes(
          "This file contains all 500-series headersnn");
      
      file600 = new DataOutputStream(
                          new FileOutputStream("600.txt"));
      file600.writeBytes(
                "This file contains exception outputnn");
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
    
    //Iterate on the bookmarks in the ArrayList collection.
    for(int msgCntr = 0;msgCntr < theBookmarks.size();
                                                msgCntr++){
      theName = theBookmarks.get(msgCntr).bkMrkName;
      theUrl = theBookmarks.get(msgCntr).bkMrkUrl;

      //Determine eligibility based on the specified 
      // range of bookmark indices and the protocol.
      if((msgCntr >= lowBkMrkLimit) && 
                    (msgCntr < lowBkMrkLimit + numToProc)){
        //Strip off the protocol for the HTTP protocol only
        if(theUrl.substring(0,7).toUpperCase().
                                        equals("HTTP://")){
          theUrl = theUrl.substring(7);
          //This bookmark is eligible for processing.
          eligibleCounter++;
          //Display progress on standard output
          System.out.println("n" + msgCntr + " " 
                                 + theName + " " + theUrl);
                                 
          //Try to connect to the server to retrieve the
          // response headers.
          tryToConnect(msgCntr,theName,theUrl);
        }else{
          //This protocol can't be handled by this program.
          // Document that fact in the file named 000.txt.
          try{
            file000.writeBytes(msgCntr + " " + 
                          "Can't handle this protocol.n");
            file000.writeBytes(
                         theName + "   " + theUrl +"nn");
          }catch(IOException e){
            try{
              file600.writeBytes(e + "nn");
            }catch(Exception ex){
              ex.printStackTrace();
            }//end catch
            e.printStackTrace();
            System.exit(0);
          }//end catch
        }//end else regarding protocol
      }//end if regarding the bookmark indices
    }//end for loop iterating on the ArrayList object
    
    //Store summary information about the run in the file
    // named 000.txt and close all output text files.
    try{
      
      file000.writeBytes("Number eligible bookmarks = " 
                                 + eligibleCounter + "n");
      file000.writeBytes("Bookmark range = " 
            + lowBkMrkLimit
            + " to " + (lowBkMrkLimit + numToProc) + "n");
      file000.writeBytes("Total number bookmarks = " 
                             + theBookmarks.size() + "n");
      file000.close();
      file100.close();
      file200.close();
      file300.close();
      file400.close();
      file500.close();
      file600.close();
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
  }//end processBkMrks
  //-----------------------------------------------------//
  
  //The purpose of this method is to extract all of the
  // bookmarks and to encapsulate them in an ArrayList
  // object.  Each element in the ArrayList object is an
  // object of the inner class named Bookmark.
  //This version of the method is designed to extract
  // bookmarks from Firefox and Netscape bookmark files.
  ArrayList <Bookmark> getFireFoxBookmarks(
                        String bkMrkPath,String bkMrkFile){
    int urlIndex = 0;
    int startIndex = 0;
    int endIndex = 0;
    ArrayList <Bookmark> theBookmarks = 
                                new ArrayList <Bookmark>();
    try{
      BufferedReader bufRdr = new BufferedReader(
                 new InputStreamReader(new FileInputStream(
                                  bkMrkPath + bkMrkFile)));
      //Read each line of text from the copy of the
      // bookmark file.  If the line contains a URL,
      // extract the URL and the name of the bookmark.
      String theName = null;
      String theUrl = null;
      String data = null;
      while((data = bufRdr.readLine()) != null){
        urlIndex = data.indexOf("A HREF="");       
        //urlIndex will be -1 if line doesn't contain
        // a URL indicated by A HREF...  In that case, just
        // ignore the line of text.
        if(urlIndex != -1){
          //Find the index of the quotation marks at the
          // beginning and the end of the URL.
          startIndex = urlIndex+8;//Index of first quote+1
          //Index of quotation mark at the end of the URL.
          endIndex = data.indexOf(""",startIndex);
          //Extract and save the URL
          theUrl = data.substring(startIndex,endIndex);
          
          //Get and save the content of the element
          // named A.
          // Get the index of the beginning of the content.
          startIndex = data.indexOf(">",urlIndex) +1;
          //Get the index of the end of the content.
          endIndex = data.indexOf("</A>",startIndex);
          //Get and save the content
          if(endIndex > startIndex){
            //The A element is not empty.
            theName = data.substring(startIndex,endIndex);
          }else{
            //The A element is empty
            theName = "No bookmark name found.";
          }//end else

          theBookmarks.add(new Bookmark(theName,theUrl));
        }//end if
      }//end while
      bufRdr.close();
    }catch(Exception e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
    
    return theBookmarks;
  }//end getFireFoxBookmarks
  //-----------------------------------------------------//
  
  //This method uses recursion to traverse the directory
  // tree containing IE Favorites.  Each bookmark is 
  // represented by a file with an extension of .url. The
  // name of the file is the name of the bookmarl.  The
  // URL for the bookmark is contained as a line of text
  // in the file.
  ArrayList <Bookmark> getIEBookmarks(
       String bkMrkPath,ArrayList <Bookmark> theBookmarks){
   
    String theName = null;
    String theUrl = null;
    String fileName = null;
    String pathAndFile = null;
    
    //Get a File object that represents the directory.
    File fileObj = new File(bkMrkPath);
    //Make certain that the directory exists.
    if(fileObj.exists()){
      //Confirm that the File object represents a directory
      // and not a file.
      if(fileObj.isDirectory()){
        //Get a list of the directory contents in an array
        // object.
        File[] dirContents = fileObj.listFiles();
        
        //Sort the directory contents according to the
        // natural order according toname.  You may want to
        // disable this sort and leave the data in the
        // recursion order.  It all depends on how you plan
        // to locate the Favorites in the IE Favorites
        // display.
        //Arrays.sort(dirContents);
        //Process the contents of the directory that were
        // saved in the list of contents.
        for(int cnt = 0;cnt < dirContents.length;cnt++){
          if(dirContents[cnt].isDirectory()){
            //Make a recursive call to process this
            // directory before processing the remaining
            // contents in the list of contents.
            theBookmarks = getIEBookmarks(
                  dirContents[cnt].getPath(),theBookmarks);
          }else if(dirContents[cnt].isFile()){
            pathAndFile = dirContents[cnt].getPath();
            fileName = dirContents[cnt].getName();

            //All file names that represent bookmarks
            // should end with .url.
            if(fileName.toUpperCase().endsWith(".URL")){
              theName = fileName.substring(
                 0,fileName.toUpperCase().indexOf(".URL"));
              theUrl = getTheUrl(pathAndFile);
              theBookmarks.add(
                             new Bookmark(theName,theUrl));
            }//end if
          }//end else
        }//end for loop
      }else{
        System.out.println(
                  bkMrkPath + ": not a directory.");
      }//end else
    }else{
      System.out.println("Directory " + bkMrkPath
                                     + " does not exist.");
    }//end else
    return theBookmarks;
  }//end getIEBookmarks
  //-----------------------------------------------------//
  
  //This is a helper method called by getIEBookmarks.  The
  // purpose of this method is to extract the URL from a
  // Microsoft .url file.
  String getTheUrl(String pathAndFile){
    try{
      BufferedReader inData = new BufferedReader(
                              new FileReader(pathAndFile));
      String data; //temp holding area

      while((data = inData.readLine()) != null){
        if(data.startsWith("URL=")){
          String theUrl = data.substring(4);
          inData.close();//Close input file
          return theUrl;
        }//end if
      }//end while loop
      inData.close();//Close input file
    }catch(Exception e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
    }//end catch
    System.out.println("No URL Found");
    return "No URL Found";
  }//end getTheUrl
  //-----------------------------------------------------//
  
  //This is an inner class, the purpose of which is to
  // encapsulate the name and the URL for a bookmark.
  class Bookmark{
    String bkMrkName;
    String bkMrkUrl;
    
    Bookmark(String bkMrkName,String bkMrkUrl){
      this.bkMrkName = bkMrkName;
      this.bkMrkUrl = bkMrkUrl;
    }//end constructor
  }//end inner class Bookmark
  //-----------------------------------------------------//
  
  //The purpose of this method is to try to connect to the
  // website specified by a given URL and to download
  // the response header lines.
  void tryToConnect(int cnt, String theName,String URL){
    String server = "";
    String theFile = "";

    //Handle cases with a file specified or with no file
    // specified but a trailing slash on the URL.
    if(URL.indexOf("/") != -1){
      server = URL.substring(0,URL.indexOf("/"));
      theFile = URL.substring(URL.indexOf("/"));
    }else
      //Handle the case of no slash and no file specified.
      if(URL.indexOf("/") == -1){
        server = URL;
        theFile = "/";
    }//end if

    int port = 80; //http port
    try{
      Socket socket = new Socket(server,port);//get socket

      //Get input and output streams from the socket      
      BufferedReader inputStream = 
                  new BufferedReader(new InputStreamReader(
                                 socket.getInputStream()));
      PrintWriter outputStream = 
                    new PrintWriter(new OutputStreamWriter(
                           socket.getOutputStream()),true);

      //Send a command to the web server asking it to
      // send back the response header lines only using the
      // HTTP 1.1 protocol.
      outputStream.println(
                          "HEAD " + theFile + " HTTP/1.1");
      outputStream.println("Host: " + server);
      //May need to modify the following for non-Windows
      // systems, (see Wikipedia reference) to cause hard
      // line breaks consisting of both a carriage return
      // and a line feed to be sent to the server.
      outputStream.println();
      outputStream.println();
      
      //Get first response header line.  We don't care
      // about the other lines.
      String line = inputStream.readLine();
      
      //Save first line of all headers in the file
      // named 000.txt.
      file000.writeBytes(cnt + " " + theName + " " + URL 
                                                   + "n");
      if(line.startsWith("HTTP/1.0")){
        file000.writeBytes(
                    "HTTP/1.0 results are not reliablen");
      }//end if
      file000.writeBytes(line + "n");
      file000.writeBytes("n");

      //Save first line of all 100 series headers in the
      // file named 100.txt
      if(line.substring(9,10).equals("1")){
        file100.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file100.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file100.writeBytes(line + "n");
        file100.writeBytes("n");
      }//end if

      //Save first line of all 200 series headers in the
      // file named 200.txt
      if(line.substring(9,10).equals("2")){
        file200.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file200.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file200.writeBytes(line + "n");
        file200.writeBytes("n");
      }//end if

      //Save first line of all 300 series headers in the
      // file named 300.txt
      if(line.substring(9,10).equals("3")){
        file300.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file300.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file300.writeBytes(line + "n");
        file300.writeBytes("n");
      }//end if

      //Save first line of all 400 series headers in the
      // file named 400.txt
      if(line.substring(9,10).equals("4")){
        file400.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file400.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file400.writeBytes(line + "n");
        file400.writeBytes("n");
      }//end if
      
      //Save first line of all 500 series headers in the
      // file named 500.txt
      if(line.substring(9,10).equals("5")){
        file500.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file500.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file500.writeBytes(line + "n");
        file500.writeBytes("n");
      }//end if

      socket.close();
    }//end try
    catch(Exception e){
      try{
        file600.writeBytes(cnt + " " + theName + "n");
        file600.writeBytes(server + theFile + "n");
        file600.writeBytes(e + "n");
        file600.writeBytes("n");
      }catch(IOException ex){
        ex.printStackTrace();
      }//end catch
    }//end catch
  }//end tryToConnect
  //-----------------------------------------------------//
}//end class Bookmarks10 definition

Listing 20

Copyright 2006, Richard G. Baldwin.  Reproduction in whole or in part in any
form or medium without express written permission from Richard Baldwin is
prohibited.

About the author

Richard Baldwin is a
college professor (at Austin Community College in Austin, TX) and private
consultant whose primary focus is a combination of Java, C#, and XML. In
addition to the many platform and/or language independent benefits of Java and
C# applications, he believes that a combination of Java, C#, and XML will become
the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects and he
frequently provides onsite training at the high-tech companies located in and
around Austin, Texas.  He is the author of Baldwin’s Programming
Tutorials, which have gained a
worldwide following among experienced and aspiring programmers. He has also
published articles in JavaPro magazine.

In addition to his programming expertise, Richard has many years of
practical experience in Digital Signal Processing (DSP).  His first job after he
earned his Bachelor’s degree was doing DSP in the Seismic Research Department of
Texas Instruments.  (TI is still a world leader in DSP.)  In the following
years, he applied his programming and DSP expertise to other interesting areas
including sonar and underwater acoustics.

Richard holds an MSEE degree from Southern Methodist University and has
many years of experience in the application of computer technology to real-world
problems.

[email protected]

Latest Posts

Related Stories