http://www.developer.com/

Back to article

Using Java to Clean Up Your Bookmark Library


March 21, 2006

Java Programming Notes # 2410


Preface

Many of us who have been using browsers on the web for many years have accumulated vast bookmark libraries containing many broken bookmarks.  In my own case, before I embarked on my bookmark cleanup campaign, I had accumulated more than 5,200 bookmarks, many of which had probably been broken for years.

In this lesson, I will show you how to write a program that will identify potentially broken bookmarks so that you can either delete them from your library or repair them.  The program works for Firefox and Netscape bookmark libraries as well as Internet Explorer Favorites libraries.

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java tutorials.  You will find those lessons published at Gamelan.com.  However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there.  You will find a consolidated index at www.DickBaldwin.com.

General Background Information

A Firefox bookmark library

Firefox and Netscape use the same technique for creating and maintaining a bookmark library.  In particular, by default, the bookmarks are stored in a file named bookmarks.html that you will find somewhere on your hard disk in an area that is dedicated to the browser.

(Internet Explorer, on the other hand, uses a completely different approach to creating and maintaining its library of Favorites.  This program is compatible with the approaches used by all three programs.)

Path to the Firefox bookmark file

For example, here is the path to the Firefox bookmark file on my computer running under Windows XP:

C:Documents and SettingsOwnerApplication Data
MozillaFirefoxProfilesathy94h2.default
bookmarks.html

Note that by default everything in and beyond the folder named Application Data is hidden.  You must select "Show hidden files and folders" under Folder Options in order to be able to see the bookmark file.  Depending on your operating system, your bookmark file may or may not be similarly located on your hard disk.

Also note that the folder named athy94h2.default appears to be a random folder name that is established when you install Firefox.

A browser view of a Firefox bookmark file

Figure 1 shows a cropped rectangular section of the Firefox browser window when it has been loaded with its own bookmark file named bookmarks.html.


Figure 1

The Firefox Bookmarks Manager view

Figure 2 shows a screen shot of the Firefox Bookmarks Manager screen, adjusted so as to view the same set of bookmarks shown in Figure 1.  You should be able to correlate the material in the upper portion of the left pane and all of the material in the right pane in Figure 2 with the material in Figure 1.  As you can see, the general structure of the browser view and the Bookmarks Manager view of the file named bookmarks.html are very similar.  As you will learn later, this is a very fortunate circumstance.


Figure 2

IE Favorites

As mentioned earlier, the approach that Microsoft uses to create and maintain the IE Favorites library is entirely different from the approach used by Firefox and Netscape.  The IE Favorites library is simply a directory tree structure rooted in a Windows folder at a location similar to the following:

C:Documents and SettingsOwnerFavorites

Each Favorite item (bookmark) is stored in a separate text file having an extension of url.

(The Microsoft properties dialog refers to these files as Internet Shortcut files.)

The name and the URL for the bookmark

The name of the bookmark is the name of the Internet Shortcut file.

The URL for the bookmark along with some other information is stored in the Internet Shortcut file.

Bookmark library structure

Folders in the IE Favorites library are created by creating ordinary Windows folders as children, grandchildren, etc., of the folder named Favorites.

The Windows Explorer view

I'm going to show you three views of the IE Favorites, which unfortunately bear little resemblance to one another.  Figure 3 shows a screen shot of an ordinary Windows Explorer window in which the files have been sorted according to the Name by clicking the sorting bar at the top.


Figure 3

With the exception of the file named aacmd.bat, each of the files in Figure 3 represents an item in the Favorites library (a bookmark).  There are, in addition, other bookmarks in the folders named Adobe Studio, HP Recommended Sites, Links, and Media.  The order of the files and the folders in the view shown in Figure 3 depends on which of the sorting bars at the top has most recently been clicked.

Connecting to a server via an Internet Shortcut file

Double-clicking one of the Internet Shortcut files shown in Figure 3 will cause the default browser to attempt to connect to the server whose URL is contained in the Internet Shortcut file.

The IE Favorites view

The view shown in Figure 4 is the view taken from inside the IE browser after having clicked the button with the large gold star near the top.


Figure 4

The order is controlled by the user

As you can see, the order of the items in Figure 4 doesn't match the order of the items in Figure 3.  In fact, the user can change the order of the items in Figure 4 by selecting an item and dragging it up or down to a new location.  The user can also change the order of the items by clicking the button labeled Organize and making use of tools that are found there (see Figure 12).  Note, however, that neither of these approaches to rearranging the items in this view has any effect on the order of the actual files in the folder.

This ability to rearrange the items is very useful from the viewpoint of making the Favorites library useful, but as you will see later makes it more difficult to clean up the library by deleting or repairing broken links.

The view with the most natural order

The view that shows the Favorites items in the most natural order is the view shown in Figure 5.  This view is the result of opening a command window and executing a DIR command in the Favorites folder.

In Figure 5, you can see the names of the individual files having an extension of url.  These are the Internet Shortcut files.  The names of these files match the names of the Favorites items that appear in the view shown in Figure 4.


Figure 5

Will use this order

The order of the Internet Shortcut files shown in Figure 5 matches the processing order of the program that I will explain later.  The program processes the Favorites directory listing recursively.  Thus in the case shown in Figure 5, the program begins by processing the following three files having the url extension in the order shown:

  • .NET Development.url
  • .NET Framework Home Page.url
  • ACC WebMail Login,baldwin,ACC Email Pwd...

Then the program makes a recursive call and process all of the files in the directory named Adobe Studio.

Once all the files in directory named Adobe Studio have been processed (along with the files in its sub-directories, if any), the program returns to the level shown in Figure 5 and processes the file named Antivirus daily download.url.  It will continue processing files in the order shown until it encounters the directory named HP Recommended Sites.  At that point, it makes a recursive call to process the files in that directory and its sub-directories.

IE Favorites can be difficult to locate

What you will see later is that even when you have identified a Favorites item with a broken link, it can be difficult to locate it in the IE Favorites view shown in Figure 4 in order to delete or repair the item.  As near as I have been able to determine, that view does not provide a mechanism by which you can search for a specified item (but perhaps I overlooked that capability).

Let me see the headers ...

Generally, this program operates by attempting to contact the server specified in the URL for each bookmark and asking that server to send back the response header lines for the resource specified by the URL.

(The program requests that the server send only the response header lines and not the entire resource in order to preserve bandwidth and improve speed.)

HTTP requests

According to Wikipedia, whenever an HTTP client contacts an HTTP server, it can send one of the requests shown in Figure 6.

HTTP request methods
  • GET By far the most common method used to request for a specified URL.
  • HEAD Identical to GET, except that the page content is not returned; just the headers are. Useful for retrieving meta-information.
  • POST Similar to GET, except that a message body, typically containing key-value pairs from an HTML form submission, is included in the request.
  • PUT Used for uploading files to a specified URI on a web-server.
  • DELETE Rarely implemented, deletes a resource (i.e. a file).
  • TRACE Echoes back the received request, so that a client can see what intermediate servers are adding or changing in the request.
  • OPTIONS Returns the HTTP methods that the server supports. This can be used to check the functionality of a web server.
  • CONNECT Rarely implemented, for use with a proxy that can change to being an SSL tunnel.

HTTP servers are supposed to implement at least GET and HEAD methods and, whenever possible, also OPTIONS method.

Figure 6

Response header lines

When this program contacts a server, it sends a HEAD request using the HTTP 1.1 protocol, requesting that only the response header lines be returned.

(You can view request and response headers for any URL at http://web-sniffer.net/.)

For example, the entry of HTTP://WWW.DICKBALDWIN.COM/ABC into the web sniffer page shown above produced the output shown in Figure 7.

HTTP/1.1 404 Not Found
Date: Sat, 17 Sep 2005 13:56:02 GMT
Server: Apache	
Content-Length: 320
Connection: close
Content-Type: text/html; charset=iso-8859-1
Figure 7

Ignore all but the status line

This program ignores all but the first response header line, taking the content of that line as an indication of the quality of the bookmark.

According to HTTP Made Really Easy, the initial response line, often called the status line, has three parts separated by spaces:

  • The HTTP version
  • A response status code that gives the result of the request
  • An English reason phrase describing the status code

Typical HTTP 1.1 status lines

Typical HTTP 1.1 status lines from different servers are shown in Figure 8.

HTTP/1.1 200 OK
HTTP/1.1 301 Moved Permanently
HTTP/1.1 302 Moved Temporarily
HTTP/1.1 302 Found
HTTP/1.1 302 Object moved
HTTP/1.1 400 Bad Request
HTTP/1.1 401 Authorization Required
HTTP/1.1 403 Access Forbidden
HTTP/1.1 403 Invalid method
HTTP/1.1 404 Not found
HTTP/1.1 404 Object Not Found
HTTP/1.1 405 Method Not Allowed
HTTP/1.1 405
HTTP/1.1 500 Server Error
HTTP/1.1 500 Internal Server Error
HTTP/1.1 501 Method Not Implemented
HTTP/1.1 501 Method Not Supported
Figure 8

As you can see in Figure 8, the reason phrase for the same response status code varies from one server to another.

The status code

Also according to HTTP Made Really Easy,

  • The status code is meant to be computer-readable; the reason phrase is meant to be human-readable, and may vary.
  • The status code is a three-digit integer, and the first digit identifies the general category of response:
    • 1xx indicates an informational message only
    • 2xx indicates success of some kind
    • 3xx redirects the client to another URL
    • 4xx indicates an error on the client's part
    • 5xx indicates an error on the server's part

Program output

This program processes a specified bookmark library (Firefox, Netscape, or IE) and produces seven separate reports that indicate the quality of each bookmark in the library.

(For cases where the bookmark library is large, the user is allowed to specify a subset of bookmarks to process based on the positional indices of the bookmarks in the library.)

Six of the seven reports contain the status line plus additional information about the bookmarks.  The reports are written into text files named 000.txt through 600.txt.

Why do we need seven different reports?

The file named 000.txt contains information about every bookmark in the subset of bookmarks being processed.

In addition, the bookmarks are partitioned into five categories based on the first character in the status code.  The files named 100.txt through 500.txt contain information about bookmarks where the first character in the status code matches the first character in the file name.

(For example, only those bookmarks that produced a response status code beginning with the character 4, indicating an error on the client's part, are contained in the file named 400.txt.  Furthermore, those bookmarks are not contained in any other report other than 000.txt, which contains all bookmarks.)

A report on exceptions

The file named 600.txt contains information about bookmarks for which the program was unable to successfully communicate with the specified server.  Figure 9 shows some typical examples in this category.


java.net.ConnectException: Connection timed out: connect
java.net.SocketException: Network is unreachable: connect
Figure 9

Most important results for cleanup effort

Referring back to the meaning of the different status codes, it is apparent that the contents of the files named 400.txt, 500.txt, and 600.txt are the most important with regard to the task of identifying and either deleting or repairing broken bookmarks.

Sample output

Figure 10 shows an example of the type of output that is provided for the bookmarks in the files from 000.txt through 500.txt.

57 Portfolio Rules 
www2.austin.cc.tx.us./ftfac/PPportfolio.htm
HTTP/1.1 301 Moved Permanently
Figure 10

The position, name, and URL

The number at the beginning shows the position of the bookmark in the library, beginning with an index value of 0.  This number is followed by the bookmark name on the same line.  Then, the URL for the bookmark follows the bookmark name on the same line separated by a space.

(Note that in Figure 10, it was necessary to move the URL from the first line to the second line to cause the material to fit in this narrow publication format.)

The status line

The last (second) line of output for each bookmark is the status line from the response header.  This is the information that is used to categorize the bookmarks and to place the information for each bookmark in the files with names ranging from 100.txt through 500.txt.

The exception output format

The format of the information in the file named 600.txt is somewhat different from the other six files.  Figure 11 shows a typical entry in this file.

1048 WebTk Download Page
redsonja.sunlabs.com/research/tcl/
java.net.UnknownHostException: redsonja.sunlabs.com
Figure 11

The number at the beginning shows the index of the bookmark in the bookmark library.  This is followed on the same line by the bookmark name.

The second line in Figure 11 shows the URL for the bookmark.

The bookmarks in this file are those for which an exception was thrown when the program attempted to connect to the server and to request a resource.  The third line shows the error message encapsulated in the exception object.  In the case of Figure 11, for example, the exception occurred when the program contacted the Domain Name Server in an attempt to resolve the IP address for the server named redsonja.sunlabs.com.

Using the information in the reports

Before getting into the program code, I want to make a few comments about how you can use the information contained in the various reports to clean up your bookmark library.

Not a silver bullet

To begin with, this program is not a silver bullet that resolves all of your bookmark library problems when you run it.  Rather, while this program is extremely useful in helping you to identify broken bookmarks, it is still up to you to either delete those bookmarks from your bookmark library, or to repair them.

As mentioned earlier, for the purpose of cleaning up your bookmark library, the information contained in the files named 400.txt, 500.txt, and 600.txt is probably the most important.  These are the files that contain information about bookmarks that are potentially broken.  In addition, the file named 000.txt contains information about all of the bookmarks in the order that they appear in the bookmark library.  This information is sometimes useful for reference purposes.

Firefox bookmark problems are the easiest to deal with

As I mentioned earlier, it is easier to deal with the problems in the Firefox and Netscape bookmark libraries than it is to deal with problems in the IE Favorites library.  Therefore, I will begin my discussion with Firefox.  Since Netscape uses the same approach to creating and maintaining the bookmark library, these comments apply also to Netscape.

I recommend that you begin your cleanup effort with the file named 600.txt.  After you deal with all the bookmarks for which you are unable to connect to the server, you can process the information in the file named 400.txt.  After that, you can finish up with the file named 500.txt.

However, if you prefer a different order, you can process the files in any order that suits your needs.

Three windows on your screen

Regardless of the order in which you process the files, my recommendation is that you open three windows on your screen.

The bookmarks in browser view

The first window that you should open provides a browser view of the bookmark file named bookmarks.html.  Locate this file on your disk and copy it into another folder.  Then open the copy in your favorite browser producing a screen output similar to that shown in Figure 1.

When your library contains hundreds and possibly thousands of bookmarks, it can be very difficult to locate an individual bookmark in the library.  This view of the bookmarks is very useful in helping you to locate a bookmark that has been identified by the program as potentially broken.  You can use the search feature of the browser to search and find a bookmark with a given name in this view.

The bookmarks are hyperlinks

Also note that the bookmarks are hyperlinks in this view.  All that is needed to manually test the quality of a bookmark is to click on the hyperlink with your mouse.  That will cause the browser to attempt to connect to the server and to download the requested resource.

The bookmarks in Bookmarks Manager view

The second window that you should open on your screen is the Firefox Bookmarks Manager, producing a screen output similar to that shown in Figure 2.  This is the view that you should use to either delete or to repair bookmarks.

(While it is possible to edit the Firefox bookmark file directly, that is a bad idea unless you are very skilled at editing HTML.  It is probably also a bad idea to modify that file while Firefox is running even if you are skilled at editing HTML.)

Deleting a bookmark

You can delete a bookmark in the Bookmarks Manager view by highlighting the bookmark in this view and clicking the large red X in the top of the window (not shown in the cropped image in Figure 2).

Repairing a bookmark

You can repair a bookmark in this view by right-clicking a bookmark and selecting Properties.  This will produce a dialog in which you can edit the URL, changing it from a broken URL to a good URL.

Locating a specific bookmark

Recall that the order of the bookmarks in the browser view of Figure 1 is the same as the order of the bookmarks in the Bookmarks Manager view of Figure 2.  Once you get used to the formats involved, there is a strong visual correlation between the formats of Figure 1 and Figure 2.  Thus, once you have used the search feature of the browser to locate a bookmark in the browser view of Figure 1, it is usually an easy task to manually locate that bookmark in the Bookmarks Manager view of Figure 2.

(The Bookmarks Manager also has very respectable search capability.  However, once you have searched for and found a bookmark in the search view of the bookmark manager, you have access to the bookmark (or more probably a copy of the bookmark) itself, but you can neither delete nor repair the bookmark in the search view.  Author's update:  While that was true in the version of Firefox being used by the author when this lesson was originally written, with version 1.5, a bookmark that has been located in the search view can be deleted or can be repaired by accessing its properties.  Further, the result of a search doesn't provide any information about the actual location of the bookmark in the library.  Therefore, if the number of bookmarks in the library is large, something like the browser view of Figure 1 is needed to locate bookmarks in the Bookmarks Manager view.)

The broken bookmarks window

Depending on which type of problem you are addressing, the third window that you should open is one of the text files produced by the program.  If you open the files named 400.txt or 500.txt in a text editor, you should see something similar to Figure 10.  If you open the file named 600.txt in a text editor, you should see something similar to Figure 11.

Potentially broken bookmarks

Regardless of which type of problem you are addressing, each text file contains information about potentially broken bookmarks.

The order of the bookmarks in the text file is the same as the order of the bookmarks in the browser view of Figure 1 and the order of the bookmarks in the Bookmarks Manager view of Figure 2.  Thus, you can easily start at the top of the broken-bookmarks list and work your way down, or start at the bottom and work your way up.

The basic approach

The basic approach is to copy a bookmark name from the broken-bookmarks window, paste it into the search field of the browser view, and search for the bookmark.  Then scroll the Bookmarks Manager view to the same bookmark and either delete or repair it.

Since it is possible to have two or more bookmarks with the same name in the library, once you locate the bookmark in the browser view, you should compare its URL with the URL shown in the broken-bookmarks window to confirm that you have located the correct bookmark.

(In Firefox, if you point to a bookmark in the browser view, the URL for that hyperlink appears at the bottom of the browser window.)

Dealing with exceptions

My recommendation for dealing with exceptions is that you manually test each bookmark for which the program threw an exception when the attempt was made to connect to the server.  (The server may simply have been down for maintenance when the program was run.)  All that is necessary to manually test the bookmark in browser view is to click on the hyperlink identifying that bookmark.

If the manual test with the browser view indicates that a problem still exists, scroll the Bookmarks Manager view to locate the same bookmark in that view.  Then either delete or repair the bookmark.

Dealing with 400-series errors

As shown in Figure 8, there are several different kinds of errors that you are likely to encounter in the 400-series. You will need to interpret the meaning of the different kinds of errors to determine what to do about them.

My experience is that the occurrence of a 404 error, (indicating that the requested resource could not be found), is usually pretty reliable.  After manually testing a number of errors of this type in the browser view, I concluded that unless the bookmark is one that I considered to be very important, it was not worth the effort to manually test them.  After that, whenever I encountered a 404 error, I simply scrolled the Bookmarks Manager view to that same bookmark and deleted it.

However, the true meaning of the other errors in the 400 series seems to be less definitive.  For those cases, I manually tested each bookmark before deleting it from the bookmark library.

Dealing with 500-series errors

For the most part, I found the exact meaning of the 500-series errors to be very unreliable.  For the most part, I manually tested all 500-series errors in the browser view before deleting them.

Dealing with IE Favorites

I wish that I could give you similarly helpful suggestions as to how to deal with IE Favorites that show up in the reports as being potentially broken.  Unfortunately, I don't have quite as much to offer in this regard.

(Although Microsoft doesn't refer to their Favorites as bookmarks, for simplicity of writing, I will often refer to them as bookmarks in this lesson.)

Finding a specific IE bookmark

As near as I can determine, the only way to find a specific bookmark in the IE bookmark view shown in Figure 4 is to search for it visually and manually.

(The IE bookmark view shown in Figure 4 is exposed by clicking on the button with the large gold star and the word Favorites at the top of an IE browser window.)

Apparently no search capability is available

If the IE bookmark view provides any way to automatically search for a specific bookmark, I have been unable to find it.

(I confess, however, that I rarely use IE and therefore may have overlooked a search capability.)

Deleting a problem bookmark

Therefore, after you run the program and identify bookmarks that are potentially broken in your IE bookmarks library, you may need to manually and visually search the bookmarks view to locate those bookmarks if you want to delete them.

(I will show you another possible but somewhat questionable way to delete IE bookmarks later in this lesson.)

The IE Favorites organizer view

You can delete bookmarks from the IE Favorites library shown in Figure 4 by clicking the Organize link shown at the top of Figure 4 in order to produce the organizer view shown in Figure 12.


Figure 12

You can select a bookmark in the organizer view and click the Delete button to delete it from the Favorites library.

Repairing an IE bookmark

Having located a bookmark in either the IE Favorites view shown in Figure 4 or the organizer view shown in Figure 12, you can right-click on that bookmark and select Properties to expose a dialog that you can use to repair the bookmark.

However, if you just want to repair an IE bookmark, it may be easier to use the standard Windows Search tool shown in Figure 13 to find the Internet Shortcut file that represents the bookmark of interest.


Figure 13

If you are an IE user, you are probably already aware that you activate this search tool by clicking the button with the picture of the magnifying glass and the word Search at the top of a standard Windows XP Explorer window.

Searching for Internet Shortcut files

To search for a specific Internet Shortcut file representing an IE bookmark, open an Explorer window on the Favorites folder, which will probably have a path similar to the following:

C:Documents and SettingsOwnerFavorites

Then open the search tool and enter the name of the file, (which is also the name of the bookmark), in the search dialog that appears in the left pane of Figure 13.  Click the Search button.  If the file exists in the Favorites folder or one of its sub-folders, a link to the file will appear in the right pane of Figure 13 when the search is complete.

Double-click to test the bookmark

At this point, you can double-click the link in the right pane to manually test the bookmark that the file represents if such a test is needed.  You can also right-click the link and select Properties to expose a dialog that will allow you to edit the URL in order to repair it.

Deleting the file to delete the bookmark

You could also delete the file showing in the right pane of Figure 13 to delete the bookmark.  However, I'm not absolutely certain that is a safe thing to do.  Because Windows has the ability to maintain the order of the bookmarks in the IE bookmarks view (Figure 4), according to the arrangement that you create by dragging the bookmarks up and down, the Internet Shortcut files don't exist in a vacuum.  There is some linkage (possibly an index file) between the existence of the Internet Shortcut files and IE.  It is possible that deleting those files outside of IE could cause a problem with IE's ability to manage the bookmarks represented by those files.

(However, I frequently drag shortcuts onto the Links toolbar and delete shortcuts from the Links toolbar with no apparent ill effects.  The Links toolbar is apparently just another view of the Links folder shown in Figure 4.  On the basis of that experience, I suspect that it is probably safe to delete an Internet Shortcut file in order to delete an IE bookmark.  However, you might want to be a little cautious in this regard.  For example, it might be a good idea to make certain that IE isn't running when you delete the files.)

Program Preview

This section provides a preview of the program named Bookmarks10.

Purpose

The purpose of this program is to help you to clean up your bookmark library by identifying potentially broken bookmarks.  The program is compatible with bookmark libraries for the following browsers:

  • Firefox
  • Netscape
  • Internet Explorer

Processes HTTP bookmarks only

This program does not attempt to connect to secure web sites using the HTTPS protocol.  Also, it does not support FTP and protocols other than HTTP.  If the bookmark library contains bookmarks that specify a protocol other than HTTP, those bookmarks are simply ignored.

Methodology

The program attempts to connect to the server using the HTTP 1.1 protocol and to retrieve the response headers from the server for each bookmark within a specified range of bookmarks in the bookmark library.

The program uses the first line in the response header to categorize the response into one of five categories as described at http://www.jmarshall.com/easy/http/.

According to the source given above, the initial response line, often called the status line, has three parts separated by spaces:

  • The HTTP version
  • A response status code that gives the result of the request
  • An English reason phrase describing the status code.

The HTTP version is in the format "HTTP/x.x".

The status code is meant to be computer-readable.

The reason phrase is meant to be human-readable, and may vary.

Format and meaning of the status code

The status code is a three-digit integer, and the first digit identifies the general category of response:

  • 1xx indicates an informational message only
  • 2xx indicates success of some kind
  • 3xx redirects the client to another URL
  • 4xx indicates an error on the client's part
  • 5xx indicates an error on the server's part

Some typical status lines follow:

  • HTTP/1.1 200 OK
  • HTTP/1.1 301 Moved Permanently
  • HTTP/1.1 302 Moved Temporarily
  • HTTP/1.1 302 Found
  • HTTP/1.1 302 Object moved
  • HTTP/1.1 400 Bad Request
  • HTTP/1.1 401 Authorization Required
  • HTTP/1.1 403 Access Forbidden
  • HTTP/1.1 403 Invalid method
  • HTTP/1.1 404 Not found
  • HTTP/1.1 404 Object Not Found
  • HTTP/1.1 405 Method Not Allowed
  • HTTP/1.1 405
  • HTTP/1.1 500 Server Error
  • HTTP/1.1 500 Internal Server Error
  • HTTP/1.1 501 Method Not Implemented
  • HTTP/1.1 501 Method Not Supported

Note that the reason phrase does vary from one web server to another.  Also note that I haven't seen any status lines that show a status code in the 1xx range.

Program output

The first header response line along with additional information about each bookmark within the specified range is stored in a set of output text files named 100.txt through 500.txt.  The user can examine the information provided in those text files to determine the quality of each bookmark.

For those bookmarks that appear to be broken on the basis of the web server response, the user can either delete the bookmark from the library, or attempt to repair it.

The program produces two more output files in addition to the five output files described above.  A file named 000.txt contains information about every bookmark within the range of specified bookmarks.  A file named 600.txt contains information about each bookmark for which the program threw an exception when trying to connect to the server.  Some sample exceptions follow:

  • java.net.UnknownHostException: www.BadBookmark.com
  • java.net.ConnectException: Connection timed out: connect
  • java.net.SocketException: Network is unreachable: connect

Program input

The following five values must be provided as command-line parameters.  All command-line parameters are provided as strings, but must be convertible to the types shown below.

  • String bkMrkPath:  Path to the folder containing a Firefox bookmark file or containing a multitude of IE url files.
  • String bkMrkFile:  Name of the Firefox bookmark file.  Use a dummy name for this parameter when processing IE favorites.
  • int lowBkMrkLimit:  Index of first bookmark to process.  Indices begin with 0 for the first bookmark.
  • int numToProc:  Number of bookmarks to process.
  • String browser:  Type of browser:  F for Firefox, N for Netscape, or I for Internet Explorer.

Figure 14 shows the contents of a typical batch file used to process 200 bookmarks beginning with bookmark index 100 in an IE Favorites library.

java Bookmarks10 
"C:/Documents and Settings/Owner/Favorites/" 
DummyFileName 
100 
200 
I
Figure 14

Note that it was necessary to display each of the command-line parameters on a different line in Figure 14 to force this material to fit in this narrow publication format.

Program testing

This program was tested using J2SE 5.0 under WinXP.  J2SE 5.0 or later is required due to the use of generics.

Discussion and Sample Code

The program named Bookmarks10

I will explain this program in fragments.  You can view a complete listing of the program in Listing 20 near the end of the lesson.

The class definition begins in Listing 1.  The code in Listing 1 simply declares several variables used to produce the output files.

class Bookmarks10{
  //Output text file streams
  DataOutputStream file000;
  DataOutputStream file100;
  DataOutputStream file200;
  DataOutputStream file300;
  DataOutputStream file400;
  DataOutputStream file500;
  DataOutputStream file600;

Listing 1

The main method

The main method begins in Listing 2.

  public static void main(String[] args){
    //Confirm correct number of command-line parameters.
    // If the number is not correct, display a usage msg
    // and terminate the program.
    if(args.length != 5){
      System.out.println("Command-line parameter error");
      System.out.println();
      System.out.println("Usage: java Bookmarks10");
      System.out.println("followed by:");
      System.out.println("Bookmark path");
      System.out.println("Bookmark file");
      System.out.println("Low bookmark limit");
      System.out.println("Number bookmarks to process");
      System.out.println("Browser, F, N, or I");
      
      System.out.println();
      System.out.println("Terminating Program");
      System.exit(0);      
    }//end if
    
    //The following values are provided as command-line
    // parameters.

    //Path to the folder containing a Firefox bookmark
    // file or containing a multitude of IE .url files.
    String bkMrkPath = args[0];
    //Name of the Firefox bookmark file.  Just use a 
    // dummy name for this parameter when processing IE
    // favorites
    String bkMrkFile = args[1];
    //Index of first bookmark to process.
    int lowBkMrkLimit = Integer.parseInt(args[2]);
    //Number of bookmarks to process.
    int numToProc = Integer.parseInt(args[3]);
    //Type of browser: F for Firefox, N for Navigator,
    // or I for Internet Explorer.
    String browser = args[4];
    //End of command-line parameters

Listing 2

The code in Listing 2 simply deals with the required command-line parameters and shouldn't require further explanation.

Instantiate an object of this class

The code in Listing 3 instantiates an object of this class and stores its reference in a reference variable named thisObj.

    Bookmarks10 thisObj = new Bookmarks10();

Listing 3

The reference variable named thisObj will be used later to invoke instance methods belonging to the object.

Get name and URL for each bookmark

The code in Listing 4 gets the name and the URL for each of the bookmarks and encapsulates them in an object of type Bookmark.  All of the Bookmark objects are encapsulated in an object of type ArrayList.

    //The following collection encapsulates all of the
    // bookmarks awaiting final processing.  The
    // getIEBookmarks method requires that a method
    // parameter points to the ArrayList object on input
    // because of its recursive nature.  The
    // getFireFoxBookmarks method is not recursive and it
    // overwrites this object with a new ArrayList object
    // that it creates.
    ArrayList <Bookmark> theBookmarks = 
                                new ArrayList <Bookmark>();
    if(browser.toUpperCase().equals("F")){
      //Process Firefox bookmarks.
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("N")){
      //Process Netscape Navigator bookmarks.  Same format
      // as Firefox
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("I")){
      //Process Inernet Explorer favorites.
      theBookmarks = thisObj.getIEBookmarks(
                                   bkMrkPath,theBookmarks);
    }else{
      System.out.println("Don't recognize browser");
      System.out.println("Terminating program");
      System.exit(0);
    }//end else

Listing 4

Code was explained in an earlier lesson

The code in Listing 4, along with the methods named getFireFoxBookmarks and getIEBookmarks is very similar to code that I explained in the earlier lesson entitled Creating a Portable Bookmark Library using Java, Part 2.  Therefore, I won't explain that code again here.  Rather, I will simply refer you to that earlier lesson.  You can view those methods in Listing 20 near the end of the lesson.

Once the code in Listing 4 has executed, all of the required bookmark information has been encapsulated in an ArrayList object referred to by theBookmarks.

Process the bookmarks

Continuing with the main method, the code in Listing 5 invokes the method named processBkMrks to process all of the bookmarks that have been encapsulated in the ArrayList object.

    thisObj.processBkMrks(lowBkMrkLimit,numToProc,
                                             theBookmarks);
  }// end main

Listing 5

Listing 5 also signals the end of the main method.

The processBkMrks method

The method named processBkMrks begins in Listing 6.  This method processes bookmarks previously stored in an ArrayList object referred to by theBookmarks.

  void processBkMrks(int lowBkMrkLimit,
                     int numToProc,
                     ArrayList <Bookmark> theBookmarks){
    int eligibleCounter = 0;
    String theName = null;
    String theUrl = null;

Listing 6

This method receives a reference to the ArrayList object containing bookmark information along with information identifying the bookmarks to process.  The parameter named lowBkMrkLimit specifies the index of the first bookmark to process.  The parameter named numToProc specifies the number of bookmarks to process.

Listing 6 declares and initializes some local working variables.

Create the output files

The code in Listing 7 creates the seven output files and places one line of explanatory text in each file.

    try{
      file000 = new DataOutputStream(
                          new FileOutputStream("000.txt"));
      file000.writeBytes(
                     "This file contains all headersnn");
      
      file100 = new DataOutputStream(
                          new FileOutputStream("100.txt"));
      file100.writeBytes(
          "This file contains all 100-series headersnn");
      
      file200 = new DataOutputStream(
                          new FileOutputStream("200.txt"));
      file200.writeBytes(
          "This file contains all 200-series headersnn");
      
      file300 = new DataOutputStream(
                          new FileOutputStream("300.txt"));
      file300.writeBytes(
          "This file contains all 300-series headersnn");
      
      file400 = new DataOutputStream(
                          new FileOutputStream("400.txt"));
      file400.writeBytes(
          "This file contains all 400-series headersnn");
      
      file500 = new DataOutputStream(
                          new FileOutputStream("500.txt"));
      file500.writeBytes(
          "This file contains all 500-series headersnn");
      
      file600 = new DataOutputStream(
                          new FileOutputStream("600.txt"));
      file600.writeBytes(
                "This file contains exception outputnn");
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch

Listing 7

The code in Listing 7 is straightforward and shouldn't require further explanation.

Iterate on the ArrayList object

Listing 8 shows the beginning of a for loop that is used to iterate on the ArrayList object and to examine each bookmark encapsulated in the object.

    for(int msgCntr = 0;msgCntr < theBookmarks.size();
                                                msgCntr++){
      theName = theBookmarks.get(msgCntr).bkMrkName;
      theUrl = theBookmarks.get(msgCntr).bkMrkUrl;

Listing 8

The code in Listing 8 extracts and saves the name and the URL for each bookmark that it examines.

Determine eligibility

Listing 9 shows the beginning of an if statement that determines the eligibility of the current bookmark for processing based on the specified range of bookmark indices and the protocol.

      if((msgCntr >= lowBkMrkLimit) && 
                    (msgCntr < lowBkMrkLimit + numToProc)){
        //Strip off the protocol for the HTTP protocol only
        if(theUrl.substring(0,7).toUpperCase().
                                        equals("HTTP://")){
          theUrl = theUrl.substring(7);
          //This bookmark is eligible for processing.
          eligibleCounter++;
          //Display progress on standard output
          System.out.println("n" + msgCntr + " " 
                                 + theName + " " + theUrl);
                                 
          //Try to connect to the server to retrieve the
          // response headers.
          tryToConnect(msgCntr,theName,theUrl);

Listing 9

In order to be eligible for processing, the bookmark must specify the HTTP protocol and the index of the bookmark must fall within the range specified by the user.

If the bookmark is determined to be eligible, the URL for the bookmark, along with some other information is passed to a method named tryToConnect.  This method, which I will explain later, contains the code that attempts to connect to the server specified by the URL and to retrieve the response header for the specified resource.

If protocol is not HTTP

Continuing for the moment with the method named processBkMrks, the code in the else clause in Listing 10 deals with those bookmarks for which the index is in the specified range, but for which the protocol is not HTTP.

        }else{
          //This protocol can't be handled by this program.
          // Document that fact in the file named 000.txt.
          try{
            file000.writeBytes(msgCntr + " " + 
                          "Can't handle this protocol.n");
            file000.writeBytes(
                         theName + "   " + theUrl +"nn");
          }catch(IOException e){
            try{
              file600.writeBytes(e + "nn");
            }catch(Exception ex){
              ex.printStackTrace();
            }//end catch
            e.printStackTrace();
            System.exit(0);
          }//end catch
        }//end else regarding protocol
      }//end if regarding the bookmark indices
    }//end for loop iterating on the ArrayList object

Listing 10

The code in the else clause in Listing 10 writes a notification into the file named 000.txt to the effect that the protocol is not eligible for processing.

Listing 10 also contains some cleanup code including a catch block and several end points including the end point for the for loop that began in Listing 8 and is used to iterate on the bookmarks encapsulated in the ArrayList object.

Store summary information

Listing 11 stores summary information about the run at the end of the file named 000.txt and closes all output text files.

    try{
      
      file000.writeBytes("Number eligible bookmarks = " 
                                 + eligibleCounter + "n");
      file000.writeBytes("Bookmark range = " 
            + lowBkMrkLimit
            + " to " + (lowBkMrkLimit + numToProc) + "n");
      file000.writeBytes("Total number bookmarks = " 
                             + theBookmarks.size() + "n");
      file000.close();
      file100.close();
      file200.close();
      file300.close();
      file400.close();
      file500.close();
      file600.close();
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
  }//end processBkMrks

Listing 11

Listing 11 also signals the end of the method named processBkMrks.

Sample summary information

Figure 15 shows a sample of the summary information that resulted from running the program on my Firefox bookmark library.

Number eligible bookmarks = 1203
Bookmark range = 2869 to 7919
Total number bookmarks = 4088
Figure 15

As you can see, there were a little over 1200 available bookmarks between the specified beginning index and the end of the library at an index of 4087.  Of this total, 1203 were deemed to be eligible for processing.  Presumably the remaining bookmarks specified the wrong protocol.

The method named tryToConnect

Listing 12 shows the beginning of the method named tryToConnect, which is invoked on all eligible bookmarks in Listing 9.

The purpose of this method is to try to connect to the server specified by a given URL and to download the response header lines for the specified resource.

  void tryToConnect(int cnt, String theName,String URL){
    String server = "";
    String theFile = "";

    //Handle cases with a file specified or with no file
    // specified but a trailing slash on the URL.
    if(URL.indexOf("/") != -1){
      server = URL.substring(0,URL.indexOf("/"));
      theFile = URL.substring(URL.indexOf("/"));
    }else
      //Handle the case of no slash and no file specified.
      if(URL.indexOf("/") == -1){
        server = URL;
        theFile = "/";
    }//end if

Listing 12

After declaring and initializing a couple of local working variables, the code in Listing 12 gets values for the server and the resource that is requested by the bookmark.

Different URL formats

The code in Listing 12 deals with the fact that URLs can come in different formats.  For example, some URLs specify a resource and some do not.  In the latter case, the expectation is that the server will deliver a default resource, such as a file named index.html.  In this case, the resource needs to be specified as a single forward-slash character when the HEAD request is sent to the server.

Get a Socket connection to the server

The code in Listing 13 tries to get a socket connection to the server on port 80, the standard HTTP port.

    int port = 80; //http port
    try{
      Socket socket = new Socket(server,port);//get socket

      //Get input and output streams from the socket      
      BufferedReader inputStream = 
                  new BufferedReader(new InputStreamReader(
                                 socket.getInputStream()));
      PrintWriter outputStream = 
                    new PrintWriter(new OutputStreamWriter(
                           socket.getOutputStream()),true);

Listing 13

If the connection is achieved, Listing 13 gets input and output streams on the socket by which the program can send a request to the server and read the response provided by the server.

If the attempt to get the socket connection fails, the code in a catch block shown later in Listing 19 will be executed to cause that failure to be noted in the output file named 600.txt.

Request the response headers

The code in Listing 14 sends a HEAD request to the server asking it to send back the response header lines pertaining to the resource specified by theFile using the HTTP 1.1 protocol.

      outputStream.println(
                          "HEAD " + theFile + " HTTP/1.1");
      outputStream.println("Host: " + server);
      //May need to modify the following for non-Windows
      // systems, (see Wikipedia reference) to cause hard
      // line breaks consisting of both a carriage return
      // and a line feed to be sent to the server.
      outputStream.println();
      outputStream.println();

Listing 14

You can read more about the format requirements of the HTTP 1.1 protocol at Wikipedia.

(Note the comment in Listing 14 regarding hard line breaks and non-windows systems.)

Read and save the first response line header

The code in Listing 15 reads and saves the first line sent back by the server in the response header for the resource.  For the purposes of this program, we don't care about the other lines in the response header, so we don't read them.

      String line = inputStream.readLine();

Listing 15

Save the first line for all bookmarks

The code in Listing 16 saves the first header response line in the file named 000.txt, along with the index value for the bookmark.  This information can be useful later for reference purposes.

      file000.writeBytes(cnt + " " + theName + " " + URL 
                                                   + "n");

Listing 16

Distribute the fist line among different output files

The code in Listing 17 distributes another copy of the first response header line among five different output files based on the first character of the status code.  For example, all lines for which the status code begins with 2 go into the file named 200.txt, and all lines for which the status code begins with 4 go into the file named 400.txt.

      if(line.startsWith("HTTP/1.0")){
        file000.writeBytes(
                    "HTTP/1.0 results are not reliablen");
      }//end if
      file000.writeBytes(line + "n");
      file000.writeBytes("n");

      //Save first line of all 100 series headers in the
      // file named 100.txt
      if(line.substring(9,10).equals("1")){
        file100.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file100.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file100.writeBytes(line + "n");
        file100.writeBytes("n");
      }//end if

      //Save first line of all 200 series headers in the
      // file named 200.txt
      if(line.substring(9,10).equals("2")){
        file200.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file200.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file200.writeBytes(line + "n");
        file200.writeBytes("n");
      }//end if

      //Save first line of all 300 series headers in the
      // file named 300.txt
      if(line.substring(9,10).equals("3")){
        file300.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file300.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file300.writeBytes(line + "n");
        file300.writeBytes("n");
      }//end if

      //Save first line of all 400 series headers in the
      // file named 400.txt
      if(line.substring(9,10).equals("4")){
        file400.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file400.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file400.writeBytes(line + "n");
        file400.writeBytes("n");
      }//end if
      
      //Save first line of all 500 series headers in the
      // file named 500.txt
      if(line.substring(9,10).equals("5")){
        file500.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file500.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file500.writeBytes(line + "n");
        file500.writeBytes("n");
      }//end if

Listing 17

Close the connection

The code in Listing 18 closes the Socket connection.

      socket.close();
    }//end try

Listing 18

Listing 18 also signals the end of the try block that began in Listing 13.

Unable to connect

Listing 19 shows the catch block that is associated with the try block that began in Listing 13

    catch(Exception e){
      try{
        file600.writeBytes(cnt + " " + theName + "n");
        file600.writeBytes(server + theFile + "n");
        file600.writeBytes(e + "n");
        file600.writeBytes("n");
      }catch(IOException ex){
        ex.printStackTrace();
      }//end catch
    }//end catch
  }//end tryToConnect

}//end class Bookmarks10 definition

Listing 19

The code in Listing 19 is executed if the program is unable to make the connection with the server specified by the bookmark.  In this event, information regarding the problem is recorded in the output file named 600.txtFigure 11 shows an example of such output

Listing 19 also signals the end of the method named tryToConnect and the end of the class named Bookmarks10.

Run the Program

I encourage you to copy the code from Listing 20 into your text editor, compile it, and execute it.  Experiment with it, making changes, and observing the results of your changes.

If you feel really ambitious, you might want to expand the code causing the program to automatically delete broken bookmarks from the bookmark library.

Summary

In this lesson, I showed you how to write a Java program that will help you to identify broken bookmarks in your bookmark library so that you can either delete them or repair them.

I began by explaining the differences between IE bookmark libraries and Firefox/Netscape bookmark libraries.

I explained that this program identifies potentially broken bookmarks in all three kinds of bookmark libraries: IE, Firefox, and Netscape.

I explained how you can use the output produced by this program to first find and then to delete or repair broken bookmarks in your bookmark library.

I explained how the HTTP 1.1 protocol can be used to connect to a server and request the response headers associated with a specified resource.

I explained how you can use the information contained in the first response header line to assess the quality of a bookmark.

I described and provided examples of each of the seven output text files produced by this program.

Finally, I explained in detail how this program accomplishes its purpose and gave usage examples for the program.

Complete Program Listing

A complete listing of the program discussed in this lesson is shown in Listing 20 below.
 
/* File Bookmarks10.java 
Copyright 2005, R.G.Baldwin
Revised 09/15/05

The purpose of this program is to help you to clean up your
bookmark library.  It is compatible with bookmark libraries
for the following browsers:

Firefox
Netscape
Internet Explorer

For each bookmark within a specified range of bookmarks 
within the bookmark library, the program attempts to use 
the bookmark to connect to the web server using the 
HTTP 1.1 protocol and to retrieve the response headers from
the web server. It uses the first line in the response 
header to categorize the response into one of five
categories as described at 
http://www.jmarshall.com/easy/http/.

This program does not attempt to connect to secure web 
sites using the HTTPS protocol.  Also, it does not support 
FTP and other protocols.  If the bookmark library contains 
bookmarks that specify a protocol other than simple HTTP, 
those bookmarks are simply ignored.

According to the source given above, the initial response 
line, called the status line, has three parts separated by 
spaces:

1. The HTTP version
2. A response status code that gives the result of the 
   request
3. An English reason phrase describing the status code.

The HTTP version is in the format "HTTP/x.x".

The status code is meant to be computer-readable. The 
reason phrase is meant to be human-readable, and may vary.

The status code is a three-digit integer, and the first 
digit identifies the general category of response:

1xx indicates an informational message only
2xx indicates success of some kind
3xx redirects the client to another URL
4xx indicates an error on the client's part
5xx indicates an error on the server's part

The header response line along with additional information 
about each bookmark within the specified range is stored in
a set of output text files named 100.txt through 500.txt.
The user can examine the information provided in those text
files to determine the quality of the bookmark. For those 
bookmarks that are determined to have problems on the basis
of the web server response, the user can either delete the 
bookmark from the library, or attempt to repair it.

Some typical status lines follow:

HTTP/1.1 200 OK
HTTP/1.1 301 Moved Permanently
HTTP/1.1 302 Moved Temporarily
HTTP/1.1 302 Found
HTTP/1.1 302 Object moved
HTTP/1.1 400 Bad Request
HTTP/1.1 401 Authorization Required
HTTP/1.1 403 Access Forbidden
HTTP/1.1 403 Invalid method
HTTP/1.1 404 Not found
HTTP/1.1 404 Object Not Found
HTTP/1.1 405 Method Not Allowed
HTTP/1.1 405
HTTP/1.1 500 Server Error
HTTP/1.1 500 Internal Server Error
HTTP/1.1 501 Method Not Implemented
HTTP/1.1 501 Method Not Supported

Note that the reason phrase does vary from one web server
to another.  Also note that I haven't seen any status 
lines that show a status code in the 1xx range.

The status codes that are probably the most important in 
terms of cleaning up the bookmark library are those in the 
4xx and 5xx range.

In addition to the five output files described above, the 
program also produces two additional output files. A file 
named 000.txt contains information about every bookmark 
within the range of specified bookmarks.

A file named 600.txt contains information about each 
bookmark for which the program threw an exception when
trying to connect to the web site, such as the following:

java.net.UnknownHostException: www.BadBookmark.com
java.net.ConnectException: Connection timed out: connect
java.net.SocketException: Network is unreachable: connect


The following five values must be provided as command-line 
parameters.  All command-line parameters are provided as
strings, but must be convertible to the types shown below.

String bkMrkPath: Path to the folder containing a Firefox
 bookmark file or containing a multitude of IE .url files.
String bkMrkFile: Name of the Firefox bookmark file.  Just
 use a dummy name for this parameter when processing IE
 favorites
int lowBkMrkLimit: Index of first bookmark to process.
int numToProc: Number of bookmarks to process.
String browser: Type of browser: F for Firefox, N for
 Navigator, or I for Internet Explorer.

Tested using J2SE 5.0 under WinXP.  J2SE 5.0 or later is
required due to the use of generics.
**********************************************************/
import java.net.*;
import java.io.*;
import java.util.*;

class Bookmarks10{
  //Output text file streams
  DataOutputStream file000;
  DataOutputStream file100;
  DataOutputStream file200;
  DataOutputStream file300;
  DataOutputStream file400;
  DataOutputStream file500;
  DataOutputStream file600;

  public static void main(String[] args){
    //Confirm correct number of command-line parameters.
    // If the number is not correct, display a usage msg
    // and terminate the program.
    if(args.length != 5){
      System.out.println("Command-line parameter error");
      System.out.println();
      System.out.println("Usage: java Bookmarks10");
      System.out.println("followed by:");
      System.out.println("Bookmark path");
      System.out.println("Bookmark file");
      System.out.println("Low bookmark limit");
      System.out.println("Number bookmarks to process");
      System.out.println("Browser, F, N, or I");
      
      System.out.println();
      System.out.println("Terminating Program");
      System.exit(0);      
    }//end if
    
    //The following values are provided as command-line
    // parameters.

    //Path to the folder containing a Firefox bookmark
    // file or containing a multitude of IE .url files.
    String bkMrkPath = args[0];
    //Name of the Firefox bookmark file.  Just use a 
    // dummy name for this parameter when processing IE
    // favorites
    String bkMrkFile = args[1];
    //Index of first bookmark to process.
    int lowBkMrkLimit = Integer.parseInt(args[2]);
    //Number of bookmarks to process.
    int numToProc = Integer.parseInt(args[3]);
    //Type of browser: F for Firefox, N for Navigator,
    // or I for Internet Explorer.
    String browser = args[4];
    //End of command-line parameters
    
    //Instantiate a new object of this class.    
    Bookmarks10 thisObj = new Bookmarks10();

    //Get the name and the URL for each of the bookmarks.
    // Encapsulate them in an object of type Bookmark.
    // Encapsulate all of the Bookmark objects in an object
    // of type ArrayList.
    
    //The following collection encapsulates all of the
    // bookmarks awaiting final processing.  The
    // getIEBookmarks method requires that a method
    // parameter points to the ArrayList object on input
    // because of its recursive nature.  The
    // getFireFoxBookmarks method is not recursive and it
    // overwrites this object with a new ArrayList object
    // that it creates.
    ArrayList <Bookmark> theBookmarks = 
                                new ArrayList <Bookmark>();
    if(browser.toUpperCase().equals("F")){
      //Process Firefox bookmarks.
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("N")){
      //Process Netscape Navigator bookmarks.  Same format
      // as Firefox
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("I")){
      //Process Inernet Explorer favorites.
      theBookmarks = thisObj.getIEBookmarks(
                                   bkMrkPath,theBookmarks);
    }else{
      System.out.println("Don't recognize browser");
      System.out.println("Terminating program");
      System.exit(0);
    }//end else

    //Process the bookmarks.
    thisObj.processBkMrks(lowBkMrkLimit,numToProc,
                                             theBookmarks);
  }// end main
  //-----------------------------------------------------//
  
  //This method processes bookmarks previously stored in an
  // ArrayList object.
  void processBkMrks(int lowBkMrkLimit,
                     int numToProc,
                     ArrayList <Bookmark> theBookmarks){
    int eligibleCounter = 0;
    String theName = null;
    String theUrl = null;
    
    //Create the output files.
    try{
      file000 = new DataOutputStream(
                          new FileOutputStream("000.txt"));
      file000.writeBytes(
                     "This file contains all headersnn");
      
      file100 = new DataOutputStream(
                          new FileOutputStream("100.txt"));
      file100.writeBytes(
          "This file contains all 100-series headersnn");
      
      file200 = new DataOutputStream(
                          new FileOutputStream("200.txt"));
      file200.writeBytes(
          "This file contains all 200-series headersnn");
      
      file300 = new DataOutputStream(
                          new FileOutputStream("300.txt"));
      file300.writeBytes(
          "This file contains all 300-series headersnn");
      
      file400 = new DataOutputStream(
                          new FileOutputStream("400.txt"));
      file400.writeBytes(
          "This file contains all 400-series headersnn");
      
      file500 = new DataOutputStream(
                          new FileOutputStream("500.txt"));
      file500.writeBytes(
          "This file contains all 500-series headersnn");
      
      file600 = new DataOutputStream(
                          new FileOutputStream("600.txt"));
      file600.writeBytes(
                "This file contains exception outputnn");
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
    
    //Iterate on the bookmarks in the ArrayList collection.
    for(int msgCntr = 0;msgCntr < theBookmarks.size();
                                                msgCntr++){
      theName = theBookmarks.get(msgCntr).bkMrkName;
      theUrl = theBookmarks.get(msgCntr).bkMrkUrl;

      //Determine eligibility based on the specified 
      // range of bookmark indices and the protocol.
      if((msgCntr >= lowBkMrkLimit) && 
                    (msgCntr < lowBkMrkLimit + numToProc)){
        //Strip off the protocol for the HTTP protocol only
        if(theUrl.substring(0,7).toUpperCase().
                                        equals("HTTP://")){
          theUrl = theUrl.substring(7);
          //This bookmark is eligible for processing.
          eligibleCounter++;
          //Display progress on standard output
          System.out.println("n" + msgCntr + " " 
                                 + theName + " " + theUrl);
                                 
          //Try to connect to the server to retrieve the
          // response headers.
          tryToConnect(msgCntr,theName,theUrl);
        }else{
          //This protocol can't be handled by this program.
          // Document that fact in the file named 000.txt.
          try{
            file000.writeBytes(msgCntr + " " + 
                          "Can't handle this protocol.n");
            file000.writeBytes(
                         theName + "   " + theUrl +"nn");
          }catch(IOException e){
            try{
              file600.writeBytes(e + "nn");
            }catch(Exception ex){
              ex.printStackTrace();
            }//end catch
            e.printStackTrace();
            System.exit(0);
          }//end catch
        }//end else regarding protocol
      }//end if regarding the bookmark indices
    }//end for loop iterating on the ArrayList object
    
    //Store summary information about the run in the file
    // named 000.txt and close all output text files.
    try{
      
      file000.writeBytes("Number eligible bookmarks = " 
                                 + eligibleCounter + "n");
      file000.writeBytes("Bookmark range = " 
            + lowBkMrkLimit
            + " to " + (lowBkMrkLimit + numToProc) + "n");
      file000.writeBytes("Total number bookmarks = " 
                             + theBookmarks.size() + "n");
      file000.close();
      file100.close();
      file200.close();
      file300.close();
      file400.close();
      file500.close();
      file600.close();
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
  }//end processBkMrks
  //-----------------------------------------------------//
  
  //The purpose of this method is to extract all of the
  // bookmarks and to encapsulate them in an ArrayList
  // object.  Each element in the ArrayList object is an
  // object of the inner class named Bookmark.
  //This version of the method is designed to extract
  // bookmarks from Firefox and Netscape bookmark files.
  ArrayList <Bookmark> getFireFoxBookmarks(
                        String bkMrkPath,String bkMrkFile){
    int urlIndex = 0;
    int startIndex = 0;
    int endIndex = 0;
    ArrayList <Bookmark> theBookmarks = 
                                new ArrayList <Bookmark>();
    try{
      BufferedReader bufRdr = new BufferedReader(
                 new InputStreamReader(new FileInputStream(
                                  bkMrkPath + bkMrkFile)));
      //Read each line of text from the copy of the
      // bookmark file.  If the line contains a URL,
      // extract the URL and the name of the bookmark.
      String theName = null;
      String theUrl = null;
      String data = null;
      while((data = bufRdr.readLine()) != null){
        urlIndex = data.indexOf("A HREF="");       
        //urlIndex will be -1 if line doesn't contain
        // a URL indicated by A HREF...  In that case, just
        // ignore the line of text.
        if(urlIndex != -1){
          //Find the index of the quotation marks at the
          // beginning and the end of the URL.
          startIndex = urlIndex+8;//Index of first quote+1
          //Index of quotation mark at the end of the URL.
          endIndex = data.indexOf(""",startIndex);
          //Extract and save the URL
          theUrl = data.substring(startIndex,endIndex);
          
          //Get and save the content of the element
          // named A.
          // Get the index of the beginning of the content.
          startIndex = data.indexOf(">",urlIndex) +1;
          //Get the index of the end of the content.
          endIndex = data.indexOf("</A>",startIndex);
          //Get and save the content
          if(endIndex > startIndex){
            //The A element is not empty.
            theName = data.substring(startIndex,endIndex);
          }else{
            //The A element is empty
            theName = "No bookmark name found.";
          }//end else

          theBookmarks.add(new Bookmark(theName,theUrl));
        }//end if
      }//end while
      bufRdr.close();
    }catch(Exception e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
    
    return theBookmarks;
  }//end getFireFoxBookmarks
  //-----------------------------------------------------//
  
  //This method uses recursion to traverse the directory
  // tree containing IE Favorites.  Each bookmark is 
  // represented by a file with an extension of .url. The
  // name of the file is the name of the bookmarl.  The
  // URL for the bookmark is contained as a line of text
  // in the file.
  ArrayList <Bookmark> getIEBookmarks(
       String bkMrkPath,ArrayList <Bookmark> theBookmarks){
   
    String theName = null;
    String theUrl = null;
    String fileName = null;
    String pathAndFile = null;
    
    //Get a File object that represents the directory.
    File fileObj = new File(bkMrkPath);
    //Make certain that the directory exists.
    if(fileObj.exists()){
      //Confirm that the File object represents a directory
      // and not a file.
      if(fileObj.isDirectory()){
        //Get a list of the directory contents in an array
        // object.
        File[] dirContents = fileObj.listFiles();
        
        //Sort the directory contents according to the
        // natural order according toname.  You may want to
        // disable this sort and leave the data in the
        // recursion order.  It all depends on how you plan
        // to locate the Favorites in the IE Favorites
        // display.
        //Arrays.sort(dirContents);
        //Process the contents of the directory that were
        // saved in the list of contents.
        for(int cnt = 0;cnt < dirContents.length;cnt++){
          if(dirContents[cnt].isDirectory()){
            //Make a recursive call to process this
            // directory before processing the remaining
            // contents in the list of contents.
            theBookmarks = getIEBookmarks(
                  dirContents[cnt].getPath(),theBookmarks);
          }else if(dirContents[cnt].isFile()){
            pathAndFile = dirContents[cnt].getPath();
            fileName = dirContents[cnt].getName();

            //All file names that represent bookmarks
            // should end with .url.
            if(fileName.toUpperCase().endsWith(".URL")){
              theName = fileName.substring(
                 0,fileName.toUpperCase().indexOf(".URL"));
              theUrl = getTheUrl(pathAndFile);
              theBookmarks.add(
                             new Bookmark(theName,theUrl));
            }//end if
          }//end else
        }//end for loop
      }else{
        System.out.println(
                  bkMrkPath + ": not a directory.");
      }//end else
    }else{
      System.out.println("Directory " + bkMrkPath
                                     + " does not exist.");
    }//end else
    return theBookmarks;
  }//end getIEBookmarks
  //-----------------------------------------------------//
  
  //This is a helper method called by getIEBookmarks.  The
  // purpose of this method is to extract the URL from a
  // Microsoft .url file.
  String getTheUrl(String pathAndFile){
    try{
      BufferedReader inData = new BufferedReader(
                              new FileReader(pathAndFile));
      String data; //temp holding area

      while((data = inData.readLine()) != null){
        if(data.startsWith("URL=")){
          String theUrl = data.substring(4);
          inData.close();//Close input file
          return theUrl;
        }//end if
      }//end while loop
      inData.close();//Close input file
    }catch(Exception e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
    }//end catch
    System.out.println("No URL Found");
    return "No URL Found";
  }//end getTheUrl
  //-----------------------------------------------------//
  
  //This is an inner class, the purpose of which is to
  // encapsulate the name and the URL for a bookmark.
  class Bookmark{
    String bkMrkName;
    String bkMrkUrl;
    
    Bookmark(String bkMrkName,String bkMrkUrl){
      this.bkMrkName = bkMrkName;
      this.bkMrkUrl = bkMrkUrl;
    }//end constructor
  }//end inner class Bookmark
  //-----------------------------------------------------//
  
  //The purpose of this method is to try to connect to the
  // website specified by a given URL and to download
  // the response header lines.
  void tryToConnect(int cnt, String theName,String URL){
    String server = "";
    String theFile = "";

    //Handle cases with a file specified or with no file
    // specified but a trailing slash on the URL.
    if(URL.indexOf("/") != -1){
      server = URL.substring(0,URL.indexOf("/"));
      theFile = URL.substring(URL.indexOf("/"));
    }else
      //Handle the case of no slash and no file specified.
      if(URL.indexOf("/") == -1){
        server = URL;
        theFile = "/";
    }//end if

    int port = 80; //http port
    try{
      Socket socket = new Socket(server,port);//get socket

      //Get input and output streams from the socket      
      BufferedReader inputStream = 
                  new BufferedReader(new InputStreamReader(
                                 socket.getInputStream()));
      PrintWriter outputStream = 
                    new PrintWriter(new OutputStreamWriter(
                           socket.getOutputStream()),true);

      //Send a command to the web server asking it to
      // send back the response header lines only using the
      // HTTP 1.1 protocol.
      outputStream.println(
                          "HEAD " + theFile + " HTTP/1.1");
      outputStream.println("Host: " + server);
      //May need to modify the following for non-Windows
      // systems, (see Wikipedia reference) to cause hard
      // line breaks consisting of both a carriage return
      // and a line feed to be sent to the server.
      outputStream.println();
      outputStream.println();
      
      //Get first response header line.  We don't care
      // about the other lines.
      String line = inputStream.readLine();
      
      //Save first line of all headers in the file
      // named 000.txt.
      file000.writeBytes(cnt + " " + theName + " " + URL 
                                                   + "n");
      if(line.startsWith("HTTP/1.0")){
        file000.writeBytes(
                    "HTTP/1.0 results are not reliablen");
      }//end if
      file000.writeBytes(line + "n");
      file000.writeBytes("n");

      //Save first line of all 100 series headers in the
      // file named 100.txt
      if(line.substring(9,10).equals("1")){
        file100.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file100.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file100.writeBytes(line + "n");
        file100.writeBytes("n");
      }//end if

      //Save first line of all 200 series headers in the
      // file named 200.txt
      if(line.substring(9,10).equals("2")){
        file200.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file200.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file200.writeBytes(line + "n");
        file200.writeBytes("n");
      }//end if

      //Save first line of all 300 series headers in the
      // file named 300.txt
      if(line.substring(9,10).equals("3")){
        file300.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file300.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file300.writeBytes(line + "n");
        file300.writeBytes("n");
      }//end if

      //Save first line of all 400 series headers in the
      // file named 400.txt
      if(line.substring(9,10).equals("4")){
        file400.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file400.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file400.writeBytes(line + "n");
        file400.writeBytes("n");
      }//end if
      
      //Save first line of all 500 series headers in the
      // file named 500.txt
      if(line.substring(9,10).equals("5")){
        file500.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file500.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file500.writeBytes(line + "n");
        file500.writeBytes("n");
      }//end if

      socket.close();
    }//end try
    catch(Exception e){
      try{
        file600.writeBytes(cnt + " " + theName + "n");
        file600.writeBytes(server + theFile + "n");
        file600.writeBytes(e + "n");
        file600.writeBytes("n");
      }catch(IOException ex){
        ex.printStackTrace();
      }//end catch
    }//end catch
  }//end tryToConnect
  //-----------------------------------------------------//
}//end class Bookmarks10 definition

Listing 20


Copyright 2006, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Programming Tutorials, which have gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.

In addition to his programming expertise, Richard has many years of practical experience in Digital Signal Processing (DSP).  His first job after he earned his Bachelor's degree was doing DSP in the Seismic Research Department of Texas Instruments.  (TI is still a world leader in DSP.)  In the following years, he applied his programming and DSP expertise to other interesting areas including sonar and underwater acoustics.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date