September 16, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Using Java to Clean Up Your Bookmark Library

  • March 21, 2006
  • By Richard G. Baldwin
  • Send Email »
  • More Articles »

Run the Program

I encourage you to copy the code from Listing 20 into your text editor, compile it, and execute it.  Experiment with it, making changes, and observing the results of your changes.

If you feel really ambitious, you might want to expand the code causing the program to automatically delete broken bookmarks from the bookmark library.

Summary

In this lesson, I showed you how to write a Java program that will help you to identify broken bookmarks in your bookmark library so that you can either delete them or repair them.

I began by explaining the differences between IE bookmark libraries and Firefox/Netscape bookmark libraries.

I explained that this program identifies potentially broken bookmarks in all three kinds of bookmark libraries: IE, Firefox, and Netscape.

I explained how you can use the output produced by this program to first find and then to delete or repair broken bookmarks in your bookmark library.

I explained how the HTTP 1.1 protocol can be used to connect to a server and request the response headers associated with a specified resource.

I explained how you can use the information contained in the first response header line to assess the quality of a bookmark.

I described and provided examples of each of the seven output text files produced by this program.

Finally, I explained in detail how this program accomplishes its purpose and gave usage examples for the program.

Complete Program Listing

A complete listing of the program discussed in this lesson is shown in Listing 20 below.
 
/* File Bookmarks10.java 
Copyright 2005, R.G.Baldwin
Revised 09/15/05

The purpose of this program is to help you to clean up your
bookmark library.  It is compatible with bookmark libraries
for the following browsers:

Firefox
Netscape
Internet Explorer

For each bookmark within a specified range of bookmarks 
within the bookmark library, the program attempts to use 
the bookmark to connect to the web server using the 
HTTP 1.1 protocol and to retrieve the response headers from
the web server. It uses the first line in the response 
header to categorize the response into one of five
categories as described at 
http://www.jmarshall.com/easy/http/.

This program does not attempt to connect to secure web 
sites using the HTTPS protocol.  Also, it does not support 
FTP and other protocols.  If the bookmark library contains 
bookmarks that specify a protocol other than simple HTTP, 
those bookmarks are simply ignored.

According to the source given above, the initial response 
line, called the status line, has three parts separated by 
spaces:

1. The HTTP version
2. A response status code that gives the result of the 
   request
3. An English reason phrase describing the status code.

The HTTP version is in the format "HTTP/x.x".

The status code is meant to be computer-readable. The 
reason phrase is meant to be human-readable, and may vary.

The status code is a three-digit integer, and the first 
digit identifies the general category of response:

1xx indicates an informational message only
2xx indicates success of some kind
3xx redirects the client to another URL
4xx indicates an error on the client's part
5xx indicates an error on the server's part

The header response line along with additional information 
about each bookmark within the specified range is stored in
a set of output text files named 100.txt through 500.txt.
The user can examine the information provided in those text
files to determine the quality of the bookmark. For those 
bookmarks that are determined to have problems on the basis
of the web server response, the user can either delete the 
bookmark from the library, or attempt to repair it.

Some typical status lines follow:

HTTP/1.1 200 OK
HTTP/1.1 301 Moved Permanently
HTTP/1.1 302 Moved Temporarily
HTTP/1.1 302 Found
HTTP/1.1 302 Object moved
HTTP/1.1 400 Bad Request
HTTP/1.1 401 Authorization Required
HTTP/1.1 403 Access Forbidden
HTTP/1.1 403 Invalid method
HTTP/1.1 404 Not found
HTTP/1.1 404 Object Not Found
HTTP/1.1 405 Method Not Allowed
HTTP/1.1 405
HTTP/1.1 500 Server Error
HTTP/1.1 500 Internal Server Error
HTTP/1.1 501 Method Not Implemented
HTTP/1.1 501 Method Not Supported

Note that the reason phrase does vary from one web server
to another.  Also note that I haven't seen any status 
lines that show a status code in the 1xx range.

The status codes that are probably the most important in 
terms of cleaning up the bookmark library are those in the 
4xx and 5xx range.

In addition to the five output files described above, the 
program also produces two additional output files. A file 
named 000.txt contains information about every bookmark 
within the range of specified bookmarks.

A file named 600.txt contains information about each 
bookmark for which the program threw an exception when
trying to connect to the web site, such as the following:

java.net.UnknownHostException: www.BadBookmark.com
java.net.ConnectException: Connection timed out: connect
java.net.SocketException: Network is unreachable: connect


The following five values must be provided as command-line 
parameters.  All command-line parameters are provided as
strings, but must be convertible to the types shown below.

String bkMrkPath: Path to the folder containing a Firefox
 bookmark file or containing a multitude of IE .url files.
String bkMrkFile: Name of the Firefox bookmark file.  Just
 use a dummy name for this parameter when processing IE
 favorites
int lowBkMrkLimit: Index of first bookmark to process.
int numToProc: Number of bookmarks to process.
String browser: Type of browser: F for Firefox, N for
 Navigator, or I for Internet Explorer.

Tested using J2SE 5.0 under WinXP.  J2SE 5.0 or later is
required due to the use of generics.
**********************************************************/
import java.net.*;
import java.io.*;
import java.util.*;

class Bookmarks10{
  //Output text file streams
  DataOutputStream file000;
  DataOutputStream file100;
  DataOutputStream file200;
  DataOutputStream file300;
  DataOutputStream file400;
  DataOutputStream file500;
  DataOutputStream file600;

  public static void main(String[] args){
    //Confirm correct number of command-line parameters.
    // If the number is not correct, display a usage msg
    // and terminate the program.
    if(args.length != 5){
      System.out.println("Command-line parameter error");
      System.out.println();
      System.out.println("Usage: java Bookmarks10");
      System.out.println("followed by:");
      System.out.println("Bookmark path");
      System.out.println("Bookmark file");
      System.out.println("Low bookmark limit");
      System.out.println("Number bookmarks to process");
      System.out.println("Browser, F, N, or I");
      
      System.out.println();
      System.out.println("Terminating Program");
      System.exit(0);      
    }//end if
    
    //The following values are provided as command-line
    // parameters.

    //Path to the folder containing a Firefox bookmark
    // file or containing a multitude of IE .url files.
    String bkMrkPath = args[0];
    //Name of the Firefox bookmark file.  Just use a 
    // dummy name for this parameter when processing IE
    // favorites
    String bkMrkFile = args[1];
    //Index of first bookmark to process.
    int lowBkMrkLimit = Integer.parseInt(args[2]);
    //Number of bookmarks to process.
    int numToProc = Integer.parseInt(args[3]);
    //Type of browser: F for Firefox, N for Navigator,
    // or I for Internet Explorer.
    String browser = args[4];
    //End of command-line parameters
    
    //Instantiate a new object of this class.    
    Bookmarks10 thisObj = new Bookmarks10();

    //Get the name and the URL for each of the bookmarks.
    // Encapsulate them in an object of type Bookmark.
    // Encapsulate all of the Bookmark objects in an object
    // of type ArrayList.
    
    //The following collection encapsulates all of the
    // bookmarks awaiting final processing.  The
    // getIEBookmarks method requires that a method
    // parameter points to the ArrayList object on input
    // because of its recursive nature.  The
    // getFireFoxBookmarks method is not recursive and it
    // overwrites this object with a new ArrayList object
    // that it creates.
    ArrayList <Bookmark> theBookmarks = 
                                new ArrayList <Bookmark>();
    if(browser.toUpperCase().equals("F")){
      //Process Firefox bookmarks.
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("N")){
      //Process Netscape Navigator bookmarks.  Same format
      // as Firefox
      theBookmarks = thisObj.getFireFoxBookmarks(
                                      bkMrkPath,bkMrkFile);
    }else if(browser.toUpperCase().equals("I")){
      //Process Inernet Explorer favorites.
      theBookmarks = thisObj.getIEBookmarks(
                                   bkMrkPath,theBookmarks);
    }else{
      System.out.println("Don't recognize browser");
      System.out.println("Terminating program");
      System.exit(0);
    }//end else

    //Process the bookmarks.
    thisObj.processBkMrks(lowBkMrkLimit,numToProc,
                                             theBookmarks);
  }// end main
  //-----------------------------------------------------//
  
  //This method processes bookmarks previously stored in an
  // ArrayList object.
  void processBkMrks(int lowBkMrkLimit,
                     int numToProc,
                     ArrayList <Bookmark> theBookmarks){
    int eligibleCounter = 0;
    String theName = null;
    String theUrl = null;
    
    //Create the output files.
    try{
      file000 = new DataOutputStream(
                          new FileOutputStream("000.txt"));
      file000.writeBytes(
                     "This file contains all headersnn");
      
      file100 = new DataOutputStream(
                          new FileOutputStream("100.txt"));
      file100.writeBytes(
          "This file contains all 100-series headersnn");
      
      file200 = new DataOutputStream(
                          new FileOutputStream("200.txt"));
      file200.writeBytes(
          "This file contains all 200-series headersnn");
      
      file300 = new DataOutputStream(
                          new FileOutputStream("300.txt"));
      file300.writeBytes(
          "This file contains all 300-series headersnn");
      
      file400 = new DataOutputStream(
                          new FileOutputStream("400.txt"));
      file400.writeBytes(
          "This file contains all 400-series headersnn");
      
      file500 = new DataOutputStream(
                          new FileOutputStream("500.txt"));
      file500.writeBytes(
          "This file contains all 500-series headersnn");
      
      file600 = new DataOutputStream(
                          new FileOutputStream("600.txt"));
      file600.writeBytes(
                "This file contains exception outputnn");
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
    
    //Iterate on the bookmarks in the ArrayList collection.
    for(int msgCntr = 0;msgCntr < theBookmarks.size();
                                                msgCntr++){
      theName = theBookmarks.get(msgCntr).bkMrkName;
      theUrl = theBookmarks.get(msgCntr).bkMrkUrl;

      //Determine eligibility based on the specified 
      // range of bookmark indices and the protocol.
      if((msgCntr >= lowBkMrkLimit) && 
                    (msgCntr < lowBkMrkLimit + numToProc)){
        //Strip off the protocol for the HTTP protocol only
        if(theUrl.substring(0,7).toUpperCase().
                                        equals("HTTP://")){
          theUrl = theUrl.substring(7);
          //This bookmark is eligible for processing.
          eligibleCounter++;
          //Display progress on standard output
          System.out.println("n" + msgCntr + " " 
                                 + theName + " " + theUrl);
                                 
          //Try to connect to the server to retrieve the
          // response headers.
          tryToConnect(msgCntr,theName,theUrl);
        }else{
          //This protocol can't be handled by this program.
          // Document that fact in the file named 000.txt.
          try{
            file000.writeBytes(msgCntr + " " + 
                          "Can't handle this protocol.n");
            file000.writeBytes(
                         theName + "   " + theUrl +"nn");
          }catch(IOException e){
            try{
              file600.writeBytes(e + "nn");
            }catch(Exception ex){
              ex.printStackTrace();
            }//end catch
            e.printStackTrace();
            System.exit(0);
          }//end catch
        }//end else regarding protocol
      }//end if regarding the bookmark indices
    }//end for loop iterating on the ArrayList object
    
    //Store summary information about the run in the file
    // named 000.txt and close all output text files.
    try{
      
      file000.writeBytes("Number eligible bookmarks = " 
                                 + eligibleCounter + "n");
      file000.writeBytes("Bookmark range = " 
            + lowBkMrkLimit
            + " to " + (lowBkMrkLimit + numToProc) + "n");
      file000.writeBytes("Total number bookmarks = " 
                             + theBookmarks.size() + "n");
      file000.close();
      file100.close();
      file200.close();
      file300.close();
      file400.close();
      file500.close();
      file600.close();
    }catch(IOException e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
  }//end processBkMrks
  //-----------------------------------------------------//
  
  //The purpose of this method is to extract all of the
  // bookmarks and to encapsulate them in an ArrayList
  // object.  Each element in the ArrayList object is an
  // object of the inner class named Bookmark.
  //This version of the method is designed to extract
  // bookmarks from Firefox and Netscape bookmark files.
  ArrayList <Bookmark> getFireFoxBookmarks(
                        String bkMrkPath,String bkMrkFile){
    int urlIndex = 0;
    int startIndex = 0;
    int endIndex = 0;
    ArrayList <Bookmark> theBookmarks = 
                                new ArrayList <Bookmark>();
    try{
      BufferedReader bufRdr = new BufferedReader(
                 new InputStreamReader(new FileInputStream(
                                  bkMrkPath + bkMrkFile)));
      //Read each line of text from the copy of the
      // bookmark file.  If the line contains a URL,
      // extract the URL and the name of the bookmark.
      String theName = null;
      String theUrl = null;
      String data = null;
      while((data = bufRdr.readLine()) != null){
        urlIndex = data.indexOf("A HREF="");       
        //urlIndex will be -1 if line doesn't contain
        // a URL indicated by A HREF...  In that case, just
        // ignore the line of text.
        if(urlIndex != -1){
          //Find the index of the quotation marks at the
          // beginning and the end of the URL.
          startIndex = urlIndex+8;//Index of first quote+1
          //Index of quotation mark at the end of the URL.
          endIndex = data.indexOf(""",startIndex);
          //Extract and save the URL
          theUrl = data.substring(startIndex,endIndex);
          
          //Get and save the content of the element
          // named A.
          // Get the index of the beginning of the content.
          startIndex = data.indexOf(">",urlIndex) +1;
          //Get the index of the end of the content.
          endIndex = data.indexOf("</A>",startIndex);
          //Get and save the content
          if(endIndex > startIndex){
            //The A element is not empty.
            theName = data.substring(startIndex,endIndex);
          }else{
            //The A element is empty
            theName = "No bookmark name found.";
          }//end else

          theBookmarks.add(new Bookmark(theName,theUrl));
        }//end if
      }//end while
      bufRdr.close();
    }catch(Exception e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
      System.exit(0);
    }//end catch
    
    return theBookmarks;
  }//end getFireFoxBookmarks
  //-----------------------------------------------------//
  
  //This method uses recursion to traverse the directory
  // tree containing IE Favorites.  Each bookmark is 
  // represented by a file with an extension of .url. The
  // name of the file is the name of the bookmarl.  The
  // URL for the bookmark is contained as a line of text
  // in the file.
  ArrayList <Bookmark> getIEBookmarks(
       String bkMrkPath,ArrayList <Bookmark> theBookmarks){
   
    String theName = null;
    String theUrl = null;
    String fileName = null;
    String pathAndFile = null;
    
    //Get a File object that represents the directory.
    File fileObj = new File(bkMrkPath);
    //Make certain that the directory exists.
    if(fileObj.exists()){
      //Confirm that the File object represents a directory
      // and not a file.
      if(fileObj.isDirectory()){
        //Get a list of the directory contents in an array
        // object.
        File[] dirContents = fileObj.listFiles();
        
        //Sort the directory contents according to the
        // natural order according toname.  You may want to
        // disable this sort and leave the data in the
        // recursion order.  It all depends on how you plan
        // to locate the Favorites in the IE Favorites
        // display.
        //Arrays.sort(dirContents);
        //Process the contents of the directory that were
        // saved in the list of contents.
        for(int cnt = 0;cnt < dirContents.length;cnt++){
          if(dirContents[cnt].isDirectory()){
            //Make a recursive call to process this
            // directory before processing the remaining
            // contents in the list of contents.
            theBookmarks = getIEBookmarks(
                  dirContents[cnt].getPath(),theBookmarks);
          }else if(dirContents[cnt].isFile()){
            pathAndFile = dirContents[cnt].getPath();
            fileName = dirContents[cnt].getName();

            //All file names that represent bookmarks
            // should end with .url.
            if(fileName.toUpperCase().endsWith(".URL")){
              theName = fileName.substring(
                 0,fileName.toUpperCase().indexOf(".URL"));
              theUrl = getTheUrl(pathAndFile);
              theBookmarks.add(
                             new Bookmark(theName,theUrl));
            }//end if
          }//end else
        }//end for loop
      }else{
        System.out.println(
                  bkMrkPath + ": not a directory.");
      }//end else
    }else{
      System.out.println("Directory " + bkMrkPath
                                     + " does not exist.");
    }//end else
    return theBookmarks;
  }//end getIEBookmarks
  //-----------------------------------------------------//
  
  //This is a helper method called by getIEBookmarks.  The
  // purpose of this method is to extract the URL from a
  // Microsoft .url file.
  String getTheUrl(String pathAndFile){
    try{
      BufferedReader inData = new BufferedReader(
                              new FileReader(pathAndFile));
      String data; //temp holding area

      while((data = inData.readLine()) != null){
        if(data.startsWith("URL=")){
          String theUrl = data.substring(4);
          inData.close();//Close input file
          return theUrl;
        }//end if
      }//end while loop
      inData.close();//Close input file
    }catch(Exception e){
      try{
        file600.writeBytes(e + "nn");
      }catch(Exception ex){
        ex.printStackTrace();
      }//end catch
      e.printStackTrace();
    }//end catch
    System.out.println("No URL Found");
    return "No URL Found";
  }//end getTheUrl
  //-----------------------------------------------------//
  
  //This is an inner class, the purpose of which is to
  // encapsulate the name and the URL for a bookmark.
  class Bookmark{
    String bkMrkName;
    String bkMrkUrl;
    
    Bookmark(String bkMrkName,String bkMrkUrl){
      this.bkMrkName = bkMrkName;
      this.bkMrkUrl = bkMrkUrl;
    }//end constructor
  }//end inner class Bookmark
  //-----------------------------------------------------//
  
  //The purpose of this method is to try to connect to the
  // website specified by a given URL and to download
  // the response header lines.
  void tryToConnect(int cnt, String theName,String URL){
    String server = "";
    String theFile = "";

    //Handle cases with a file specified or with no file
    // specified but a trailing slash on the URL.
    if(URL.indexOf("/") != -1){
      server = URL.substring(0,URL.indexOf("/"));
      theFile = URL.substring(URL.indexOf("/"));
    }else
      //Handle the case of no slash and no file specified.
      if(URL.indexOf("/") == -1){
        server = URL;
        theFile = "/";
    }//end if

    int port = 80; //http port
    try{
      Socket socket = new Socket(server,port);//get socket

      //Get input and output streams from the socket      
      BufferedReader inputStream = 
                  new BufferedReader(new InputStreamReader(
                                 socket.getInputStream()));
      PrintWriter outputStream = 
                    new PrintWriter(new OutputStreamWriter(
                           socket.getOutputStream()),true);

      //Send a command to the web server asking it to
      // send back the response header lines only using the
      // HTTP 1.1 protocol.
      outputStream.println(
                          "HEAD " + theFile + " HTTP/1.1");
      outputStream.println("Host: " + server);
      //May need to modify the following for non-Windows
      // systems, (see Wikipedia reference) to cause hard
      // line breaks consisting of both a carriage return
      // and a line feed to be sent to the server.
      outputStream.println();
      outputStream.println();
      
      //Get first response header line.  We don't care
      // about the other lines.
      String line = inputStream.readLine();
      
      //Save first line of all headers in the file
      // named 000.txt.
      file000.writeBytes(cnt + " " + theName + " " + URL 
                                                   + "n");
      if(line.startsWith("HTTP/1.0")){
        file000.writeBytes(
                    "HTTP/1.0 results are not reliablen");
      }//end if
      file000.writeBytes(line + "n");
      file000.writeBytes("n");

      //Save first line of all 100 series headers in the
      // file named 100.txt
      if(line.substring(9,10).equals("1")){
        file100.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file100.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file100.writeBytes(line + "n");
        file100.writeBytes("n");
      }//end if

      //Save first line of all 200 series headers in the
      // file named 200.txt
      if(line.substring(9,10).equals("2")){
        file200.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file200.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file200.writeBytes(line + "n");
        file200.writeBytes("n");
      }//end if

      //Save first line of all 300 series headers in the
      // file named 300.txt
      if(line.substring(9,10).equals("3")){
        file300.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file300.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file300.writeBytes(line + "n");
        file300.writeBytes("n");
      }//end if

      //Save first line of all 400 series headers in the
      // file named 400.txt
      if(line.substring(9,10).equals("4")){
        file400.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file400.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file400.writeBytes(line + "n");
        file400.writeBytes("n");
      }//end if
      
      //Save first line of all 500 series headers in the
      // file named 500.txt
      if(line.substring(9,10).equals("5")){
        file500.writeBytes(cnt + " " + theName + " " + URL
                                                   + "n");
        if(line.startsWith("HTTP/1.0")){
          file500.writeBytes(
                    "HTTP/1.0 results are not reliablen");
        }//end if
        file500.writeBytes(line + "n");
        file500.writeBytes("n");
      }//end if

      socket.close();
    }//end try
    catch(Exception e){
      try{
        file600.writeBytes(cnt + " " + theName + "n");
        file600.writeBytes(server + theFile + "n");
        file600.writeBytes(e + "n");
        file600.writeBytes("n");
      }catch(IOException ex){
        ex.printStackTrace();
      }//end catch
    }//end catch
  }//end tryToConnect
  //-----------------------------------------------------//
}//end class Bookmarks10 definition

Listing 20


Copyright 2006, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Programming Tutorials, which have gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.

In addition to his programming expertise, Richard has many years of practical experience in Digital Signal Processing (DSP).  His first job after he earned his Bachelor's degree was doing DSP in the Seismic Research Department of Texas Instruments.  (TI is still a world leader in DSP.)  In the following years, he applied his programming and DSP expertise to other interesting areas including sonar and underwater acoustics.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com





Page 3 of 3



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel