With the growing popularity of the Web, there is an increasing need for programs that can download information and manipulate it according to user requirements. This article documents a Java program that harvests information from a Web site.
We’ll use the act of downloading historical stock information. If you wish to analyze only one stock, you don’t need to automate the information. But what about 100 stocks? The program
accepts a ticker symbol on its command line, then downloads historical information from Yahoo’s Web site to a file.
will demonstrate how to use the
class (which uses buffering to download information quickly); this program will also demonstrate how to use the
class, the
class, and the
class to manipulate information that has been downloaded into memory.
Downloading the Information
is the class that does all the work (see Listing 1). First, the program retrieves a five-year weekly history of the ticker’s stock price.
import java.io.*;import java.net.*;import java.util.* ;import java.text.* ;public class download_hist { public static void main(String[] args) throws Exception { int year_1 ; int int_from_year ; if (args.length != 1) { System.err.println(“Usage: java download_hist ” + “ticker”); System.err.println(“Example: java download_hist ” + “INTC”); System.exit(1); } SimpleDateFormat formatter = new SimpleDateFormat(“yyyy/MM/dd”); Date dt = new Date(System.currentTimeMillis()); String yyyy_mm_dd = (formatter.format(dt) ) ; String curr_year = yyyy_mm_dd.substring(0,4) ; String curr_month = yyyy_mm_dd.substring(5,7) ; String curr_date = yyyy_mm_dd.substring(8,10) ; // Go Back 5 Years int_from_year = Integer.parseInt( curr_year ) – 5 ; String TickerSymbol = URLEncoder.encode(args[0]); String TargetUrl = “http://chart.yahoo.com/table.csv?s=” + TickerSymbol + “&a=” + curr_month + “&b=” + curr_date + “&c=” + int_from_year + “&d=” + curr_month + “&e=” + curr_date + “&f=” + curr_year + “&g=w&q=q&y=0&z=” + TickerSymbol + “&x=.csv” ; URL url = new URL( TargetUrl); URLConnection connection = url.openConnection(); System.out.println(“TargetUrl is:” + TargetUrl); BufferedInputStream in = ( new BufferedInputStream( connection.getInputStream())); byte[] buffer = new byte[8192] ; StringBuffer strbuf = new StringBuffer( 8192 ) ; FileOutputStream destination = new FileOutputStream( “C:StocksStock Data” + TickerSymbol +”_old.csv”) ; while ( true ) { // System.out.println(inputLine); int bytes_read = in.read( buffer) ; if ( bytes_read == -1 ) break ; destination.write(buffer , 0, bytes_read ) ; System.out.println( “Downloaded ” + bytes_read + ” bytes for ticker ” + TickerSymbol ); strbuf.append( new String( buffer , 0, bytes_read ) ) ; } in.close(); destination.close() ; change_string(strbuf , TickerSymbol ) ; } public static void change_string (StringBuffer strbuf, String TickerSymbol ) { int pos1, pos2, pos3, pos4, pos5 ; int length = 0 , count = 0 ; // Manipulate this string String s1 = new String ( strbuf.toString() ) ; StringTokenizer tokens = new StringTokenizer( s1, “n” ) ; StringBuffer s2_buf = new StringBuffer( ) ; StringBuffer actual_line_buf = new StringBuffer() ; tokens.nextToken() ; while ( tokens.hasMoreTokens() ) { String curr_line = new String ( tokens.nextToken() ) ; pos1 = curr_line.indexOf( ‘,’, 1) + 1 ; pos2 = curr_line.indexOf( ‘,’, pos1 ) +1 ; pos3 = curr_line.indexOf( ‘,’, pos2 )+ 1 ; pos4 = curr_line.indexOf( ‘,’, pos3 ) + 1 ; pos5 = curr_line.indexOf( ‘,’, pos4 ) + 1 ; length = curr_line.length() ; actual_line_buf.append ( curr_line.substring( 0,pos1) ) ; actual_line_buf.append ( curr_line.substring( pos2 , pos3) ) ; actual_line_buf.append ( curr_line.substring( pos3 , pos4) ) ; actual_line_buf.append ( curr_line.substring( pos5 , length) ) ; actual_line_buf.append ( “n” ) ; // System.out.println(“actual_line “+ actual_line_buf.toString() +”n “) ; s2_buf.insert( 0 , actual_line_buf.toString() ) ; // System.out.println(“s2_buf “+ s2_buf.toString() +”n “) ; actual_line_buf.delete(0 , (length + 1) ) ; // System.out.println(“actual_line “+ actual_line_buf.toString() +”n “); } try { // System.out.println(“s2_buf “+ s2_buf.toString() +”n “) ; BufferedWriter destination_1 = new BufferedWriter( new OutputStreamWriter( new FileOutputStream(“C:StocksStock Data” +TickerSymbol +”.csv”)) ) ; destination_1.write(s2_buf.toString() , 0, s2_buf.toString().length() ) ; destination_1.flush() ; destination_1.close() ; } catch ( IOException io ) { System.out.println(“Error opening output file n “) ; } } // End change_string}
Listing 1. Source code for Download_hist.java.
Date dt = new Date(System.currentTimeMillis()); String yyyy_mm_dd = (formatter.format(dt) ) ; String curr_year = yyyy_mm_dd.substring(0,4) ; String curr_month = yyyy_mm_dd.substring(5,7) ; String curr_date = yyyy_mm_dd.substring(8,10) ; // Go Back 5 Years int_from_year = Integer.parseInt( curr_year ) – 5 ;
In the lines shown above, five years are subtracted from the current system date/year.
Next, construct a
String with parameters required by the site — the URL class is extremely useful for this kind of networking, since it significantly simplifies the programmer’s work.
An input stream is set up and buffered (buffered streams increase efficiency substantially). Then, put the contents in a StringBuffer,
. The append method of this class is used to keep adding lines to the buffer. Loop through until the
variable is -1; then break out of the loop. When there is no more information to be read from the input stream, the
variable is set to -1. Write the contents of the downloaded information into a file, then close the connection with
and the file output stream with
.
while ( true ) { // System.out.println(inputLine); int bytes_read = in.read( buffer) ; if ( bytes_read == -1 ) break ; destination.write(buffer , 0, bytes_read ) ; // write to File System.out.println( “Downloaded ” + bytes_read + ” bytes for ticker ” + TickerSymbol ); strbuf.append( new String( buffer , 0, bytes_read ) ) ;}
The information in the
variable is given in the format shown below.
Week of, Open, High, Low, Close, Volume17-Jan- 0,100.2969,105.75,99.875,100.0625,3691710010-Jan- 0,85.75,106.625,84.125,103.0625,595062003-Jan- 0,83.2656,87.375,77.375,82,24701600..30-Jan-95,8.8713,9.2455,8.7154,9.1676,648740023-Jan-95,8.5283,8.9181,8.4816,8.8557,579020016-Jan-95,8.5439,8.809,8.4504,8.5829,6431200
Manipulating the Downloaded Information
The
method is then called to manipulate the contents downloaded to
. A list of changes made by
to the downloaded information follows.
- Eliminate the header line (
…)
- Eliminate the Open and the Close columns in every line
- Sort the file so the last line in the original download becomes the first line.
Eliminate the Header Line
The
class is used to parse the contents of the String
into individual lines as shown below.
String s1 = new String ( strbuf.toString() ) ; StringTokenizer tokens = new StringTokenizer( s1, “n” ) ; StringBuffer s2_buf = new StringBuffer( ) ; StringBuffer actual_line_buf = new StringBuffer() ;
The “n” is used as the delimiter, now every call to
will return the next line. In order to skip the first line, you may skip over the first token.
tokens.nextToken() ; while ( tokens.hasMoreTokens() )
The
method will be true if there are any other tokens left; use this loop through all the tokens (lines) until the end.
Eliminate the Open and the Close Columns
Now, process each line to eliminate the columns Open and Close. Look at the first line processed within the loop: 17-Jan- 0,100.2969,105.75,99.875,100.0625,36917100
You must eliminate the values in bold. Simply identify the location of the ‘,’ character and extract portions of the appropriate line. The variables
, are used to indicate the positions of the ‘,’ character.
pos1 = curr_line.indexOf ( ‘,’, 1) + 1 ;
The substring method is used to retrieve portions of the required string. The following line will append to
contents of
from
to
; which would be
.
actual_line_buf.append ( curr_line.substring( 0,pos1) ) ;actual_line_buf.append ( curr_line.substring( pos2 , pos3) ) ;actual_line_buf.append ( curr_line.substring( pos3 , pos4) ) ;actual_line_buf.append ( curr_line.substring( pos5 , length) ) ;actual_line_buf.append ( “n” ) ;
Do not extract
and
, because they are to be discarded.
Sort the File
Sort so the last line in the original download becomes the first line. Do this by using the
method to sort the file in the reverse order. The
method will add the contents of the actual line before the contents of
. Note the subtle difference between the
and the
methods of the
class.
s2_buf.insert( 0 , actual_line_buf.toString() ) ;
Contents of
at the
end of
function:
16-Jan-95,8.809,8.4504,643120023-Jan-95,8.9181,8.4816,579020030-Jan-95,9.2455,8.7154,6487400……3-Jan- 0,87.375,77.375,2470160010-Jan- 0,106.625,84.125,5950620017-Jan- 0,105.75,99.875,36917100
The contents of
are then written to the file
C:StocksStockData<Ticker>.csv |
.
Summary
Java is well-suited for programs that harvest information from the Web. Happily, the URL class hides much of the complexity associated with networking. The
class is useful for holding the contents downloaded into memory; the
class is effective in parsing the String into tokens and then manipulating them. This program is easily adapted to harvesting other specific kinds of information from the Web.
To run: java download_hist intc
To compile:
javac download_hist.java
Files created:
C:StocksStock Dataintc.csv C:StocksStock Data intc_old.csv
About the Author
Sharath Sahadevan is a senior software engineer with MasterCard International in St. Louis. His team supports the MasterCard Settlement Account Management application. He has a bachelor’s degree in electrical and electronics engineering from P.S.G College of Technology, India (1990). When not working, he enjoys spending time with his family, as well as playing tennis, cricket, basketball, and chess.