Microsoft & .NET.NETDownload Historical Stock Information from a Web Site

Download Historical Stock Information from a Web Site

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

With the growing popularity of the Web, there is an increasing need for programs that can download information and manipulate it according to user requirements. This article documents a Java program that harvests information from a Web site.

We’ll use the act of downloading historical stock information. If you wish to analyze only one stock, you don’t need to automate the information. But what about 100 stocks? The program

Download_hist.java
accepts a ticker symbol on its command line, then downloads historical information from Yahoo’s Web site to a file.

Download_hist.java
will demonstrate how to use the

URL
class (which uses buffering to download information quickly); this program will also demonstrate how to use the

String
class, the

StringTokenizer
class, and the

StringBuffer
class to manipulate information that has been downloaded into memory.

Downloading the Information

Download_hist
is the class that does all the work (see Listing 1). First, the program retrieves a five-year weekly history of the ticker’s stock price.
import java.io.*;import java.net.*;import java.util.* ;import java.text.* ;public class download_hist {  public static void main(String[] args) throws Exception {  int year_1 ;  int int_from_year ;    if (args.length != 1) {      System.err.println(“Usage: java download_hist ”                + “ticker”);      System.err.println(“Example: java download_hist ”                + “INTC”);      System.exit(1);    }   SimpleDateFormat formatter =                  new SimpleDateFormat(“yyyy/MM/dd”);   Date dt = new Date(System.currentTimeMillis());        String yyyy_mm_dd = (formatter.format(dt) ) ;        String curr_year = yyyy_mm_dd.substring(0,4) ;        String curr_month = yyyy_mm_dd.substring(5,7) ;        String curr_date = yyyy_mm_dd.substring(8,10) ;        // Go Back 5 Years        int_from_year = Integer.parseInt( curr_year ) – 5 ;    String TickerSymbol = URLEncoder.encode(args[0]);    String TargetUrl =  “http://chart.yahoo.com/table.csv?s=” +               TickerSymbol +               “&a=”               + curr_month               + “&b=”               + curr_date               + “&c=”               + int_from_year               + “&d=”               + curr_month               + “&e=”               + curr_date               + “&f=”               + curr_year               + “&g=w&q=q&y=0&z=”               +               TickerSymbol + “&x=.csv”                ;   URL url = new URL( TargetUrl);    URLConnection connection = url.openConnection();     System.out.println(“TargetUrl is:”                + TargetUrl);   BufferedInputStream in = (                new BufferedInputStream(                connection.getInputStream()));    byte[] buffer = new byte[8192] ;    StringBuffer strbuf = new StringBuffer( 8192 ) ;    FileOutputStream  destination =      new FileOutputStream(      “C:StocksStock Data” +  TickerSymbol +”_old.csv”) ;    while ( true )    {     //  System.out.println(inputLine);     int bytes_read = in.read( buffer) ;     if ( bytes_read == -1 ) break ;      destination.write(buffer , 0, bytes_read ) ;      System.out.println(      “Downloaded ” + bytes_read + ” bytes for ticker ” +          TickerSymbol );      strbuf.append( new String( buffer , 0, bytes_read ) ) ;     }    in.close();   destination.close() ;   change_string(strbuf , TickerSymbol ) ;  }  public static void change_string (StringBuffer strbuf,      String TickerSymbol )  {  int pos1, pos2, pos3, pos4, pos5 ;  int length = 0 , count = 0 ;    // Manipulate this string    String s1 = new String ( strbuf.toString() ) ;    StringTokenizer tokens = new                 StringTokenizer( s1, “n” ) ;    StringBuffer s2_buf = new StringBuffer( ) ;    StringBuffer actual_line_buf = new StringBuffer() ;    tokens.nextToken() ;    while ( tokens.hasMoreTokens() )    {     String curr_line = new String ( tokens.nextToken() ) ;     pos1 = curr_line.indexOf( ‘,’, 1) + 1 ;     pos2 = curr_line.indexOf( ‘,’, pos1 ) +1 ;     pos3 = curr_line.indexOf( ‘,’, pos2 )+ 1 ;     pos4 = curr_line.indexOf( ‘,’, pos3 ) + 1 ;     pos5 = curr_line.indexOf( ‘,’, pos4 ) + 1 ;     length = curr_line.length() ;   actual_line_buf.append ( curr_line.substring( 0,pos1) ) ;   actual_line_buf.append ( curr_line.substring( pos2 , pos3) ) ;   actual_line_buf.append ( curr_line.substring( pos3 , pos4) ) ;   actual_line_buf.append ( curr_line.substring( pos5 , length) ) ;   actual_line_buf.append ( “n” ) ;  // System.out.println(“actual_line “+ actual_line_buf.toString() +”n “) ;   s2_buf.insert( 0 , actual_line_buf.toString() ) ;  // System.out.println(“s2_buf “+ s2_buf.toString() +”n “) ;   actual_line_buf.delete(0 , (length + 1) ) ;   // System.out.println(“actual_line “+ actual_line_buf.toString() +”n “);    }    try    {   // System.out.println(“s2_buf “+ s2_buf.toString() +”n “) ;    BufferedWriter  destination_1 =      new BufferedWriter(      new OutputStreamWriter(   new FileOutputStream(“C:StocksStock Data”             +TickerSymbol +”.csv”)) ) ;     destination_1.write(s2_buf.toString() , 0,               s2_buf.toString().length() ) ;     destination_1.flush() ;     destination_1.close() ;    }    catch ( IOException io )    {     System.out.println(“Error opening output file n “) ;    }  } // End change_string}

Listing 1. Source code for Download_hist.java.

Date dt = new Date(System.currentTimeMillis());        String yyyy_mm_dd = (formatter.format(dt) ) ;        String curr_year = yyyy_mm_dd.substring(0,4) ;        String curr_month = yyyy_mm_dd.substring(5,7) ;        String curr_date = yyyy_mm_dd.substring(8,10) ;        // Go Back 5 Years        int_from_year = Integer.parseInt( curr_year ) – 5 ;

In the lines shown above, five years are subtracted from the current system date/year.

Next, construct a

TargetURL
String with parameters required by the site — the URL class is extremely useful for this kind of networking, since it significantly simplifies the programmer’s work.

An input stream is set up and buffered (buffered streams increase efficiency substantially). Then, put the contents in a StringBuffer,

strbuf
. The append method of this class is used to keep adding lines to the buffer. Loop through until the

bytes_read
variable is -1; then break out of the loop. When there is no more information to be read from the input stream, the

bytes_read
variable is set to -1. Write the contents of the downloaded information into a file, then close the connection with

in.close()
and the file output stream with

destination.close ()
.

while ( true ) { //  System.out.println(inputLine); int bytes_read = in.read( buffer) ; if ( bytes_read == -1 ) break ; destination.write(buffer , 0, bytes_read ) ; // write to File System.out.println( “Downloaded ” + bytes_read + ” bytes for ticker ” +     TickerSymbol ); strbuf.append( new String( buffer , 0, bytes_read ) ) ;}

The information in the

strbuf
variable is given in the format shown below.

Week of, Open, High, Low, Close, Volume17-Jan- 0,100.2969,105.75,99.875,100.0625,3691710010-Jan- 0,85.75,106.625,84.125,103.0625,595062003-Jan- 0,83.2656,87.375,77.375,82,24701600..30-Jan-95,8.8713,9.2455,8.7154,9.1676,648740023-Jan-95,8.5283,8.9181,8.4816,8.8557,579020016-Jan-95,8.5439,8.809,8.4504,8.5829,6431200

Manipulating the Downloaded Information

The

change_string
method is then called to manipulate the contents downloaded to

strbuf
. A list of changes made by

change_string
to the downloaded information follows.

  • Eliminate the header line (
    Week of,
    …)
  • Eliminate the Open and the Close columns in every line
  • Sort the file so the last line in the original download becomes the first line.

Eliminate the Header Line

The

StringTokenizer
class is used to parse the contents of the String

s1
into individual lines as shown below.

    String s1 = new String ( strbuf.toString() ) ;    StringTokenizer tokens = new                 StringTokenizer( s1, “n” ) ;    StringBuffer s2_buf = new StringBuffer( ) ;    StringBuffer actual_line_buf = new StringBuffer() ;

The “n” is used as the delimiter, now every call to

tokens.nextToken
will return the next line. In order to skip the first line, you may skip over the first token.

tokens.nextToken() ;        while ( tokens.hasMoreTokens() )
The

tokens.hasMoreTokens()
method will be true if there are any other tokens left; use this loop through all the tokens (lines) until the end.

Eliminate the Open and the Close Columns

Now, process each line to eliminate the columns Open and Close. Look at the first line processed within the loop:

        17-Jan- 0,100.2969,105.75,99.875,100.0625,36917100

You must eliminate the values in bold. Simply identify the location of the ‘,’ character and extract portions of the appropriate line. The variables

pos1, pos2, … pos5
, are used to indicate the positions of the ‘,’ character.

pos1 = curr_line.indexOf ( ‘,’, 1) + 1 ;

The substring method is used to retrieve portions of the required string. The following line will append to

actual_line_buf
contents of

curr_line
from

0
to

pos1
; which would be

17-Jan- 0
.

actual_line_buf.append ( curr_line.substring( 0,pos1) ) ;actual_line_buf.append ( curr_line.substring( pos2 , pos3) ) ;actual_line_buf.append ( curr_line.substring( pos3 , pos4) ) ;actual_line_buf.append ( curr_line.substring( pos5 , length) ) ;actual_line_buf.append ( “n” ) ;

Do not extract

pos1-pos2
and

pos4-pos5
, because they are to be discarded.

Sort the File

Sort so the last line in the original download becomes the first line. Do this by using the

insert
method to sort the file in the reverse order. The

insert
method will add the contents of the actual line before the contents of

s2_buf
. Note the subtle difference between the

append
and the

insert
methods of the

StringBuffer
class.

s2_buf.insert( 0 , actual_line_buf.toString() ) ;

Contents of

s2_buf
at the end of

change_string
function:

16-Jan-95,8.809,8.4504,643120023-Jan-95,8.9181,8.4816,579020030-Jan-95,9.2455,8.7154,6487400……3-Jan- 0,87.375,77.375,2470160010-Jan- 0,106.625,84.125,5950620017-Jan- 0,105.75,99.875,36917100

The contents of

s2_buf
are then written to the file

C:StocksStockData<Ticker>.csv
.

Summary

Java is well-suited for programs that harvest information from the Web. Happily, the URL class hides much of the complexity associated with networking. The

StringBuffer
class is useful for holding the contents downloaded into memory; the

StringTokenizer
class is effective in parsing the String into tokens and then manipulating them. This program is easily adapted to harvesting other specific kinds of information from the Web.

To run:

    java download_hist intc

To compile:

    javac download_hist.java

Files created:

    C:StocksStock Dataintc.csv    C:StocksStock Data intc_old.csv

About the Author

Sharath Sahadevan is a senior software engineer with MasterCard International in St. Louis. His team supports the MasterCard Settlement Account Management application. He has a bachelor’s degree in electrical and electronics engineering from P.S.G College of Technology, India (1990). When not working, he enjoys spending time with his family, as well as playing tennis, cricket, basketball, and chess.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories