dcsimg
May 26, 2017
Hot Topics:

Download Historical Stock Information from a Web Site

  • March 14, 2000
  • By Sharath Sahadevan
  • Send Email »
  • More Articles »

With the growing popularity of the Web, there is an increasing need for programs that can download information and manipulate it according to user requirements. This article documents a Java program that harvests information from a Web site.

We'll use the act of downloading historical stock information. If you wish to analyze only one stock, you don't need to automate the information. But what about 100 stocks? The program

Download_hist.java
accepts a ticker symbol on its command line, then downloads historical information from Yahoo's Web site to a file.
Download_hist.java
will demonstrate how to use the
URL
class (which uses buffering to download information quickly); this program will also demonstrate how to use the
String
class, the
StringTokenizer
class, and the
StringBuffer
class to manipulate information that has been downloaded into memory.

Downloading the Information

Download_hist
is the class that does all the work (see Listing 1). First, the program retrieves a five-year weekly history of the ticker's stock price.
import java.io.*;import java.net.*;import java.util.* ;import java.text.* ;public class download_hist {  public static void main(String[] args) throws Exception {  int year_1 ;  int int_from_year ;    if (args.length != 1) {      System.err.println("Usage: java download_hist "                + "ticker");      System.err.println("Example: java download_hist "                + "INTC");      System.exit(1);    }   SimpleDateFormat formatter =                  new SimpleDateFormat("yyyy/MM/dd");   Date dt = new Date(System.currentTimeMillis());        String yyyy_mm_dd = (formatter.format(dt) ) ;        String curr_year = yyyy_mm_dd.substring(0,4) ;        String curr_month = yyyy_mm_dd.substring(5,7) ;        String curr_date = yyyy_mm_dd.substring(8,10) ;        // Go Back 5 Years        int_from_year = Integer.parseInt( curr_year ) - 5 ;    String TickerSymbol = URLEncoder.encode(args[0]);    String TargetUrl =  "http://chart.yahoo.com/table.csv?s=" +               TickerSymbol +               "&a="               + curr_month               + "&b="               + curr_date               + "&c="               + int_from_year               + "&d="               + curr_month               + "&e="               + curr_date               + "&f="               + curr_year               + "&g=w&q=q&y=0&z="               +               TickerSymbol + "&x=.csv"                ;   URL url = new URL( TargetUrl);    URLConnection connection = url.openConnection();     System.out.println("TargetUrl is:"                + TargetUrl);   BufferedInputStream in = (                new BufferedInputStream(                connection.getInputStream()));    byte[] buffer = new byte[8192] ;    StringBuffer strbuf = new StringBuffer( 8192 ) ;    FileOutputStream  destination =      new FileOutputStream(      "C:\\Stocks\\Stock Data\\" +  TickerSymbol +"_old.csv") ;    while ( true )    {     //  System.out.println(inputLine);     int bytes_read = in.read( buffer) ;     if ( bytes_read == -1 ) break ;      destination.write(buffer , 0, bytes_read ) ;      System.out.println(      "Downloaded " + bytes_read + " bytes for ticker " +          TickerSymbol );      strbuf.append( new String( buffer , 0, bytes_read ) ) ;     }    in.close();   destination.close() ;   change_string(strbuf , TickerSymbol ) ;  }  public static void change_string (StringBuffer strbuf,      String TickerSymbol )  {  int pos1, pos2, pos3, pos4, pos5 ;  int length = 0 , count = 0 ;    // Manipulate this string    String s1 = new String ( strbuf.toString() ) ;    StringTokenizer tokens = new                 StringTokenizer( s1, "\n" ) ;    StringBuffer s2_buf = new StringBuffer( ) ;    StringBuffer actual_line_buf = new StringBuffer() ;    tokens.nextToken() ;    while ( tokens.hasMoreTokens() )    {     String curr_line = new String ( tokens.nextToken() ) ;     pos1 = curr_line.indexOf( ',', 1) + 1 ;     pos2 = curr_line.indexOf( ',', pos1 ) +1 ;     pos3 = curr_line.indexOf( ',', pos2 )+ 1 ;     pos4 = curr_line.indexOf( ',', pos3 ) + 1 ;     pos5 = curr_line.indexOf( ',', pos4 ) + 1 ;     length = curr_line.length() ;   actual_line_buf.append ( curr_line.substring( 0,pos1) ) ;   actual_line_buf.append ( curr_line.substring( pos2 , pos3) ) ;   actual_line_buf.append ( curr_line.substring( pos3 , pos4) ) ;   actual_line_buf.append ( curr_line.substring( pos5 , length) ) ;   actual_line_buf.append ( "\n" ) ;  // System.out.println("actual_line "+ actual_line_buf.toString() +"\n ") ;   s2_buf.insert( 0 , actual_line_buf.toString() ) ;  // System.out.println("s2_buf "+ s2_buf.toString() +"\n ") ;   actual_line_buf.delete(0 , (length + 1) ) ;   // System.out.println("actual_line "+ actual_line_buf.toString() +"\n ");    }    try    {   // System.out.println("s2_buf "+ s2_buf.toString() +"\n ") ;    BufferedWriter  destination_1 =      new BufferedWriter(      new OutputStreamWriter(   new FileOutputStream("C:\\Stocks\\Stock Data\\"             +TickerSymbol +".csv")) ) ;     destination_1.write(s2_buf.toString() , 0,               s2_buf.toString().length() ) ;     destination_1.flush() ;     destination_1.close() ;    }    catch ( IOException io )    {     System.out.println("Error opening output file \n ") ;    }  } // End change_string}

Listing 1. Source code for Download_hist.java.

Date dt = new Date(System.currentTimeMillis());        String yyyy_mm_dd = (formatter.format(dt) ) ;        String curr_year = yyyy_mm_dd.substring(0,4) ;        String curr_month = yyyy_mm_dd.substring(5,7) ;        String curr_date = yyyy_mm_dd.substring(8,10) ;        // Go Back 5 Years        int_from_year = Integer.parseInt( curr_year ) - 5 ;

In the lines shown above, five years are subtracted from the current system date/year.

Next, construct a

TargetURL
String with parameters required by the site — the URL class is extremely useful for this kind of networking, since it significantly simplifies the programmer's work.

An input stream is set up and buffered (buffered streams increase efficiency substantially). Then, put the contents in a StringBuffer,

strbuf
. The append method of this class is used to keep adding lines to the buffer. Loop through until the
bytes_read
variable is -1; then break out of the loop. When there is no more information to be read from the input stream, the
bytes_read
variable is set to -1. Write the contents of the downloaded information into a file, then close the connection with
in.close()
and the file output stream with
destination.close ()
.

while ( true ) { //  System.out.println(inputLine); int bytes_read = in.read( buffer) ; if ( bytes_read == -1 ) break ; destination.write(buffer , 0, bytes_read ) ; // write to File System.out.println( "Downloaded " + bytes_read + " bytes for ticker " +     TickerSymbol ); strbuf.append( new String( buffer , 0, bytes_read ) ) ;}

The information in the

strbuf
variable is given in the format shown below.

Week of, Open, High, Low, Close, Volume17-Jan- 0,100.2969,105.75,99.875,100.0625,3691710010-Jan- 0,85.75,106.625,84.125,103.0625,595062003-Jan- 0,83.2656,87.375,77.375,82,24701600..30-Jan-95,8.8713,9.2455,8.7154,9.1676,648740023-Jan-95,8.5283,8.9181,8.4816,8.8557,579020016-Jan-95,8.5439,8.809,8.4504,8.5829,6431200

Manipulating the Downloaded Information

The
change_string
method is then called to manipulate the contents downloaded to
strbuf
. A list of changes made by
change_string
to the downloaded information follows.
  • Eliminate the header line (
    Week of,
    ...)
  • Eliminate the Open and the Close columns in every line
  • Sort the file so the last line in the original download becomes the first line.

Eliminate the Header Line

The
StringTokenizer
class is used to parse the contents of the String
s1
into individual lines as shown below.

    String s1 = new String ( strbuf.toString() ) ;    StringTokenizer tokens = new                 StringTokenizer( s1, "\n" ) ;    StringBuffer s2_buf = new StringBuffer( ) ;    StringBuffer actual_line_buf = new StringBuffer() ;

The "\n" is used as the delimiter, now every call to

tokens.nextToken
will return the next line. In order to skip the first line, you may skip over the first token.

tokens.nextToken() ;        while ( tokens.hasMoreTokens() )
The
tokens.hasMoreTokens()
method will be true if there are any other tokens left; use this loop through all the tokens (lines) until the end.

Eliminate the Open and the Close Columns

Now, process each line to eliminate the columns Open and Close. Look at the first line processed within the loop:

        17-Jan- 0,100.2969,105.75,99.875,100.0625,36917100

You must eliminate the values in bold. Simply identify the location of the ',' character and extract portions of the appropriate line. The variables

pos1, pos2, ... pos5
, are used to indicate the positions of the ',' character.

pos1 = curr_line.indexOf ( ',', 1) + 1 ;

The substring method is used to retrieve portions of the required string. The following line will append to

actual_line_buf
contents of
curr_line
from
0
to
pos1
; which would be
17-Jan- 0
.

actual_line_buf.append ( curr_line.substring( 0,pos1) ) ;actual_line_buf.append ( curr_line.substring( pos2 , pos3) ) ;actual_line_buf.append ( curr_line.substring( pos3 , pos4) ) ;actual_line_buf.append ( curr_line.substring( pos5 , length) ) ;actual_line_buf.append ( "\n" ) ;

Do not extract

pos1-pos2
and
pos4-pos5
, because they are to be discarded.

Sort the File

Sort so the last line in the original download becomes the first line. Do this by using the
insert
method to sort the file in the reverse order. The
insert
method will add the contents of the actual line before the contents of
s2_buf
. Note the subtle difference between the
append
and the
insert
methods of the
StringBuffer
class.

s2_buf.insert( 0 , actual_line_buf.toString() ) ;

Contents of

s2_buf
at the end of
change_string
function:

16-Jan-95,8.809,8.4504,643120023-Jan-95,8.9181,8.4816,579020030-Jan-95,9.2455,8.7154,6487400......3-Jan- 0,87.375,77.375,2470160010-Jan- 0,106.625,84.125,5950620017-Jan- 0,105.75,99.875,36917100

The contents of

s2_buf
are then written to the file
C:\Stocks\StockData\<Ticker>.csv
.

Summary

Java is well-suited for programs that harvest information from the Web. Happily, the URL class hides much of the complexity associated with networking. The
StringBuffer
class is useful for holding the contents downloaded into memory; the
StringTokenizer
class is effective in parsing the String into tokens and then manipulating them. This program is easily adapted to harvesting other specific kinds of information from the Web.

To run:

    java download_hist intc

To compile:

    javac download_hist.java

Files created:

    C:\Stocks\Stock Data\intc.csv    C:\Stocks\Stock Data\ intc_old.csv

About the Author

Sharath Sahadevan is a senior software engineer with MasterCard International in St. Louis. His team supports the MasterCard Settlement Account Management application. He has a bachelor's degree in electrical and electronics engineering from P.S.G College of Technology, India (1990). When not working, he enjoys spending time with his family, as well as playing tennis, cricket, basketball, and chess.






Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel