Situation: You want to automate the retrieval
of web data that is of interest to you, but was not necessarily designed for you.
In this paper, we will download historical NYSE quotes from Yahoo TechTicker.
Yahoo Finance offers a free service that allows you to retrieve historical quote on any NYSE ticker in an easy to use csv format.
its url is (and you can try it): http://ichart.finance.yahoo.com/table.csv?s=MSFT
Nothing fancy here, No SOAP, no XML, just a plain downlood of csv data and it is just a useful web service.
It has a few parameters, but the only one that is required is the ticker symbol: s is the ticker symbol.
There are also other parameters (start and end date, frequency...), but it would not be useful to detail them here.
All this interface information is public, you see it in the web browser's url bar when you request historical quotes from their website.
Because it is so simple, it can all be quickly used in HTML, you don't even need a server, your browser does that for you.
But using a web browser is not automatic, so some questions need another answer, for example
- How do I download the quotes for the whole Fortune 500 without spending my day clicking and saving files?
SHEEP's HttpGetClient allows you to automate your downloads Free HttpGetClient download.
Let us see how we do that with four downloads and I will let you generalize on your own time.
For personal reasons, I am very fond of Johnson and Johnson (JNJ), Pfizer (PFE), Pernod-Ricard (PER.F) and Yahoo (YHOO), so I will use them and download all their quotes from the beginning of 2008.
Here are the cmd and shell no-brainer scripts.
Windows Script
set URL1="http://ichart.finance.yahoo.com/table.csv?s=PFE&a=00&b=1&c=2008"
set URL2="http://ichart.finance.yahoo.com/table.csv?s=JNJ&a=00&b=1&c=2008"
set URL3="http://ichart.finance.yahoo.com/table.csv?s=PER.F&a=00&b=1&c=2008"
set URL4="http://ichart.finance.yahoo.com/table.csv?s=YHOO&a=00&b=1&c=2008"
java -jar CMHttpGetClient14.jar url=%URL1%
java -jar CMHttpGetClient14.jar url=%URL2%
java -jar CMHttpGetClient14.jar url=%URL3%
java -jar CMHttpGetClient14.jar url=%URL4%
Unix Script
export URL1="http://ichart.finance.yahoo.com/table.csv?s=PFE&a=00&b=1&c=2008"
export URL2="http://ichart.finance.yahoo.com/table.csv?s=JNJ&a=00&b=1&c=2008"
export URL3="http://ichart.finance.yahoo.com/table.csv?s=PER.F&a=00&b=1&c=2008"
export URL4="http://ichart.finance.yahoo.com/table.csv?s=YHOO&a=00&b=1&c=2008"
java -jar CMHttpGetClient14.jar url=$URL1
java -jar CMHttpGetClient14.jar url=$URL2
java -jar CMHttpGetClient14.jar url=$URL3
java -jar CMHttpGetClient14.jar url=$URL4
What to do and what to expect?
Download CMHttpGetClient14.zip using the link above.
We will assume you are under Windows and already have java installed, you can adapt to suit your installation of course.
Unzip it in your programs folder and create batch file called getTicker.cmd in C:\Program Files\CMHttpGetClient14.
Copy the Windows script above into getTicker.cmd and save.
Double click the batch file to start it.
The program creates a directory called ichart.finance.yahoo.com in your home directory (C:\Documents and Settings\you).
The program creates such a directory for each server it uses, this helps you to find your past queries from the file manager.
You can specify another location than your home, but that is not what we are talking about here.
Inside this server directory you will find a directory called table.csv.
The program creates a sub-directory for each service it uses, again this helps you to find your past queries.
In there, I will find four triplets of files with strange names.
The names of files in a triplet start with a unique id like -SHEEP-1219036228515-SHEEP-BJXRMK-SHEEP-
The first file has the .request.txt extension and remembers the request you issued.
The second one has the .log.txt extension and remembers the technical execution log.
The third one has the .response.txt extension and contains the returned data.
Note: The .response.txt is only an extension, it can actually contain anything (pics, video, xml...)
With such a simple script, that is all you get, the rest is up to you.
The point is, the http has been taken care of and you already know what you want to do with the response (rename, move, import in a database...)
Here is a simple java program to upload the same tickers, but at the same time rename the files to match their ticker.
Compile using your favorite method, all you have to do is put CMHttpGetClient14.jar to your classpath.
import com.compomentis.httpgetclient.HttpGetClient;
import java.io.File;
import java.io.IOException;
import java.util.Calendar;
import java.util.Vector;
public class TickerImport
{
Vector tickers = null;
HttpGetClient httpGetClient = null;
/** Creates a new instance of TickerImport */
public TickerImport()
{
tickers = new Vector();
}
public static void main(String args[]) throws IOException
{
TickerImport tickerImport = new TickerImport();
tickerImport.getTickers();
tickerImport.createClient();
tickerImport.run();
}
void getTickers()
{
// Here I hard-coded a list, you may want to do something more elegant
// What about getting it from a file, a database or even use HttpGetClient to get it from a web service?
tickers.add("PFE");
tickers.add("JNJ");
tickers.add("YHOO");
tickers.add("PER.F");
}
void createClient() throws IOException
{
httpGetClient = new HttpGetClient();
}
void run() throws IOException
{
for(int i=0; i < tickers.size(); i++)
{
String tickerId = (String) tickers.get(i);
prepareRequest(tickerId);
httpGetClient.run();
File responseFile = new File(httpGetClient.getResponseFile());
File betterFile = new File(responseFile.getParentFile(), tickerId + ".csv");
responseFile.renameTo(betterFile);
}
}
void prepareRequest(String tickerId) throws IOException
{
httpGetClient.setUrl("http://ichart.finance.yahoo.com/table.csv");
httpGetClient.clearFields();
httpGetClient.setField("a","00"); // January
httpGetClient.setField("b","1"); // The first
httpGetClient.setField("c","2008"); // 2008
httpGetClient.setField("s", tickerId);
}
}
Conclusion
Automating web data retrieval has no got to be a sophisticated task requiring lenghty coding.
At the time of writing this article (23/08/08) the four downloads amount for about one megabyte.
The script takes a minute to write and runs in about twenty seconds.
The java program takes five minutes to write and runs in a little less than six seconds.
Just for the fun of it, I have spent an hour writing a program that does the same but for 1,731 tickers and for all available quotes since the 01/01/1970...All up, I retrieved about three million quotes in two hours (database import included)
No XML, no SOAP, no framework...
|
|
|