Downloading stock prices in F# - Part IV - Async loader for splits

Luca Bolognese - Sep 2008

Other parts:

Downloading splits is a messy affair. The problem is that Yahoo doesn’t give you a nice comma-delimitated stream to work with. You have to parse the Html yourself (and it can be on multiple pages). At the end of the post, the overall result is kind of neat, but to get there we need a lot of busywork.

First, let’s deﬁne a function that constructs the correct URL to download splits from. Notice that you need to pass a page number to it.

let splitUrl ticker span page =
    "http://finance.yahoo.com/q/hp?s=" + ticker + "&a="
    + (span.Start.Month - 1).ToString() + "&b=" + span.Start.Day.ToString() + "&c=" 
    + span.Start.Year.ToString() + "&d=" + (span.End.Month - 1).ToString() + "&e="
    + span.End.Day.ToString() + "&f=" + span.End.Year.ToString() + "&g=v&z=66&y="
    + (66 * page).ToString();

The reason for this particular url format (i.e. 66 * page) is completely unknown to me. I also have the feeling that it might change in the future. Or maybe not given how many people rely on it.

I then describe the driver function for loading splits:

let rec loadWebSplitAsync ticker span page splits =
    let parseSplit text splits =
        List.append splits (parseSplits (scrapHtmlRows text)),
                                           not(containsDivsOrSplits (scrapHtmlCells text))
    async {
        let url = splitUrl ticker span page
        let! text = loadWebStringAsync url
        let splits, beyondLastPage = parseSplit text splits
        if beyondLastPage then return splits else
                                 return! loadWebSplitAsync ticker span (page + 1) splits }

This is a bit convoluted (it is an Async recursive function). Let’s go through it in some detail. First there is a nested function parseSplit. It takes an html string and a list of observations and returns a tuple of two elements. The ﬁrst element is the same list of observations augmented with the splits found in the text. The second element is a boolean that is true if we have navigated beyond the last page for the splits.

The function to test that we are beyond the last page is the following:

let containsDivsOrSplits cells =
    cells |> Seq.exists
        (fun (x:string) -> Regex.IsMatch(x, @"$.+Dividend", RegexOptions.Multiline)
                           || Regex.IsMatch(x, "Stock Split"))

This function just checks if the words Stock Split or Dividend are anywhere in the table. If they aren’t, then we have ﬁnished processing the pages for this particular ticker and date span.

The function to extract the splits observations from the web page takes some cells (a seq<seq>) as input and returns an observation list. It is reproduced below:

let parseSplits rows =
    let parseRow row =
        if row |> Seq.exists (fun (x:string) -> x.Contains("Stock Split"))
        then
            let dateS = Seq.hd row
            let splitS = Seq.nth 1 row
            let date = DateTime.Parse(dateS)
            let regex = Regex.Match(splitS,@"(d+)s+:s+(d+)s+Stock Split",
                                                                   RegexOptions.Multiline)
            let newShares = shares (float (regex.Groups.Item(1).Value))
            let oldShares = shares (float (regex.Groups.Item(2).Value))
            Some({Date = date; Event = Split(newShares / oldShares)})
        else None
    rows |> Seq.choose parseRow |> Seq.to_list

It just take a bunch of rows and choose the ones that contain stock split information. For these, it parses the information out of the text and creates a Split Observation out of it. I think it is intuitive what the various Seq functions do in this case. Also note my overall addiction to the pipe operator ( |> ). In my opinion this is the third most important keyword in F# (after ‘let’ and ‘match’).

Let’s now go back to the loadWebSplitAsync function and discuss the rest of it. In particular this part:

async {
    let url = splitUrl ticker span page
    let! text = loadWebStringAsync url
    let splits, beyondLastPage = parseSplit text splits
    if beyondLastPage then return splits else
        return! loadWebSplitAsync ticker span (page + 1) splits }

First of all it is an Async function. You should expect some Async stuff to go on inside it. And indeed, after forming the URL in the ﬁrst line, the very next line is a call to loadWebStringAsync. We discussed this one in the previous installment. It just asynchronously loads a string from an URL. Notice the bang after ‘let’. This is your giveaway that async stuff is being performed.

The result of the async request is parsed to extract splits. Also, the beyondLastPage ﬂag is set if we have ﬁnished our work. If we have, we return the split observation list; if we haven’t, we do it again incrementing the page number to load the html text from.

Now that we have all the pieces in places, we can wrap up the split loading stuff inside this facade function:

let loadSplitsAsync ticker span = loadWebSplitAsync ticker span 0 []

And ﬁnally put together the results of this post and the previous one with the overall function-to-rule-them-all:

let loadTickerAsync ticker span =
    async {
        let prices = loadPricesAsync ticker span
        let divs =  loadDivsAsync ticker span
        let splits = loadSplitsAsync ticker span
        let! prices, divs, splits = Async.Parallel3 (prices, divs, splits)
        return prices |> List.append divs |> List.append splits
        }

All right, that was a lot of work to get to this simple thing. This is a good entry point to our price/divs/split loading framework. It has the right inputs and outputs: it takes a ticker and a date span and returns an Async of a list of observations. Our caller can decide when he wants to execute the returned Async object.

Notice that in the body of the function I call Async.Parallel. This is debatable. A more ﬂexible solution is to return a tuple containing three Asyncs (prices, divs, splits) and let the caller decide how to put them together. I decided against this for simplicity reasons. This kind of trade-off is very common in Async programming: giving maximum ﬂexibility to your caller against exposing something more understandable.

I have to admit I didn’t enjoy much writing (and describing) all this boilerplate code. I’m sure it can be written in a better way. I might rewrite plenty of it if I discover bugs. I kind of like the end result though. loadTickerAsync has an overall structure I’m pretty happy with.

Next post, some algorithms with our observations

Comments

Luca Bolognese's WebLog : Down

2008-09-19T18:01:11Z

PingBack from http://blogs.msdn.com/lucabol/archive/2008/09/12/downloading-stock-prices-in-f-part-iii-async-loader-for-prices-and-divs.aspx

Luca Bolognese's WebLog

2008-10-20T18:48:07Z

Other parts: Part I - Data modeling Part II - Html scraping Part III - Async loader for prices and divs

Tags

Comments

Luca Bolognese's WebLog : Down

Luca Bolognese's WebLog