Downloading stock prices in F# - Part IV - Async loader for splits

-

Other parts:

Downloading splits is a messy af­fair. The prob­lem is that Yahoo does­n’t give you  a nice comma-de­lim­i­tated stream to work with. You have to parse the Html your­self (and it can be on mul­ti­ple pages). At the end of the post, the over­all re­sult is kind of neat, but to get there we need a lot of busy­work.

First, let’s de­fine a func­tion that con­structs the cor­rect URL to down­load splits from. Notice that you need to pass a page num­ber to it.

let splitUrl ticker span page =
    "http://finance.yahoo.com/q/hp?s=" + ticker + "&a="
+ (span.Start.Month - 1).ToString() + "&b=" + span.Start.Day.ToString() + "&c="
+ span.Start.Year.ToString() + "&d=" + (span.End.Month - 1).ToString() + "&e="
+ span.End.Day.ToString() + "&f=" + span.End.Year.ToString() + "&g=v&z=66&y="
+ (66 * page).ToString();

The rea­son for this par­tic­u­lar url for­mat (i.e. 66 * page) is com­pletely un­known to me. I also have the feel­ing that it might change in the fu­ture. Or maybe not given how many peo­ple rely on it.

I then de­scribe the dri­ver func­tion for load­ing splits:

let rec loadWebSplitAsync ticker span page splits =
    let parseSplit text splits =
        List.append splits (parseSplits (scrapHtmlRows text)),
not(containsDivsOrSplits (scrapHtmlCells text)) async { let url = splitUrl ticker span page let! text = loadWebStringAsync url let splits, beyondLastPage = parseSplit text splits if beyondLastPage then return splits else
return!
loadWebSplitAsync ticker span (page + 1) splits }

This is a bit con­vo­luted (it is an Async re­cur­sive func­tion). Let’s go through it in some de­tail. First there is a nested func­tion pars­eS­plit. It takes an html string and a list of ob­ser­va­tions and re­turns a tu­ple of two el­e­ments. The first el­e­ment is the same list of ob­ser­va­tions aug­mented with the splits found in the text. The sec­ond el­e­ment is a boolean that is true if we have nav­i­gated be­yond the last page for the splits.

The func­tion to test that we are be­yond the last page is the fol­low­ing:

let containsDivsOrSplits cells =
    cells |> Seq.exists
(fun (x:string) -> Regex.IsMatch(x, @"$.+Dividend", RegexOptions.Multiline)
|| Regex.IsMatch(x, "Stock Split"))

This func­tion just checks if the words Stock Split or Dividend are any­where in the table. If they aren’t, then we have fin­ished pro­cess­ing the pages for this par­tic­u­lar ticker and date span.

The func­tion to ex­tract the splits ob­ser­va­tions from the web page takes some cells (a seq<seq>) as in­put and re­turns an ob­ser­va­tion list. It is re­pro­duced be­low:

let parseSplits rows =
    let parseRow row =
        if row |> Seq.exists (fun (x:string) -> x.Contains("Stock Split"))
        then
            let dateS = Seq.hd row
            let splitS = Seq.nth 1 row
            let date = DateTime.Parse(dateS)
            let regex = Regex.Match(splitS,@"(d+)s+:s+(d+)s+Stock Split",
RegexOptions.Multiline) let newShares = shares (float (regex.Groups.Item(1).Value)) let oldShares = shares (float (regex.Groups.Item(2).Value)) Some({Date = date; Event = Split(newShares / oldShares)}) else None rows |> Seq.choose parseRow |> Seq.to_list

It just take a bunch of rows and choose the ones that con­tain stock split in­for­ma­tion. For these, it parses the in­for­ma­tion out of the text and cre­ates a Split Observation out of it. I think it is in­tu­itive what the var­i­ous Seq func­tions do in this case. Also note my over­all ad­dic­tion to the pipe op­er­a­tor ( |> ). In my opin­ion this is the third most im­por­tant key­word in F# (after let’ and match’).

Let’s now go back to the load­Web­Spli­tA­sync func­tion and dis­cuss the rest of it. In par­tic­u­lar this part:

async {
    let url = splitUrl ticker span page
    let! text = loadWebStringAsync url
    let splits, beyondLastPage = parseSplit text splits
    if beyondLastPage then return splits else
return!
loadWebSplitAsync ticker span (page + 1) splits }

First of all it is an Async func­tion. You should ex­pect some Async stuff to go on in­side it. And in­deed, af­ter form­ing the URL in the first line, the very next line is a call to load­Web­StringA­sync. We dis­cussed this one in the pre­vi­ous in­stall­ment. It just asyn­chro­nously loads a string from an URL. Notice the bang af­ter let’. This is your give­away that async stuff is be­ing per­formed.

The re­sult of the async re­quest is parsed to ex­tract splits. Also, the be­yond­Last­Page flag is set if we have fin­ished our work. If we have, we re­turn the split ob­ser­va­tion list; if we haven’t, we do it again in­cre­ment­ing the page num­ber to load the html text from.

Now that we have all the pieces in places, we can wrap up the split load­ing stuff in­side this fa­cade func­tion:

let loadSplitsAsync ticker span = loadWebSplitAsync ticker span 0 []

And fi­nally put to­gether the re­sults of this post and the pre­vi­ous one with the over­all func­tion-to-rule-them-all:

let loadTickerAsync ticker span =
    async {
        let prices = loadPricesAsync ticker span
        let divs =  loadDivsAsync ticker span
        let splits = loadSplitsAsync ticker span
        let! prices, divs, splits = Async.Parallel3 (prices, divs, splits)
        return prices |> List.append divs |> List.append splits
        }

All right, that was a lot of work to get to this sim­ple thing. This is a good en­try point to our price/​divs/​split load­ing frame­work. It has the right in­puts and out­puts: it takes a ticker and a date span and re­turns an Async of a list of ob­ser­va­tions. Our caller can de­cide when he wants to ex­e­cute the re­turned Async ob­ject.

Notice that in the body of the func­tion I call Async.Parallel. This is de­bat­able. A more flex­i­ble so­lu­tion is to re­turn a tu­ple con­tain­ing three Asyncs (prices, divs, splits) and let the caller de­cide how to put them to­gether. I de­cided against this for sim­plic­ity rea­sons. This kind of trade-off is very com­mon in Async pro­gram­ming: giv­ing max­i­mum flex­i­bil­ity to your caller against ex­pos­ing some­thing more un­der­stand­able.

I have to ad­mit I did­n’t en­joy much writ­ing (and de­scrib­ing) all this boil­er­plate code. I’m sure it can be writ­ten in a bet­ter way. I might rewrite plenty of it if I dis­cover bugs. I kind of like the end re­sult though. loadTick­erA­sync has an over­all struc­ture I’m pretty happy with.

Next post,  some al­go­rithms with our ob­ser­va­tions

Tags

2 Comments

Comments

Luca Bolognese's WebLog

2008-10-20T18:48:07Z

Other parts: Part I - Data mod­el­ing Part II - Html scrap­ing Part III - Async loader for prices and divs