Downloading stock prices in F# - Part V - Adjusting historical data

-

Other parts:

Here is the prob­lem. When you down­load prices/​divs/​splits from Yahoo you get a strange mix of his­tor­i­cal num­bers and ad­justed num­bers. To be more pre­cise, the div­i­dends are his­tor­i­cally ad­justed. The prices are not ad­justed, but there is one last col­umn in the data for Adjusted close. If you don’t know what adjusted’ means in this con­text read here.

The prob­lem with us­ing the adjusted close’ col­umn is that, for a par­tic­u­lar date in the past, adjusted close’ changes when­ever the com­pany pays a div­i­dend or splits its stock. So if I re­trieve the value on two dif­fer­ent days I might get dif­fer­ent num­bers be­cause, in the mean­time, the com­pany paid a div­i­dend. This pre­vents me from stor­ing a sub­set of the data lo­cally and then re­triev­ing other sub­sets later on. It also has the lim­i­ta­tion that just the clos­ing price is pre­sent while I might need ad­justed open­ing price, ad­justed high price or even ad­justed vol­ume de­pend­ing on the op­er­a­tions I want to per­form on the data (i.e. cal­cu­lat­ing os­cil­la­tors or vol­ume-ad­justed mov­ing av­er­ages).

The so­lu­tion I came up with is to down­load the data and trans­form it to an asHappened’ state. This state is sim­ply an un­ad­justed ver­sion of what hap­pened in the past. Data in this state is not go­ing to change in the fu­ture, which means that I can safely store it lo­cally. I can then on de­mand pro­duce historically ad­just­ed’ data when­ever I need to.

Ok, to the code. As it of­ten hap­pens, I need some aux­il­iary func­tions be­fore I get to the core of the al­go­rithms. The first one is a way to com­pare two ob­ser­va­tions, I will use it later on to sort a list of ob­ser­va­tions.

let compareObservations obs1 obs2 =
    if obs1.Date <> obs2.Date then obs2.Date.CompareTo(obs1.Date)
    else
        match obs1.Event, obs2.Event with
            | Price _, Price _ | Div _, Div _ | Split _, Split _
-> failwith "Two same date/ same kind observations" | Price _, _ -> -1 | _, Price _ -> 1 | _ ->

This is rather sim­ple. If the dates of these ob­ser­va­tions are dif­fer­ent, just com­pare them. If they are the same then the two ob­ser­va­tions can­not be of the same type (i.e. I can­not have two prices for a par­tic­u­lar date). Given that they are not of the same, then &(&^%!#$!4. Crap, that teaches me to put com­ments in my code! I think I’m putting the price in­for­ma­tion first, but I’m not sure. Anyhow my uni­ver­sal ex­cuse not to fig­ur­ing it out is that the test­cases suc­ceed so I must be do­ing it right (how lame, test­case-ad­dic­tion I guess).

The next aux­il­iary func­tion is just a wrap­per over fold. I al­ways tend to wrap fold calls in a method with a bet­ter name be­cause I re­mem­ber the old times when I did­n’t know what fold was. I want a reader of my code to be able to un­der­stand it even if they are not fa­mil­iar with fold (the universal func­tional Swiss-Army-Knife). This func­tion is a map that needs to know the value of an ac­cu­mu­la­tor to cor­rectly per­form its map­ping over each el­e­ment.

let mapAcc acc newAccF newItemF inl =
    let foldF (acc, l) x = newAccF acc x, (newItemF acc x)::l
    let _, out = inl |> List.fold_left foldF (acc, [])
    out

Apart from the im­ple­men­ta­tion de­tails, this func­tion takes an ac­cu­mu­la­tor, an ac­cu­mu­la­tor func­tion, an item func­tion and an in­put list. For each el­e­ment in the list it cal­cu­lates two things:

  1. a new value for the accumulator: newAccumulatorValue = newAccF oldAccValue itemValue
  2. a new value for the item: new ItemValue = newItemF accValue oldItemValue

Maybe there is a stan­dard func­tional way to do such a thing with a spe­cific name that I’m not aware of. Luke might know. He is my res­i­dent fold ex­pert.

All right, now to he main al­go­rithm.

let asHappened splitFactor observations =
    let newSplitFactor splitFactor obs =
        match obs.Event with
            | Split(factor) -> splitFactor * factor
            | _             -> splitFactor
    let newObs splitFactor obs =
        let date = obs.Date
        let event = match obs.Event with
                        | Price(p)                  -> Price(p)
                        | Div(amount)               -> Div(amount * splitFactor)
                        | Split(factor)             -> Split(factor)
        {Date = date; Event = event}
    observations
    |> List.sort compareObservations
    |> mapAcc splitFactor newSplitFactor newObs

To un­der­stand what’s go­ing on start from the bot­tom. I’m tak­ing the ob­ser­va­tion list down­loaded from Yahoo and sort­ing it us­ing my com­pare­Ob­ser­va­tions func­tion. I then take the re­sult­ing list and ap­ply the pre­vi­ously de­scribed ma­pAcc to it. For this func­tion split­Fac­tor is the ac­cu­mu­la­tor, newS­plit­Fac­tor is the ac­cu­mu­la­tor func­tion and newObs is the func­tion that gen­er­ate a new value for each item in the list.

NewSplitFactor is triv­ial: every time it sees a Split ob­ser­va­tion it up­dates the value of the split fac­tor. That’s it. NewObs is rather sim­ple as well. Every time it sees a div­i­dend, it unadjust’ it by mul­ti­ply­ing its amount by the split fac­tor. The end re­sult is to trans­form the div­i­dends down­loaded from Yahoo (which are ad­justed) to an un­ad­justed state. I could have fil­tered out the price ob­ser­va­tions be­fore do­ing all of this and add them back af­ter­ward, but did­n’t. It’d prob­a­bly be slower

Now that I can recre­ate the state of the world as it was at a par­tic­u­lar point in time, what if I want to ad­just the data? I can call ad­justed

let adjusted (splitFactor, lastDiv, oFact, hFact, lFact, cFact, vFact)
asHappenedObs = let newFactor (splitFactor, lastDiv, oFact, hFact, lFact, cFact, vFact) obs = match obs.Event with | Split(split) ->
splitFactor * split, lastDiv, oFact, hFact, lFact, cFact, vFact | Div(div) -> splitFactor, div, oFact, hFact, lFact, cFact, vFact | Price(p) ->
splitFactor, 0.<money>, oFact / (1. - lastDiv / p.Open),
hFact / (1. - lastDiv / p.High), lFact / (1. - lastDiv / p.Low),
cFact / (1. - lastDiv / p.Close), vFact / (1. - lastDiv / p.Close) let newObs (splitFactor, lastDiv, oFact, hFact, lFact, cFact, vFact) obs = let date = obs.Date let event = match obs.Event with | Price(p) ->
Price({Open = p.Open / splitFactor / oFact;
High = p.High / splitFactor / hFact;
Low = p.Low / splitFactor / lFact;
Close = p.Close / splitFactor / cFact;
Volume = p.Volume / splitFactor / vFact }) | Div(amount) -> Div (amount / splitFactor) | Split(split) -> Split(split) {Date = date; Event = event} asHappenedObs |> List.sort compareObservations |> mapAcc (splitFactor, lastDiv, oFact, hFact, lFact, cFact, vFact)
newFactor newObs |> List.filter (fun x -> match x.Event with Split(_) -> false | _ -> true)

Wow, ok, this looks messy. Let’s go through it. Starting from the bot­tom: sort the ob­ser­va­tions, per­form the right al­go­rithm and fil­ter away all the splits. It does­n’t make sense to have splits in ad­justed data.

The in­ter­est­ing piece is the map­pAcc func­tion. It take a tu­ple of fac­tors as ac­cu­mu­la­tor and the usual two func­tions to up­date such tu­ple and cre­ate new ob­ser­va­tions. The newObs func­tion cre­ates a new Observation us­ing the fac­tors in the ac­cu­mu­la­tor tu­ple. Notice how the div­i­dends are di­vided by the split­Fac­tor (which is the op­po­site of our asHap­pened al­go­rithm where we were mul­ti­ply­ing them). Also no­tice how the prices are di­vided by both the split­Fac­tor and the per­ti­nent price fac­tor. This is needed be­cause the prices need to be ad­justed by the div­i­dends paid out and the ad­just­ment fac­tor is dif­fer­ent for each kind of price (i.e. open, close, etc). The new­Fac­tor func­tion sim­ply up­dates all the fac­tors de­pend­ing on the cur­rent ob­ser­va­tion.

Notice how asHap­pened and ad­justed are struc­turally sim­i­lar. This is an ar­ti­fact of hav­ing a func­tional ap­proach to writ­ing code: it kind of forces you to iden­tify these com­mon­al­ity in the way an al­go­rithm be­have and ab­stract them out (in this case in the ma­pAcc func­tion). You of­ten dis­cover that such ab­stracted-out pieces are more gen­er­ally use­ful than the case at hand.

Tags

2 Comments

Comments

Luca Bolognese's WebLog

2008-10-20T18:48:07Z

Other parts: Part I - Data mod­el­ing Part II - Html scrap­ing Part III - Async loader for prices and divs