Bayesian inference in F# – Part IIb – Finding Maia underlying attitude


Other parts:

The pre­vi­ous post ended on this note.

let MaiaJointProb attitude action =
    match attitude with
    | Happy     -> happyActions |> List.assoc action
    | UnHappy   -> unHappyActions |> List.assoc action
    | Quiet     -> quietActions |> List.assoc action

This is just a two by two ma­trix. It sim­ply rep­re­sents which prob­a­bil­ity is as­so­ci­ated to an (attitude, ac­tion) tu­ple. It is use­ful to think about it in these terms, be­cause it makes eas­ier to grasp the fol­low­ing func­tion:

/// Conditional probability of a mental state, given a particular observed action
let MaiaLikelihood action = fun attitude -> MaiaJointProb attitude action

This is sim­ply a row in the ma­trix. It an­swers the ques­tion: given that I ob­serve a par­tic­u­lar ac­tion, what is the prob­a­bil­ity that Maia has a cer­tain at­ti­tude?. This is called likelihood func­tion” in sta­tis­tics. Its gen­eral form is: given that a I ob­serve an out­come, what is the prob­a­bil­ity that it is gen­er­ated by a process with a par­tic­u­lar pa­ra­me­ter?

A re­lated ques­tion is then: what if I ob­serve a se­quence of in­de­pen­dent ac­tions? What is the prob­a­bil­ity that the baby has a cer­tain at­ti­tude then? This is an­swered by the fol­low­ing:

/// Multiple applications of the previous conditional probabilities for a series of actions (multiplied)
let MaiaLikelihoods actions =
    let composeLikelihoods previousLikelihood action  = fun attitude -> previousLikelihood attitude * MaiaLikelihood action attitude
    actions |> Seq.fold composeLikelihoods (fun attitude -> 1.)

It is a triv­ial ex­ten­sion of the pre­vi­ous func­tion (really), once you know that to com­bine like­li­hoods you mul­ti­ply them.

We now need to de­scribe what our prior is. A prior is our pre­con­ceived no­tion about a par­tic­u­lar pa­ra­me­ter (in this case the baby’s at­ti­tude). You might be tempted to ex­press that no­tion with a sin­gle value, but that would be in­ac­cu­rate. You need to in­di­cate how con­fi­dent you are about it. In sta­tis­tics you do that by choos­ing a dis­tri­b­u­tion for your be­lief. This is one of the beau­ties of Bayesian sta­tis­tics, every­thing is a prob­a­bil­ity dis­tri­b­u­tion. In this case, we re­ally don’t have any pre­vi­ous be­lief, so we pick the uni­form dis­tri­b­u­tion.

let MaiaUniformPrior attitude = 1. / 3.

Think of this as: you haven’t read any baby-at­ti­tude-spe­cific study or re­ceived any ex­ter­nal in­for­ma­tion about the likely at­ti­tude of Maia, so you can­not pre­fer one at­ti­tude over an­other.

We are al­most done. Now we have to ap­ply the Bayesian the­o­rem and get the un-nor­mal­ized pos­te­rior dis­tri­b­u­tion. Forget about the un-nor­mal­ized word. What is a pos­te­rior dis­tri­b­u­tion? This is your out­put, your re­turn value. It says: given my prior be­lief on the value of a pa­ra­me­ter and given the out­comes that I ob­served, this is what I now be­lieve the pa­ra­me­ter to be. In this case it goes like: I had no opin­ion on Maia’s at­ti­tude to start with, but af­ter I ob­served her be­hav­ior for a while, I now think she is Happy with prob­a­bil­ity X, UnHappy with prob­a­bil­ity Y and Quiet with prob­a­bil­ity Z.

/// Calculates the unNormalized posterior given prior and likelihood
let unNormalizedPosterior (prior:'a -> float) likelihood =
    fun theta -> prior theta * likelihood theta

We then need to nor­mal­ize this thing (it does­n’t sum to one). The way to do it is to di­vide each prob­a­bil­ity by the sum of the prob­a­bil­i­ties for all the pos­si­ble out­comes.

/// All possible values for the unobservable parameter (mental state)
let support = [Happy; UnHappy; Quiet]
/// Normalize the posterior (it integrates to 1.)
let posterior prior likelihood =
    let post = unNormalizedPosterior prior likelihood
    let sum = support |> List.sum_by (fun attitude -> post attitude)
    fun attitude -> post attitude / sum

We are done. Now we can now start mod­el­ing sce­nar­ios. Let’s say that you ob­serve [Smile;Smile;Cry;Smile;LookSilly]. What could the un­der­ly­ing at­ti­tude of Maia be?

let maiaIsANormalBaby = posterior MaiaUniformPrior (MaiaLikelihoods [Smile;Smile;Cry;Smile;LookSilly])

We can then ex­e­cute our lit­tle model:

maiaIsANormalBaby Happy
maiaIsANormalBaby UnHappy
maiaIsANormalBaby Quiet

And we get (0.5625, 0.0625, 0.375). So Maia is likely to be happy and un­likely to be un­happy. Let’s now model one ex­treme case:

/// Extreme cases
let maiaIsLikelyHappyDist = posterior MaiaUniformPrior (MaiaLikelihoods [Smile;Smile;Smile;Smile;Smile;Smile;Smile])
maiaIsLikelyHappyDist Happy
maiaIsLikelyHappyDist UnHappy
maiaIsLikelyHappyDist Quiet

And we get (0.944, 0.000431, 0.05). Now Maia is al­most cer­tainly Happy. Notice that I can con­fi­dently make this af­fir­ma­tion be­cause my end re­sult is ex­actly what I was look­ing for when I started my quest. Using clas­si­cal sta­tis­tics, that would­n’t be the case.

A re­lated ques­tion I might want to ask is: given the pos­te­rior dis­tri­b­u­tion for at­ti­tude that I just found, what is the prob­a­bil­ity of ob­serv­ing a par­tic­u­lar ac­tion? In other words, given the model that I built, what does it pre­dict?

let posteriorPredictive jointProb posterior =
    let composeProbs previousProbs attitude = fun action -> previousProbs action + jointProb attitude action * posterior attitude
    support |> Seq.fold composeProbs (fun action -> 0.)
let nextLikelyUnknownActionDist = posteriorPredictive MaiaJointProb maiaIsLikelyHappyDist

I don’t have the strength right now to ex­plain the math­e­mat­i­cal un­der­pin­ning of this. In words, this says: con­sid­er­ing that Maia can have one of the pos­si­ble three Attitudes with the prob­a­bil­ity cal­cu­lated above, what is the prob­a­bil­ity that I ob­serve a par­tic­u­lar ac­tion? Notice that the sig­na­ture for it is: (Action –> float), which is the com­piler way to say it.

Now we can run the thing.

nextLikelyUnknownActionDist Smile
nextLikelyUnknownActionDist Cry
nextLikelyUnknownActionDist LookSilly

And we get (0.588, 0.2056, 0.2055). Why is that? We’ll talk about it in the next post.


1 Comment