Bayesian inference in F# - Part I - Background


Other posts:

My in­ter­est in Bayesian in­fer­ence comes from my dis­sat­is­fac­tion with classical’ sta­tis­tics. Whenever I want to know some­thing, for ex­am­ple the prob­a­bil­ity that an un­known pa­ra­me­ter is be­tween two val­ues, classical’ sta­tis­tics seems to an­swer a dif­fer­ent and more con­vo­luted ques­tion.

Try ask­ing some­one what the 95% con­fi­dence in­ter­val for X is (x1, x2) means. Very likely he will tell you that it means that there is a 95% prob­a­bil­ity that X lies be­tween x1 and x2. That is not the case in clas­si­cal sta­tis­tics. It is the case in Bayesian sta­tis­tics. Also all the funny busi­ness of defin­ing a Null hy­poth­e­sis for the sake of prov­ing its false­ness al­ways made my head spin. You don’t need any of that in Bayesian sta­tis­tics. More re­cently, my dis­cov­ery that sta­tis­ti­cal sig­nif­i­cance is an harm­ful con­cept, in­stead of the bedrock of knowl­edge I al­ways thought it to be, shook my con­fi­dence in classical’ sta­tis­tics even more.

Admittedly, I’m not that smart. If I have an hard time get­ting an in­tu­itive un­der­stand­ing of some­thing, it tends to go away from my mind af­ter a cou­ple of days I’ve learned it. This hap­pens all the time with classical’ sta­tis­tics. I feel like I have learned the thing ten times, be­cause I con­tin­u­ously for­get it. This does­n’t hap­pen with Bayesian sta­tis­tics. It just makes in­tu­itive sense.

At this point you might be wan­der­ing what classical’ sta­tis­tics is. I use the term clas­si­cal, but I re­ally should­n’t. Classical sta­tis­tics is nor­mally just called statistics’ and it is all you learn if you pick up what­ever book on the topic (for ex­am­ple the oth­er­wise ex­cel­lent Introduction to the Practice of Statistics). Bayesian sta­tis­tics is just a foot­note in such books. This is a shame.

Bayesian sta­tis­tics pro­vides a much clearer and el­e­gant frame­work for un­der­stand­ing the process of in­fer­ring knowl­edge from data. The un­der­ly­ing ques­tion that it an­swers is: If I hold an opin­ion about some­thing and I re­ceive ad­di­tional data on it, how should I ra­tio­nally change my opin­ion?. This ques­tion of how to up­date your knowl­edge is at the very foun­da­tion of hu­man learn­ing and progress in gen­eral (for ex­am­ple the sci­en­tific method is based on it). We bet­ter be sure that the way we an­swer it is sound.

You might wan­der how it is pos­si­ble to go against some­thing that is so widely ac­cepted and taught every­where as classical’ sta­tis­tics is. Well, very many things that most peo­ple be­lieve are wrong. I al­ways like to cite old Ben on this: The fact that other peo­ple agree or dis­agree with you makes you nei­ther right nor wrong. You will be right if your facts and your rea­son­ing are cor­rect… This lit­tle rule al­ways served me well.

In this se­ries of posts I will give ex­am­ples of Bayesian sta­tis­tics in F#. I am not a sta­tis­ti­cian, which makes me part of the very dan­ger­ous cat­e­gory of people who are not sta­tis­ti­cians but talk about sta­tis­tics. To try to mit­i­gate the prob­lem I en­listed the help of Ralf Herbrich, who is a sta­tis­ti­cian and can catch my most bla­tant er­rors. Obviously I’ll man­age to hide my er­rors so clev­erly that not even Ralf would spot them. In which case the fault is just mine.

In the next post we’ll look at some F# code to model the Bayesian in­fer­ence process.




Can you rec­om­mend some read­ing for Bayesian sta­tis­tics?

One of the rea­sons it is not that pop­u­lar is the lack of good in­tro­duc­tory books on it. The one I use is Bayesian Data Analysis (2nd edi­tion)” by Gelman, Carlin, Stan and Rubin.
It is a good books with plenty of prac­ti­cal ex­am­ples, but it is not ex­actly easy. In this se­ries of posts I’ll try to be easy. We’ll see.



So what does the 95% con­fi­dence in­ter­val for X is (x1, x2)” mean in classical’ sta­tis­tics?

A con­fi­dence in­ter­val is a range in which the ob­served sam­ple pa­ra­me­ter is sup­posed to fall if you re­peated the ex­per­i­ment many (how many?) times. They are a way to de­scribe the sam­ple dis­tri­b­u­tion, not the dis­tri­b­u­tion of the un­der­ly­ing pa­ra­me­ter you set out to dis­cover. As I said in the blog post, things get pretty con­vo­luted in clas­si­cal sta­tis­tic.
From Wikipedia (http://​​wiki/​Con­fi­dence_in­ter­val) : For a given pro­por­tion p (where p is the con­fi­dence level), a con­fi­dence in­ter­val for a pop­u­la­tion pa­ra­me­ter is an in­ter­val that is cal­cu­lated from a ran­dom sam­ple of an un­der­ly­ing pop­u­la­tion such that, if the sam­pling was re­peated nu­mer­ous times and the con­fi­dence in­ter­val re­cal­cu­lated from each sam­ple ac­cord­ing to the same method, a pro­por­tion p of the con­fi­dence in­ter­vals would con­tain the pop­u­la­tion pa­ra­me­ter in ques­tion …
Confidence in­ter­vals play a sim­i­lar role in fre­quen­tist sta­tis­tics to the cred­i­bil­ity in­ter­val in Bayesian sta­tis­tics. However, con­fi­dence in­ter­vals and cred­i­bil­ity in­ter­vals are not only math­e­mat­i­cally dif­fer­ent; they have rad­i­cally dif­fer­ent in­ter­pre­ta­tions.
A Bayesian in­ter­val es­ti­mate is called a cred­i­ble in­ter­val. Using much of the same no­ta­tion as above, the de­f­i­n­i­tion of a cred­i­ble in­ter­val for the un­known true value of θ is, for a given α,
Pr of θ given X=x”(u(x) < θ < v(x))) = 1 - α
Here Θ is used to em­pha­size that the un­known value of θ is be­ing treated as a ran­dom vari­able. The de­f­i­n­i­tions of the two types of in­ter­vals may be com­pared as fol­lows.
The de­f­i­n­i­tion of a con­fi­dence in­ter­val in­volves prob­a­bil­i­ties cal­cu­lated from the dis­tri­b­u­tion of X for given (θ,φ) (or con­di­tional on these val­ues) and the con­di­tion needs to hold for all val­ues of (θ,φ).
The de­f­i­n­i­tion of a cred­i­ble in­ter­val in­volves prob­a­bil­i­ties cal­cu­lated from the dis­tri­b­u­tion of Θ con­di­tional on the ob­served val­ues of X=x and mar­gin­alised (or av­er­aged) over the val­ues of Φ, where this last quan­tity is the ran­dom vari­able cor­re­spond­ing to the un­cer­tainty about the nui­sance pa­ra­me­ters in φ.
Note that the treat­ment of the nui­sance pa­ra­me­ters above is of­ten omit­ted from dis­cus­sions com­par­ing con­fi­dence and cred­i­ble in­ter­vals but it is markedly dif­fer­ent be­tween the two cases.
In some sim­ple stan­dard cases, the in­ter­vals pro­duced as con­fi­dence and cred­i­ble in­ter­vals from the same data set can be iden­ti­cal. They are al­ways very dif­fer­ent if mod­er­ate or strong prior in­for­ma­tion is in­cluded in the Bayesian analy­sis.
Meaning and in­ter­pre­ta­tion
For users of fre­quen­tist meth­ods, var­i­ous in­ter­pre­ta­tions of a con­fi­dence in­ter­val can be given.
- The con­fi­dence in­ter­val can be ex­pressed in terms of sam­ples (or re­peated sam­ples): Were this pro­ce­dure to be re­peated on mul­ti­ple sam­ples, the cal­cu­lated con­fi­dence in­ter­val (which would dif­fer for each sam­ple) would en­com­pass the true pop­u­la­tion pa­ra­me­ter 90% of the time.” [5] Note that this need not be re­peated sam­pling from the same pop­u­la­tion, just re­peated sam­pling [6].
- The ex­pla­na­tion of a con­fi­dence in­ter­val can amount to some­thing like: The con­fi­dence in­ter­val rep­re­sents val­ues for the pop­u­la­tion pa­ra­me­ter for which the dif­fer­ence be­tween the pa­ra­me­ter and the ob­served es­ti­mate is not sta­tis­ti­cally sig­nif­i­cant at the 10% level”[7]. In fact, this re­lates to one par­tic­u­lar way in which a con­fi­dence in­ter­val may be con­structed.
- The prob­a­bil­ity as­so­ci­ated with a con­fi­dence in­ter­val may also be con­sid­ered from a pre-ex­per­i­ment point of view, in the same con­text in which ar­gu­ments for the ran­dom al­lo­ca­tion of treat­ments to study items are made. Here the ex­per­i­menter sets out the way in which they in­tend to cal­cu­late a con­fi­dence in­ter­val and know, be­fore they do the ac­tual ex­per­i­ment, that the in­ter­val they will end up cal­cu­lat­ing has a cer­tain chance of cov­er­ing the true but un­known value. This is very sim­i­lar to the repeated sam­ple” in­ter­pre­ta­tion above, ex­cept that it avoids re­ly­ing on con­sid­er­ing hy­po­thet­i­cal re­peats of a sam­pling pro­ce­dure that may not be re­peat­able in any mean­ing­ful sense.
In each of the above, the fol­low­ing ap­plies. If the true value of the pa­ra­me­ter lies out­side the 90% con­fi­dence in­ter­val once it has been cal­cu­lated, then an event has oc­curred which had a prob­a­bil­ity of 10% (or less) of hap­pen­ing by chance.
Users of Bayesian meth­ods, if they pro­duced an in­ter­val es­ti­mate, would by con­trast want to say My de­gree of be­lief that the pa­ra­me­ter is in fact in this in­ter­val is 90%” [8]. See Credible in­ter­val. Disagreements about these is­sues are not dis­agree­ments about so­lu­tions to math­e­mat­i­cal prob­lems. Rather they are dis­agree­ments about the ways in which math­e­mat­ics is to be ap­plied.

Eber Irigoyen


Admittedly, I’m not that smart.“
where does that leave us, mere mor­tals?

Luca Bolognese's WebLog


Other parts: Part I — Background Part II — A sim­ple ex­am­ple — mod­el­ing Maia The pre­vi­ous post ended on

Luca Bolognese's WebLog


Other parts: Part I - Background Part IIb - Finding Maia un­der­ly­ing at­ti­tude Let’s start with a sim­ple