Bayesian inference in F# - Part I - Background - Luca Bolognese

Bayesian inference in F# - Part I - Background

Luca -

☕ 2 min. read

Other posts:

My in­ter­est in Bayesian in­fer­ence comes from my dis­sat­is­fac­tion with classical’ sta­tis­tics. Whenever I want to know some­thing, for ex­am­ple the prob­a­bil­ity that an un­known pa­ra­me­ter is be­tween two val­ues, classical’ sta­tis­tics seems to an­swer a dif­fer­ent and more con­vo­luted ques­tion.

Try ask­ing some­one what the 95% con­fi­dence in­ter­val for X is (x1, x2) means. Very likely he will tell you that it means that there is a 95% prob­a­bil­ity that X lies be­tween x1 and x2. That is not the case in clas­si­cal sta­tis­tics. It is the case in Bayesian sta­tis­tics. Also all the funny busi­ness of defin­ing a Null hy­poth­e­sis for the sake of prov­ing its false­ness al­ways made my head spin. You don’t need any of that in Bayesian sta­tis­tics. More re­cently, my dis­cov­ery that sta­tis­ti­cal sig­nif­i­cance is an harm­ful con­cept, in­stead of the bedrock of knowl­edge I al­ways thought it to be, shook my con­fi­dence in classical’ sta­tis­tics even more.

Admittedly, I’m not that smart. If I have an hard time get­ting an in­tu­itive un­der­stand­ing of some­thing, it tends to go away from my mind af­ter a cou­ple of days I’ve learned it. This hap­pens all the time with classical’ sta­tis­tics. I feel like I have learned the thing ten times, be­cause I con­tin­u­ously for­get it. This does­n’t hap­pen with Bayesian sta­tis­tics. It just makes in­tu­itive sense.

At this point you might be wan­der­ing what classical’ sta­tis­tics is. I use the term clas­si­cal, but I re­ally should­n’t. Classical sta­tis­tics is nor­mally just called statistics’ and it is all you learn if you pick up what­ever book on the topic (for ex­am­ple the oth­er­wise ex­cel­lent Introduction to the Practice of Statistics). Bayesian sta­tis­tics is just a foot­note in such books. This is a shame.

Bayesian sta­tis­tics pro­vides a much clearer and el­e­gant frame­work for un­der­stand­ing the process of in­fer­ring knowl­edge from data. The un­der­ly­ing ques­tion that it an­swers is: If I hold an opin­ion about some­thing and I re­ceive ad­di­tional data on it, how should I ra­tio­nally change my opin­ion?. This ques­tion of how to up­date your knowl­edge is at the very foun­da­tion of hu­man learn­ing and progress in gen­eral (for ex­am­ple the sci­en­tific method is based on it). We bet­ter be sure that the way we an­swer it is sound.

You might wan­der how it is pos­si­ble to go against some­thing that is so widely ac­cepted and taught every­where as classical’ sta­tis­tics is. Well, very many things that most peo­ple be­lieve are wrong. I al­ways like to cite old Ben on this: The fact that other peo­ple agree or dis­agree with you makes you nei­ther right nor wrong. You will be right if your facts and your rea­son­ing are cor­rect… This lit­tle rule al­ways served me well.

In this se­ries of posts I will give ex­am­ples of Bayesian sta­tis­tics in F#. I am not a sta­tis­ti­cian, which makes me part of the very dan­ger­ous cat­e­gory of people who are not sta­tis­ti­cians but talk about sta­tis­tics. To try to mit­i­gate the prob­lem I en­listed the help of Ralf Herbrich, who is a sta­tis­ti­cian and can catch my most bla­tant er­rors. Obviously I’ll man­age to hide my er­rors so clev­erly that not even Ralf would spot them. In which case the fault is just mine.

In the next post we’ll look at some F# code to model the Bayesian in­fer­ence process.



Can you recommend some reading for Bayesian statistics?

One of the reasons it is not that popular is the lack of good introductory books on it. The one I use is "Bayesian Data Analysis (2nd edition)" by Gelman, Carlin, Stan and Rubin.
It is a good books with plenty of practical examples, but it is not exactly easy. In this series of posts I'll try to be easy. We'll see.



So what does "the 95% confidence interval for X is (x1, x2)" mean in 'classical' statistics?

A confidence interval is a range in which the observed sample parameter is supposed to fall if you repeated the experiment many (how many?) times. They are a way to describe the sample distribution, not the distribution of the underlying parameter you set out to discover. As I said in the blog post, things get pretty convoluted in classical statistic.
From Wikipedia ( : "For a given proportion p (where p is the confidence level), a confidence interval for a population parameter is an interval that is calculated from a random sample of an underlying population such that, if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion p of the confidence intervals would contain the population parameter in question ...
Confidence intervals play a similar role in frequentist statistics to the credibility interval in Bayesian statistics. However, confidence intervals and credibility intervals are not only mathematically different; they have radically different interpretations.
A Bayesian interval estimate is called a credible interval. Using much of the same notation as above, the definition of a credible interval for the unknown true value of θ is, for a given α,
"Pr of θ given X=x"(u(x) < θ < v(x))) = 1 - α
Here Θ is used to emphasize that the unknown value of θ is being treated as a random variable. The definitions of the two types of intervals may be compared as follows.
The definition of a confidence interval involves probabilities calculated from the distribution of X for given (θ,φ) (or conditional on these values) and the condition needs to hold for all values of (θ,φ).
The definition of a credible interval involves probabilities calculated from the distribution of Θ conditional on the observed values of X=x and marginalised (or averaged) over the values of Φ, where this last quantity is the random variable corresponding to the uncertainty about the nuisance parameters in φ.
Note that the treatment of the nuisance parameters above is often omitted from discussions comparing confidence and credible intervals but it is markedly different between the two cases.
In some simple standard cases, the intervals produced as confidence and credible intervals from the same data set can be identical. They are always very different if moderate or strong prior information is included in the Bayesian analysis.
Meaning and interpretation
For users of frequentist methods, various interpretations of a confidence interval can be given.
- The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time." [5] Note that this need not be repeated sampling from the same population, just repeated sampling [6].
- The explanation of a confidence interval can amount to something like: "The confidence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the 10% level"[7]. In fact, this relates to one particular way in which a confidence interval may be constructed.
- The probability associated with a confidence interval may also be considered from a pre-experiment point of view, in the same context in which arguments for the random allocation of treatments to study items are made. Here the experimenter sets out the way in which they intend to calculate a confidence interval and know, before they do the actual experiment, that the interval they will end up calculating has a certain chance of covering the true but unknown value. This is very similar to the "repeated sample" interpretation above, except that it avoids relying on considering hypothetical repeats of a sampling procedure that may not be repeatable in any meaningful sense.
In each of the above, the following applies. If the true value of the parameter lies outside the 90% confidence interval once it has been calculated, then an event has occurred which had a probability of 10% (or less) of happening by chance.
Users of Bayesian methods, if they produced an interval estimate, would by contrast want to say "My degree of belief that the parameter is in fact in this interval is 90%" [8]. See Credible interval. Disagreements about these issues are not disagreements about solutions to mathematical problems. Rather they are disagreements about the ways in which mathematics is to be applied.

Eber Irigoyen


"Admittedly, I'm not that smart."
where does that leave us, mere mortals?

Luca Bolognese's WebLog


Other parts: Part I – Background Part II – A simple example – modeling Maia The previous post ended on

Luca Bolognese's WebLog


Other parts: Part I - Background Part IIb - Finding Maia underlying attitude Let's start with a simple

0 Webmentions

These are webmentions via the IndieWeb and