Bayesian inference in F# - Part I - Background

Luca Bolognese - Nov 2008

Other posts:

My interest in Bayesian inference comes from my dissatisfaction with ‘classical’ statistics. Whenever I want to know something, for example the probability that an unknown parameter is between two values, ‘classical’ statistics seems to answer a different and more convoluted question.

Try asking someone what the 95% conﬁdence interval for X is (x1, x2) means. Very likely he will tell you that it means that there is a 95% probability that X lies between x1 and x2. That is not the case in classical statistics. It is the case in Bayesian statistics. Also all the funny business of deﬁning a Null hypothesis for the sake of proving its falseness always made my head spin. You don’t need any of that in Bayesian statistics. More recently, my discovery that statistical significance is an harmful concept, instead of the bedrock of knowledge I always thought it to be, shook my conﬁdence in ‘classical’ statistics even more.

Admittedly, I’m not that smart. If I have an hard time getting an intuitive understanding of something, it tends to go away from my mind after a couple of days I’ve learned it. This happens all the time with ‘classical’ statistics. I feel like I have learned the thing ten times, because I continuously forget it. This doesn’t happen with Bayesian statistics. It just makes intuitive sense.

At this point you might be wandering what ‘classical’ statistics is. I use the term classical, but I really shouldn’t. Classical statistics is normally just called ‘statistics’ and it is all you learn if you pick up whatever book on the topic (for example the otherwise excellent Introduction to the Practice of Statistics). Bayesian statistics is just a footnote in such books. This is a shame.

Bayesian statistics provides a much clearer and elegant framework for understanding the process of inferring knowledge from data. The underlying question that it answers is: If I hold an opinion about something and I receive additional data on it, how should I rationally change my opinion?. This question of how to update your knowledge is at the very foundation of human learning and progress in general (for example the scientiﬁc method is based on it). We better be sure that the way we answer it is sound.

You might wander how it is possible to go against something that is so widely accepted and taught everywhere as ‘classical’ statistics is. Well, very many things that most people believe are wrong. I always like to cite old Ben on this: The fact that other people agree or disagree with you makes you neither right nor wrong. You will be right if your facts and your reasoning are correct… This little rule always served me well.

In this series of posts I will give examples of Bayesian statistics in F#. I am not a statistician, which makes me part of the very dangerous category of ’people who are not statisticians but talk about statistics. To try to mitigate the problem I enlisted the help of Ralf Herbrich, who is a statistician and can catch my most blatant errors. Obviously I’ll manage to hide my errors so cleverly that not even Ralf would spot them. In which case the fault is just mine.

In the next post we’ll look at some F# code to model the Bayesian inference process.

Comments

Bayesian inference in F# - Par

2008-11-07T12:04:04Z

PingBack from http://www.tmao.info/bayesian-inference-in-f-part-i-background/

barrkel

2008-11-07T15:09:46Z

Can you recommend some reading for Bayesian statistics?

lucabol

2008-11-07T15:30:44Z

One of the reasons it is not that popular is the lack of good introductory books on it. The one I use is “Bayesian Data Analysis (2nd edition)” by Gelman, Carlin, Stan and Rubin.
It is a good books with plenty of practical examples, but it is not exactly easy. In this series of posts I’ll try to be easy. We’ll see.

configurator

2008-11-09T08:25:07Z

So what does “the 95% conﬁdence interval for X is (x1, x2)” mean in ‘classical’ statistics?

lucabol

2008-11-09T09:16:32Z

A conﬁdence interval is a range in which the observed sample parameter is supposed to fall if you repeated the experiment many (how many?) times. They are a way to describe the sample distribution, not the distribution of the underlying parameter you set out to discover. As I said in the blog post, things get pretty convoluted in classical statistic.
From Wikipedia (http://en.wikipedia.org/wiki/Conﬁdence_interval) : “For a given proportion p (where p is the conﬁdence level), a conﬁdence interval for a population parameter is an interval that is calculated from a random sample of an underlying population such that, if the sampling was repeated numerous times and the conﬁdence interval recalculated from each sample according to the same method, a proportion p of the conﬁdence intervals would contain the population parameter in question …
Conﬁdence intervals play a similar role in frequentist statistics to the credibility interval in Bayesian statistics. However, conﬁdence intervals and credibility intervals are not only mathematically different; they have radically different interpretations.
A Bayesian interval estimate is called a credible interval. Using much of the same notation as above, the definition of a credible interval for the unknown true value of θ is, for a given α,
“Pr of θ given X=x”(u(x) < θ < v(x))) = 1 - α
Here Θ is used to emphasize that the unknown value of θ is being treated as a random variable. The definitions of the two types of intervals may be compared as follows.
The definition of a conﬁdence interval involves probabilities calculated from the distribution of X for given (θ,φ) (or conditional on these values) and the condition needs to hold for all values of (θ,φ).
The definition of a credible interval involves probabilities calculated from the distribution of Θ conditional on the observed values of X=x and marginalised (or averaged) over the values of Φ, where this last quantity is the random variable corresponding to the uncertainty about the nuisance parameters in φ.
Note that the treatment of the nuisance parameters above is often omitted from discussions comparing conﬁdence and credible intervals but it is markedly different between the two cases.
In some simple standard cases, the intervals produced as conﬁdence and credible intervals from the same data set can be identical. They are always very different if moderate or strong prior information is included in the Bayesian analysis.
Meaning and interpretation
For users of frequentist methods, various interpretations of a conﬁdence interval can be given.
- The conﬁdence interval can be expressed in terms of samples (or repeated samples): “Were this procedure to be repeated on multiple samples, the calculated conﬁdence interval (which would differ for each sample) would encompass the true population parameter 90% of the time.” [5] Note that this need not be repeated sampling from the same population, just repeated sampling [6].
- The explanation of a conﬁdence interval can amount to something like: “The conﬁdence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the 10% level”[7]. In fact, this relates to one particular way in which a conﬁdence interval may be constructed.
- The probability associated with a conﬁdence interval may also be considered from a pre-experiment point of view, in the same context in which arguments for the random allocation of treatments to study items are made. Here the experimenter sets out the way in which they intend to calculate a conﬁdence interval and know, before they do the actual experiment, that the interval they will end up calculating has a certain chance of covering the true but unknown value. This is very similar to the “repeated sample” interpretation above, except that it avoids relying on considering hypothetical repeats of a sampling procedure that may not be repeatable in any meaningful sense.
In each of the above, the following applies. If the true value of the parameter lies outside the 90% conﬁdence interval once it has been calculated, then an event has occurred which had a probability of 10% (or less) of happening by chance.
Users of Bayesian methods, if they produced an interval estimate, would by contrast want to say “My degree of belief that the parameter is in fact in this interval is 90%” [8]. See Credible interval. Disagreements about these issues are not disagreements about solutions to mathematical problems. Rather they are disagreements about the ways in which mathematics is to be applied.
”

Eber Irigoyen

2008-11-09T23:19:52Z

“Admittedly, I’m not that smart.“
where does that leave us, mere mortals?

Luca Bolognese's WebLog

2009-01-19T11:48:04Z

Other parts: Part I — Background Part II — A simple example — modeling Maia The previous post ended on

Luca Bolognese's WebLog

2009-01-19T11:49:37Z

Other parts: Part I - Background Part IIb - Finding Maia underlying attitude Let’s start with a simple

Tags

Comments

Bayesian inference in F# - Par

barrkel

lucabol

configurator

lucabol

Eber Irigoyen

Luca Bolognese's WebLog

Luca Bolognese's WebLog