Bayes' theorem

Bayes' theorem is a fundamental result in probability theory that expresses the conditional probability of an event A given the occurrence of another event B in terms of the conditional probability of B given A, the prior probability of A, and the prior probability of B. Formally, if $P(A)$ and $P(B)$ are the probabilities of events A and B, and $P(B|A)$ is the probability of B occurring under the assumption that A has occurred, then the theorem states:

$$ P(A|B) = \frac{P(B|A),P(A)}{P(B)}, $$

provided that $P(B) > 0$.

Historical Background

The theorem is named after the 18th‑century English statistician and theologian Thomas Bayes (c. 1701 – 1761), who first derived a special case of the result in an essay published posthumously in 1763. Pierre-Simon Laplace independently rediscovered and generalized the theorem in the early 19th century, contributing to its development within the framework of inverse probability.

Interpretation

Prior probability $P(A)$: The initial degree of belief in event A before observing any evidence B.
Likelihood $P(B|A)$: The probability of observing evidence B assuming that A is true.
Posterior probability $P(A|B)$: The updated belief in A after incorporating the evidence B.
Evidence $P(B)$: A normalizing constant ensuring that the posterior probabilities sum to one; it can be expressed as $P(B) = \sum_{i} P(B|A_i) P(A_i)$ when the set ${A_i}$ forms a partition of the sample space.

Applications

Bayes' theorem underlies many methods in statistics, machine learning, and information theory, including:

Bayesian inference – updating probability distributions for model parameters as new data become available.
Naïve Bayes classifiers – a family of simple probabilistic classifiers that assume conditional independence among features.
Diagnostic testing – evaluating the probability of a disease given a positive test result, incorporating disease prevalence and test sensitivity/specificity.
Spam filtering – determining the likelihood that an email is spam based on word frequencies.
Decision theory – informing optimal decisions under uncertainty by combining prior beliefs with observed outcomes.

Mathematical Extensions

Continuous variables: For random variables with probability density functions, the theorem extends by replacing probabilities with densities.
Multiple evidence: When multiple independent pieces of evidence $B_1, B_2, \dots, B_n$ are observed, the posterior can be updated iteratively or via a joint likelihood $P(B_1, B_2, \dots, B_n|A)$.
Bayes factor: The ratio $ \frac{P(B|A_1)}{P(B|A_2)} $ compares the support that evidence B provides for two competing hypotheses $A_1$ and $A_2$.

Limitations and Considerations

Choice of prior: The posterior distribution depends on the selected prior; in situations with limited data, the prior can exert substantial influence.
Computational complexity: Exact Bayesian updating may be infeasible for high‑dimensional models, leading to the use of approximation techniques such as Markov chain Monte Carlo (MCMC) or variational inference.

References

Bayes, T. (1763). An Essay towards solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society of London.
Laplace, P. S. (1812). Théorie Analytique des Probabilités. Paris: Courcier.

Find the meaning, context, and related topics in one search.