Bayes' theorem

Bayes' theorem is a fundamental result in probability theory that expresses the conditional probability of an event A given the occurrence of another event B in terms of the conditional probability of B given A, the prior probability of A, and the prior probability of B. Formally, if $P(A)$ and $P(B)$ are the probabilities of events A and B, and $P(B|A)$ is the probability of B occurring under the assumption that A has occurred, then the theorem states:

$$ P(A|B) = \frac{P(B|A),P(A)}{P(B)}, $$

provided that $P(B) > 0$.

Historical Background

The theorem is named after the 18th‑century English statistician and theologian Thomas Bayes (c. 1701 – 1761), who first derived a special case of the result in an essay published posthumously in 1763. Pierre-Simon Laplace independently rediscovered and generalized the theorem in the early 19th century, contributing to its development within the framework of inverse probability.

Interpretation

  • Prior probability $P(A)$: The initial degree of belief in event A before observing any evidence B.
  • Likelihood $P(B|A)$: The probability of observing evidence B assuming that A is true.
  • Posterior probability $P(A|B)$: The updated belief in A after incorporating the evidence B.
  • Evidence $P(B)$: A normalizing constant ensuring that the posterior probabilities sum to one; it can be expressed as $P(B) = \sum_{i} P(B|A_i) P(A_i)$ when the set ${A_i}$ forms a partition of the sample space.

Applications

Bayes' theorem underlies many methods in statistics, machine learning, and information theory, including:

  1. Bayesian inference – updating probability distributions for model parameters as new data become available.
  2. Naïve Bayes classifiers – a family of simple probabilistic classifiers that assume conditional independence among features.
  3. Diagnostic testing – evaluating the probability of a disease given a positive test result, incorporating disease prevalence and test sensitivity/specificity.
  4. Spam filtering – determining the likelihood that an email is spam based on word frequencies.
  5. Decision theory – informing optimal decisions under uncertainty by combining prior beliefs with observed outcomes.

Mathematical Extensions

  • Continuous variables: For random variables with probability density functions, the theorem extends by replacing probabilities with densities.
  • Multiple evidence: When multiple independent pieces of evidence $B_1, B_2, \dots, B_n$ are observed, the posterior can be updated iteratively or via a joint likelihood $P(B_1, B_2, \dots, B_n|A)$.
  • Bayes factor: The ratio $ \frac{P(B|A_1)}{P(B|A_2)} $ compares the support that evidence B provides for two competing hypotheses $A_1$ and $A_2$.

Limitations and Considerations

  • Choice of prior: The posterior distribution depends on the selected prior; in situations with limited data, the prior can exert substantial influence.
  • Computational complexity: Exact Bayesian updating may be infeasible for high‑dimensional models, leading to the use of approximation techniques such as Markov chain Monte Carlo (MCMC) or variational inference.

See Also

  • Probability theory
  • Conditional probability
  • Bayesian statistics
  • Likelihood function
  • Prior and posterior distributions

References

  • Bayes, T. (1763). An Essay towards solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society of London.
  • Laplace, P. S. (1812). Théorie Analytique des Probabilités. Paris: Courcier.
Browse

More topics to explore