Data analysis recipes: Probability calculus for inference
In this pedagogical text aimed at those wanting to start thinking about or brush up on probabilistic inference, I review the rules by which probability distribution functions can (and cannot) be combined. I connect these rules to the operations performed in probabilistic data analysis. Dimensional analysis is emphasized as a valuable tool for helping to construct non-wrong probabilistic statements. The applications of probability calculus in constructing likelihoods, marginalized likelihoods, posterior probabilities, and posterior predictions are all discussed.
💡 Research Summary
The paper serves as a pedagogical bridge for readers who are either new to probabilistic inference or wish to refresh their understanding of the underlying calculus. It begins by revisiting the foundational definitions of probability, emphasizing the relationship between joint, marginal, and conditional probabilities, and deriving Bayes’ theorem in its most general form. A central theme throughout the manuscript is dimensional analysis: the author stresses that probability density functions (pdfs) and probability mass functions (pmfs) carry units (e.g., 1/length for a continuous variable) and that any algebraic manipulation must respect these units. This perspective is used to flag common mistakes such as multiplying unrelated probability terms, which leads to unit inconsistencies and mathematically invalid statements.
The discussion then moves to the construction of likelihood functions. The likelihood L(θ | D) = p(D | θ) is presented not as a probability but as a dimension‑less function of the parameters θ given observed data D. For independent observations, the likelihood factorizes into a product of conditional probabilities, and the author demonstrates how the product retains the correct dimensionality only because each factor is itself a pdf evaluated at the data point. The paper clarifies that likelihoods cannot be normalized on their own; they become proper probability distributions only after multiplication by a prior p(θ) and division by the evidence p(D).
Marginalization is treated next. When a model contains nuisance parameters φ, the marginalized likelihood (or evidence) is obtained by integrating over φ:
L_marg(θ | D) = ∫ p(D | θ, φ) p(φ) dφ.
The author shows that the prior p(φ) is essential for preserving dimensional consistency; omitting it would leave the integral with the wrong units and render the result meaningless. Practical examples, such as marginalizing over a variance hyper‑parameter in a Gaussian linear regression, illustrate the technique.
Posterior inference follows naturally from Bayes’ theorem: p(θ | D) = L(θ | D) p(θ) / p(D). The manuscript explains how the posterior inherits the dimensionality of the prior (since the likelihood is dimensionless) and how it can be sampled or approximated using standard methods (MCMC, variational inference, etc.).
The final substantive section covers posterior predictive distributions. The predictive density for a future observation ỹ is given by
p(ỹ | D) = ∫ p(ỹ | θ) p(θ | D) dθ.
Here, dimensional analysis again guarantees that the integral yields a proper pdf in the space of ỹ, because p(ỹ | θ) carries the appropriate units and p(θ | D) is unit‑less. The author demonstrates how this formulation yields predictive intervals, model checking tools (e.g., posterior predictive checks), and a natural way to incorporate uncertainty about θ into future predictions.
Throughout the text, the author intersperses “common‑pitfall” boxes that list typical errors encountered in applied work: ignoring priors, treating likelihoods as probabilities, failing to include Jacobian terms when transforming variables, and neglecting the proper limits of integration during marginalization. Each pitfall is revisited with a dimensional‑analysis checklist that allows practitioners to quickly verify the correctness of their expressions.
In summary, the paper provides a concise yet thorough roadmap for constructing valid probabilistic statements in data analysis. By anchoring every step—likelihood formation, marginalization, posterior computation, and predictive inference—in the language of units and dimensions, it equips readers with a practical diagnostic tool that complements formal mathematical derivations. The result is a clear, actionable guide that bridges theory and practice, enabling both novices and seasoned analysts to perform Bayesian inference without inadvertently violating the fundamental rules of probability calculus.
Comments & Academic Discussion
Loading comments...
Leave a Comment