How to Use Experimental Data to Compute the Probability of Your Theory
This article is geared towards theorists interested in estimating parameters of their theoretical models, and computing their own limits using available experimental data and elementary Mathematica code. The examples given can be useful also to experimentalists who wish to learn how to use Bayesian methods. A thorough introduction precedes the practical part, to make clear the advantages and shortcomings of the method, and to prevent its abuse. The goal of this article is to help bridge the gap between theory and experiment.
💡 Research Summary
**
The paper by Georgios Choudalakis is a practical guide aimed at theorists who wish to confront their models with publicly available experimental data using Bayesian inference. It begins with a clear exposition of Bayes’ theorem, emphasizing the conceptual difference between Bayesian and frequentist approaches: while frequentists evaluate the probability of the data given a hypothesis (P(data | hypothesis)) and construct confidence intervals, Bayesians fix the observed data and compute the posterior probability of the hypothesis (P(hypothesis | data)), thereby providing a direct probability statement about the parameter of interest (POI).
A substantial portion of the manuscript is devoted to the role of the prior. The author distinguishes “subjective Bayesians,” who accept any prior as a legitimate expression of prior belief, from “objective Bayesians,” who seek priors with special invariance or maximal data‑sensitivity properties. He warns that extreme priors (e.g., a Dirac delta) can dominate the posterior regardless of the data, while more diffuse priors allow the data to speak. The paper encourages the user to explicitly state the chosen prior and to explore the sensitivity of results to alternative choices.
The core of the work is a step‑by‑step recipe for constructing a likelihood function from binned experimental results (typically obtained from HepData). Each bin is treated as an independent Poisson (or Gaussian for large counts) observation, with the expected count given by the theoretical prediction after detector effects. The author supplies Mathematica snippets that define the prior, the likelihood, and perform numerical integration to obtain the normalized posterior, as well as routines to locate the maximum a posteriori (MAP) estimate and credible intervals.
Systematic uncertainties are discussed qualitatively. The author shows how to introduce nuisance parameters with Gaussian (or log‑normal) priors and marginalize over them, but stresses that a full treatment requires detailed knowledge of the experimental systematic model, which is often only available within the collaboration. Consequently, the paper presents a “quick‑and‑dirty” limit that ignores systematics, arguing that for many searches the impact is at the few‑percent level.
A particularly valuable discussion concerns detector response. The author argues against unfolding because it inevitably introduces bias through regularisation, inflates variances, and destroys the simple Poisson nature of the data. Instead, he recommends folding the theoretical spectrum with a detector response matrix (migration matrix) supplied by the experiment. Folding preserves the known statistical properties of the observed histogram, allows the use of standard χ² or likelihood tests, and avoids the need for regularisation. He acknowledges that different new‑physics models may require different folding matrices (e.g., due to varying η distributions) and suggests a pragmatic approach: smear individual objects (jets, leptons, MET) using parametrised resolutions, which can be applied uniformly to any model.
Event selection is treated in detail. The paper explains how to reproduce the analysis cuts (pT, η, Δφ, MET, etc.) using generator‑level four‑vectors from tools such as Pythia and FastJet. It discusses the subtle differences between parton‑level, hadron‑level, and reconstructed‑level objects, offering correction factors (e.g., out‑of‑cone energy loss for anti‑kT jets) to map cuts appropriately. Fake MET arising from detector resolution is deemed negligible compared with genuine MET from invisible particles, so it can be ignored in many new‑physics scenarios.
Data acquisition is straightforward: the author points to HepData as the primary repository for binned spectra, background expectations, and—when available—migration matrices. He notes that other fields (e.g., observational astronomy) have similar open‑data cultures, implying that the methodology is broadly applicable.
Finally, the manuscript contains a strong ethical warning. The author stresses that any claim of discovery must be coordinated with the experimental collaboration that produced the data, and that theorists should treat their Bayesian results as exploratory rather than definitive. He also reminds readers that systematic uncertainties, background modeling, and detector effects are the domain of experimentalists, and that collaboration is essential for a complete analysis.
In summary, the paper delivers a self‑contained, Mathematica‑based workflow for Bayesian inference with real experimental data: define a prior, build a Poisson‑based likelihood from binned spectra, optionally include systematic nuisance parameters, fold the theoretical prediction with the detector response, apply the published event selection, and compute the posterior PDF and credible intervals. By doing so, theorists can obtain quantitative probability statements about their models without needing large‑scale computing resources or proprietary software, while remaining aware of the methodological limitations and the necessity of close interaction with experimental collaborations.
Comments & Academic Discussion
Loading comments...
Leave a Comment