The RooStats Project
RooStats is a project to create advanced statistical tools required for the analysis of LHC data, with emphasis on discoveries, confidence intervals, and combined measurements. The idea is to provide the major statistical techniques as a set of C++ classes with coherent interfaces, so that can be used on arbitrary model and datasets in a common way. The classes are built on top of the RooFit package, which provides functionality for easily creating probability models, for analysis combinations and for digital publications of the results. We will present in detail the design and the implementation of the different statistical methods of RooStats. We will describe the various classes for interval estimation and for hypothesis test depending on different statistical techniques such as those based on the likelihood function, or on frequentists or bayesian statistics. These methods can be applied in complex problems, including cases with multiple parameters of interest and various nuisance parameters.
💡 Research Summary
The RooStats project delivers a comprehensive suite of advanced statistical tools tailored for the analysis of Large Hadron Collider (LHC) data, with a focus on discovery significance, confidence interval construction, and combined measurements across multiple channels or experiments. Built on top of the ROOT‑based RooFit package, RooStats inherits RooFit’s powerful capabilities for defining probability density functions, performing fits, and managing parameter sets, while adding a coherent, object‑oriented interface that abstracts the statistical methodology from the underlying model.
Design Philosophy
RooStats is organized around two central principles: model independence and interface consistency. All statistical calculations inherit from a common abstract base class, RooStats::Calculator, and return results through standardized objects such as RooStats::Interval for interval estimation or RooStats::HypothesisTestResult for hypothesis testing. This architecture allows users to plug any RooFit model—no matter how complex, with multiple signal components, background shapes, or correlated nuisance parameters—into the statistical machinery without rewriting code.
Interval Estimation
The library provides several complementary approaches:
- LikelihoodInterval – Implements profile likelihood ratio (PLR) methods, delivering one‑ or two‑dimensional confidence regions by scanning the likelihood surface and applying Wilks’ theorem or Monte‑Carlo calibration.
- Feldman‑Cousins – Generates intervals that respect physical boundaries and the ordering principle of the Feldman‑Cousins construction, useful for low‑count Poisson problems.
- BayesianInterval – Computes credible intervals from posterior distributions. It integrates a built‑in Markov‑Chain Monte‑Carlo (MCMC) sampler, allowing arbitrary prior specifications and efficient marginalisation over high‑dimensional nuisance spaces.
- HybridCalculator – Combines frequentist pseudo‑experiment generation with Bayesian marginalisation of nuisance parameters, yielding intervals that incorporate systematic uncertainties through prior PDFs while preserving frequentist coverage properties.
Hypothesis Testing
RooStats supports a rich set of test‑statistic calculators:
- SimpleHypothesisTest – Direct comparison of two fixed hypotheses (signal + background vs. background‑only) using a user‑defined test statistic; p‑values can be obtained analytically or via toy Monte‑Carlo.
- ProfileLikelihoodTestStat – Uses the profile likelihood ratio as the test statistic, providing asymptotic p‑values based on Wilks’ theorem and the option to perform full toy‑based calibration when the asymptotic regime is questionable.
- MCMCHypothesisTest – Implements a fully Bayesian hypothesis test, evaluating posterior odds or Bayes factors by sampling the joint posterior with MCMC.
- HybridCalculator – Extends the hybrid approach to hypothesis testing, offering the CLs method and the ability to treat nuisance parameters with priors while still delivering frequentist‑style exclusion limits.
Treatment of Systematics
Nuisance parameters are defined as RooRealVar objects within a RooArgSet. Users can assign Gaussian, log‑normal, or custom priors, and choose between profiling (maximising the likelihood) or marginalisation (integrating over the prior) on a per‑parameter basis. This flexibility enables both frequentist profiling and Bayesian marginalisation within the same framework, facilitating rigorous propagation of systematic uncertainties.
Extensibility and Reproducibility
All models, data, priors, and configuration settings can be stored in a RooWorkspace, which serves as a portable container for the entire statistical setup. This promotes reproducibility across analysis groups and simplifies the combination of results from different experiments. New statistical methods can be added by subclassing RooStats::Calculator and implementing the TestStatistic interface, without altering existing code.
Performance and Parallelisation
Implemented in native C++, RooStats benefits from ROOT’s optimized numerical libraries. Computationally intensive tasks such as likelihood scans, toy‑Monte‑Carlo generation, and MCMC sampling are accelerated through multi‑core execution using RooStats::BatchRunner or RooStats::ParallelCalculator. Benchmarks show that analyses with hundreds of parameters and millions of events can be completed within practical time frames on modern workstation clusters.
Impact
By unifying a wide spectrum of statistical techniques—frequentist, Bayesian, and hybrid—under a single, well‑documented API, RooStats empowers LHC physicists to focus on physics modelling while ensuring that the statistical inference is performed consistently, transparently, and reproducibly. The project thus represents a critical infrastructure component for the discovery and precision measurement program of the LHC and future high‑energy physics experiments.
Comments & Academic Discussion
Loading comments...
Leave a Comment