Constraints on Yield Parameters in Extended Maximum Likelihood Fits

The method of extended maximum likelihood is a well known concept of parameter estimation. One can implement external knowledge on the unknown parameters by multiplying the likelihood by constraint terms. In this note, we emphasize that this is also true for yield parameters in an extended maximum likelihood fit, which is widely used in the particle physics community. We recommend a way to generate pseudo-experiments in presence of constraint terms on yield parameters, and point to pitfalls inside the RooFit framework.

💡 Research Summary

The paper addresses a subtle but important point in the use of extended maximum‑likelihood (EML) fits, which are a staple in particle‑physics data analysis. While it is common practice to incorporate external knowledge about shape parameters (such as means, widths, or efficiencies) by multiplying the likelihood with constraint terms, the authors emphasize that the same technique can and should be applied to yield (event‑count) parameters. They demonstrate mathematically that adding a Gaussian (or any appropriate) constraint on a yield does not break the statistical foundations of the EML formalism; instead, the total likelihood becomes

L_total(θ, ν) = L_EML(θ, ν) × C(ν)

where L_EML is the standard extended likelihood, ν is the total expected number of events, and C(ν) is a constraint term, typically a normal distribution with mean μ_ν and standard deviation σ_ν that encodes prior knowledge (e.g., an independent measurement of a branching fraction). By taking the logarithm, one sees that the yield estimator is a weighted combination of the Poisson‑derived count and the external measurement, with weights set by the relative uncertainties. Consequently, the estimator automatically interpolates between the data‑driven value (when the data are abundant) and the prior value (when the data are scarce).

A major contribution of the paper is a clear prescription for generating pseudo‑experiments (toy MC) in the presence of such constraints. Two approaches are discussed:

Data‑level generation – draw the observed number of events from a Poisson distribution with the true expected yield, then generate each event’s kinematic variables from the model PDF, and finally apply the constraint during the fit.
Parameter‑level generation – first draw a constrained yield value from the external Gaussian, then use that value as the Poisson mean for the event‑count generation.

The authors argue that the second method preserves the correct statistical relationship between the constraint and the Poisson fluctuation, avoids double‑counting of uncertainties, and is easier to implement when multiple constrained yields are present.

The paper also highlights practical pitfalls that arise when using the RooFit toolkit, a widely adopted framework for likelihood modelling. A common mistake is to attach a RooGaussianConstraint directly to a yield variable without explicitly coupling it to the Poisson term. RooFit then treats the constraint as an independent additive term, leading to an under‑estimation of the total variance and biased yield estimates. The authors provide a robust recipe: define the yield as a RooRealVar, construct the Poisson term with RooPoisson, build the shape PDF(s) with RooAddPdf, and finally combine everything with RooProdPdf that includes the Gaussian constraint. During the fit, the “ExternalConstraints” option must be enabled so that RooFit correctly propagates the constraint’s contribution to the covariance matrix.

A realistic physics example is presented: fitting a mass spectrum that contains overlapping signal and background components. An external measurement of the signal branching ratio (μ_ν ± σ_ν) is introduced as a Gaussian constraint on the signal yield. The constrained fit yields a signal count that lies between the pure Poisson estimate and the external value, with an uncertainty that reflects both sources of information. The authors show that the resulting confidence intervals are neither overly conservative (as would happen if the constraint were ignored) nor overly optimistic (as would happen if the constraint were applied incorrectly).

In summary, the paper makes three key points:

Yield parameters in an extended maximum‑likelihood fit can be constrained in exactly the same way as shape parameters, without violating the statistical integrity of the method.
When performing toy‑MC studies, the constrained yield should be sampled first (parameter‑level generation) and then used as the Poisson mean; this ensures that the pseudo‑experiments faithfully reproduce the combined effect of data fluctuations and external knowledge.
Within RooFit, one must explicitly couple the Gaussian constraint to the Poisson term; otherwise the framework will mishandle the uncertainty propagation. The authors provide concrete code snippets and a checklist to avoid this common error.

By integrating external yield information correctly, analysts can reduce bias, improve the precision of parameter estimates, and obtain more reliable uncertainty assessments—especially in analyses where signal yields are small or where multiple channels share common systematic constraints. The paper thus offers both a theoretical justification and a practical guide that will be valuable to anyone performing sophisticated likelihood fits in high‑energy physics.

💡 Research Summary

📜 Original Paper Content