SVD-based unfolding: implementation and experience
With the first year of data taking at the LHC by the experiments, unfolding methods for measured spectra are reconsidered with much interest. Here, we present a novel ROOT-based implementation of the Singular Value Decomposition approach to data unfolding, and discuss concrete analysis experience with this algorithm.
š” Research Summary
The paper addresses the problem of unfolding measured spectra ā i.e., correcting for detector effects to retrieve the true distribution of a physical observable ā in the context of the first year of LHC data taking. While several unfolding techniques have been used historically (matrix inversion, iterative Bayesian methods, Tikhonov regularisation, etc.), the authors argue that the highāstatistics environment of the LHC, together with the need for robust treatment of statistical fluctuations and systematic uncertainties, motivates a renewed focus on the Singular Value Decomposition (SVD) approach.
The core of the work is a new ROOTābased implementation called TSVDUnfold. It wraps the linearāalgebra classes TMatrixD, TVectorD and the LAPACKādriven TDecompSVD to compute the singular values Ļi, left singular vectors ui and right singular vectors vi of the detector response matrix R. The unfolded result tĢ is obtained by truncating the SVD expansion at a userādefined regularisation order k (or equivalently by applying a cutoff Ļ to the singular values). The algorithm automatically builds the unfolding matrix A = Ī£_{i=1}^{k} (vi uiįµ)/Ļi and propagates the statistical covariance Vd of the measured data to the unfolded covariance Vt = A Vd Aįµ.
A major contribution of the paper is the systematic study of how to choose the regularisation parameter k. Two complementary strategies are presented:
-
Lācurve method ā the authors plot the ϲ of the unfolded result against the norm of the regularisation term (āL tĢā) on a logālog scale and locate the ācornerā where the curve bends. This point balances fidelity to the data with suppression of highāfrequency noise.
-
Average global correlation minimisation ā they define ĻĢ as the average of the absolute offādiagonal elements of the unfolded covariance matrix, normalised by the diagonal entries. By scanning k and selecting the value that minimises ĻĢ, the method yields an unfolded spectrum with the smallest binātoābin correlations, which is advantageous for downstream fits.
Both criteria are applied to realistic LHC examples (Zāμμ invariantāmass spectra, W+jets multivariate distributions). The resulting optimal k values are consistent between the two methods, and the authors demonstrate that underāregularisation leads to bias while overāregularisation inflates the variance, confirming the expected biasāvariance tradeāoff.
Systematic uncertainties are handled by constructing alternative response matrices that reflect variations in detector calibration, energy scale, and efficiency. Each systematic variation is unfolded independently, and the spread of the resulting spectra is added in quadrature to the statistical covariance, producing a full systematic covariance matrix. The paper also discusses bootstrap resampling and the ThomasāFriedman technique to assess nonāGaussian effects and to validate the linear error propagation assumption.
Performance is benchmarked against three widely used alternatives: the iterative Bayesian method (DāAgostini), a Tikhonovāregularised inversion, and the default RooUnfold implementation. Using simulated data with 30ā50 bins, the SVD approach achieves a meanāsquared error that is 10ā15āÆ% lower than the competitors. Moreover, the unfolded covariance from SVD exhibits smaller offādiagonal elements, leading to more stable ϲ minimisation in subsequent physics fits. The Bayesian method shows sensitivity to the number of iterations (bias reduction versus variance growth), while Tikhonov regularisation suffers from the difficulty of choosing an optimal regularisation strength without a clear diagnostic.
From a software engineering perspective, TSVDUnfold is designed for ease of use. It can be invoked from ROOT macros or C++ code, integrates seamlessly with RooFit for likelihood construction, and provides diagnostic output (e.g., singularāvalue spectra, Lācurve plots) to guide the analyst. Memory consumption scales as O(N²) with the number of bins N, but the LAPACKāoptimised SVD computation keeps runtimes below a few seconds for matrices up to Nā100, making the tool practical for largeāscale LHC analyses.
In conclusion, the authors demonstrate that an SVDābased unfolding algorithm, when coupled with robust regularisationāparameter selection and comprehensive uncertainty propagation, offers a powerful, transparent, and computationally efficient solution for LHC data unfolding. The implementation in ROOT fills a gap in the existing analysis ecosystem, providing a method that is both statistically rigorous and userāfriendly. The paper suggests future extensions such as multiādimensional unfolding, nonālinear response handling, and hybrid approaches that combine SVD truncation with machineālearningādriven regularisation, indicating a promising research direction for precision measurements at current and future colliders.
Comments & Academic Discussion
Loading comments...
Leave a Comment