Improving Correlation Function Fitting with Ridge Regression: Application to Cross-Correlation Reconstruction
Cross-correlation techniques provide a promising avenue for calibrating photometric redshifts and determining redshift distributions using spectroscopy which is systematically incomplete (e.g., current deep spectroscopic surveys fail to obtain secure redshifts for 30-50% or more of the galaxies targeted). In this paper we improve on the redshift distribution reconstruction methods presented in Matthews & Newman (2010) by incorporating full covariance information into our correlation function fits. Correlation function measurements are strongly covariant between angular or spatial bins, and accounting for this in fitting can yield substantial reduction in errors. However, frequently the covariance matrices used in these calculations are determined from a relatively small set (dozens rather than hundreds) of subsamples or mock catalogs, resulting in noisy covariance matrices whose inversion is ill-conditioned and numerically unstable. We present here a method of conditioning the covariance matrix known as ridge regression which results in a more well behaved inversion than other techniques common in large-scale structure studies. We demonstrate that ridge regression significantly improves the determination of correlation function parameters. We then apply these improved techniques to the problem of reconstructing redshift distributions. By incorporating full covariance information, applying ridge regression, and changing the weighting of fields in obtaining average correlation functions, we obtain reductions in the mean redshift distribution reconstruction error of as much as ~40% compared to previous methods. In an appendix, we provide a description of POWERFIT, an IDL code for performing power-law fits to correlation functions with ridge regression conditioning that we are making publicly available.
💡 Research Summary
This paper presents a substantial methodological improvement for reconstructing the redshift distribution of photometric galaxy samples using cross‑correlation techniques. The authors build upon the framework introduced by Matthews & Newman (2010), which demonstrated that the angular cross‑correlation between a spectroscopic reference sample and a photometric sample can be used to infer the true redshift distribution φₚ(z) of the latter. While the original work assumed that correlation‑function measurements in different angular or spatial bins were statistically independent, later theoretical work and simulations have shown that these bins are in fact strongly covariant. Ignoring this covariance leads to sub‑optimal parameter estimates and inflated uncertainties.
To address this, the authors first construct realistic mock catalogs based on the Millennium Simulation. They generate 24 DEEP2‑like light‑cones, each covering 0.5° × 2.0° and spanning 0.1 < z < 1.5. From each cone they draw a spectroscopic sample (60 % of galaxies with R < 24, mimicking DEEP2’s selection) and a photometric sample whose true redshifts are perturbed by a Gaussian error (σ_z = 0.20, bias = 0.75). The resulting data set contains roughly 35 k spectroscopic objects and 44 k photometric objects per cone.
The central statistical challenge is that the covariance matrices of the measured correlation functions are estimated from a relatively small number of mock realizations or jack‑knife regions. Such matrices are noisy, and their inversion can become unstable, producing wildly varying best‑fit parameters. The authors therefore introduce ridge regression (also known as Tikhonov regularization) as a conditioning technique. By adding a small multiple λ of the identity matrix to the estimated covariance (C′ = C + λI), they obtain a well‑behaved inverse. The optimal λ is determined through a risk analysis that balances bias against variance; the minimum risk occurs for λ ≈ 10⁻³, where the mean squared error of the fitted parameters is reduced by roughly 30 % relative to the unregularized case.
With a stable inverse in hand, the authors perform simultaneous power‑law fits to three correlation functions: the spectroscopic autocorrelation ξ_ss(r, z), the photometric angular autocorrelation w_pp(θ), and the angular cross‑correlation w_sp(θ, z). All three are modeled as power laws with an additive integral‑constraint term (w = A θ^{1‑γ} − C). The spectroscopic autocorrelation is measured in redshift slices, projected along the line of sight to obtain w_p(r_p), and fitted to extract r₀,ss(z) and γ_ss(z). The photometric autocorrelation yields A_pp and γ_pp. For the cross‑correlation, the authors fix γ_sp to the average of γ_ss and γ_pp (γ_sp = (γ_ss + γ_pp)/2) to break the strong degeneracy between amplitude and slope, then fit for A_sp and the integral constraint C_sp in each redshift bin. Importantly, they smooth the C_sp(z) curve with a Gaussian model, which dramatically improves the stability of the reconstruction.
A further refinement concerns how the average correlation functions are computed across the 24 light‑cones. The original method summed pair counts over all fields before applying the Landy‑Szalay estimator, which unintentionally gave higher weight to overdense fields. The new approach calculates the correlation function in each field separately and then takes an unweighted average, thereby avoiding bias from field‑to‑field density fluctuations.
The reconstruction proceeds iteratively. Starting from an initial guess that the photometric bias evolves like the spectroscopic bias (r₀,pp(z) ∝ r₀,ss(z)), the authors use Limber’s equation to update r₀,pp, recompute r₀,sp, and then solve for φₚ(z) from the measured w_sp using the analytic relation derived by Newman (2008). This loop is repeated until convergence.
When applied to the mock data, the ridge‑regularized, covariance‑aware fitting reduces the mean error on φₚ(z) by up to 40 % compared with the original MN10 pipeline. The improvement is most pronounced in the estimation of the mean redshift ⟨z⟩ and the width σ_z of the distribution, which are critical for weak‑lensing cosmology.
To facilitate adoption, the authors release POWERFIT, an IDL package that implements power‑law fitting with full covariance handling and optional ridge regression or singular‑value trimming. Users can supply their own covariance matrices, choose the regularization strength, and obtain best‑fit parameters together with robust error estimates.
In summary, this work demonstrates that proper treatment of covariance—especially when the covariance matrix is noisy—combined with ridge regression conditioning, yields significantly more accurate cross‑correlation measurements and, consequently, more reliable photometric redshift distribution reconstructions. The methodology is directly applicable to upcoming large‑scale surveys such as LSST, Euclid, and WFIRST, where precise knowledge of φₚ(z) will be essential for dark‑energy studies.
Comments & Academic Discussion
Loading comments...
Leave a Comment