Simulation-based cosmological inference from optically-selected galaxy clusters with $ exttt{Capish}$
Galaxy clusters are powerful probes of the growth of cosmic structure through measurements of their abundance as a function of mass and redshift. Extracting precise cosmological constraints from cluster surveys is challenging, as we must contend the complex relationship between richness and the underlying halo mass, selection function biases, super-sample covariance, and correlated measurement noise between mass proxies. As upcoming photometric surveys are expected to detect tens to hundreds of thousands of galaxy clusters, controlling these systematics becomes essential. In this paper, we present a forward-modelling approach using Simulation-Based Inference (SBI), which provides a natural framework for jointly modelling cluster abundance and lensing mass observables while capturing systematic uncertainties at higher fidelity than analytic likelihood methods - which rely on simplifying assumptions such as fixed covariances and Gaussianity - without requiring an explicit likelihood formulation. We introduce $\texttt{Capish}$, a Python code for generating forward-modelled galaxy cluster catalogues using halo mass functions and incorporating observational effects. We perform SBI using neural density estimation with normalizing flows, trained on abundance and mean lensing mass measurements in observed redshift-richness bins. Our forward model accounts for realistic noise, redshift uncertainties, selection functions, and correlated scatter between lensing mass and observed richness. We find good agreement with likelihood-based analyses, with broader SBI posteriors reflecting the increased realism of the forward model. We also test $\texttt{Capish}$ on cluster catalogues built from a large cosmological simulation, finding a good fit to cosmological parameters.
💡 Research Summary
This paper introduces Capish (Cluster abundance posterior inference from simulated halos), a Python framework that enables simulation‑based cosmological inference from optically‑selected galaxy cluster surveys. The authors argue that traditional analytic likelihood approaches, which assume fixed Gaussian covariances and neglect many observational systematics, become inadequate for the next generation of surveys (LSST, Euclid) that will detect tens to hundreds of thousands of clusters. To overcome these limitations they adopt Simulation‑Based Inference (SBI), also known as likelihood‑free inference, which relies on forward‑modeling the data and learning the mapping from simulated observables to underlying parameters with neural density estimators.
Capish implements three main stages.
- Halo catalogue generation – Using a state‑of‑the‑art halo mass function (e.g., Tinker et al.) and halo bias, the code draws halo masses and true redshifts over a fine mass–redshift grid. Poisson sampling provides the stochastic count, while Super‑Sample Covariance (SSC) is incorporated as a log‑normal multiplicative factor computed with the PySSC library. This reproduces both the mean halo abundance and its variance, including the extra covariance induced by large‑scale density fluctuations.
- Mapping halo properties to observables – Each halo is assigned an observed richness (λ_obs) and a weak‑lensing mass (log m_WL) via a multivariate Gaussian model. The mean richness follows a power‑law scaling with mass and redshift, and its scatter combines an intrinsic component with a Poisson term that accounts for the count‑in‑cell nature of richness. The lensing mass shares the same mass‑redshift dependence and is correlated with richness through a user‑specified correlation coefficient ρ. Additional observational effects—photometric‑z errors, miscentering, selection‑function biases—are parameterised and injected at this stage.
- Summary statistic computation – For a set of redshift–richness bins the code extracts two summary statistics: the number of clusters in each bin and the mean lensing mass of those clusters. These statistics are the inputs for the SBI pipeline.
The SBI component uses normalizing‑flow based Neural Density Estimators (NDEs). The authors generate tens of thousands of forward‑model realisations spanning a broad prior on cosmological parameters (Ω_m, σ_8, etc.) and nuisance parameters (scatter amplitudes, selection‑function parameters). The NDE learns the conditional density p(θ | d) directly from the simulated summary statistics, eliminating the need for an explicit likelihood function. Once trained, the estimator can be evaluated on real data to obtain posterior distributions.
Two validation exercises are presented. First, the authors compare Capish‑SBI results on mock catalogues to those obtained with a conventional Gaussian likelihood that includes Poisson and SSC terms. The posterior means agree, but the SBI posteriors are noticeably broader, reflecting the more realistic treatment of correlated scatter and non‑Gaussian noise. Second, they apply the full pipeline to a mock cluster catalogue derived from the Euclid Flagship N‑body simulation. The recovered cosmological parameters match the input values within the quoted uncertainties, demonstrating that the forward model faithfully captures the underlying physics.
Key insights from the study include:
- SSC matters – The variance of low‑mass cluster counts is dominated by SSC; neglecting it leads to underestimated errors.
- Richness–lensing correlation – Introducing a non‑zero ρ between richness and lensing mass changes the joint covariance and can bias results if ignored.
- Flexibility of SBI – By learning the full likelihood implicitly, SBI naturally accommodates non‑Gaussianities, parameter‑dependent covariances, and complex selection effects that would be cumbersome to encode analytically.
- Open‑source availability – Capish is released on GitHub, with modular code that can be extended to other selection methods (X‑ray, SZ) or additional mass proxies.
The authors acknowledge limitations. The richness–mass relation is modeled as a simple Gaussian scatter, whereas real data exhibit richer behaviours (e.g., richness cuts, member contamination). The computational cost of generating large training sets and training deep normalizing flows can be substantial, especially when expanding to higher‑dimensional parameter spaces. Future work is suggested on incorporating more sophisticated, possibly non‑Gaussian, scaling relations, on active‑learning strategies to reduce the number of required simulations, and on jointly analysing multi‑wavelength cluster samples.
In summary, the paper demonstrates that simulation‑based inference, combined with a well‑designed forward model of cluster physics and observational systematics, provides a robust and extensible pathway to extract unbiased cosmological constraints from the massive cluster samples expected in the coming decade. Capish serves as a ready‑to‑use toolkit for the community, bridging the gap between high‑precision data and the sophisticated statistical machinery needed to fully exploit it.
Comments & Academic Discussion
Loading comments...
Leave a Comment