Telescopes dont make catalogues!

Astronomical instruments make intensity measurements; any precise astronomical experiment ought to involve modeling those measurements. People make catalogues, but because a catalogue requires hard decisions about calibration and detection, no catalogue can contain all of the information in the raw pixels relevant to most scientific investigations. Here we advocate making catalogue-like data outputs that permit investigators to test hypotheses with almost the power of the original image pixels. The key is to provide users with approximations to likelihood tests against the raw image pixels. We advocate three options, in order of increasing difficulty: The first is to define catalogue entries and associated uncertainties such that the catalogue contains the parameters of an approximate description of the image-level likelihood function. The second is to produce a K-catalogue sampling in “catalogue space” that samples a posterior probability distribution of catalogues given the data. The third is to expose a web service or equivalent that can re-compute on demand the full image-level likelihood for any user-supplied catalogue.

💡 Research Summary

The paper argues that astronomical observations fundamentally consist of intensity measurements recorded in raw image pixels, and that any rigorous scientific analysis must model those measurements directly. Traditional astronomical catalogs, however, are derived after a series of hard decisions—background subtraction, PSF calibration, detection thresholds, and source deblending—that inevitably discard information present in the original pixels. Consequently, catalogs cannot serve as a complete statistical summary for many research questions, especially those involving subtle flux variations, complex morphologies, or joint modeling of multiple sources.

To bridge this gap, the authors propose three increasingly sophisticated solutions. The first is a “likelihood‑approximating catalog.” Instead of providing only point estimates and simple error bars, each entry would include a full set of source parameters (position, flux, shape, etc.) together with a covariance matrix that encodes the local image‑level likelihood surface. Users could then insert these parameters into a likelihood function that approximates the original pixel data, enabling hypothesis testing with near‑pixel fidelity while remaining compatible with existing data pipelines.

The second solution is a “K‑catalog sampling” approach. In a Bayesian framework, the posterior distribution over all possible catalogs given the image data is sampled, producing a collection of plausible catalogs. Researchers can treat this ensemble as a Monte‑Carlo representation of the full uncertainty, allowing them to propagate catalog‑level errors through any downstream analysis or to compare competing models by integrating over the entire posterior. Implementing this requires efficient high‑dimensional samplers (MCMC, variational inference, or modern deep‑learning‑based methods) and raises storage and bandwidth concerns because many catalog realizations must be retained and shared.

The third and most ambitious proposal is an on‑demand image‑likelihood web service. Users submit a candidate catalog (or any parametric model of the sky), and the service computes the exact log‑likelihood against the calibrated raw images in real time, returning the result to the client. This offers the ultimate fidelity: no approximations, no loss of information, and the ability to test arbitrarily complex models such as multi‑component galaxies, time‑variable sources, or joint analyses across multiple bands. The trade‑off is the need for substantial computational infrastructure—high‑performance clusters, fast I/O, robust caching, and secure user authentication—to handle the massive data volumes and to guarantee reasonable response times.

The authors emphasize that these three options form a progressive roadmap. The likelihood‑approximating catalog can be deployed with modest changes to current pipelines and delivers immediate scientific gains. K‑catalog sampling adds a richer statistical description at the cost of higher computational and storage demands. The full image‑likelihood service provides the most complete inference capability but requires a dedicated, scalable backend.

In conclusion, the paper advocates a paradigm shift: rather than treating catalogs as the final product of astronomical observation, the community should treat the image‑level likelihood as the primary data product. By exposing approximations or exact evaluations of that likelihood, researchers retain the full informational content of the observations, enabling more accurate, reproducible, and flexible scientific investigations.

💡 Research Summary

📜 Original Paper Content