The miniJPAS survey quasar selection V: combined algorithm

The miniJPAS survey quasar selection V: combined algorithm
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Aims. Quasar catalogues from narrow-band photometric data are used in a variety of applications, including targeting for spectroscopic follow-up, measurements of supermassive black hole masses, or Baryon Acoustic Oscillations. Here, we present the final quasar catalogue, including redshift estimates, from the miniJPAS Data Release constructed using several flavours of machine-learning algorithms. Methods. In this work, we use a machine learning algorithm to classify quasars, optimally combining the output of 8 individual algorithms. We assess the relative importance of the different classifiers. We include results from 3 different redshift estimators to also provide improved photometric redshifts. We compare our final catalogue against both simulated data and real spectroscopic data. Our main comparison metric is the $f_1$ score, which balances the catalogue purity and completeness. Results. We evaluate the performance of the combined algorithm using synthetic data. In this scenario, the combined algorithm outperforms the rest of the codes, reaching $f_1=0.88$ and $f_1=0.79$ for high- and low-z quasars (with $z\geq2.1$ and $z<2.1$, respectively) down to magnitude $r=23.5$. We further evaluate its performance against real spectroscopic data, finding different performances. We conclude that our simulated data is not realistic enough and that a new version of the mocks would improve the performance. Our redshift estimates on mocks suggest a typical uncertainty of $σ_{\rm NMAD} =0.11$, which, according to our results with real data, could be significantly smaller (as low as $σ_{\rm NMAD}=0.02$). We note that the data sample is still not large enough for a full statistical consideration.


💡 Research Summary

**
This paper presents the final quasar catalogue derived from the miniJPAS Data Release, together with photometric redshift estimates, by optimally combining the outputs of several machine‑learning classifiers. The authors build on a series of previous works that introduced eight individual classification algorithms (three convolutional neural networks, two decision‑tree based methods, two artificial neural networks, and the SQUEZE code) and two additional redshift‑only estimators. Rather than feeding raw photometric measurements into a new model, they use the confidence scores and redshift predictions from these pre‑existing tools as input features for a meta‑classifier that produces a single, unified quasar probability and a refined redshift estimate for each object.

The data set consists of 46 441 objects detected in the miniJPAS survey (56 narrow‑band + 4 broad‑band filters) after quality‑flag cleaning. A “point‑like” subsample of 11 419 objects is defined using the ER‑T stellarity index (≥ 0.1) to focus on sources that appear stellar, as high‑redshift quasars are typically unresolved. For validation, the authors employ two complementary data sources: (1) synthetic “mock” catalogs built from SDSS spectra convolved with the J‑PAS filter set and injected with realistic noise, and (2) a cross‑matched sample of 3 730 objects with spectra from the DESI Early Data Release, of which 1 171 are point‑like. The mock catalogues are split into training (100 000), validation (30 000), test (30 000) and a special 1 deg² test set that respects the expected sky fractions of stars, galaxies, and quasars.

Each of the eight classifiers outputs four confidence values corresponding to the classes star, galaxy, low‑z quasar (z < 2.1) and high‑z quasar (z ≥ 2.1). SQUEZE additionally provides a list of trial redshifts and associated confidence, while the two redshift‑only tools supply independent photometric redshift estimates. These eleven quantities constitute the feature vector for the meta‑classifier, which is trained on the mock training set and tuned on the validation set. Feature‑importance analysis reveals that the CNN‑2 model (a 2‑D convolutional network) and SQUEZE dominate the combined performance, especially for the high‑z quasar regime where emission‑line detection is crucial.

Performance is quantified primarily by the F₁ score, which balances purity and completeness. On the mock test set the combined algorithm achieves F₁ = 0.88 for high‑z quasars and F₁ = 0.79 for low‑z quasars down to r = 23.5 mag, outperforming each individual classifier. The photometric redshift accuracy on mocks is σ_NMAD = 0.11. When applied to the DESI cross‑matched sample, the F₁ scores drop, reflecting the fact that the mocks do not fully capture the complexities of real observations (e.g., noise properties, source morphology). Notably, about 18 % of spectroscopically confirmed quasars are classified as extended by the ER‑T metric, leading to missed detections in the point‑like sample. Nevertheless, the redshift estimates on real data are considerably better than on mocks, with σ_NMAD as low as 0.02, indicating that the inclusion of actual spectral information dramatically improves redshift precision.

The authors acknowledge several limitations. The mock catalogues assume equal numbers of stars, galaxies, and quasars, which is unrealistic and may bias the training. The point‑like/extended classification based on morphology is imperfect, especially at faint magnitudes where the signal‑to‑noise ratio is low. The spectroscopic validation sample is relatively small (≈ 1 200 point‑like objects), limiting the statistical robustness of the conclusions. They suggest that future work should focus on generating more realistic mocks, refining morphological classifiers, and expanding the spectroscopic validation with larger DESI releases.

In summary, this study demonstrates that a well‑designed ensemble of heterogeneous machine‑learning models can substantially improve quasar identification and photometric redshift estimation in a narrow‑band photometric survey. The combined algorithm leverages the strengths of deep‑learning image classifiers, tree‑based methods, and line‑detection techniques to achieve high completeness and purity, particularly for high‑redshift quasars where Ly α enters the optical window. The results provide a valuable resource for targeting spectroscopic follow‑up, studying the intergalactic medium, and performing cosmological analyses such as Baryon Acoustic Oscillation measurements with quasar samples derived from miniJPAS.


Comments & Academic Discussion

Loading comments...

Leave a Comment