Classifying white dwarfs from multi-object spectroscopy surveys with machine learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With tens to hundreds of spectra of white dwarfs being taken each night from multi-object spectroscopic surveys, automated spectral classification is essential as part of efficient data processing. In this study, we design a neural network to classify the spectral type of white dwarfs using a combination of spectra from the Dark Energy Spectroscopic Instrument (DESI) data release~1 and imaging from Pan-STARRS photometry. The trained network has a near 100% accuracy at identifying DA and DB white dwarf spectral types, while having an 85-95% accuracy for identifying all other primary types, including metal pollution. Distinct spectral or photometric features map into separate structures when performing a Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction. Investigating further and looking at multiple epoch spectra, we performed a separate search for objects that have strongly changing spectral signatures using UMAP, discovering 3 new inhomogeneous surface composition (‘double-faced’) white dwarfs in the process. We lastly show how machine learning has the potential to separate single white dwarfs from double white dwarf binary star systems in a large dataset, ideal for isolating a single star population. The results from all of these techniques show a compelling use of machine learning to boost efficiency in analysing white dwarfs observed in multi-object spectroscopy surveys, at times replacing the need for human-driven spectral classifications. This demonstrates our techniques as powerful tools for batch population analyses, finding outliers as a form of rare subclass detection, and in conducting multi-epoch spectral analyses.

💡 Research Summary

The paper presents a comprehensive machine‑learning framework designed to automatically classify white‑dwarf (WD) spectra obtained from modern multi‑object spectroscopic (MOS) surveys, with a focus on the Dark Energy Spectroscopic Instrument (DESI) Data Release 1 (DR1) and complementary Pan‑STARRS photometry. The authors begin by downloading every co‑added DESI spectrum, cross‑matching the catalogue with the Gaia EDR3 white‑dwarf candidate list of Gentile Fusillo et al. (2021) and retaining only objects with a white‑dwarf probability greater than 0.5. This yields 41 268 unique sources. Human‑assigned spectral types from the Montreal White Dwarf Database (MWDD) are then used to label the data, after which a quality cut (non‑zero bitmask, sufficient signal‑to‑noise, removal of spectra with large gaps) reduces the sample to 19 292 high‑quality spectra. After discarding very rare classes and those with ambiguous labels, the final training set consists of 17 614 spectra spanning the main spectral families: DA, DB, DAB/DBA (merged into DA/DB due to scarcity), DQ, DZ, DC, DO, DAH, DAZ/DZA, DBZ/DZB, and cataclysmic variables.

Spectral preprocessing is performed separately on the blue arm (3600–5800 Å) and red arm (5760–7620 Å). The authors bin the blue arm in 40 Å intervals, apply a 3σ clipping, fit a 7th‑order polynomial to the binned flux, and iteratively remove outliers (±2.5σ). The red arm is binned in 24 Å intervals with a 2.5σ clipping and a 4th‑order polynomial fit. Regions heavily affected by instrumental artefacts (e.g., 4285–4410 Å where the absolute flux calibration drops) are masked. After normalisation, the two arms are concatenated, and any remaining 5σ outliers are linearly interpolated. This uniform scaling ensures that the neural network receives inputs on a common flux scale.

Photometric information is incorporated to improve classification. The authors test several photometric surveys (Gaia, SDSS, Pan‑STARRS, GALEX, 2MASS) and conclude that Pan‑STARRS g, r, i, z, y bands provide the best balance of depth, sky coverage, and photometric precision for the DESI sample. Synthetic SDSS photometry generated from Gaia XP coefficients is comparable, but the need to discard objects with missing or high‑error bands reduces the effective training size, so Pan‑STARRS is preferred.

The machine‑learning model is a fully‑connected feed‑forward neural network with four layers (input → 256 → 128 → 64 → output). Class imbalance is addressed by applying class‑specific weights in the cross‑entropy loss. Training proceeds on the combined spectral‑photometric vectors, and performance is evaluated using a held‑out test set. Results show near‑perfect classification for the dominant DA and DB types (≈99.8 % accuracy). For the more complex or rarer classes—DZ (metal‑polluted), DQ (carbon Swan bands), DC (featureless), DO (He II), DAH (magnetic splitting), and mixed‑type DAZ/DZA—the network attains 85–95 % accuracy. The authors note that DAB/DBA objects could not be reliably separated due to their low numbers and overlapping spectral features.

To visualise the learned feature space, the authors apply Uniform Manifold Approximation and Projection (UMAP) to the penultimate layer activations. Distinct clusters correspond to the major spectral families, confirming that the network has learned physically meaningful representations. Leveraging the multi‑epoch nature of DESI, they project spectra of the same object taken at different times onto the UMAP map. Objects that migrate between clusters are flagged as candidates with strongly variable spectral signatures. This approach uncovers three previously unknown “double‑faced” white dwarfs whose surface composition appears to change dramatically between epochs, a phenomenon that would be difficult to detect by eye.

Finally, the paper demonstrates how the trained DA classifier can be repurposed to identify double‑white‑dwarf binaries masquerading as single stars. By examining the probability distribution of the DA class and the widths/asymmetries of Balmer lines, the authors isolate a subset of objects whose spectra are best explained by blended contributions from two white dwarfs. This method provides an efficient, scalable way to build a clean sample of single white dwarfs for population studies while simultaneously flagging binary candidates for follow‑up.

In summary, the study delivers a robust, end‑to‑end pipeline: (1) rigorous data cleaning and normalisation of DESI spectra, (2) integration of high‑quality Pan‑STARRS photometry, (3) supervised training of a weighted neural network, (4) UMAP‑based visualisation and outlier detection, (5) multi‑epoch variability mining, and (6) binary‑system discrimination. The results demonstrate that machine learning can match or exceed human expert performance for white‑dwarf spectral typing, uncover rare subclasses, and streamline the analysis of the massive data streams expected from current and future MOS surveys.

Classifying white dwarfs from multi-object spectroscopy surveys with machine learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment