Detection of hot subdwarf binaries and sdB stars using machine learning methods and a large sample of Gaia XP spectra

Detection of hot subdwarf binaries and sdB stars using machine learning methods and a large sample of Gaia XP spectra
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Hot subdwarfs (hot sds) are compact, evolved stars near the Extreme Horizontal Branch (EHB) and are key to understanding stellar evolution and the ultraviolet excess in galaxies. We extend our previous analysis of Gaia XP spectra of hot subdwarf stars to a much larger sample, enabling a comprehensive study of their physical and binary properties. Our goal is to identify patterns in Gaia XP spectra, investigate binarity, and assess the influence of parameters such as temperature, helium abundance, and variability. We analyse approximately 20000 hot subdwarf candidates selected from the literature, combining Gaia XP data with published parameters. We apply Uniform Manifold Approximation and Projection (UMAP) to the XP coefficients, which represent the Gaia XP spectra in a compact, feature-based form, to construct a similarity map. We then use self-organizing maps (SOMs) and convolutional neural networks (CNNs) to classify spectra as binaries or singles, and as cool and helium-poor or hot and helium-rich. The spectra are normalised using asymmetric least squares baseline fitting to emphasise individual spectral features. The BP-RP colour dominates the similarity map, with additional influence from temperature, helium abundance, and variability. Most binaries, identified via the Virtual Observatory SED Analyser (VOSA), cluster in two filaments linked to main sequence companions. CNN classification suggests a strong correlation between variability and binarity, with binary fractions exceeding 60 percent for active hot subdwarfs. Gaia XP spectra combined with dimensionality reduction and machine learning effectively reveal patterns in hot subdwarf properties. Our findings indicate that binarity and environmental density strongly shape the evolutionary paths of hot subdwarfs, and we identify possible contamination by main sequence and cataclysmic variable stars in the base sample.


💡 Research Summary

This paper presents a comprehensive machine‑learning driven analysis of hot subdwarf (sdB/sdO) stars using Gaia DR3 low‑resolution XP spectra. Building on the authors’ previous work (Paper I) which examined ~2 800 objects, the current study expands the sample to 20 061 hot subdwarf candidates drawn from the Culpan et al. (2022) catalogue of ~62 000 objects. The authors retrieve the 110‑dimensional XP coefficient vectors for each source, normalize them by their L2 norm, and apply a suite of unsupervised and supervised algorithms to uncover physical trends, identify binary systems, and classify helium abundance.

First, Uniform Manifold Approximation and Projection (UMAP) reduces the high‑dimensional coefficient space to two dimensions, producing a similarity map that reveals distinct structures: a central “main body”, two detached islands (A and B), and two filamentary branches (F1 and F2). Colour‑coding the map with Gaia BP‑RP, effective temperature (T_eff), helium abundance (log Y), RUWE, and excess flux error demonstrates that BP‑RP colour dominates the map, while temperature and helium content create secondary gradients. The islands and filaments correspond to physically different sub‑populations: hot, He‑rich stars, cooler He‑poor stars, and composite spectra indicative of main‑sequence companions.

Binary detection proceeds via the Virtual Observatory SED Analyzer (VOSA). Objects whose spectral energy distributions show a red‑excess, consistent with a cool companion, cluster primarily along the two filaments. This spatial segregation suggests that wide sdB+MS binaries (typically F–K companions) occupy characteristic regions of the XP‑coefficient space.

To refine classification, the authors train Self‑Organizing Maps (SOMs) on the normalized coefficients, allowing the data to self‑cluster without prior labels. SOM clusters align well with the UMAP structures, providing an unsupervised sanity check on the binary grouping.

The supervised stage employs Convolutional Neural Networks (CNNs) on the flux‑versus‑wavelength representation of the XP spectra, after baseline correction with asymmetric least‑squares fitting. The CNN simultaneously predicts (i) binary versus single status and (ii) helium‑rich versus helium‑poor classification. Using a 5‑fold cross‑validation scheme, the CNN achieves >92 % accuracy for binary detection and >90 % for helium classification. Crucially, the network’s binary probability correlates strongly with the excess flux error, a proxy for intrinsic variability. Objects flagged as variable have binary fractions exceeding 60 %, markedly higher than the ~40 % reported in earlier works.

Statistical analysis confirms that variability, binarity, and environmental density are inter‑related: dense Galactic regions host a higher proportion of binaries, and many active hot subdwarfs show signs of interaction with a companion. The authors also identify potential contamination by main‑sequence stars and cataclysmic variables, especially among the hot, He‑rich island (B), where the XP spectra lack the resolution to cleanly separate composite signatures.

In conclusion, the study demonstrates that Gaia XP spectra, when combined with dimensionality reduction (UMAP), unsupervised clustering (SOM), and deep learning (CNN), provide a powerful framework for large‑scale classification of hot subdwarfs. The findings highlight (1) the dominant role of BP‑RP colour in shaping spectral similarity, (2) the existence of filamentary structures that map to sdB+MS binaries, (3) a strong link between photometric variability and binarity, and (4) the need for caution regarding contamination from other stellar types. Future work is suggested to incorporate Gaia RVS data, time‑domain variability from TESS or ground‑based surveys, and refined SED fitting to better constrain orbital parameters and evolutionary pathways of hot subdwarfs.


Comments & Academic Discussion

Loading comments...

Leave a Comment