Seeking Spectroscopic Binaries with Data-Driven Models
Data-driven stellar classification has a long and important history in astronomy, dating as far back as Annie Jump Cannon’s “by eye” classifications of stars into spectral types still used today. In recent years, data-driven spectroscopy has proven to be an effective means of deriving stellar properties for large samples of stars, sidestepping issues with computational efficiency, incomplete line lists, and radiative transfer calculations associated with physical stellar models. A logical application of these algorithms is the detection of unresolved stellar binaries, which requires accurate spectroscopic models to resolve flux contributions from a fainter secondary star in the spectrum. Here we use The Cannon to train a data-driven model on spectra from the Keck High Resolution Echelle Spectrometer. We show that our model is competitive with existing data-driven models in its ability to predict stellar properties Teff, stellar radius, [Fe/H], vsin(i), and instrumental PSF, particularly when we apply a novel wavelet-based processing step to spectra before training. We find that even with accurate estimates of star properties, our model’s ability to detect unresolved binaries is limited by its approx. 3% accuracy in per-pixel flux predictions, illuminating possible limitations of data-driven model applications.
💡 Research Summary
The paper investigates whether a data‑driven spectral model can be used to identify unresolved (spectral) binaries in high‑resolution Keck/HIRES spectra of Kepler planet‑host stars. The authors adopt The Cannon, a linear, label‑based model, and train it on a curated set of high‑signal‑to‑noise (S/N ≈ 150) HIRES spectra from the SpecMatch‑Emp library. To avoid contaminating the training set with binaries, they remove any star flagged as a spectroscopic, eclipsing, or X‑ray binary in SIMBAD, as well as Gaia non‑single‑star candidates, ending up with 335 clean spectra. The sample is split into “hot” (Teff > 5500 K) and “cool” (Teff ≤ 5500 K) subsets, each receiving its own Cannon model. Because many cool stars lack measured v sin i, the authors assign v sin i = 0 km s⁻¹ and augment the data with artificially broadened copies (v sin i = 3, 5, 7 km s⁻¹) to populate the rotational‑velocity label space.
A key methodological innovation is a two‑step blaze‑function correction. After the standard CPS pipeline removal of the instrumental blaze using a composite B‑star spectrum, the authors notice residual low‑frequency variations, especially in lower‑S/N spectra. They therefore apply a wavelet‑filtering step: each echelle order is decomposed with discrete wavelets (using the “ym5” mother wavelet), the lowest‑frequency approximation coefficients (≈20 Å scales) are discarded, and the remaining coefficients are recombined. This process removes continuum‑level systematics while preserving line‑scale information, leading to more stable spectra across multiple exposures.
Training the Cannon on the wavelet‑filtered spectra yields an average per‑pixel flux prediction error of ~3 %, a substantial improvement over the 10–20 % discrepancies found when using ab‑initio synthetic spectra (e.g., from Starfish). The model recovers stellar labels with typical uncertainties of ±70 K in Teff, ±0.07 R⊙ in radius, ±0.04 dex in
Comments & Academic Discussion
Loading comments...
Leave a Comment