A Machine-Learning Approach for Identifying CME-Associated Stellar Flares in TESS Observations

A Machine-Learning Approach for Identifying CME-Associated Stellar Flares in TESS Observations
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Coronal mass ejections (CMEs) are major drivers of stellar space weather and can strongly influence the habitability of exoplanets. However, compared to the frequent occurrence of white-light flares, confirmed stellar CMEs remain extremely rare. Whether such flares are commonly accompanied by CMEs is a key question for solar-stellar comparative studies. Using Sun-as-a-star soft X-ray flare light curves observed by the GOES XRS 1–8~Å channel, we compiled a sample of 1,766 M-class and larger solar flares and extracted features with both deep convolutional neural networks and manual methods. Five machine-learning classifiers were trained to distinguish eruptive from confined flares, with the random forest model achieving the best performance (true skill statistic; TSS = 0.31). This TSS value indicates that the model possesses a moderate ability to discriminate between eruptive and confined flares. Normalized white-light and GOES XRS flare light curves show broadly consistent temporal evolution, reflecting their shared energy-release history and supporting a probabilistic transfer of the model to white-light flare data. We applied the best-performing RF model to 41,405 TESS-detected flares on FGKM-type main-sequence stars, predicting that approximately 47% of events show CME-like morphological characteristics, with the model-implied intrinsic association fraction lying in the range 35%–60%. Intriguingly, the CME occurrence rate decreases with increasing flare energy, indicating that the most energetic flares may be more strongly confined by overlying magnetic fields. These results provide new insight into flare-CME connections in diverse stellar environments and have important implications for assessing the impact of stellar eruptive activity on exoplanetary atmospheres.


💡 Research Summary

This paper tackles the long‑standing problem of estimating how often stellar flares are accompanied by coronal mass ejections (CMEs). The authors build a supervised binary classifier using solar “Sun‑as‑a‑star” soft‑X‑ray flare light curves from the GOES X‑ray Sensor (1–8 Å) and CME association information from the SOHO/LASCO catalog. From 1997 to early 2025 they initially retrieve ~4,000 M‑ and X‑class flares, then apply strict quality cuts (removing data gaps, multi‑peak events, etc.) to obtain a high‑quality sample of 1,766 flares (1,656 M‑class, 110 X‑class). A flare is labeled “eruptive” if a CME appears within a two‑hour window after the flare peak, a criterion widely used in solar studies. This yields 1,043 eruptive events (≈59 % of the sample), with CME association rates of 89 % for X‑class and 57 % for M‑class flares.

Pre‑processing and Feature Engineering
All GOES light curves are peak‑aligned (peak at t = 0) and normalized in time by the full‑width‑at‑half‑maximum (FWHM) to mitigate duration differences. A linear background fit using five points before and after each flare is subtracted, and the net flare signal is scaled to the range 0–1. From each processed curve the authors extract 33 features: 13 handcrafted morphological descriptors (total duration, rise‑to‑total ratio, integrated fluxes in rise/decay phases, rise slope, decay time constant, etc.) and 20 high‑level features obtained by converting the normalized curve into a fixed‑size grayscale image, feeding it through a ResNet‑50 convolutional neural network, and reducing the resulting high‑dimensional representation to 20 components via principal component analysis (PCA) that retain ~80 % of the variance. The hybrid feature vector thus captures both physically interpretable metrics and abstract shape information.

Model Training and Evaluation
Five conventional supervised classifiers are evaluated: logistic regression, random forest (RF), gradient‑boosted trees (XGBoost), linear discriminant analysis, and linear support vector machine. Five independent 80 %/20 % stratified train‑test splits are generated to preserve the eruptive/confined class balance. Performance is quantified using the True Skill Statistic (TSS), accuracy, precision, and recall. The random forest consistently outperforms the others, achieving TSS = 0.31, overall accuracy ≈ 68 %, and recall ≈ 71 % for the eruptive class. While a TSS of 0.31 indicates only moderate discriminative power, it is noteworthy given the modest sample size and the intrinsic difficulty of distinguishing CME‑associated from confined flares based solely on light‑curve morphology.

Application to TESS Stellar Flares
The trained RF model is transferred to a large set of white‑light flares detected by the Transiting Exoplanet Survey Satellite (TESS). The authors identify 41,405 flares on FGKM main‑sequence stars, process them with the same pipeline (peak alignment, FWHM scaling, background subtraction, feature extraction), and feed the resulting vectors into the RF classifier. Using a probability threshold of 0.5, roughly 47 % of the stellar flares are classified as CME‑like. Accounting for model bias and the fact that the solar training set may over‑represent eruptive events, the authors infer an intrinsic CME association fraction between 35 % and 60 % for the TESS sample.

A striking result emerges when the CME probability is examined as a function of flare energy (estimated from the TESS flare amplitude). Contrary to the solar trend—where higher‑energy flares are more likely to be eruptive—the stellar data show a decreasing CME occurrence rate with increasing flare energy. The authors interpret this as evidence that the most energetic flares on active stars may be strongly confined by overlying magnetic fields, preventing the launch of CMEs. This behavior aligns with previous case studies of highly active M dwarfs (e.g., AD Leo) where numerous energetic flares have been observed without accompanying CMEs.

Discussion, Limitations, and Future Work
The paper’s strengths lie in (1) the systematic construction of a labeled solar flare‑CME dataset, (2) the innovative combination of handcrafted and deep‑learning‑derived features, (3) a thorough comparison of multiple classical ML algorithms, and (4) the first large‑scale application of a solar‑trained CME classifier to stellar flare observations. Limitations include the modest TSS (indicating that individual flare classifications remain uncertain), potential domain shift between GOES soft‑X‑ray and TESS white‑light signals, and reliance on LASCO CME detections that may miss slow or “stealth” CMEs. The authors suggest several avenues for improvement: incorporating multi‑wavelength flare diagnostics (e.g., H α, UV, radio), employing sequence models such as Transformers or Temporal Convolutional Networks to capture temporal dynamics directly, expanding the training set with additional solar events (including confined flares with high‑quality X‑ray data), and cross‑validating predictions with independent CME proxies (coronal dimming, radio bursts) in stellar observations.

Implications
By providing a probabilistic estimate of CME occurrence for tens of thousands of stellar flares, this work offers a new quantitative tool for assessing the space‑weather environment around exoplanets. The inferred CME rates, especially the decreasing trend with flare energy, have direct consequences for atmospheric erosion models, magnetospheric protection calculations, and ultimately the habitability assessments of planets orbiting active FGKM stars. The methodology also establishes a framework for future data‑driven studies that bridge solar and stellar physics, leveraging the wealth of archival solar observations to interpret the ever‑growing catalog of stellar flare data from missions like TESS, PLATO, and the upcoming Roman Space Telescope.


Comments & Academic Discussion

Loading comments...

Leave a Comment