Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks
Context: JWST has enabled transmission spectroscopy at unprecedented precision, but stellar heterogeneities (spots and faculae) remain a dominant contamination source that can bias atmospheric retrievals if uncorrected. Aims: We present a fast, unsupervised methodology to reduce stellar contamination and instrument-specific noise in exoplanet transmission spectra using denoising autoencoders, improving the reliability of retrieved atmospheric parameters. Methods: We design and train denoising autoencoder architectures on large synthetic datasets of terrestrial (TRAPPIST-1e analogues) and sub-Neptune (K2-18b analogues) planets. Reconstruction quality is evaluated with the $χ^2$ statistic over a wide range of signal-to-noise ratios, and atmospheric retrieval experiments on contaminated spectra are used to compare against standard correction approaches in accuracy and computational cost. Results: The autoencoders reconstruct uncontaminated spectra while preserving key molecular features, even at low S/N. In retrieval tests, pre-processing with denoising autoencoders reduces bias in inferred abundances relative to uncorrected baselines and matches the accuracy of simultaneous stellar-contamination fitting while reducing computational time by a factor of three to six. Conclusions: Denoising autoencoders provide an efficient alternative to conventional correction strategies and are promising components of future atmospheric characterization pipelines for both rocky and gaseous exoplanets.
💡 Research Summary
**
The paper tackles one of the most pressing challenges in the analysis of exoplanet transmission spectra in the JWST era: the contamination introduced by stellar heterogeneities (spots and faculae) and instrument‑specific noise, collectively known as the Transit Light Source (TLS) effect. While JWST’s unprecedented infrared sensitivity enables the detection of multiple molecular species in both gas‑giant and terrestrial atmospheres, the wavelength‑dependent distortion caused by unocculted stellar features can mimic or obscure genuine planetary signals, leading to biased atmospheric retrievals. Traditional mitigation strategies—such as model‑independent sibling‑planet approaches, explicit TLS modeling within retrieval frameworks, or the use of out‑of‑transit stellar spectra—are either highly model‑dependent, computationally intensive, or limited by the accuracy of stellar atmosphere models, especially for cool M dwarfs.
To overcome these limitations, the authors propose an unsupervised deep‑learning pipeline based on denoising autoencoders (DAEs). They first formalize the TLS effect through the contamination factor ϵλ(C), where C encapsulates spot/faculae covering fractions, temperatures, and chord‑crossing fractions. Using this formalism, they generate a massive synthetic dataset that spans two planetary regimes: Earth‑like TRAPPIST‑1e analogues and sub‑Neptune K2‑18b analogues. Each synthetic spectrum includes a realistic range of atmospheric compositions (H₂O, CO₂, CH₄, etc.), temperature‑pressure profiles, and a variety of stellar contamination scenarios. Instrumental noise is modeled on JWST NIRSpec PRISM and NIRISS characteristics, covering signal‑to‑noise ratios from 5 to 100.
The DAE architecture is a 1‑D encoder‑decoder network with a bottleneck that forces the model to learn a compact latent representation of the underlying “clean” spectrum. During training, the input spectra are deliberately corrupted by adding Gaussian noise and the TLS‑induced ϵλ distortion; the network is then tasked with reconstructing the original uncontaminated signal. The loss function combines a χ² reconstruction term with a Kullback‑Leibler divergence regularizer, encouraging the latent space to capture the statistical structure of both noise and contamination while preserving essential molecular features.
Performance evaluation shows that the DAE consistently reduces χ² by 30–70 % across the full S/N range, with the most pronounced gains at low S/N where traditional parametric fitting often over‑fits the noise. Retrieval experiments employ the Bayesian atmospheric retrieval code POSEIDON. When contaminated spectra are fed directly into POSEIDON, retrieved molecular abundances exhibit biases of up to 0.5 dex. Pre‑processing the same spectra with the DAE reduces these biases to 0.1–0.3 dex, matching the accuracy of simultaneous TLS‑and‑atmosphere retrievals but at a fraction of the computational cost. On average, the DAE‑augmented pipeline runs 3–6 times faster, because the retrieval no longer needs to explore the high‑dimensional stellar‑contamination parameter space.
The authors also test the method on realistic JWST observations, including a NIRISS transit of TRAPPIST‑1 b. The DAE successfully removes the TLS imprint, leaving residuals within the photon‑noise limit and preserving key absorption bands (e.g., H₂O at 1.4 µm). This demonstrates that the learned latent space effectively internalizes the statistical signatures of both stellar heterogeneity and instrument noise, enabling model‑independent denoising.
In the discussion, the authors acknowledge that the DAE’s success depends on the fidelity of the synthetic training set, particularly the stellar spectral libraries for cool stars. They suggest future work incorporating variational autoencoders or conditional generative models to jointly infer TLS parameters alongside atmospheric ones, and extending the framework to real‑time pipelines for upcoming missions such as Ariel.
Overall, the study provides compelling evidence that denoising autoencoders constitute a powerful, computationally efficient alternative to conventional TLS correction techniques, improving the reliability of atmospheric retrievals for both rocky and gaseous exoplanets in the JWST era.
Comments & Academic Discussion
Loading comments...
Leave a Comment