A method for filling gaps in solar irradiance and in solar proxy data
Data gaps are ubiquitous in spectral irradiance data, and yet, little effort has been put into finding robust methods for filling them. We introduce a data-adaptive and nonparametric method that allows us to fill data gaps in multi-wavelength or in multichannel records. This method, which is based on the iterative singular value decomposition, uses the coherency between simultaneous measurements at different wavelengths (or between different proxies) to fill the missing data in a self-consistent way. The interpolation is improved by handling different time scales separately. Two major assets of this method are its simplicity, with few tuneable parameters, and its robustness. Two examples of missing data are given: one from solar EUV observations, and one from solar proxy data. The method is also appropriate for building a composite out of partly overlapping records.
💡 Research Summary
The paper presents a robust, data‑adaptive, non‑parametric technique for filling gaps in multi‑wavelength solar irradiance records and related proxy time series. The core of the method is an iterative singular value decomposition (SVD) that exploits the strong linear coherence that typically exists among simultaneous measurements at different wavelengths or among different solar activity proxies. The algorithm proceeds as follows: (1) initialize each missing value with a simple estimate (e.g., the temporal mean of the corresponding record); (2) compute the SVD of the complete data matrix; (3) reconstruct the matrix using only the K largest singular values (the “modes”) to capture the coherent part of the variability; (4) replace the missing entries with the reconstructed values; (5) repeat steps 2–4 until convergence for a given K, then increase K and iterate again. This inner‑loop/outer‑loop scheme converges rapidly and requires only three tunable parameters: the number of retained modes K, the number of temporal scales into which the data are decomposed, and the embedding dimension D used to incorporate temporal coherence.
Because solar irradiance exhibits variability on distinct time scales (e.g., solar rotation ~27 days, the 11‑year solar cycle), the authors first separate the data into short‑term and long‑term components using a wavelet transform (or a simple band‑pass filter). The iterative SVD is then applied independently to each component, which improves reconstruction skill by allowing different spectral dependencies for each physical process.
Temporal coherence is further exploited by “embedding” the data matrix: each column is augmented with time‑shifted copies of itself, forming an extended matrix of dimension N × (D × M) where D is the embedding depth. Applying SVD to this embedded matrix simultaneously captures wavelength and time correlations. In practice, low embedding dimensions (D = 2–5) already yield substantial improvements while keeping computational costs modest.
The method is demonstrated on two real‑world cases. The first concerns the Solar EUV Monitor (SEM) on SOHO, which suffered a multi‑month data gap in 1998. SEM’s 30.38 nm He II flux is highly correlated with three other proxies: the 10.7 cm radio flux (square‑root transformed), the Mg II core‑to‑wing index, and the Lyman‑α line intensity. Using all four series together, the authors set D = 4 and find that retaining K ≈ 5–6 modes (out of 16 possible) minimizes the normalized reconstruction error, which is about 10 % of the solar‑cycle amplitude—comparable to the intrinsic variability of the data. Synthetic gap tests (randomly removing 5–10 % of points) confirm the optimal K and demonstrate that the method can reliably fill both short and long gaps.
The second example applies the same framework to a set of solar activity proxies that contain numerous irregular gaps. Here, an embedding dimension D = 3–4 and K = 4–5 provide the best performance, yielding reconstructed time series that are virtually indistinguishable from the original measurements when evaluated with cross‑validation.
Key advantages of the approach are its simplicity (few parameters, straightforward MATLAB implementation), its ability to handle arbitrarily long gaps by leveraging information from other channels, and its natural extension to multiscale and temporally embedded data. The method also lends itself to building composite records from partially overlapping datasets, a common need in solar‑irradiance climatology.
Limitations include the reliance on linear correlations; during impulsive flare phases, where spectral behavior becomes highly non‑linear, the reconstruction may be less accurate. The initial normalization (subtracting the mean and scaling by the standard deviation) must be recomputed at each iteration because missing values affect these statistics. Computational load grows with the product of the number of wavelengths, the length of the record, and the embedding depth, which can become demanding for very large datasets.
Overall, the paper delivers a practical, statistically sound framework for gap filling in solar irradiance and proxy data, demonstrating that iterative SVD combined with multiscale decomposition and temporal embedding can achieve high‑fidelity reconstructions without extensive tuning. Future work could explore extensions to non‑linear relationships, real‑time implementation, and application to other astrophysical time‑series such as stellar spectra or planetary atmospheric measurements.
Comments & Academic Discussion
Loading comments...
Leave a Comment