Large gaps imputation in remote sensed imagery of the environment
Imputation of missing data in large regions of satellite imagery is necessary when the acquired image has been damaged by shadows due to clouds, or information gaps produced by sensor failure. The general approach for imputation of missing data, that could not be considered missed at random, suggests the use of other available data. Previous work, like local linear histogram matching, take advantage of a co-registered older image obtained by the same sensor, yielding good results in filling homogeneous regions, but poor results if the scenes being combined have radical differences in target radiance due, for example, to the presence of sun glint or snow. This study proposes three different alternatives for filling the data gaps. The first two involves merging radiometric information from a lower resolution image acquired at the same time, in the Fourier domain (Method A), and using linear regression (Method B). The third method consider segmentation as the main target of processing, and propose a method to fill the gaps in the map of classes, avoiding direct imputation (Method C). All the methods were compared by means of a large simulation study, evaluating performance with a multivariate response vector with four measures: Q, RMSE, Kappa and Overall Accuracy coefficients. Difference in performance were tested with a MANOVA mixed model design with two main effects, imputation method and type of lower resolution extra data, and a blocking third factor with a nested sub-factor, introduced by the real Landsat image and the sub-images that were used. Method B proved to be the best for all criteria.
💡 Research Summary
The paper addresses the problem of filling large missing regions in satellite imagery, which commonly arise from cloud shadows, sensor failures, or other non‑random disturbances. Traditional approaches such as local linear histogram matching rely on an older co‑registered image from the same sensor. While effective for homogeneous areas, these methods break down when the radiometric conditions of the two scenes differ dramatically—for example, due to sun glint, snow cover, or seasonal changes. To overcome these limitations, the authors propose three distinct imputation strategies that make use of a lower‑resolution (LR) image captured simultaneously with the high‑resolution (HR) target image.
Method A – Fourier‑domain fusion: Both the LR and HR images are transformed into the frequency domain. The low‑frequency components (which carry overall brightness and contrast information) are taken from the LR image, while the high‑frequency components (which preserve fine spatial detail) are retained from the HR image. After inverse transforming, the resulting composite image supplies radiometric context for the missing HR pixels. This approach preserves spatial detail but can suffer from edge artifacts, phase misalignment, and loss of high‑frequency energy if the registration between LR and HR images is imperfect.
Method B – Linear regression based imputation: A pixel‑wise or block‑wise linear regression model is fitted between the LR and HR images using only the valid (non‑missing) pixels. The regression coefficients are then applied to the LR values that correspond to the missing HR locations, producing estimates for the missing radiance. The authors regularize the regression to avoid over‑fitting and validate the model with cross‑validation. Because the LR image is captured at the same time as the HR image, the regression captures the contemporaneous radiometric relationship, leading to robust performance across a variety of surface types. The main limitation is the linearity assumption, which may not hold for highly non‑linear reflectance phenomena such as water glint or complex vegetation canopies.
Method C – Class‑map based gap filling: Instead of directly estimating pixel values, this method first classifies the LR image into land‑cover categories (e.g., water, forest, urban). The missing HR region is then filled by assigning the most probable class label based on surrounding classified pixels, effectively imputing a class map rather than radiometric values. This strategy is advantageous when the downstream analysis focuses on classification rather than precise radiometry, but any misclassification propagates directly into the gap‑filled area, and it cannot recover continuous spectral information needed for tasks like temperature retrieval.
To evaluate the three methods, the authors conducted an extensive simulation study using real Landsat scenes. They artificially introduced missing blocks of various sizes and shapes, then applied each imputation technique. Performance was measured with four complementary metrics: Q (image quality index), RMSE (root‑mean‑square error), Kappa (agreement for categorical data), and Overall Accuracy (OA). Because the four metrics jointly describe both radiometric fidelity and classification quality, the authors employed a multivariate analysis of variance (MANOVA) with a mixed‑model design. The two fixed factors were “imputation method” and “type of LR auxiliary data,” while a nested blocking factor accounted for variability among the original Landsat scenes and their sub‑images.
Statistical testing revealed that Method B consistently outperformed the other two across all four metrics, with significant main effects (p < 0.01) in the MANOVA. Method A showed modest improvements over a naïve baseline but was inferior to Method B, especially in preserving high‑frequency detail near gap edges. Method C performed adequately for classification‑oriented tasks when the class boundaries were clear, yet its overall radiometric accuracy lagged behind the regression approach.
The paper’s contributions are threefold: (1) it demonstrates that contemporaneous lower‑resolution imagery can be leveraged effectively for gap filling, (2) it provides a rigorous comparative framework that includes both radiometric and thematic evaluation criteria, and (3) it validates the superiority of a simple linear regression model under realistic remote‑sensing conditions. Limitations include the reliance on simulated missing data rather than fully operational cloud‑masked products, and the linearity assumption inherent in Method B. Future work could explore non‑linear machine‑learning regressors, real‑world operational testing, and computational optimizations for near‑real‑time processing.
In summary, the study establishes linear regression‑based imputation (Method B) as the most reliable technique for restoring large missing regions in high‑resolution satellite images when a synchronized lower‑resolution observation is available, offering a practical solution for maintaining data continuity in environmental monitoring, land‑cover change detection, and related applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment