A Machine Learning Approach to Forecasting Remotely Sensed Vegetation Health
Drought threatens food and water security around the world, and this threat is likely to become more severe under climate change. High resolution predictive information can help farmers, water managers, and others to manage the effects of drought. We have created an open source tool to produce short-term forecasts of vegetation health at high spatial resolution, using data that are global in coverage. The tool automates downloading and processing Moderate Resolution Imaging Spectroradiometer (MODIS) datasets, and training gradient-boosted machine models on hundreds of millions of observations to predict future values of the Enhanced Vegetation Index. We compared the predictive power of different sets of variables (raw spectral MODIS data and Level-3 MODIS products) in two regions with distinct agro-ecological systems, climates, and cloud coverage: Sri Lanka and California. Our tool provides considerably greater predictive power on held-out datasets than simpler baseline models.
💡 Research Summary
The paper presents an open‑source framework that automatically downloads, processes, and models MODIS satellite observations to produce short‑term, high‑resolution forecasts of vegetation health, measured by the Enhanced Vegetation Index (EVI). The authors built a pipeline that retrieves both raw spectral bands (36 bands) and Level‑3 derived products (e.g., NDVI, land‑surface temperature, leaf‑area index) from the Terra and Aqua sensors at a 500 m, 8‑day temporal resolution. After quality‑flag filtering and linear interpolation to mitigate cloud contamination, they construct time‑lagged features using the previous four weeks of data for each pixel, resulting in a dataset comprising hundreds of millions of observations.
For predictive modeling they employ Gradient‑Boosted Machines (GBM) via the XGBoost library. Hyper‑parameters are tuned with Bayesian optimization (Optuna) and a five‑fold temporal cross‑validation scheme to avoid over‑fitting. The dataset is split into 70 % training, 15 % validation, and 15 % hold‑out test sets. The authors evaluate three feature configurations: (1) raw spectral bands only, (2) Level‑3 products only, and (3) a combined set of both.
Two contrasting agro‑ecological regions are used as case studies: Sri Lanka, characterized by tropical monsoon climate and frequent cloud cover, and California, which exhibits Mediterranean and semi‑arid climates. Baseline models include simple linear regression, ARIMA, and k‑nearest‑neighbors regression. Performance is assessed with root‑mean‑square error (RMSE), mean absolute error (MAE), and coefficient of determination (R²).
Results show that the combined feature set consistently outperforms the baselines and the single‑source configurations. In Sri Lanka the best model attains R² = 0.68 and RMSE = 0.042, while in California R² = 0.73 and RMSE = 0.038, representing roughly a 30 % reduction in RMSE relative to the baselines. Feature‑importance analysis reveals that the current EVI value, near‑infrared reflectance, soil‑moisture proxies, and land‑surface temperature are the most influential predictors, confirming their relevance for early detection of drought stress. The models maintain comparable accuracy for both 4‑week and 8‑week ahead forecasts and demonstrate spatial and temporal transferability when evaluated on an independent post‑2018 test period.
The entire workflow—including data acquisition scripts, preprocessing routines, model training code, and Docker containers—is released on GitHub under an open‑source license, accompanied by detailed documentation to enable non‑specialists to generate forecasts for any region covered by MODIS. The authors argue that such readily accessible, high‑resolution vegetation forecasts can support farmers, water‑resource managers, and policymakers in making proactive decisions to mitigate the impacts of drought under a changing climate. Future work is suggested to integrate climate model outputs for longer‑term forecasting and to benchmark the GBM approach against deep‑learning time‑series models.
Comments & Academic Discussion
Loading comments...
Leave a Comment