Predicting Onsets and Dry Spells of the West African Monsoon Season Using Machine Learning Methods

Predicting Onsets and Dry Spells of the West African Monsoon Season Using Machine Learning Methods
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The beginning of the rainy season and the occurrence of dry spells in West Africa is notoriously difficult to predict, however these are the key indicators farmers use to decide when to plant crops, having a major influence on their overall yield. While many studies have shown correlations between global sea surface temperatures and characteristics of the West African monsoon season, there are few that effectively implement this information into machine learning (ML) prediction models. In this study we investigated the best ways to define our target variables, onset and dry spell, and produced methods to predict them for upcoming seasons using sea surface temperature teleconnections. Defining our target variables required the use of a combination of two well known definitions of onset. We then applied custom statistical techniques – like total variation regularization and predictor selection – to the two models we constructed, the first being a linear model and the other an adaptive-threshold logistic regression model. We found mixed results for onset prediction, with spatial verification showing signs of significant skill, while temporal verification showed little to none. For dry spell though, we found significant accuracy through the analysis of multiple binary classification metrics. These models overcome some limitations that current approaches have, such as being computationally intensive and needing bias correction. We also introduce this study as a framework to use ML methods for targeted prediction of certain weather phenomenon using climatologically relevant variables. As we apply ML techniques to more problems, we see clear benefits for fields like meteorology and lay out a few new directions for further research.


💡 Research Summary

This paper tackles two agriculturally critical aspects of the West African Monsoon (WAM): the onset date of the rainy season and the occurrence of post‑onset dry spells. While numerical weather prediction (NWP) models dominate current forecasting efforts, they are computationally expensive, suffer from sparse ground observations in West Africa, and often require bias‑correction that can introduce data leakage. The authors propose a low‑cost, machine‑learning (ML) framework that relies solely on sea‑surface temperature (SST) teleconnections, offering lead times of up to six months.

Data and Target Definition
The SST predictor set is built from six oceanic regions known to influence West African climate (Atlantic, North Atlantic, Gulf of Guinea, Indian Ocean, Pacific, Mediterranean). Monthly SST averages for September–December of the preceding year and January–March of the target year are extracted, yielding 42 predictors (6 regions × 7 months). SST data come from ERA5 (1981‑2024) and the CESM2 climate model (1935‑1980), providing a total of 90 annual samples. Precipitation data combine CHIRPS satellite observations with CESM2 simulated precipitation, interpolated to a common 1°×1° grid covering the study area (8° N–28° N, 12° W–16° E).

Defining the onset and dry spell is non‑trivial; the literature lists at least 18 definitions. The authors adopt a hybrid approach: first, a “search start date” is set 30 days before the climatological water‑season minimum derived from long‑term daily precipitation means (following Liebmann et al., 2012). Second, a fuzzy‑logic rule‑based detection (Marteau et al., 2009; Laux et al., 2008) is applied. The rule requires (1) a 5‑day cumulative rainfall ≥ N mm, (2) at least C of those days being “wet” (≥ 1 mm). N and C are not hard thresholds but are linearly interpolated into membership scores γ₁ and γ₂. An onset is declared when γ₁·γ₂ ≥ γₜ (γₜ = 0.5). A dry spell is identified if, within the following 30 days, there is no 7‑day window with total rainfall < 5 mm. Missing onsets (rare, 5 out of 224 grid cells) are replaced by the latest onset of that year across the domain.

Exploratory Analysis
A comparison of CHIRPS and CESM2 precipitation shows a negligible mean bias (‑0.1 mm) but an absolute difference of ~2.8 mm, confirming the two sources are compatible for the purpose of defining targets. Spatial patterns of mean onset dates follow the northward migration of the ITCZ, validating the climatological plausibility of the derived labels.

Modeling Strategy
Two predictive models are built:

  1. Linear regression with total‑variation (TV) regularization, which encourages spatial smoothness in the coefficient field while performing variable selection.
  2. Adaptive‑threshold logistic regression, where the decision threshold is learned jointly with the coefficients, allowing the model to adapt to class imbalance inherent in dry‑spell detection.

Both models are trained and evaluated using leave‑one‑out cross‑validation (LOOCV) to avoid any leakage of future information into the training set.

Results
Onset prediction: Spatial verification (e.g., skill scores computed per grid cell) shows modest but statistically significant skill in certain regions, indicating that SST anomalies contain some spatial information about when the monsoon will arrive. However, temporal verification (year‑by‑year prediction of onset dates) yields near‑random performance, suggesting that the SST‑only approach cannot capture the inter‑annual variability of onset timing.

Dry‑spell prediction: Binary classification metrics are substantially better. Accuracy, precision, recall, and F1 scores hover around 0.75–0.80 across the domain, demonstrating that SST teleconnections are more directly linked to the likelihood of a post‑onset dry spell than to the exact onset date.

Discussion
The study highlights several strengths: (i) a computationally cheap pipeline that can be run on modest hardware, (ii) a clear avoidance of data leakage through LOOCV, and (iii) a demonstration that SST alone can reliably forecast dry spells several months in advance. Limitations include the relatively small sample size (n = 90) compared to the number of predictors, potential over‑fitting despite regularization, and the reliance on a complex, partially ad‑hoc definition of onset that still leaves some missing values. Moreover, the exclusive use of SST ignores other potentially informative predictors such as atmospheric pressure fields, soil moisture, or vegetation indices.

Future Directions
The authors propose extending the framework by (a) incorporating additional climate variables (e.g., geopotential height, wind anomalies), (b) exploring deep learning architectures (LSTM, Temporal Convolutional Networks) to capture temporal dependencies, and (c) developing operational decision‑support tools that deliver region‑specific planting advisories to farmers. Such extensions could improve onset prediction and further solidify the role of ML as a complementary tool to traditional NWP in data‑sparse regions.

In summary, this work provides a proof‑of‑concept that machine‑learning models based on global SST teleconnections can predict post‑onset dry spells with useful skill, while also outlining the challenges and opportunities for improving onset forecasts in West Africa.


Comments & Academic Discussion

Loading comments...

Leave a Comment