Radar-Based Raindrop Size Distribution Prediction: Comparing Analytical, Neural Network, and Decision Tree Approaches
Reliable estimation of the raindrop size distribution (RSD) is important for applications including quantitative precipitation estimation, soil erosion modelling, and wind turbine blade erosion. While in situ instruments such as disdrometers provide detailed RSD measurements, they are spatially limited, motivating the use of polarimetric radar for remote retrieval of rain microphysical properties. This study presents a comparative evaluation of analytical and machine-learning approaches for retrieving RSD parameters from polarimetric radar observables. One-minute OTT Parsivel2 disdrometer measurements collected between September 2020 and May 2022 at Sheepdrove Farm, UK, were quality-controlled using collocated weighing and tipping-bucket rain gauges. Measured RSDs were fitted to a normalised three-parameter gamma distribution, from which a range of polarimetric radar variables were analytically simulated. Analytical retrievals, neural networks, and decision tree models were then trained to estimate the gamma distribution parameters across multiple radar feature sets and model architectures. To assess robustness and equifinality, each model configuration was trained 100 times using random 70/30 train-test splits, yielding approximately 17,000 trained models in total. Machine-learning approaches generally outperform analytical methods; however, no single model class or architecture is uniformly optimal. Model performance depends strongly on both the target RSD parameter and the available radar observables, with decision trees showing particular robustness in reduced-feature regimes. These results highlight the importance of aligning retrieval model structure with operational data constraints rather than adopting a single universal approach.
💡 Research Summary
This paper presents a systematic comparison of analytical, neural‑network, and decision‑tree approaches for retrieving raindrop size distribution (RSD) parameters from polarimetric radar observations. One‑minute measurements from an OTT Parsivel 2 disdrometer, collected between September 2020 and May 2022 at Sheepdrove Farm, UK, were quality‑controlled against three collocated rain gauges. The measured drop‑size spectra were fitted to a normalized three‑parameter gamma distribution (N_w, D_0, μ). Using the fitted gamma parameters, a comprehensive set of dual‑polarimetric radar variables (horizontal/vertical reflectivity, differential reflectivity, specific differential phase, correlation coefficient, linear depolarization ratio, etc.) was analytically simulated, yielding roughly 10 000 synthetic radar samples.
Four radar‑feature configurations were defined, ranging from the full eight‑variable set to minimal subsets (e.g., only Z_h and Z_dr). Three model families were trained: (1) analytical retrievals based on closed‑form equations from the literature (using Z_h and Z_dr for N_w and D_0, μ fixed); (2) multilayer perceptrons with 1–3 hidden layers and 22–128 neurons per layer, employing ReLU/tanh activations, Adam optimizer, and early stopping; (3) CART‑based decision trees with leaf counts of 6, 12, 24, and 36, plus gradient‑boosted and random‑forest ensembles. Each configuration was trained 100 times on random 70 %/30 % train‑test splits, resulting in about 17 000 trained models. Performance metrics (RMSE, MAE, R²) were computed for each RSD parameter.
The results show that machine‑learning models outperform the analytical method overall, but the “best” model depends on the target parameter and available radar observables. Neural networks achieve the lowest errors when rich feature sets are available (e.g., D_0 RMSE ≈ 0.09 mm, N_w error ≈ 5 %). However, the shape parameter μ remains difficult for all approaches (average absolute error 0.35–0.48). When only limited observables such as Z_h and Z_dr are used, decision trees exhibit superior robustness; a 36‑leaf tree estimates D_0 with ≈ 0.12 mm mean error and shows the smallest variability across the 100 random splits. Increasing neural‑network depth improves fit on training data but raises over‑fitting risk, mitigated by early stopping and L2 regularization. Decision‑tree performance improves with leaf count up to 24–36 leaves, balancing bias and variance.
The study concludes that no single retrieval architecture is universally optimal. Operational deployments should match model complexity to the radar product suite: simple, interpretable decision trees are preferable under constrained data, while richer neural‑network architectures excel when full polarimetric suites are available. Moreover, the extensive random‑split training protocol provides a practical way to quantify model equifinality and uncertainty, essential for reliable real‑world applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment