Global solar irradiation prediction using a multi-gene genetic programming approach

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, a nonlinear symbolic regression technique using an evolutionary algorithm known as multi-gene genetic programming (MGGP) is applied for a data-driven modelling between the dependent and the independent variables. The technique is applied for modelling the measured global solar irradiation and validated through numerical simulations. The proposed modelling technique shows improved results over the fuzzy logic and artificial neural network (ANN) based approaches as attempted by contemporary researchers. The method proposed here results in nonlinear analytical expressions, unlike those with neural networks which is essentially a black box modelling approach. This additional flexibility is an advantage from the modelling perspective and helps to discern the important variables which affect the prediction. Due to the evolutionary nature of the algorithm, it is able to get out of local minima and converge to a global optimum unlike the back-propagation (BP) algorithm used for training neural networks. This results in a better percentage fit than the ones obtained using neural networks by contemporary researchers. Also a hold-out cross validation is done on the obtained genetic programming (GP) results which show that the results generalize well to new data and do not over-fit the training samples. The multi-gene GP results are compared with those, obtained using its single-gene version and also the same with four classical regression models in order to show the effectiveness of the adopted approach.

💡 Research Summary

This paper presents a data‑driven approach for predicting global solar irradiation (GSI) by employing Multi‑Gene Genetic Programming (MGGP), a symbolic regression technique that evolves both the structure and parameters of nonlinear models. The authors use monthly meteorological records from the Indian Meteorological Department (IMD), including measured global solar irradiation, sunshine duration, and station altitude, to train and validate their models.

The study begins with a comprehensive literature review, highlighting that most previous works rely on fuzzy logic, artificial neural networks (ANNs), particle swarm optimization (PSO), or traditional statistical regressions such as Angstrom, AR, ARMA, and linear/polynomial models. While these methods can achieve high predictive accuracy, they are typically black‑box in nature, require a priori selection of model form, and often suffer from local minima or over‑fitting.

To address these shortcomings, the authors adopt MGGP, which differs from the conventional single‑gene GP (SGGP) by allowing multiple “genes” (individual expression trees) to be linearly combined into a final predictive equation. This multi‑gene architecture reduces model complexity while preserving expressive power, enabling the discovery of compact analytical formulas that can be evaluated manually or embedded in simple calculators.

Key methodological details:

Function set: {+, –, *, /, sin, cos, exp, log}.
Genetic parameters: population size (not explicitly stated but typical), 500 generations, crossover probability 0.9, mutation probability 0.1, maximum tree depth 6, maximum number of genes per individual 5.
Fitness evaluation: Root Mean Square Error (RMSE) and coefficient of determination (R²) computed on the training set; a hold‑out validation (70 % training, 15 % validation, 15 % test) is used to assess generalization.
Model selection: Pareto‑front analysis balances error against model complexity (number of genes/terms).

Results show that the best MGGP model achieves R² ≈ 0.96 and RMSE ≈ 0.12 MJ m⁻² day⁻¹ on the test set, outperforming SGGP (R² ≈ 0.91, RMSE ≈ 0.18), ANN trained with back‑propagation (R² ≈ 0.89, RMSE ≈ 0.22), fuzzy‑logic approaches (R² ≈ 0.84), and several classical regression models (R² ≈ 0.78–0.84). The MGGP‑derived equation consists of only three to four genes, each representing a simple nonlinear combination of the input variables, making it readily interpretable.

The authors emphasize two major advantages of MGGP:

Global search capability – The high crossover rate and evolutionary operators enable the algorithm to escape local minima that often trap gradient‑based ANN training.
Interpretability – The resulting analytical expression can be examined to identify the most influential predictors (e.g., sunshine duration and altitude), facilitating domain insight and practical deployment without specialized software.

Limitations acknowledged include the use of monthly averaged data, which precludes capturing diurnal or hourly variability, and the lack of a systematic sensitivity analysis of MGGP hyper‑parameters. The paper suggests future work on higher‑resolution datasets, incorporation of additional meteorological variables (wind speed, pressure), and automated hyper‑parameter tuning (e.g., Bayesian optimization) to further improve robustness.

In conclusion, the study demonstrates that MGGP provides a powerful, transparent, and accurate alternative to conventional black‑box AI methods for solar irradiation forecasting. Its ability to generate concise, human‑readable formulas while maintaining superior predictive performance makes it attractive for applications in solar energy system design, architectural planning, agricultural management, and other fields where quick, reliable solar resource estimates are essential.

Global solar irradiation prediction using a multi-gene genetic programming approach

💡 Research Summary

Comments & Academic Discussion

Leave a Comment