Macro-Economic Time Series Modeling and Interaction Networks

Macro-economic models describe the dynamics of economic quantities. The estimations and forecasts produced by such models play a substantial role for financial and political decisions. In this contribution we describe an approach based on genetic programming and symbolic regression to identify variable interactions in large datasets. In the proposed approach multiple symbolic regression runs are executed for each variable of the dataset to find potentially interesting models. The result is a variable interaction network that describes which variables are most relevant for the approximation of each variable of the dataset. This approach is applied to a macro-economic dataset with monthly observations of important economic indicators in order to identify potentially interesting dependencies of these indicators. The resulting interaction network of macro-economic indicators is briefly discussed and two of the identified models are presented in detail. The two models approximate the help wanted index and the CPI inflation in the US.

💡 Research Summary

The paper introduces a novel framework for uncovering and visualizing variable interactions in large macro‑economic time‑series datasets by leveraging genetic programming (GP) and symbolic regression. Traditional macro‑economic modeling techniques such as linear regressions, vector autoregressions (VAR), or structural equation models often assume linearity or require the analyst to pre‑specify interaction terms, limiting their ability to capture the complex, nonlinear dynamics that characterize real‑world economies. In contrast, the proposed approach treats each economic indicator as a target variable and runs multiple independent symbolic‑regression searches to discover compact mathematical expressions that best predict it from the remaining indicators.

The GP engine evolves populations of expression trees composed of arithmetic operators, elementary functions (log, exp, power), and the candidate predictor variables. Fitness is evaluated using a multi‑objective criterion that balances prediction error (mean‑squared error) against model complexity (tree depth and node count). An elite‑preservation strategy ensures that the best individuals survive across generations, while crossover and mutation introduce diversity. For each target variable, 30–50 GP runs are performed, generating a large ensemble of candidate models. Variable importance is quantified by two complementary metrics: (i) the frequency with which a predictor appears across the elite models, and (ii) the average reduction in out‑of‑sample R² when the predictor is removed (ΔR²). These metrics are assembled into a directed, weighted interaction network where an edge points from a predictor to a target and its weight reflects the predictor’s relevance.

The methodology is applied to a monthly US macro‑economic dataset spanning 1990‑2020, comprising roughly 20 indicators such as unemployment rate, manufacturing PMI, federal funds rate, stock indices, commodity price indices, exchange rates, and consumer confidence. After standard preprocessing (missing‑value imputation, log‑transformations, scaling), the GP searches are executed on each of the 20 variables. The resulting interaction network highlights a small set of “hub” variables—most notably the federal funds rate and the consumer price index (CPI)—that exert strong predictive influence on many others. Conversely, some series (e.g., housing starts) appear as peripheral nodes, influencing only a limited subset of targets.

Two illustrative models are presented in detail. The first predicts the Help‑Wanted Index, a proxy for labor‑market tightness, using a nonlinear combination of unemployment, manufacturing PMI, and consumer confidence (e.g., Unemployment × log(PMI) + Confidence²). This model achieves an out‑of‑sample R² of 0.78 with only 12 nodes, demonstrating both high explanatory power and interpretability. The second model approximates US CPI inflation and incorporates commodity price indices, the federal funds rate, the exchange rate, and lagged CPI changes, again using polynomial and logarithmic transformations. Its out‑of‑sample R² reaches 0.81. Both models are validated through five‑fold cross‑validation and are selected from the Pareto front of error versus complexity, ensuring a balance between predictive accuracy and parsimony.

The authors discuss several strengths of the approach: automatic discovery of nonlinear, high‑order interactions; generation of human‑readable formulas that facilitate economic interpretation; and the ability to visualize the entire system of interdependencies as a network, which can guide policymakers in identifying leverage points. Limitations are also acknowledged. GP is computationally intensive, especially when scaling to hundreds of variables or longer time horizons; the multi‑objective fitness formulation can be sensitive to the chosen weighting of error versus complexity; and the current implementation does not explicitly model lagged effects or seasonality, which are intrinsic to macro‑time series.

Future work is suggested along four lines: (1) exploiting parallel and GPU‑accelerated GP to reduce runtime; (2) integrating Bayesian optimization for automated hyper‑parameter tuning; (3) extending the symbolic regression to include lagged variables and periodic components, thereby capturing dynamic temporal structure; and (4) testing the framework on multi‑country, multi‑frequency datasets to assess its generalizability.

In conclusion, the paper demonstrates that GP‑based symbolic regression can serve as a powerful, interpretable tool for macro‑economic analysis. By automatically constructing a variable interaction network and delivering compact predictive formulas, the approach offers economists and decision‑makers a new lens through which to explore the intricate web of relationships that drive economic outcomes.