Beyond Accuracy and Complexity: The Effective Information Criterion for Structurally Stable Symbolic Regression

Beyond Accuracy and Complexity: The Effective Information Criterion for Structurally Stable Symbolic Regression
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Symbolic regression (SR) traditionally balances accuracy and complexity, implicitly assuming that simpler formulas are structurally more rational. We argue that this assumption is insufficient: existing algorithms often exploit this metric to discover accurate and compact but structurally irrational formulas that are numerically ill-conditioned and physically inexplicable. Inspired by the structural stability of real physical laws, we propose the Effective Information Criterion (EIC) to quantify formula rationality. EIC models formulas as information channels and measures the amplification of inherent rounding noise during recursive calculation, effectively distinguishing physically plausible structures from pathological ones without relying on ground truth. Our analysis reveals a stark structural stability gap between human-derived equations and SR-discovered results. By integrating EIC into SR workflows, we provide explicit structural guidance: for heuristic search, EIC steers algorithms toward stable regions to yield superior Pareto frontiers; for generative models, EIC-based filtering improves pre-training sample efficiency by 2-4 times and boosts generalization R2 by 22.4%. Finally, an extensive study with 108 human experts shows that EIC aligns with human preferences in 70% of cases, validating structural stability as a critical prerequisite for human-perceived interpretability.


💡 Research Summary

The paper challenges the prevailing paradigm in symbolic regression (SR) that evaluates candidate formulas solely on predictive accuracy and syntactic complexity (typically measured by symbol count). While these two objectives have driven impressive results, the authors argue that they overlook a crucial dimension: structural rationality. Real physical laws are not only accurate and concise but also robust to numerical perturbations; they remain stable even when evaluated with low‑precision tools such as slide rules. In contrast, many state‑of‑the‑art SR algorithms frequently produce formulas that, despite achieving high R² and short length, contain pathological nesting (e.g., sin(sin(cot(x)))) that amplifies rounding errors, leads to catastrophic cancellation, and is physically implausible.

To address this gap, the authors introduce the Effective Information Criterion (EIC), a novel, parameter‑free metric that quantifies the structural stability of a symbolic expression. The key idea is to view a formula as a computation tree whose nodes are elementary operators (addition, multiplication, trigonometric functions, etc.). Each node is assumed to introduce a small multiplicative rounding noise εₖ with zero mean and variance σ². Propagating this noise through the tree yields a relative cumulative error ηₖ = (ỹₖ – yₖ)/yₖ at each node. The authors define a variance‑amplification factor sₖ²(x) = lim_{σ→0} Var


Comments & Academic Discussion

Loading comments...

Leave a Comment