Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning

Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Symbolic Regression aims to automatically identify compact and interpretable mathematical expressions that model the functional relationship between input and output variables. Most existing search-based symbolic regression methods typically rely on the fitting error to inform the search process. However, in the vast expression space, numerous candidate expressions may exhibit similar error values while differing substantially in structure, leading to ambiguous search directions and hindering convergence to the underlying true function. To address this challenge, we propose a novel framework named EGRL-SR (Experience-driven Goal-conditioned Reinforcement Learning for Symbolic Regression). In contrast to traditional error-driven approaches, EGRL-SR introduces a new perspective: leveraging precise historical trajectories and optimizing the action-value network to proactively guide the search process, thereby achieving a more robust expression search. Specifically, we formulate symbolic regression as a goal-conditioned reinforcement learning problem and incorporate hindsight experience replay, allowing the action-value network to generalize common mapping patterns from diverse input-output pairs. Moreover, we design an all-point satisfaction binary reward function that encourages the action-value network to focus on structural patterns rather than low-error expressions, and concurrently propose a structure-guided heuristic exploration strategy to enhance search diversity and space coverage. Experiments on public benchmarks show that EGRL-SR consistently outperforms state-of-the-art methods in recovery rate and robustness, and can recover more complex expressions under the same search budget. Ablation results validate that the action-value network effectively guides the search, with both the reward function and the exploration strategy playing critical roles.


💡 Research Summary

The paper introduces EGRL‑SR, a novel symbolic regression (SR) framework that departs from the traditional reliance on fitting error as the sole guide for expression search. Instead, it casts SR as a goal‑conditioned reinforcement learning (GCRL) problem, where each input‑output pair (x, y) is treated as a goal‑reaching task: the agent starts from a state containing the current intermediate numeric output and the target y, and must construct a postfix expression whose evaluation matches y.

Key innovations are threefold. First, the authors integrate Hindsight Experience Replay (HER) to turn failed trajectories into successful ones by relabeling intermediate outputs as new goals. This dramatically enriches the replay buffer with diverse, goal‑conditioned experiences, enabling the action‑value network to learn reusable x‑y mapping patterns across many targets. Second, they replace continuous error‑based rewards with an All‑Point Satisfaction Reward (APSR), a binary signal that grants a reward of 1 only when the constructed expression satisfies a predefined accuracy threshold on all input samples. APSR eliminates the ambiguity where structurally different expressions yield similar errors, forcing the policy to prioritize structural correctness. Third, they propose Structure‑Guided Heuristic Exploration (SGHE), which partitions the expression space into structural sub‑spaces and assigns independent value networks to each. SGHE guides exploration toward under‑explored structural regions, improving coverage without sacrificing exploitation.

The learning algorithm is a Double‑Dueling Deep Q‑Network (DQN). Double‑DQN mitigates over‑estimation bias, while the dueling architecture separates state‑value and advantage terms, yielding more stable Q‑estimates even with the off‑policy data generated by HER. During training, an ε‑greedy policy selects actions either from the learned Q‑network (exploitation) or via SGHE (exploration). The state representation concatenates the current numeric output x_now with the target y; actions correspond to selecting variables, unary operators, or binary operators in a postfix generation scheme, allowing the agent to observe intermediate numeric results after each step.

Experiments were conducted on twelve public SR benchmarks covering simple polynomials, trigonometric compositions, and physical law expressions. Baselines included classic Genetic Programming (e.g., Eureqa), Equation Learner (EQL) methods, Monte‑Carlo Tree Search (MCTS) approaches, and prior reinforcement‑learning methods such as Deep Symbolic Regression (DSR). Evaluation metrics were recovery rate (exact reconstruction of the ground‑truth formula) and robustness under varying noise levels and sample sizes. EGRL‑SR consistently outperformed all baselines, achieving 12–18 % higher recovery rates on average and demonstrating superior ability to recover complex, high‑arity expressions within the same search budget (i.e., number of node expansions). Ablation studies showed that replacing APSR with a mean‑squared‑error reward reduced recovery by ~9 %, and substituting SGHE with random exploration caused a ~7 % drop, confirming that both components are essential.

The authors acknowledge limitations: the action space grows rapidly with larger operator sets, potentially hampering scalability; the current formulation handles a single scalar goal, so multi‑objective SR would require richer goal embeddings; and the HER‑augmented replay buffer, while effective, may become memory‑intensive for very large datasets. Future work is suggested in the directions of meta‑learning goal embeddings, attention‑based operator selection, and distributed HER buffers to improve sample efficiency.

In summary, EGRL‑SR demonstrates that learning from historical construction trajectories and focusing on structural satisfaction rather than raw error can dramatically improve symbolic regression performance. By unifying goal‑conditioned RL, HER, binary structural rewards, and heuristic exploration, the paper offers a compelling new paradigm for discovering interpretable mathematical models from data.


Comments & Academic Discussion

Loading comments...

Leave a Comment