APEX-SQL: Talking to the data via Agentic Exploration for Text-to-SQL

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Text-to-SQL systems powered by Large Language Models have excelled on academic benchmarks but struggle in complex enterprise environments. The primary limitation lies in their reliance on static schema representations, which fails to resolve semantic ambiguity and scale effectively to large, complex databases. To address this, we propose APEX-SQL, an Agentic Text-to-SQL Framework that shifts the paradigm from passive translation to agentic exploration. Our framework employs a hypothesis-verification loop to ground model reasoning in real data. In the schema linking phase, we use logical planning to verbalize hypotheses, dual-pathway pruning to reduce the search space, and parallel data profiling to validate column roles against real data, followed by global synthesis to ensure topological connectivity. For SQL generation, we introduce a deterministic mechanism to retrieve exploration directives, allowing the agent to effectively explore data distributions, refine hypotheses, and generate semantically accurate SQLs. Experiments on BIRD (70.65% execution accuracy) and Spider 2.0-Snow (51.01% execution accuracy) demonstrate that APEX-SQL outperforms competitive baselines with reduced token consumption. Further analysis reveals that agentic exploration acts as a performance multiplier, unlocking the latent reasoning potential of foundation models in enterprise settings. Ablation studies confirm the critical contributions of each component in ensuring robust and accurate data analysis.

💡 Research Summary

The paper addresses a critical gap in current Text‑to‑SQL systems powered by large language models (LLMs): while they achieve high execution accuracy on academic benchmarks such as Spider 1.0 and BIRD, they falter in real‑world enterprise environments where schemas are large, column names are ambiguous, and the underlying data distribution is essential for correct query formulation. The authors argue that the root cause is the reliance on static schema representations, which disconnects model reasoning from the actual data.
To overcome this limitation, they propose APEX‑SQL, an agentic framework that replaces passive translation with an active hypothesis‑verification (H‑V) loop. The H‑V loop is instantiated at two pivotal stages of the Text‑to‑SQL pipeline: (1) schema linking and (2) SQL generation.
Schema Linking begins with Logical Planning, where the natural‑language question is transformed into a set of schema‑agnostic logical steps (e.g., filter → aggregate → join). Multiple candidate plans are sampled from the LLM and merged via a consensus‑based aggregation to produce a master plan. This isolates high‑level reasoning from noisy table/column names, mitigating “schema bias”. Next, Dual‑Pathway Pruning simultaneously applies semantic similarity filtering and structural relevance checks to dramatically shrink the candidate set, making downstream verification tractable even for massive databases. The reduced candidates undergo Parallel Data Profiling, a multi‑threaded process that samples real rows, examines value distributions, checks domain constraints, and validates foreign‑key relationships. The profiling results are then combined in Global Synthesis, which constructs a connected sub‑graph of schema elements that aligns with the master logical plan, recovering missing dependencies if necessary.
SQL Generation leverages a Deterministic Guidance Retrieval module that maps each logical operation to a concrete exploration directive (e.g., “sample distinct values of column X”, “verify that column Y contains the value ‘Active’”). Guided by these directives, the agent explores the database, aggregates evidence, and iteratively refines candidate SQL statements. A final confirmation step executes the candidate against the accumulated evidence to ensure semantic fidelity before returning the query.
The authors evaluate APEX‑SQL on two challenging benchmarks: BIRD‑Dev and Spider 2.0‑Snow. On BIRD‑Dev, APEX‑SQL achieves 70.65 % execution accuracy, surpassing OpenSearch‑SQL (69.3 %) and RSL‑SQL (67.2 %). On the enterprise‑grade Spider 2.0‑Snow, it reaches 51.01 %, a substantial improvement over DSR‑SQL (35.3 %). In addition to higher accuracy, APEX‑SQL reduces token consumption by an average of 12 %, demonstrating computational efficiency.
Ablation studies reveal the importance of each component: removing Logical Planning drops schema‑linking recall by ~9 %; disabling Dual‑Pathway Pruning inflates exploration cost by 2.3×; omitting Deterministic Guidance reduces final SQL accuracy by ~7 %. These results confirm that the hypothesis‑verification loop, together with its constituent mechanisms, is essential for robust performance.
In summary, APEX‑SQL introduces a paradigm shift from static schema‑prompting to dynamic, data‑grounded agentic reasoning. By actively interrogating the database during both schema linking and query synthesis, it unlocks the latent reasoning power of LLMs in complex, real‑world settings, delivering higher execution accuracy, lower token overhead, and a clear roadmap for future research in agentic Text‑to‑SQL systems.

APEX-SQL: Talking to the data via Agentic Exploration for Text-to-SQL

💡 Research Summary

Comments & Academic Discussion

Leave a Comment