SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation

SQL-ASTRA: Alleviating Sparse F eedback in Agentic SQL via Column-Set Matching and T rajectory Aggr egation Long Li 1 , † Zhijian Zhou 2 , 6 , † Jiangxuan Long 3 Peiyang Liu 4 W eidi Xu 5 Zhe W ang 1 Shirui Pan 1 , ∗ Chao Qu 2 , 7 , * 1 Grif ﬁth Univ ersity , Brisbane, Australia 2 Fudan Uni versity , Shanghai, China 3 The Uni versity of Hong K ong, Hong K ong, China 4 Peking Uni versity , Beijing, China 5 InFly , China 6 Shanghai Innov ation Institute, China 7 Shanghai Academy of Artiﬁcial Intelligence for Science, China long.li@griffithuni.edu.au Abstract Agentic Reinforcement Learning (RL) sho ws promise for complex tasks, but T ext-to- SQL remains mostly restricted to single-turn paradigms. A primary bottleneck is the credit assignment problem. In traditional paradigms, rew ards are determined solely by the ﬁnal-turn feedback, which ignores the intermediate pro- cess and leads to ambiguous credit e valuation. T o address this, we propose Agentic SQL, a framew ork featuring a uni versal two-tiered re- ward mechanism designed to provide ef fecti ve trajectory-lev el ev aluation and dense step-lev el signals. First, we introduce Aggregated Tra- jectory Reward (A TR) to resolve multi-turn credit assignment. Using an asymmetric transi- tion matrix, A TR aggre gates process-oriented scores to incenti vize continuous improvement. Lev eraging L yapunov stability theory , we prov e A TR acts as an energy dissipation operator, guaranteeing a c ycle-free policy and monotonic con ver gence. Second, Column-Set Matching Re ward (CSMR) pro vides immediate step-level rew ards to mitigate sparsity . By executing queries at each turn, CSMR con verts binary (0/1) feedback into dense [0 , 1] signals based on partial correctness. Evaluations on BIRD show a 5% gain ov er binary-re ward GRPO. No- tably , our approach outperforms SO T A Arctic- T ext2SQL-R1-7B on BIRD and Spider 2.0 us- ing identical models, propelling T e xt-to-SQL tow ard a robust multi-turn agent paradigm. 1 Introduction In recent years, Agentic Reinforcement Learn- ing (RL) has garnered signiﬁcant attention ( Zhang et al. , 2025a ; Zhu et al. , 2025 ), empowering Lar ge Language Models (LLMs) to engage in multi- turn interactions with en vironments to accomplish more complex tasks, such as deep research ( Ope- nAI , 2025 ; Xu and Peng , 2025 ; Huang et al. , 2025 ), W eb searching ( Jin et al. , 2025 ; Guo et al. , * Co-corresponding authors. † Equal contribution. 2025c ), and code execution ( Jiang et al. , 2025 ; Feng et al. , 2025 ). In most scenarios, the capa- bilities of these methods are enhanced through Reinforcement Learning with V eriﬁable Rewards (RL VR) ( Zhang et al. , 2025b ; Zhou et al. , 2026 ; T an et al. , 2025 ), where feedback is pro vided based on the correctness of the ﬁnal outcome ( Guo et al. , 2025a ; Li et al. , 2025b ; Cai et al. , 2025 ; Cai and Sugiyama , 2026 ). Ho wev er , when applied to complex tasks requir- ing exploratory reasoning, Agentic RL faces three core challenges: (1) Paradigm Constraint: While agents are designed for multi-turn interaction, most existing work in speciﬁc domains like T ext-to-SQL remains conﬁned to a single-turn, static generation paradigm ( Ma et al. , 2025 ; Y ao et al. , 2025 ; Ali et al. , 2025 ). This frame work fails to reﬂect the dynamic process of human data analysts who use multiple tentati ve queries to gather conte xt and re- ﬁne their strategy ( Kim et al. , 2024 ; Fürst et al. , 2024 ), which se verely limits the model’ s ability to solve complex, real-world problems ( Huo et al. , 2025 ; Lei et al. , 2025a ). (2) Credit Assignment: In multi-turn trajectories, ev aluation signals typi- cally rely e xclusi vely on ﬁnal-turn feedback ( Guo et al. , 2025d ; Feng et al. , 2025 ; Ding et al. , 2026 ; Pang et al. , 2026 ). This “all-or-nothing” approach treats the interaction sequence as a black box, in- troducing a pervasi v e credit assignment problem where the agent cannot distinguish which interme- diate steps contributed to the ﬁnal outcome. (3) Micro-le vel Reward Sparsity: Even when step- le vel feedback is a v ailable, it is often restricted to coarse, binary (0/1) signals based on ex ecution success ( Ma et al. , 2025 ; Lei et al. , 2025b ; Y ao et al. , 2025 ). Such sparse feedback ignores the rich information in “partially correct” queries, pro vid- ing insufﬁcient granular guidance and drastically restricting the efﬁcienc y and rob ustness of RL train- ing. T o systematically address these challenges, we Query CSMR CSMR Text SQL Code SQL Result Update Policy CSMR ATR Advantage √ Ground Truth Prediction Binary Reward = 1.0 Reward = 0.0 Reward = 0.53 CSMR Truth Prediction Figure 1: The framework of Agentic SQL. CSMR captures rich signals from error cases, whereas traditional binary rew ards ov erlook this information. propose an innov ati ve methodology: (1) W e construct a multi-turn interactive frame- work that enables agents to iteratively g ather con- text and reﬁne SQL queries through dynamic database interaction, effecti vely overcoming the single-turn limitations. (2) T o address the reliance on ﬁnal-turn feedback, we design the Aggregated T rajectory Reward (A TR) . Unlike traditional methods, A TR utilizes an Asymmetric T ransition Matrix to aggregate signals across the entire reasoning path, e xplicitly incenti vizing strategies that exhibit continuous and monotonic improv ement. Crucially , we provide a rigorous Theoretical Guarantee : by modeling the reasoning process as a dynamical system, we prov e that A TR acts as an ener gy dissipation oper- ator , mathematically ensuring a cycle-free policy and monotonic con ver gence based on L yapunov stability theory . (3) W e propose the Column-Set Matching Re- ward (CSMR) as an immediate, dense step-lev el reward . By ev aluating partial corr ectness via col- umn value-set normalization, CSMR transforms 0 / 1 outcomes into granular signals within the [0 , 1] range. This provides precise, step-by-step guidance and serves as the foundational input for A TR tra- jectory aggregation. 2 Methodology and Con vergence Analysis 2.1 Agentic SQL Framework T raditional Reinforcement Learning (RL) ap- proaches for Lar ge Language Models (LLMs) are mostly conﬁned to a single-turn interaction set- ting, aiming to maximize the expected rew ard J single-turn ( θ ) for a single response. J single-turn ( θ ) = E s ∼D ,a ∼ π θ ( ·| s ) [ R ( s, a )] . Ho wev er , this paradigm is insufﬁcient for complex, exploratory tasks lik e T ext-to-SQL, which require the agent to engage in multi-step interaction, strat- egy iteration, and error reco very . T o accurately model this dynamic, we formal- ize the interactiv e T ext-to-SQL task as a Finite- Horizon Markov Decision Process (MDP) deﬁned by the tuple M = ⟨S , A , P , R⟩ . An agent learns a policy π θ by generating interaction trajectories τ = ( s 0 , a 0 , . . . , s K ) to maximize the expected cu- mulati ve re ward based on our process-based re ward function R : J multi-turn ( θ ) = E τ ∼ π θ " K − 1 X t =0 R ( s t , a t ) # (1) The ov erall framew ork is illustrated in Figure 1 . In each iteration, after generating the SQL code, the agent interacts with the database to retriev e the ex ecution result and subsequently employs CSMR to deri ve a dense re w ard. Upon completion of the multi-turn generation, the A TR mechanism aggre- gates the step-wise CSMRs to formulate the advan- tage for the entire trajectory . 2.2 Reward Design Algorithm 1 CSMR (Column-Set Matching Re- ward) with Perfect Match Check Require: Gold result table G , Predicted result table P , Scal- ing factor α (e.g., 0.8) Ensure: Reward score R CSMR 1: ▷ Step 0: Check for P erfect Match 2: if I S P ER F E C T M A T C H (G, P) then ▷ e.g., set(G) == set(P) 3: r eturn 1.0 4: end if 5: ▷ Step 1: Column V alue-Set Extraction (for partial score) 6: G ′ ← D E D U P L I C ATE R OW S ( G ) 7: P ′ ← D E D U P L I C ATE R OW S ( P ) 8: N G c ← G E T N U M C O L UM N S ( G ′ ) 9: N P c ← G E T N U M C O L UM N S ( P ′ ) 10: D G ← ∅ ▷ Initialize set of gold column-sets 11: D P ← ∅ ▷ Initialize set of predicted column-sets 12: for i ← 1 to N G c do 13: C G i ← G E T U N I QU E V A LU E S I N C O L U MN ( G ′ , i ) 14: D G ← D G ∪ { C G i } 15: end for 16: for j ← 1 to N P c do 17: C P j ← G E T U N I Q U E V A L U E S I N C O L U MN ( P ′ , j ) 18: D P ← D P ∪ { C P j } 19: end for 20: ▷ Step 2: Column-Set Matching 21: M ← 0 ▷ Initialize match count 22: for each C G ∈ D G do 23: if exists C P ∈ D P s.t. C G = C P then 24: M ← M + 1 25: end if 26: end for 27: ▷ Step 3: Compute Score 28: N prod ← N G c × N P c 29: if N prod = 0 then 30: R CSMR ← 0 . 0 31: else 32: R CSMR ← M 2 N prod × α 33: end if 34: return R CSMR 2.2.1 Column-Set Matching Reward As sho wn on the right side of Figure 1 , in T ext-to- SQL tasks, traditional re ward mechanisms typically employ a sparse binary 0/1 signal. This mechanism requires the ex ecution result of a predicted query , P , to perfectly match the exe cution result of the gold answer , G . Any slight discrepanc y (e.g., ro w ordering, partially missing data) results in a rew ard of 0. This sparse reward signal signiﬁcantly hin- ders RL con v ergence, as it ignores the substantial “partially correct” information that predicted results may contain. T o address this, we propose Column-Set Match- ing Re ward (CSMR), a nov el, denser re ward func- tion designed to measure the meaningful structur al overlap between P and G . Our core idea is to mov e away from comparing r ows (tuples) and in- stead compare the sets of values within each col- umn . Speciﬁcally , our re ward calculation process is detailed in Algorithm 1 . The CSMR mechanism, using product-based normalization, ef fectiv ely captures partial corr ect- ness . It provides a dense reward signal for matching column value-sets, e ven with incorrect ro w compo- sition, thereby alle viating the sparsity problem of binary 0/1 re wards. W e note that CSMR is insensiti ve to ro w-lev el combinations. For example, P = [[a1, b2], [a2, b1]] would perfectly match the column-sets of G = [[a1, b1], [a2, b2]] . T o address this, we introduce a scaling factor α < 1 . 0 (e.g., 0.8). This factor caps the maximum rew ard from CSMR at α , intentionally distinguishing these ‘pseudo-perfect’ (column-only) matches from truly perfect ro w-lev el matches. 2.3 Aggregated T rajectory Reward and Theoretical Guarantee While CSMR provides spatial density , it is insuf- ﬁcient for ev aluating the temporal quality of a complete multi-step trajectory τ . In a traditional RL setup, one might employ a Step-wise Update strategy , which assigns and backpropagates re ward signals r t immediately after each interaction turn. Ho wev er , this nai v e approach suf fers from high gra- dient variance due to inconsistent signal scaling across heterogeneous database schemas. Further- more, it is susceptible to reward hacking —a phe- nomenon where agents learn to exploit the re ward function through intentional oscillations. T o overcome these limitations, we design the Ag- gregated T rajectory Reward (A TR) R A TR . Un- like step-wise reinforcement, A TR is a single scalar re ward provided only at episode termination to explicitly incentivize continuous improv ement while suppressing oscillatory behaviors. W e pro- vide a detailed ablation study comparing A TR with the Step-wise Update in Section 3.3.1 . A TR Calculation The A TR aggregates the tra- jectory’ s utility by weighing both the direction and magnitude of semantic changes. The total rew ard is computed as: The step-wise transition function Ψ is deri ved from an asymmetric matrix M scaled by the gra- dient magnitude | ∆ R t | , where ∆ R t = R t − R t − 1 . The core inductive bias is encoded in M , where each element represents a speciﬁc semantic behav- ior: M =  R Low → Lo w R Low → High R High → Low R High → High  =  − 0 . 0 +1 . 0 − 1 . 5 0 . 0  Ψ( R t − 1 , R t ) = M s t − 1 ,s t · ( | ∆ R t | if s t − 1  = s t 1 . 0 if s t − 1 = s t (2) The binary states ( s t − 1 , s t ) are determined dynam- ically by the gradient and a stagnation threshold τ : (i) T ransition ( s t − 1  = s t ): A positi ve gradient ∆ R t > 0 triggers an impro vement transition, while ∆ R t < 0 triggers degradation. (ii) Stagnation ( s t − 1 = s t ): When ∆ R t ≈ 0 , the state is sustained as High if R t > τ , otherwise it remains Lo w . This design ensures that the agent is rew arded pro- portionally to the e xtent of improvement ( | ∆ R t | ), while the asymmetric M (with | R High → Low | > | R Low → High | ) ensures strict dissipati vity . Theoretical Foundation: L yapunov-Guided Con ver gence W e demonstrate that the A TR mechanism is not merely heuristic, but acts as a necessary energy dissipation operator, guarantee- ing asymptotic stability (i.e., con ver gence to the correct SQL query) from the perspective of L ya- punov stability theory ( T aylor et al. , 2018 ). 1. Semantic Error Energy (L yapunov Candi- date Function): W e transform the CSMR score Φ( s t ) = R CSMR ( s t ) into the Semantic Err or En- ergy V ( s t ) : V ( s t ) = 1 − Φ( s t ) (3) This function satisﬁes the fundamental L yapunov properties, where V ( s ) = 0 if and only if the sys- tem reaches the equilibrium point s ∗ (the correct SQL). 2. Stability Criterion and A TR’s Dissipa- tive Role: Asymptotic stability requires the en- ergy sequence to be strictly decreasing, ∆ V ( s t ) = V ( s t +1 ) − V ( s t ) < 0 . Our R step is explicitly de- signed to approximate the negation of the orbital deri vati ve: R step ( s t , s t +1 ) ∝ −∇ V ( s t ) ≈ Φ( s t +1 ) − Φ( s t ) (4) Maximizing the cumulativ e A TR implicitly opti- mizes the policy π θ to maximize the rate of energy dissipation, accelerating con ver gence to wards s ∗ . 3. Necessity of Asymmetric Shaping: Elimi- nating Limit Cycles Multi-turn agents often suf fer from limit cycles , oscillating between suboptimal states. W e prov e an Asymmetric T ransition Matrix M is necessary to eliminate such cycles. While symmetric rew ards satisfy P t ∈C R t = 0 for any cycle C —yielding a merely Lagr ange stable ( Leine , 2010 ; Zhang and Zeng , 2018 ) and oscillatory system—A TR imposes | R High → Low | > | R Low → High | , ensuring a net rew ard loss ( P t ∈C R t < 0 ). This transforms the reasoning into a strictly dissipative system , forcing π ∗ to be cycle-free along a monotonic ener gy-descent path (Appendix B ). 4. Robustness via Quantized F eedback: The threshold τ in the A TR calculation acts as a quan- tized feedback ﬁlter , akin to techniques in Sliding Mode Control. S t = I (Φ( s t ) > τ ) Gi ven the stochastic nature of LLMs, the contin- uous potential Φ( s t ) may contain high-frequency noise. By discretizing the state transitions, we en- sure that the L yapunov deriv ati ve ∆ V dri ves policy updates only when a signiﬁcant semantic phase transition occurs, thereby enhancing the robust- ness of the learning process. 2.4 GRPO T raining with T ool Masking W e use the GRPO algorithm to optimize our policy model ( Guo et al. , 2025a ). For a speciﬁc question- answer pair ( q , a ) , GRPO’ s underlying behavior policy , π θ old , generates a group of G indi vidual re- sponses, denoted as { o i } G i =1 . The advantage for the i -th response within this ensemble is then cal- culated by normalizing the rew ards speciﬁc to that group, { R A TR ,i } G i =1 : ˆ A i,t = R A TR ,i − mean ( { R A TR ,j } G j =1 ) std ( { R A TR ,j } G j =1 ) . (5) Thus, e very token in the trajectory uses this nor- malized re ward as its adv antage. Additionally , we introduce a binary mask, M i,t = ( 1 if o i,t is reasoning token 0 if o i,t is ex ecution token (6) This loss masking ensures the model focuses on learning the reasoning process. W e deﬁne the clipped surrogate objective as ρ clip i,t ( θ ) = clip ( ρ i,t ( θ ) , 1 − ε, 1 + ε ) . The optimized GRPO loss is: L group ( θ ) = 1 G G X i =1 1 | o i | | o i | X t =1 M i,t min  ρ i,t ( θ ) ˆ A i,t , ρ clip i,t ( θ ) ˆ A i,t  L ( θ ) = − E ( q ,a ) ∼D o i ∼ π θ old ( ·| q ) h L group ( θ ) i (7) where G is the number of sampled trajectories per group, and ρ i,t ( θ ) = π θ ( o i,t | q ,o i, | R Low → High | . Let R High → Low = − k · r and R Low → High = r where k > 1 . The total cycle return becomes: X t ∈C R t = (1 − k ) X r < 0 (8) 3. Conclusion: Per L yapunov stability theory , since ev ery cycle incurs a net cost, the system acts as an energy dissipation operator . This net loss forces the trajectory to terminate at the global maximum, effecti v ely pruning all limit cycles from the polic y space. C T raining Detail T able 2: Hyperparameters for RL T raining Hyperparameter V alue Batch Size 128 Learning Rate 1e-6 Rollout T emperature 1.0 Rollout T op-p 0.95 V alidation T emperature 0.6 / 1.0 V alidation T op-p 0.95 PPO Epochs 1 Max Response Length 2048 / 4096 Number of Rollouts 8 T raining Epochs 5 GPU Con 32 * A800 T raining Data 8,958/17614 B 2 α 0.8 C turn 0.0001 τ 0.6 W e utilize the BIRD training set to train our models. This dataset comprises 9,428 question- SQL pairs from 70 databases across div erse do- mains such as airlines, movies, and sales. W e ﬁl- tered out samples where the gold SQL query failed to execute, resulting in a ﬁnal set of 8,958 train- ing instances. For ev aluation, we primarily use the BIRD-Dev set to assess in-domain capabili- ties. W e introduce the Spider benchmark to test the model’ s out-of-distribution (OOD) generaliza- tion. Additionally , we leverage the challenging enterprise-grade Spider -2.0 dataset to e valuate the model’ s Agent capabilities. This dataset targets highly dif ﬁcult real-world scenarios, representing a strong indicator of practical T ext-to-SQL deploy- ment readiness. Speciﬁcally , we extract a subset of 135 locally executable examples from Spider- 2.0-Lite. T o ensure a fair comparison, we e valuate both SQL-R1 and Arctic-T ext2SQL-R1-7B using the identical conﬁguration. Throughout all ev alua- tions, we adopt a 0/1 re ward mechanism, wher e a query is considered correct only if all rows in the ex ecution result match the ground truth exactly . Our experiments employed two sets of trials. The ﬁrst set in volv ed the Qwen2.5-7B-Instruct model, which underwent Reinforcement Learning (RL) training directly without cold start. The sec- ond set in v olved the OmniSQL model, which re- quired the F ormat-6k ﬁne-tuning step to teach it the tool-calling format. Format-6k denotes that a sub- set of data w as randomly sampled from the training set and formatted into the tool-calling format. During our experimentation, we observed that general-purpose code models, such as Qwen2.5- Coder ( Hui et al. , 2024 ), are overly special- ized. This specialization leads to poor instruction- following performance, rendering them unable to correctly learn the required Agent format. Even when SFT techniques were applied to correct their output format, the model failed to exhibit strong exploratory capabilities in subsequent multi-turn generation tasks. For the RL framework, we utilize the open- source library VERL; for the Spider 2.0 ev alua- tion, we employ the open-source repository Re- FoRCE ( Deng et al. , 2025 ). The hyperparameter conﬁgurations are detailed in T able 2 . A single Reinforcement Learning ex- periment for Agentic SQL takes approximately 24 hours to complete. Speciﬁcally , α = 0 . 8 serves as the scaling factor for the CSMR to cap struc- tural matches at 0.8, effecti vely distinguishing them from truly perfect matches. The parameter B = 2 deﬁnes the re ward clipping boundary to constrain the total rew ard within [ − 2 , 2] for training stabil- ity . Additionally , C turn = 0 . 0001 introduces a turn penalty to incentivize concise reasoning, while τ = 0 . 6 acts as a stagnation threshold to determine state transitions for the A TR calculation. D T urn Corr ect Distrib ution 1 2 3 T ur n Count 0 200 400 600 800 1000 1200 1400 1600 Number of Samples T urn Count Distribution Across Different T raining Methods Single- T ur n GRPO Agentic SQL Agentic SQL + A TR Figure 5: T urn count distribution for models using CSMR on the BIRD-Dev . T able 3: Model Performance on T urn-n Interactions. Model T urn 1 Turn 2 Turn 3 Single-T urn GRPO 59.4 - - Agentic SQL 62.2 66.0 39.5 Agentic SQL + A TR 72.7 71.0 36.2 Figure 5 presents the turn-number distribution (i.e., the percentage of generations using 2 turns, 3 turns, etc.) for the RL-trained models. Con versely , T able 3 reports the accuracy achiev ed by these mod- els at a speciﬁed number of N turns. W e observe that different rew ard functions re- sult in markedly distinct distrib utions. The model trained without A TR exhibits a more balanced dis- tribution, generating outputs across 1, 2, and 3 turns. In contrast, the model employing A TR al- most entirely abandons single-turn generation, in- stead focusing heavily on 2-turn and 3-turn genera- tions, where it simultaneously achieves the highest ov erall accuracy . E Additional Experiment For experimental simplicity , we conduct these com- parati ve trials using a single-turn GRPO baseline. The primary objecti v e of α is to pro vide a clear dis- tinction between truly perfect samples and “pseudo- perfect” matches—cases where the column v alue- sets match but the row-le vel compositions are in- correct, such as P = [[a1, b2], [a2, b1]] and G = [[a1, b1], [a2, b2]] . By incorporating α , the model ef fecti vely learns to differentiate be- tween these two scenarios, thereby validating the ef ﬁcacy of our approach. While “pseudo-perfect” matches may be introduced, this mechanism en- sures they are appropriately penalized to a void mis- leading the learning process. T able 4: Sensitivity Analysis of the Scaling F actor α in CSMR. Model Bird-Dev Single-T urn GRPO α = 0 . 6 59.2 α = 0 . 7 60.1 α = 0 . 8 59.4 F Case Study In this section, we present a complete multi-turn case study . It illustrates how the agent uses in- formation about missing columns returned by the ex ecutor to iteratively reﬁne its plan, ultimately leading to the correct result. System Prompt You are a helpful SQL assistant. You will solve the user ' s question by calling a tool to execute SQL code and query a SQL database. After each tool call, you must evaluate the result returned in to determine its relevance to the question, and decide for yourself whether to make another query. You may also use a tool to inspect the database schema first before providing your final SQL code. You can call the tool a maximum of 3 times. User Prompt Task Overview: You are a data science expert. Below, you are provided with a database schema and a natural language question. Your task is to understand the schema and generate a valid SQL query to answer the question. Database Engine: SQLite Database Schema: CREATE TABLE frpm ( CDSCode text, -- example: [ ' 01100170109835 ' , ' 01100170112607 ' ] ` Academic Year ` text, -- example: [ ' 2014-2015 ' ] ` County Code ` text, -- example: [ ' 01 ' , ' 02 ' ] ` District Code ` integer , -- example: [10017, 31609] ` School Code ` text, -- example: [ ' 0109835 ' , ' 0112607 ' ] ` County Name ` text, -- example: [ ' Alameda ' , ' Alpine ' ] ` District Name ` text, -- example: [ ' Alameda County Office of Education ' , ' California School for the Blind ( State S ' ] ` School Name ` text, -- example: [ ' FAME Public Charter ' , ' Envision Academy for Arts & Technology ' ] ` District Type ` text, -- example: [ ' County Office of Education (COE) ' , ' State Special Schools ' ] ` School Type ` text, -- example: [ ' K-12 Schools (Public) ' , ' High Schools ( Public) ' ] ` Educational Option Type ` text, -- example: [ ' Traditional ' , ' Juvenile Court School ' ] ` NSLP Provision Status ` text, -- example: [ ' Breakfast Provision 2 ' , ' Provision 2 ' ] ` Charter School (Y/N) ` integer , -- example: [1, 0] ` Charter School Number ` text, -- example: [ ' 0728 ' , ' 0811 ' ] ` Charter Funding Type ` text, -- example: [ ' Directly funded ' , ' Locally funded ' ] IRC integer , -- example: [1, 0] ` Low Grade ` text, -- example: [ ' K ' , ' 9 ' ] ` High Grade ` text, -- example: [ ' 12 ' , ' 8 ' ] ` Enrollment (K-12) ` real, -- example: [1087.0, 395.0] ` Free Meal Count (K-12) ` real, -- example: [565.0, 186.0] ` Percent (%) Eligible Free (K-12) ` real, -- example: [0.5197, 0.4708] ` FRPM Count (K-12) ` real, -- example: [715.0, 186.0] ` Percent (%) Eligible FRPM (K-12) ` real, -- example: [0.6577, 0.4708] ` Enrollment (Ages 5-17) ` real, -- example: [1070.0, 376.0] ` Free Meal Count (Ages 5-17) ` real, -- example: [553.0, 182.0] ` Percent (%) Eligible Free (Ages 5-17) ` real, -- example: [0.5168, 0.4840] ` FRPM Count (Ages 5-17) ` real, -- example: [702.0, 182.0] ` Percent (%) Eligible FRPM (Ages 5-17) ` real, -- example: [0.6560, 0.4840] ` 2013-14 CALPADS Fall 1 Certification Status ` integer , -- example: [1] PRIMARY KEY (CDSCode), CONSTRAINT fk_frpm_cdscode FOREIGN KEY (CDSCode) REFERENCES schools ( CDSCode) ); -- (Other tables satscores and schools are omitted for brevity, but follow same pattern) CREATE TABLE satscores ( cds text, -- example: [ ' 10101080000000 ' , ' 10101080109991 ' ] rtype text, -- example: [ ' D ' , ' S ' ] sname text, -- school name, example: [ ' FAME Public Charter ' , ' Envision Academy for Arts & Technology ' ] dname text, -- district name, example: [ ' Alameda County Office of Education ' , ' Alameda Unified ' ] cname text, -- county name, example: [ ' Alameda ' , ' Amador ' ] enroll12 integer , -- enrollment (1st -12nd grade), example: [398, 62] NumTstTakr integer , -- Number of Test Takers, example: [88, 17] AvgScrRead integer , -- average scores in Reading, example: [418, 503] AvgScrMath integer , -- average scores in Math, example: [418, 546] AvgScrWrite integer , -- average scores in writing, example: [417, 505] NumGE1500 integer , -- Number of Test Takers Whose Total SAT Scores Are Greater or Equal to 1500, example: [14, 9] PRIMARY KEY (cds), CONSTRAINT fk_satscores_cds FOREIGN KEY (cds) REFERENCES schools (CDSCode) ); CREATE TABLE schools ( CDSCode text, -- example: [ ' 01100170000000 ' , ' 01100170109835 ' ] NCESDist text, -- National Center for Educational Statistics school district identification number, example: [ ' 0691051 ' , ' 0600002 ' ] NCESSchool text, -- National Center for Educational Statistics school identification number, example: [ ' 10546 ' , ' 10947 ' ] StatusType text, -- example: [ ' Active ' , ' Closed ' ] County text, -- example: [ ' Alameda ' , ' Alpine ' ] District text, -- example: [ ' Alameda County Office of Education ' , ' California School for the Blind ( State S ' ] School text, -- example: [ ' FAME Public Charter ' , ' Envision Academy for Arts & Technology ' ] Street text, -- example: [ ' 313 West Winton Avenue ' , ' 39899 Balentine Drive, Suite 335 ' ] StreetAbr text, -- street address, example: [ ' 313 West Winton Ave. ' , ' 39899 Balentine Dr., Ste. 335 ' ] City text, -- example: [ ' Hayward ' , ' Newark ' ] Zip text, -- example: [ ' 94544-1136 ' , ' 94560-5359 ' ] State text, -- example: [ ' CA ' ] MailStreet text, -- example: [ ' 313 West Winton Avenue ' , ' 39899 Balentine Drive, Suite 335 ' ] MailStrAbr text, -- mailing street address, example: [ ' 313 West Winton Ave. ' , ' 39899 Balentine Dr., Ste. 335 ' ] MailCity text, -- mailing city, example: [ ' Hayward ' , ' Newark ' ] MailZip text, -- mailing zip, example: [ ' 94544-1136 ' , ' 94560-5359 ' ] MailState text, -- mailing state, example: [ ' CA ' ] Phone text, -- example: [ ' (510) 887-0152 ' , ' (510) 596-8901 ' ] Ext text, -- extension, example: [ ' 130 ' , ' 1240 ' ] Website text, -- example: [ ' www.acoe. org ' , ' www.envisionacademy.org/ ' ] OpenDate date , -- example: [ ' 2005-08-29 ' , ' 2006-08-28 ' ] ClosedDate date , -- example: [ ' 2015-07-31 ' , ' 2015-06-30 ' ] Charter integer , -- example: [1, 0] CharterNum text, -- example: [ ' 0728 ' , ' 0811 ' ] FundingType text, -- example: [ ' Directly funded ' , ' Locally funded ' ] DOC text, -- District Ownership Code, example: [ ' 00 ' , ' 31 ' ] DOCType text, -- The District Ownership Code Type, example: [ ' County Office of Education (COE) ' , ' State Special Schools ' ] SOC text, -- School Ownership Code, example: [ ' 65 ' , ' 66 ' ] SOCType text, -- School Ownership Code Type, example: [ ' K-12 Schools ( Public) ' , ' High Schools (Public) ' ] EdOpsCode text, -- Education Option Code, example: [ ' TRAD ' , ' JUV ' ] EdOpsName text, -- Educational Option Name, example: [ ' Traditional ' , ' Juvenile Court School ' ] EILCode text, -- Educational Instruction Level Code, example: [ ' ELEMHIGH ' , ' HS ' ] EILName text, -- Educational Instruction Level Name, example: [ ' Elementary-High Combination ' , ' High School ' ] GSoffered text, -- grade span offered, example: [ ' K-12 ' , ' 9-12 ' ] GSserved text, -- grade span served., example: [ ' K-12 ' , ' 9-12 ' ] Virtual text, -- example: [ ' P ' , ' N ' ] Magnet integer , -- example: [0, 1] Latitude real, -- example: [37.658212, 37.521436] Longitude real, -- example: [-122.09713, -121.99391] AdmFName1 text, -- administrator ' s first name 1, example: [ ' L Karen ' , ' Laura ' ] AdmLName1 text, -- administrator ' s last name 1, example: [ ' Monroe ' , ' Robell ' ] AdmEmail1 text, -- administrator ' s email address 1, example: [ ' lkmonroe@acoe.org ' , ' laura@envisionacademy.org ' ] AdmFName2 text, -- administrator ' s first name 2, example: [ ' Sau-Lim ( Lance) ' , ' Jennifer ' ] AdmLName2 text, -- administrator ' s last name 2, example: [ ' Tsang ' , ' Koelling ' ] AdmEmail2 text, -- administrator ' s email address 2, example: [ ' stsang@unityhigh.org ' , ' jkoelling@efcps.net ' ] AdmFName3 text, -- administrator ' s first name 3, example: [ ' Drew ' , ' Irma ' ] AdmLName3 text, -- administrator ' s last name 3, example: [ ' Sarratore ' , ' Munoz ' ] AdmEmail3 text, -- administrator ' s email address 3, example: [ ' dsarratore@vincentacademy.org ' , ' gmunoz@piedmont.k12.ca.us ' ] LastUpdate date , -- example: [ ' 2015-06-23 ' , ' 2015-09-01 ' ] PRIMARY KEY (CDSCode) ); This schema describes the database ' s structure, including tables, columns, primary keys, foreign keys, and any relevant relationships or constraints. Question: Charter schools refers to ` Charter School (Y/N) ` = 1 in the frpm Please list the phone numbers of the direct charter-funded schools that are opened after 2000/1/1. Instructions: - Make sure you only output the information that is asked in the question. If the question asks for a specific column, make sure to only include that column in the SELECT clause, nothing more. - The generated query should return all of the information asked in the question without any missing or extra information. - Before generating the final SQL query, please think through the steps of how to write the query. Output Format: In your answer, please enclose the generated SQL query in a code block: ``` sql -- Your SQL query ``` Take a deep breath and think step by step to find the correct SQL query. For each round ' s response, please first think step- by -step, and then make the tool call according to this format: { "name": "run_sql_remote", "arguments": { "query": " ``` sql\n-- Your SQL query\n ``` ", "db_id": "california_schools/ california_schools.sqlite" } } Response-T urn 1 To find the phone numbers of the direct charter-funded schools that are opened after 2000/1/1, we need to follow these steps: 1. Filter the schools to include only those with ` Charter School (Y/N) ` = 1 (charter schools). 2. Ensure the schools are direct charter- funded, which can be checked from the ` Charter Funding Type ` column in the ` frpm ` table . 3. Verify that the schools were opened after 2000/1/1 using the ` OpenDate ` column in the ` schools ` table . Let ' s write the SQL query step by step: 1. Join the ` frpm ` table with the ` schools ` table using the ` CDSCode ` column. 2. Apply the filter for charter schools ( ` Charter School (Y/N) ` = 1). 3. Check for direct charter funding ( ` Charter Funding Type ` = ' Directly funded ' ). 4. Ensure the schools were opened after 2000/1/1 ( ` OpenDate ` > ' 2000-01-01 ' ). 5. Select the phone numbers from the ` schools ` table. Now, let ' s write the SQL query: { "name": "run_sql_remote", "arguments": { "query": " ``` sql\nSELECT s.Phone\ nFROM schools s\nJOIN frpm f ON s. CDSCode = f.CDSCode\nWHERE f. ` Charter School (Y/N) ` = 1 AND s. ` Charter Funding Type ` = ' Directly funded ' AND s.OpenDate > ' 2000-01-01 ' \n ``` ", "db_id": "california_schools/ california_schools.sqlite" } } SQL Executor Interaction Results error: no such column : s.Charter Funding Type Response-T urn 2 I apologize for the oversight. The ` Charter Funding Type ` column is actually in the ` frpm ` table , not the ` schools ` table . Therefore, we should join the ` frpm ` table with the ` schools ` table and apply the filters using the columns from the ` frpm ` table . Let ' s correct the query: 1. Join the ` frpm ` table with the ` schools ` table using the ` CDSCode ` column. 2. Apply the filter for charter schools ( ` Charter School (Y/N) ` = 1). 3. Check for direct charter funding ( ` Charter Funding Type ` = ' Directly funded ' ) in the ` frpm ` table. 4. Ensure the schools were opened after 2000/1/1 ( ` OpenDate ` > ' 2000-01-01 ' ). 5. Select the phone numbers from the ` schools ` table. Now, let ' s write the corrected SQL query: { "name": "run_sql_remote", "arguments": { "query": " ``` sql\nSELECT s.Phone\ nFROM schools s\nJOIN frpm f ON s. CDSCode = f.CDSCode\nWHERE f. ` Charter School (Y/N) ` = 1 AND f. ` Charter Funding Type ` = ' Directly funded ' AND s.OpenDate > ' 2000-01-01 ' \n ``` ", "db_id": "california_schools/ california_schools.sqlite" } } SQL Executor Interaction Results [[None], [ ' (510) 596-8901 ' ], [None], [ ' (510) 686-4131 ' ], [ ' (510) 452-2063 ' ], [ ' (510) 842-1181 ' ], [ ' (510) 748-4008 ' ], [ ' (510) 748-4017 ' ], [ ' (510) 995-4300 ' ], [ ' (510) 748-4314 ' ], [ ' (510) 809-9800 ' ], [ ' (510) 809-9800 ' ], [ ' (510) 300-1340 ' ], [ ' (510) 300-1560 ' ], Response-T urn 2 The phone numbers of the direct charter- funded schools that are opened after 2000/1/1 are as follows: ``` sql SELECT s.Phone FROM schools s JOIN frpm f ON s.CDSCode = f.CDSCode WHERE f. ` Charter School (Y/N) ` = 1 AND f. ` Charter Funding Type ` = ' Directly funded ' AND s.OpenDate > ' 2000-01-01 ' ```

SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment