SEMAG: Self-Evolutionary Multi-Agent Code Generation

SEMA G: Self-Evolutionary Multi-Agent Code Generation Y ulin Peng 1 , Haowen Hou 2 , Xinxin Zhu 1, 2 , Y ing Tiffany He 1 , F . Richard Y u 3 1 College of Computer Science and Softw are Engineering, Shenzhen Univ ersity , China 2 Guangdong Laboratory of Artiﬁcial Intelligence and Digital Economy (SZ), China 3 School of Information T echnology , Carleton Uni versity , Canada Abstract Large Language Models (LLMs) have made signiﬁcant progress in handling complex pro- gramming tasks. Howe ver , current methods rely on manual model selection and ﬁxed w ork- ﬂows, which limit their ability to adapt to changing task comple xities. T o address this, we propose SEMA G, a Self-Evolutionary Multi- Agent code Generation framew ork that mim- ics human coding practices. It decomposes programming tasks into stages, including plan- ning, coding, debugging, and discussion, while adapting workﬂows to task difﬁculty . Its self- ev olutionary agents can access the latest mod- els in real time and automatically upgrade the backbone model. SEMAG sets new state-of- the-art Pass@1 accuracy across benchmarks. Using identical backbone models, SEMA G out- performs prior methods by 3.3% on CodeCon- tests. When augmented with self-e volutionary model selection that automatically identiﬁes optimal backbones, SEMAG reaches 52.6%, showcasing both framew ork effecti veness and adaptability to ev olving LLM capabilities. 1 Introduction Large Language Models (LLMs) have demon- strated substantial progress in code generation and completion, dri ven by lar ge-scale pretraining on div erse codebases. The GPT series ( Achiam et al. , 2023 ; Hurst et al. , 2024 ), CodeLLaMA-2 ( Roziere et al. , 2023 ), Qwen2.5-Coder ( Hui et al. , 2024 ), and DeepSeek-v3 ( Liu et al. , 2024 ) e xhibit strong coding capabilities, unlocking new a venues for automated software development. In parallel, multi-agent frame works and debugging-enhanced methodologies—such as planning-centric work- ﬂo ws ( Lei et al. , 2024 ), self-debugging paradigms ( Chen et al. , 2023 ), and collaborati ve agent systems ( Zhong et al. , 2024 )—hav e shown promising per - formance on standard benchmarks. Nonetheless, real-world scenarios present open-ended tasks, con- strained computational budgets, and e volving spec- Ac tion : web site_ search Li n ks w ith Sn ip pet Mo dels M odel Selec tin g Ag en t Task Typ e Lin k Selec tin g Ag en t Li n ks Se lec ted Lin k Su mmarizing A g en t Model Sel ec ti ng Ag en t Mo del D ec id ed W eb sit e Sear chin g Di scri mina ting A g en t Web S ummary Figure 1: Overvie w workﬂow of Self-Ev olution Agents. Agents integrate insights from recent research, news, and community discussions, dynamically identify and deploy the most suitable models. iﬁcations, revealing critical limitations in current approaches. First, frame works such as Self-Debugging ( Chen et al. , 2023 ), LDB ( Zhong et al. , 2024 ) typically adopt a ﬁxed r easoning depth . On simple tasks, they introduce unnecessarily complex w orkﬂows, leading to redundant computation and excessi ve token usage, while on dif ﬁcult tasks, the shallo w reasoning depth results in poor success rates. Al- though hierarchical prompting has been sho wn to mitigate unnecessary reasoning ( Budagam et al. , 2025 ), these approaches still lack a principled mechanism to adapt reasoning depth dynamically to task complexity . Second, current pipelines utilize a single debug- ging iteration. When initial outputs div erge signiﬁ- cantly from the tar get, systems are prone to local minima. Though advanced reasoning paradigms such as Chain-of-Thought ( W ei et al. , 2022 ), Tree- of-Thoughts ( Y ao et al. , 2023a ), and parallel candi- date exploration ( Li et al. , 2025 ) enhance complex reasoning, they lack explicit discussion–decision Pr obl em De sc r ip t ion R et urn a st r i ng conta i n i ng spa ce - del i m i t e d num ber s s t a r t i ng f r om 0 up t o n i n c l u s i v e. Hidd en T es t E v al u a tio n Deba t e St ag e Sel f - r e f ine Vi sible T es t E v al u a tio n R epea t u n til ma x a t t emp ts r ea c h ed Plann in g Ag en t Plan V er if y in g Ag en t Plan G ene r a t e d Plan V er ifi ed Codi ng Ag en t Pr og r am Embedd in g T r ace S t a t emen t Code Explai ning Ag ent Sug g es t in g Ag en t De bug g in g Ag en t Pr og r am R e fin ed Deba ti ng Ag en t s Di sc r imin a t in g Ag ent Pr og r am Ana ly si s Alg or it hms & P ar ame t e r s Alg or it hms & P ar ame t e r s Selected R e fin e Sug g es t ion U se t ools opt ion a ll y Pr og r am wit h T r ace T ask T y pe C ode task Self - E v ol u t io n Ag en t s L L Ms Selected Debu g St ag e Figure 2: Ov erview of SEMAG. (1) Self-Evolve: Agents dynamically select optimal backbone LLMs per task requirements. (2) Plan: Planning Agent creates solution plans v alidated by Plan V erifying Agent through I/O simulation. (3) Debug: Coding Agent generates code; upon failure, specialized agents (Embedding T race, Code Explaining, Suggesting, Deb ugging) collaborativ ely reﬁne using trace logs. (4) Debate: When deb ugging stalls, Debating Agents propose alternativ es with Discriminating Agent selecting the optimal conﬁguration. phases that aggre gate di verse reasoning trajectories for improv ed synthesis. Third, most systems are tightly coupled to a sin- gle backbone model. Frame works built on GPT ( Achiam et al. , 2023 ; Hurst et al. , 2024 ), Gemini ( T eam et al. , 2023 , 2024 ), or Claude ( Anthropic , 2024 ) typically depend on a static model through- out ex ecution. As task characteristics shift or new models emerge, backbone switching often requires manual intervention, limiting adaptability and scal- ability . T o address these challenges, we propose SEMA G , a S elf- E volutionary M ulti- A gent code G eneration framew ork. Our contributions are sum- marized as follo ws: • Adaptive hierarchical pr ompting: W e pro- pose a dynamic strategy that adjusts reasoning depth based on task complexity . • Collaborative self-ev olution: W e introduce discussion–decision module enabling escape from local optima and adaptiv e backbone switching. • Empirical gains: Achiev es state-of-the-art performance on sev en benchmarks. W ith con- trolled backbone comparison, SEMA G im- prov es 3.3% o ver the previous best method on CodeContests; with self-ev olutionary model selection, it further reaches 52.6%. W e ev aluate SEMA G across se ven text-to-code benchmarks, including four foundational datasets (HumanEv al, MBPP , HumanEval-ET , MBPP-ET) and three competition-le vel benchmarks (APPS, Li veCode, CodeContests). Experimental results sho w that SEMA G achie ves new state-of-the-art performance, including 98.8% Pass@1( Chen et al. , 2021 ; Dong et al. , 2024 ) on HumanEval, 87.6% on MBPP , and 65.0% on Liv eCode. Most no- tably , on the most challenging dataset CodeCon- tests, SEMA G achiev es 38.0% Pass@1 accuracy with GPT -4o (3.3% improvement ov er LPW un- der the same backbone). When augmented with self-e volutionary model selection that automati- cally identiﬁes the optimal backbone, SEMA G further reaches 52.6%. These results demonstrate that SEMAG achiev es superior performance and re- source efﬁciency , while offering strong adaptability to e volving programming tasks. 2 Related W ork 2.1 T raditional A pproaches to Program Synthesis Program synthesis has a long-standing research foundation in artiﬁcial intelligence ( W aldinger and Lee , 1969 ; Manna and W aldinger , 1971 ). T ra- ditional methods lev erage search strategies and data ﬂow analysis ( McCarthy , 1978 ). Early ef- forts aimed to advance automatic programming and to identify viable approaches ( B ALZER , 1985 ; Solo way , 1986 ) or explore large program spaces through domain-speciﬁc languages ( Mernik et al. , 2005 ; Gu et al. , 2021 ). These approaches struggle with generalization and scalability due to search space complexity . 2.2 Large Language Models f or Code Synthesis Pretrained language models have enhanced code synthesis, with specialized models such as Qwen2.5-Coder ( Hui et al. , 2024 ), CodeLLaMA-2 ( Roziere et al. , 2023 ), Mistral ( Jiang et al. , 2024a ), and DeepSeek-v3 ( Liu et al. , 2024 ) excelling in pro- gramming tasks. General-purpose models, includ- ing GPT ( Achiam et al. , 2023 ; Hurst et al. , 2024 ), Gemini ( T eam et al. , 2023 , 2024 ), and Claude ( An- thropic , 2024 ), also demonstrate rob ust code gen- eration capabilities. Howe ver , these models still face challenges related to syntactic correctness, se- mantic alignment, generation rob ustness, and ver - sion conﬂicts. As a result, more reﬁned control and e v aluation mechanisms for code generation are necessary . 2.3 Prompting and Deb ugging T echniques Researchers hav e proposed various prompting and debugging techniques to improv e code genera- tion. Prompting strategies generally fall into three categories: retriev al-based ( Islam et al. , 2024 ), planning-based ( Y ao et al. , 2023b ), and debugging- based ( Chen et al. , 2023 ) approaches. These aim to guide LLMs in decomposing complex tasks into manageable parts through step-by-step reason- ing. T echniques such as Chain-of-Thought ( W ei et al. , 2022 ), T ree-of-Thoughts ( Y ao et al. , 2023a ), and cumulati ve reasoning mimic human problem- solving paths, signiﬁcantly enhancing model per - formance on complex tasks ( Zhou et al. , 2022 ; Zhang et al. , 2023 ). More advanced methods simu- late the softw are de velopment process by construct- ing multiple candidate programs and exploring the solution space in parallel ( Li et al. , 2025 ; Antoni- ades et al. , 2025 ). Debugging systems such as Self-Debugging ( Chen et al. , 2023 ) and LDB ( Zhong et al. , 2024 ) iterati vely reﬁne code using model explanations, ex ecution, and human feedback. Ho wev er , their ef fectiv eness decreases when the initial code di- ver ges from the intended function. T o improve generation quality with limited supervision, some methods break down the coding task by incorporat- ing visible test cases, step-by-step veriﬁcation ( Hu et al. , 2025 ; Li and Y uan , 2024 ; Mathe ws and Na- gappan , 2024 ), and natural language instructions to improv e controllability and alignment. Pre vious methods either ﬁx reasoning depth—wasting compute on simple tasks and underperforming on hard ones—or rely on a single LLM, limiting recovery from failures. SEMA G tackles both with three mechanisms: a hierarchical controller that scales from one-shot to multi-step planning based on feedback; a discussion–decision phase where agents critique and merge solutions to av oid local optima; and an automatic model selector that switches to a more capable backbone, boosting Pass@1 accuracy as dif ﬁculty rises. 3 Method W e present a hierarchical multi-agent frame work for code synthesis that adapts to task complex- ity through progressi ve reﬁnement levels, cou- pled with a self-ev olution mechanism for dynamic model selection. The overvie w of SEMA G is sho wn in Figure 2 . 3.1 Problem F ormulation W e deﬁne a code generation task as T = ( P , S, C ) where P ∈ P is problem description, S = { ( x i , y i ) } n i =1 are input-output examples, and C is the program space. The core agent operations are: CODER : P × S × Π × Θ → C , PLANNER : P × S → Π , VERIFIER : Π × P × S → 0 , 1 × Π × L , DEBUGGER : C × Σ → C (1) where Π is the plan space, Θ parameters, L logs, and Σ suggestions. Additional agents in- clude EMBEDTRA CE ( C → T ), EXPLAINER ( C × P → E ), and SUGGEST OR ( T × L × E → Σ ). 3.2 Hierarchical Code Synthesis Framew ork Our framew ork employs a four-le vel hierarchical architecture that progressi vely increases computa- tional ef fort based on task complexity . Lev el 1 (Direct Generation): The system ini- tially attempts direct code synthesis using minimal prompting: Y = CODER ( P , S, ∅ , ∅ ) , (2) Algorithm 1 Hierarchical workﬂo w of SEMA G Input : Problem P , examples S Output : Program Y 1: Y ← CODER ( P, S ) ▷ Level 1 2: if T E S T ( Y , S ) then return Y 3: end if 4: π ← PLANNER ( P , S ) ▷ Level 2 5: for i = 1 to M plan do 6: ( ν, π , ℓ ) ← VERIFIER ( π , P, S ) 7: if ν = 1 then break 8: end if 9: end for 10: Y ← CODER ( P, S, π ) 11: if T E S T ( Y , S ) then retur n Y 12: end if 13: for t = 1 to M try do ▷ Level 3 14: τ prev ← ∅ 15: for d = 1 to M debug do 16: τ ← EMBEDTRACE ( Y ) 17: σ ← SUGGESTOR ( τ , ℓ, EXPLAINER ( Y , P )) 18: Y ← DEBUGGER ( Y , σ ) 19: if T E S T ( Y , S ) then retur n Y 20: end if 21: if ρ ( τ , τ prev ) > δ ( d, T ) then break 22: end if 23: τ prev ← τ 24: end for 25: H ← { DEBA TER j ( P , τ , Y ) } N debater j =1 ▷ Level 4 26: Y ← CODER ( P , S, DECIDER ( H )) 27: if T E S T ( Y , S ) then retur n Y 28: end if 29: end for 30: return Y where ∅ indicates no plan or parameters. Lev el 2 (Planning and V eriﬁcation): Upon Lev el 1 failure, the system generates and iteratively re- ﬁnes a structured solution plan. The planning pro- cess operates as: π 0 = PLANNER ( P , S ) , (3) follo wed by iterativ e veriﬁcation: ( ν i , π i , ℓ i ) = VERIFIER ( π i − 1 , P , S ) , i ∈ [1 , M plan ] . (4) where ν i ∈ { 0 , 1 } indicates veriﬁcation status, π i is the reﬁned plan, and ℓ i contains veriﬁcation logs. The process terminates when ν i = 1 or i = M plan , with the ﬁnal plan π ∗ guiding code generation: Y = CODER ( P , S, π ∗ , ∅ ) . (5) Lev el 3 (T race-Guided Debugging): When Lev el 2 fails, the system enters an iterati ve debugging phase with K pass passes and M try attempts per pass. For each attempt, the deb ugging process consists of: τ = EMBEDTRACE ( Y ) , ϵ = EXPLAINER ( Y , P ) , σ = SUGGESTOR ( τ , ℓ ∗ , ϵ ) , Y ′ = DEBUGGER ( Y , σ ) . (6) This process repeats for M debug iterations, where τ captures runtime variable states, ϵ provides se- mantic analysis, and σ synthesizes targeted modiﬁ- cations. Lev el 4 (Multi-Agent Collaborative Reﬁnement): When iterati ve debugging stalls, the system em- ploys collaborativ e multi-agent discussion. Each of N debater agents generates proposals incorporating discussion history: d j = DEB A TER j ( P , τ , Y , H j − 1 ) , j ∈ [1 , N debater ] . (7) where H j − 1 = { d 1 , ..., d j − 1 } represents accumu- lated discussion history . The decision aggregation employs weighted consensus: ( α ∗ , θ ∗ ) = arg max ( α,θ ) N debater X j =1 w j · ϕ ( d j , α, θ ) , w j = exp( η j /τ w ) P k exp( η k /τ w ) . (8) where η j represents historical performance and ϕ e valuates proposal alignment. 3.3 Adaptive Le vel T ransition Mechanism Rather than using ﬁx ed iteration thresholds, we employ an adapti ve transition mechanism based on ex ecution trace similarity . The transition decision is formulated as: T ransition ( t ) = ( T rue , if ρ ( τ t , τ t − 1 ) > δ ( t, T ) False , otherwise (9) where ρ measures trace similarity using normal- ized edit distance: ρ ( τ t , τ t − 1 ) = 1 − EditDist ( τ t , τ t − 1 ) max( | τ t | , | τ t − 1 | ) (10) The adapti ve threshold δ ( t, T ) adjusts based on task complexity and iteration count: δ ( t, T ) = δ 0 · exp  − λ · t T max · complexity ( T )  (11) Model Method HumanEval MBPP HumanEval-ET MBPP-ET GPT -3.5 Direct 72.0% ± 1.2% 55.2% ± 0.8% 62.8% ± 0.6% 45.6% ± 0.6% Self-Planning 77.4% ± 1.8% 69.2% ± 0.4% 69.5% ± 0.6% 52.4% ± 1.0% MapCoder 77.4% ± 0.6% 72.0% ± 0.6% 66.5% ± 1.2% 56.6% ± 0.8% LDB 81.1% ± 0.6% 72.4% ± 0.2% 72.6% ± 1.8% 55.6% ± 0.4% LPW 89.0% ± 0.8% 76.0% ± 0.2% 77.4% ± 0.8% 57.6% ± 0.2% SEMA G (Ours) 91.5% ± 1.8% 76.2% ± 0.8% 79.9% ± 0.6% 64.4% ± 0.4% (+27.1%) (+38.0%) (+27.2%) (+41.2%) T able 1: Pass@1 accuracy comparison of dif ferent methods using GPT -3.5 on code generation benchmarks. The values enclosed in parentheses represent the improvement over the Direct Prompting approach. The standard deviation ( ± ) is calculated based on the results of three independent runs and applies to the data analysis of subsequent experiments. where δ 0 = 0 . 85 is the initial threshold, λ = 0 . 5 is the decay rate, t ∈ [1 , T max ] is the current iter- ation count within the acti ve level, and T max rep- resents the maximum iterations before mandatory le vel transition. 3.4 Self-Evolution Mechanism T o enable dynamic adaptation to e volving LLMs, we propose an automated model se- lection framew ork employing N selectors parallel agents. Each selector i performs four opera- tions: First, it generates task-speciﬁc keyw ords κ i = KEYWORDGEN ( T , context ) and retriev es recent information L i = SEARCH ( κ i ) by search- ing tools. Then, rele vant links are ﬁltered and sum- marized: L ′ i = { l ∈ L i : rele vance ( l, T ) > θ r } , (12) C i = [ ℓ ∈ L ′ i SUMMARIZE ( ℓ ) . (13) Third, each selector proposes models m i with conﬁdence score: ( m i , r i , s i ) = SELECTOR ( C i , Perf ( m i , T sample )) , (14) where s i reﬂects sampled performance on task subset T sample . Finally , consensus is achieved through weighted voting: m ∗ = arg max m ∈ M N selectors X i =1 s i · I [ m i = m ] . (15) This mechanism ensures optimal model selection without manual intervention while maintaining adaptability to emerging LLMs. 4 Experiments 4.1 Experimental Setup Evaluation Datasets. W e e valuate SEMA G on se ven text-to-code benchmarks across two cate- gories. The foundational datasets include Hu- manEv al ( Chen et al. , 2021 ) and HumanEval-ET (164 problems each), and MBPP ( Austin et al. , 2021 ) and MBPP-ET (500 problems each). The ET v ariants ( Dong et al. , 2025 ) extend their coun- terparts with additional edge test cases. For MBPP/MBPP-ET , which lack sample input-output pairs, we follo w pre vious work ( Zhong et al. , 2024 ; Lei et al. , 2024 ) by randomly selecting one test case from the hidden test set as a sample (excluded from ev aluation). The competition-lev el datasets consist of APPS ( Hendrycks et al. , 2021 ) (139 prob- lems), Liv eCode ( Jain et al. , 2025 ) (140 problems), and CodeContests ( Li et al. , 2022 ) (150 problems). Li veCode, released after the LLM training cutof f, ensures uncontaminated e valuation. Baseline Methods. W e compare SEMA G against se veral baseline approaches: Dir ect inputs tasks directly into an LLM; Self-Planning ( Jiang et al. , 2024b ) decomposes tasks into subgoals; MapCoder ( Islam et al. , 2024 ) employs four agents for re- trie val, planning, e xecution, and deb ugging; LDB ( Zhong et al. , 2024 ) utilizes control ﬂow diagrams for programme decomposition and error localiza- tion; and LPW ( Lei et al. , 2024 ), the state-of-the-art approach, veriﬁes plans step-by-step and uses print statements for debugging. 4.2 Main Results Comparison with Baselines. T ables 1 and 2 present results using GPT -3.5 and GPT -4o as back- bone models. W ith GPT -3.5, SEMAG achie ves the highest Pass@1 accuracy across all benchmarks, outperforming the strongest baseline LPW by 2.5%, Model Method HumanEval MBPP HumanEval-ET MBPP-ET GPT -4o Direct 91.5% ± 1.8% 62.8% ± 0.4% 79.3% ± 1.2% 51.0% ± 0.2% LDB 92.1% ± 1.2% 82.4% ± 0.8% 81.7% ± 1.8% 65.4% ± 1.0% LPW 98.2% ± 0.6% 84.8% ± 0.6% 84.8% ± 1.2% 65.8% ± 0.8% SEMA G (Ours) 98.8% ± 0.6% 87.6% ± 0.4% 86.6% ± 0.6% 71.8% ± 0.2% (+8.0%) (+38.9%) (+9.2%) (+40.8%) Model Method APPS Liv eCode CodeContests Overall A vg. GPT -4o Direct 47.5% ± 0.3% 46.4% ± 0.8% 24.6% ± 1.3% 57.6% LDB 53.2% ± 0.7% 54.3% ± 0.7% 29.3% ± 0.7% 65.5% LPW 62.6% ± 0.3% 59.3% ± 1.4% 34.7% ± 0.7% 70.0% SEMA G (Ours) 67.6% ± 0.8% 65.0% ± 0.7% 38.0% ± 1.3% 73.6% (+42.3%) (+40.1%) (+54.5%) (+27.7%) T able 2: Pass@1 accurac y comparison of dif ferent methods using GPT -4o (2024-05-13) across multiple benchmarks. The values enclosed in parentheses represent the impro vement o ver the Direct Prompting approach. Lev el Benchmark HumanEval MBPP HumanEval-ET MBPP-ET APPS Li veCode CodeContests Lev el 1 148 314 130 255 66 65 37 Lev el 2 8 18 6 10 9 16 6 Lev el 3 4 48 2 46 7 4 5 Lev el 4 4 120 26 189 57 55 102 T able 3: Distrib ution of prompt difﬁculty le vels across multiple benchmarks using GPT -4o (2024-05-13). GPT‑4o GPT‑4.1 DeepSeek‑v3 Claude‑3.7‑Sonnet 0 10 20 30 40 50 60 38.0 48.7 48.7 52.6 Pass@1 Accuracy (%) Model Figure 3: Pass@1 accuracy on CodeContests using GPT -4o(2024-05-13), GPT -4.1(2025-04-14), DeepSeek- v3(2025-03-24) and Claude-3.7-Sonnet(2025-02-19). 0.2%, 2.5%, and 6.8% on HumanEv al, MBPP , HumanEv al-ET , and MBPP-ET respectiv ely . Using GPT -4o, SEMA G establishes new state- of-the-art results across all sev en benchmarks, achie ving 98.8% accuracy on HumanEval (solving 162/164 problems). Compared to LPW , SEMAG demonstrates consistent improv ements of 1.8-6.0% on foundational benchmarks and 3.3-5.7% on competition-le vel benchmarks, with particularly signiﬁcant gains of 40-54% ov er Direct prompting. Self-Evolution Agents in Code T ask. T o ev aluate self-e volution capability , we deploy agents on the CodeContests benchmark to select optimal LLMs autonomously . Agents analyze real-time informa- tion to identify three candidate models: Claude- 3.7-Sonnet, GPT -4.1, and DeepSeek-v3. Figure 3 shows that Claude-3.7-Sonnet achie ves 52.6% Pass@1 accuracy , establishing a new state-of- the-art and signiﬁcantly outperforming GPT -4o’ s 38.0%. GPT -4.1 and DeepSeek-v3 both achiev e 48.7%, demonstrating that the self-ev olution mech- anism effecti vely identiﬁes and ev aluates task- optimized models for continuous improv ement. 4.3 Ablations Studies and Analyses T oken Efﬁciency Analysis. T able 3 presents the distribution of prompt difﬁculty le vels (1–4, indi- cating increasing complexity) across benchmarks using GPT -4o. Simpler datasets (HumanEv al, MBPP) predominantly use Le vel 1 prompts (90.2% and 62.8%, respecti vely), while comple x datasets (APPS, CodeContests) require more Lev el 3–4 prompts (46.0% and 71.3%, respecti vely). Figure 4 compares token consumption between LPW and SEMA G. Our hierarchical prompt strategy reduces token usage while improving accuracy across all datasets. On simpler tasks (HumanEv al, MBPP), SEMA G achiev es 19.3% and 15.5% token reduc- tion compared to LPW , respectiv ely . For com- plex tasks (APPS, CodeContests), where Le vel 4 Plan Reﬁne Discussion and Pass@1 V eriﬁcation Suggestion Decision accuracy × × × 71.9% (-21.4%) ✓ × × 77.4% (-15.4%) × ✓ × 80.5% (-12.0%) × × ✓ 81.7% (-10.7%) × ✓ ✓ 83.5% (-8.7%) ✓ × ✓ 83.5% (-8.7%) ✓ ✓ × 82.9% (-9.4%) ✓ ✓ ✓ 91.5% T able 4: Pass@1 accurac y of different component com- binations in SEMA G, sho wing relativ e decreases from the full implementation (91.5% baseline). Results ob- tained using GPT -3.5 on the HumanEval benchmark. prompts dominate, token reduction is 9.3% and 5.1%, respectiv ely , constrained by inherent task complexity . This demonstrates SEMAG’ s hierar - chical decomposition ef fectiv ely optimizes both performance and ef ﬁciency . 0 25 50 75 100 125 150 30 40 50 60 70 80 90 100 LPW: 98.2% Benchmark (Metho d) ■ HumanEval (LPW) ● HumanEval (SEMAG) ■ MBPP (LPW) ● MB PP (SEMAG) ■ APPS (LPW) ● APPS (SEMAG ) ■ LiveCode (LPW) ● LiveCo de (SEMAG) ■ CodeContests (LPW ) ● CodeCon tests (SEMAG) Pass@1 Accuracy (% ) Tokens (K) SEMAG(Ours): 98.8% LPW: 84.8% SEMAG(Ours): 87.6% SEMAG(Ours): 65.0% SEMAG(Ours): 67.6% LPW: 59.3% LPW: 62.6% SEMAG(Ours): 38.0% LPW: 34.7% Figure 4: Comparison of Pass@1 accurac y and average token count per question for LPW and SEMA G across benchmarks, using GPT -4o as the LLM backbone. Here, K = 10 3 . Impact of Different Agents. W e conduct an ab- lation study on HumanEval using GPT -3.5 to e val- uate each agent’ s contribution. As sho wn in T a- ble 4 , excluding any component reduces Pass@1 accuracy . Individual agents achie ve limited im- prov ements: Plan V eriﬁcation alone reaches 77.4% (+5.5% from baseline 71.9%), Reﬁne Suggestion 80.5%, and Discussion and Decision 81.7%. Dual- agent conﬁgurations perform better (82.9%-83.5%) but remain 8.7%-9.4% below the full implemen- tation. The complete SEMA G achiev es 91.5% Pass@1, demonstrating the synergistic importance of all three components. Impact of T ool Using. In the planning stage, the planning agent can choose to utilise external tools, such as search engines, to enhance decision- making. W e conduct an e xperiment on the Hu- manEv al benchmark with GPT -3.5. T able 5 shows that when the planning agent uses tools, SEMA G achie ves a Pass@1 accuracy of 91.5%. W ithout tools, the accuracy decreases to 87.8%. This 3.7% decline emphasizes the importance of external tools in planning. The results demonstrate that these tools help the planning agent access more relev ant information, improving the quality of plans and SEMA G’ s overall performance. W ith T ool Using Without T ool Using 91.5% 87.8% T able 5: Pass@1 accurac y of SEMA G with and without tool usage in the planning stage. Results are obtained using GPT -3.5 on the HumanEval benchmark. Analysis of Self-Evolution Agents. T o cali- brate the cra wler depth of self-evolution agents, we vary the number of returned pages, N links ∈ { 10 , 15 , 20 , 25 , 30 } , while ﬁxing all other vari- ables (ﬁ ve random seeds, identical search prompts, temperature = 0 . 1 ). After summarizing the ﬁrst N URLs (published ≤ 30 days ago), the agents ranked the evidence and proposed 3 candidate LLMs for the giv en code task. T able 6 reports (i) the probability that Claude-3.7-Sonnet appears in the T op-3 list, (ii) av erage token consumption during summarization & reasoning, and (iii) end- to-end selection latency , all averaged ov er the ﬁv e seeds. N links Pr(%) ↑ T okens ( K ) ↓ Latency (min) ↓ 10 40.0 30.4 3.5 15 60.0 39.1 4.6 20 80.0 45.9 6.0 25 80.0 65.2 7.8 30 80.0 78.3 9.2 T able 6: Impact of crawl depth on the probability (%) of discovering Claude-3.7-Sonnet in T op-3 and asso- ciated resource costs (averaged ov er ﬁv e runs, 30-day window). The results show that shallo wer crawls with 10– 15 pages often miss key benchmark posts, yielding a lower than 70% probability of identifying Claude- 3.7-Sonnet and defaulting to weaker models, albeit at lo wer cost. Scaling to N links = 20 achie ves per- fect discov ery (probability 80%) with modest over - head (45k tokens, 6 minutes). Further increases add little v alue but inﬂate costs by 30–55%. This highlights uncertainties in search- dependent model selection: online information may be incomplete or biased due to search algorithms, recency ef fects, or une ven cov erage. In our experiments, insuf ﬁcient depth ( N links ≤ 15 ) omitted Claude-3.7-Sonnet in up to 60% of runs, risking suboptimal choices. Thus, N links = 20 balances reliability and efﬁcienc y , ensuring top performers are captured while minimizing resources. Parameters Details. W e experiment on ho w dif fer- ent temperatures of LLM inﬂuence the accuracy of SEMA G. Figure 5 sho ws the v ariation in Pass@1 accuracy on the HumanEv al benchmark using GPT - 3.5. The highest mean Pass@1 accuracy (91.1%) is achie ved at T = 0 . 1 and T = 0 . 8 , with T = 0 . 1 exhibiting the lo west v ariance. T o improve the reproducibility and consistenc y of our experimen- tal results, we maintain a constant temperature of T = 0 . 1 throughout all stages of SEMA G. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 84 86 88 90 92 89.4 90.6 91.1 89.9 90.3 90.0 89.8 90.9 90.6 91.1 Pass@1 Accuracy (%) Variance (×10 -4 ) Temperature Variance Pass@1 Accuracy (% ) Figure 5: P ass@1 accuracy (right y-axis) and its vari- ance (left y-axis, scaled by × 10 − 4 ) on the HumanEv al benchmark using GPT -3.5 as the backbone, measured ov er three independent runs for each temperature setting (0.1 to 1.0). T o further quantify the inﬂuence of the num- ber of candidate generations ( M try ) and debug- ging iterations ( M debug ), we conduct a grid search ov er ( M try , M debug ) ∈ { 0 , 1 , . . . , 6 } 2 . Figure 6 sho ws the variation in Pass@1 accuracy on the HumanEv al benchmark using GPT -3.5. Increas- ing either M try or M debug consistently improves performance. Starting from (0 , 0) , where only 71.3% accuracy is achie ved, the P ass@1 accuracy increases steadily with higher values of both pa- rameters. The performance begins to plateau near ( M try = 5 , M debug = 4) , where SEMA G reaches 91.5%, representing a near-optimal balance be- 71.3 68.9 70.1 70.3 69.5 71.9 71 .9 78.7 80.5 82.9 82.3 85.9 82.9 84 .8 82.3 83.5 86.6 85.9 87.2 85.4 90 .9 85.4 85.9 86.6 87.8 88.4 85.4 89 84.8 87.8 89 87.2 90.9 89 90.2 85.4 85.4 86.6 88.4 91.5 90.9 92 .1 86 88.4 88.4 89 90.9 91.5 88.4 Pass@1 Accurac y (%) 0 M try M debug 70 75 80 85 90 1 2 3 4 5 6 7 0 2 3 4 5 6 7 Figure 6: Pass@1 accuracy on the HumanEv al bench- mark with GPT -3.5 as the backbone, e valuated under dif- ferent combinations of M try and M debug values. Each cell represents the mean Pass@1 accuracy for a speciﬁc parameter pair . tween solution diversity and iterati ve reﬁnement. Although the highest accurac y observed (92.1%) occurs at (5 , 6) , the gain ov er (5 , 4) is minimal and comes with increased inference costs. As a result, we set M try = 5 and M debug = 4 for all subsequent experiments, as these v alues have been empirically shown to optimize SEMA G’ s perfor- mance. 5 Conclusion W e introduce SEMA G, a Self-Evolutionary Multi- Agent framework designed for code generation. By employing a division of labour with hierar- chical prompting mechanisms, the coding agents of SEMA G signiﬁcantly enhance the performance of LLMs across di verse programming tasks. The self-e volutionary agents of SEMA G feature self- e volving capabilities, enabling them to access the latest models in real-time and automatically up- grade the backbone model. The coding agents of SEMA G achie ve state-of-the-art Pass@1 accu- racy across sev en benchmarks, including 98.8% on HumanEval, 87.6% on MBPP , and 38.0% on CodeContests, while substantially reducing com- putational resource ov erhead and token consump- tion. W ith controlled backbone, SEMA G improves 3.3% over LPW on CodeContests. W ith self- e volutionary model selection, it further reaches 52.6%, demonstrating the beneﬁt of adaptiv e back- bone switching. Future w ork will explore ﬁner- grained decomposition, cross-modal collaboration, and ef ﬁcient model selection strategies. 6 Limitations Among the limitations of our work, ﬁrstly , SEMA G in volv es inference-time hyperparameters ( M try and M debug ) that af fect the trade-off between accurac y and cost; howe ver , our experiments in Section 4.3 identify a stable conﬁguration that general- izes across benchmarks, and adaptiv e tuning strate- gies are left for future work. Secondly , the hier- archical multi-agent design in vests more compu- tation on challenging problems through iterativ e reﬁnement, which may increase latenc y in time- sensiti ve scenarios; our adapti ve lev el transition mechanism partially addresses this by reducing token consumption by 15–20% on simpler tasks compared to ﬁxed-depth baselines. Thirdly , the self-e volutionary model selection component re- lies on real-time information retriev al to identify optimal backbones; we note that this module is optional—the core frame work operates indepen- dently with any ﬁxed model as sho wn in T able 1 and T able 2 . Ofﬂine model recommendation could be explored in future work. Finally , as with any system e xecuting machine-generated code, running outputs inside a sandbox en vironment is advisable to mitigate potential security risks. References Josh Achiam, Stev en Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, and 1 others. 2023. Gpt-4 techni- cal report . Pr eprint , Anthropic. 2024. Introducing the next generation of claude. https://www.anthropic.com/news/ claude- 3- family . Accessed: 2024-03-04. Antonis Antoniades, Albert Örwall, K exun Zhang, Y uxi Xie, Anirudh Goyal, and W illiam W ang. 2025. Swe- search: Enhancing software agents with monte carlo tree search and iterativ e reﬁnement . In Interna- tional Confer ence on Repr esentation Learning , vol- ume 2025, pages 64485–64515. Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael T erry , Quoc Le, and 1 others. 2021. Program synthesis with large language models . Pr eprint , R B ALZER. 1985. A 15 year perspective on automatic programming. IEEE transactions on softwar e engi- neering , 11(11):1257–1268. Devichand Budagam, Ashutosh Kumar , Mahsa Khosh- noodi, Sankalp KJ, V inija Jain, and Aman Chadha. 2025. Hierarchical prompting taxonomy: A univ er- sal e valuation frame work for large language models aligned with human cognitiv e principles . In Fir st In- ternational KDD W orkshop on Prompt Optimization, 2025 . Mark Chen, Jerry T worek, Hee woo Jun, Qiming Y uan, Henrique Ponde De Oli veira Pinto, Jared Ka- plan, Harri Edwards, Y uri Burda, Nicholas Joseph, Greg Brockman, and 1 others. 2021. Evaluating large language models trained on code . Preprint , Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2023. T eaching large language models to self-debug . Preprint , arXi v:2304.05128. Y ihong Dong, Jiazheng Ding, Xue Jiang, Ge Li, Zhuo Li, and Zhi Jin. 2025. Codescore: Evaluating code generation by learning code ex ecution. ACM T rans- actions on Softwar e Engineering and Methodolo gy , 34(3):1–22. Y ihong Dong, Xue Jiang, Zhi Jin, and Ge Li. 2024. Self-collaboration code generation via chatgpt. A CM T ransactions on Softwar e Engineering and Method- ology , 33(7):1–38. Y u Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. Domain-speciﬁc lan- guage model pretraining for biomedical natural lan- guage processing. ACM T ransactions on Computing for Healthcar e (HEALTH) , 3(1):1–23. Dan Hendrycks, Stev en Basart, Saurav Kadav ath, Man- tas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, and Jacob Steinhardt. 2021. Measuring coding challenge com- petence with apps. In Pr oceedings of the Neur al Information Pr ocessing Systems T rack on Datasets and Benchmarks , v olume 1. Y aojie Hu, Qiang Zhou, Qihong Chen, Xiaopeng Li, Linbo Liu, Dejiao Zhang, Amit Kachroo, T alha Oz, and Omer T ripp. 2025. Qualityﬂo w: An agentic workﬂo w for program synthesis controlled by llm quality checks . Pr eprint , Binyuan Hui, Jian Y ang, Zeyu Cui, Jiaxi Y ang, Dayiheng Liu, Lei Zhang, Tian yu Liu, Jiajun Zhang, Bowen Y u, Keming Lu, and 1 others. 2024. Qwen2. 5-coder technical report . Pr eprint , Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Os- trow , Akila W elihinda, Alan Hayes, Alec Radford, and 1 others. 2024. Gpt-4o system card . Preprint , Md Ashraful Islam, Mohammed Eunus Ali, and Md Rizwan Parv ez. 2024. Mapcoder: Multi-agent code generation for competitiv e problem solving . In Pr oceedings of the 62nd Annual Meeting of the As- sociation for Computational Linguistics (V olume 1: Long P apers) , pages 4912–4944. Naman Jain, King Han, Alex Gu, W en-Ding Li, Fanjia Y an, Tianjun Zhang, Sida W ang, Armando Solar - Lezama, K oushik Sen, and Ion Stoica. 2025. Liv e- codebench: Holistic and contamination free ev alua- tion of large language models for code. In Interna- tional Confer ence on Representation Learning , pages 58791–58831. Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Sav ary , Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, and 1 others. 2024a. Mixtral of experts . Pr eprint , Xue Jiang, Y ihong Dong, Lecheng W ang, Zheng Fang, Qiwei Shang, Ge Li, Zhi Jin, and W enpin Jiao. 2024b. Self-planning code generation with large language models. ACM T ransactions on Softwar e Engineering and Methodology , 33(7):1–30. Chao Lei, Y anchuan Chang, Nir Lipovetzk y , and Krista A Ehinger . 2024. Planning-dri ven program- ming: A large language model programming work- ﬂow . Preprint , arXi v:2411.14503. Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin T an, Kurt K eutzer , Jiarong Xing, Joseph E Gonzalez, and Ion Stoica. 2025. S*: T est time scal- ing for code generation . Preprint , arXi v:2502.14382. Kef an Li and Y uan Y uan. 2024. Large language models as test case generators: Performance ev aluation and enhancement . Pr eprint , Y ujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser , Rémi Leblond, T om Eccles, James K eeling, Felix Gimeno, Agustin Dal Lago, and 1 others. 2022. Competition-lev el code generation with alphacode. Science , 378(6624):1092–1097. Aixin Liu, Bei Feng, Bing Xue, Bingxuan W ang, Bochao W u, Chengda Lu, Chengg ang Zhao, Chengqi Deng, Chen yu Zhang, Chong Ruan, and 1 others. 2024. Deepseek-v3 technical report . Pr eprint , Zohar Manna and Richard J W aldinger . 1971. T ow ard automatic program synthesis. Communications of the A CM , 14(3):151–165. Noble Saji Mathews and Meiyappan Nagappan. 2024. T est-driv en dev elopment for code generation . Pr eprint , John McCarthy . 1978. History of lisp. SIGPLAN Not. , 13(8):217–223. Marjan Mernik, Jan Heering, and Anthony M Sloane. 2005. When and ho w to de velop domain-speciﬁc lan- guages. ACM computing surve ys (CSUR) , 37(4):316– 344. Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen T an, Y ossi Adi, Jingyu Liu, Romain Sauvestre, T al Remez, and 1 others. 2023. Code llama: Open foundation models for code . Pr eprint , Elliot Solo way . 1986. Learning to program = learning to construct mechanisms and explanations. Commu- nications of the A CM , 29(9):850–858. Gemini T eam, Rohan Anil, Sebastian Borgeaud, Jean- Baptiste Alayrac, Jiahui Y u, Radu Soricut, Johan Schalkwyk, Andre w M Dai, Anja Hauth, Katie Millican, and 1 others. 2023. Gemini: a family of highly capable multimodal models . Pr eprint , Gemini T eam, Petko Geor giev , V ing Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett T anzer, Damien V incent, Zhufeng Pan, Shibo W ang, and 1 others. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context . Pr eprint , Richard J W aldinger and Richard CT Lee. 1969. Pro w: A step toward automatic program writing. In Pr o- ceedings of the 1st international joint confer ence on Artiﬁcial intelligence , pages 241–252. Jason W ei, Xuezhi W ang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, and 1 others. 2022. Chain-of-thought prompting elic- its reasoning in large language models. Advances in neural information pr ocessing systems , 35:24824– 24837. Shunyu Y ao, Dian Y u, Jeffre y Zhao, Izhak Shafran, T om Grif ﬁths, Y uan Cao, and Karthik Narasimhan. 2023a. Tree of thoughts: Deliberate problem solving with lar ge language models. Advances in neural information pr ocessing systems , 36:11809–11822. Shunyu Y ao, Jeffrey Zhao, Dian Y u, Nan Du, Izhak Shafran, Karthik Narasimhan, and Y uan Cao. 2023b. React: Synergizing reasoning and acting in language models. In International Confer ence on Representa- tion Learning . Y ifan Zhang, Jingqin Y ang, Y ang Y uan, and Andre w Chi-Chih Y ao. 2023. Cumulati ve reasoning with large language models . Preprint , arXi v:2308.04371. Li Zhong, Zilong W ang, and Jingbo Shang. 2024. De- bug lik e a human: A large language model debugger via verifying runtime e xecution step by step . In F ind- ings of the Association for Computational Linguistics: A CL 2024 , pages 851–870. Denny Zhou, Nathanael Schärli, Le Hou, Jason W ei, Nathan Scales, Xuezhi W ang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, and 1 oth- ers. 2022. Least-to-most prompting enables com- plex reasoning in large language models . Preprint , A Analysis of Solving Different Le vels A.1 APPS APPS is a well-established dataset for ev aluating algorithmic problem-solving capabilities, cate goris- ing programming problems into three distinct dif- ﬁculty le vels: Introductory , Interview , and Com- petition. These lev els range from basic coding ex ercises to advanced competitiv e programming challenges, pro viding a structured frame work to assess the performance of LLM-based methods across v arying complexities. 78.7 50.0 13.0 78.7 52.2 28.3 87.2 65.2 34.8 89.4 80.4 32.6 Introductory Interview Competition 0 APPS Pass@1 ( %) Direct LDB LPW SEMAG (Ours) 20 40 60 80 100 Figure 7: Pass@1 accurac y on the APPS benchmark across dif ferent dif ﬁculty le vels, Intr oductory , Interview , and Competition , of Direct, LDB, LPW and SEMAG, when using GPT -4o as the LLM backbone. Figure 7 compares accuracy on the APPS bench- mark across dif ferent lev els of problems. SEMA G demonstrates superior performance in Intr oductory and Interview le vels, achie ving 89.4% and 80.4% respecti vely , which represents a signiﬁcant margin ov er existing approaches. Speciﬁcally , SEMA G surpasses the ne xt-best LPW approach by 2.2% in the Introductory le vel and establishes a notable 15.2% adv antage in the Interview lev el. Howe ver , in competiti ve en vironments, SEMA G (32.6%) sho ws slightly reduced effecti veness compared to LPW’ s 34.8%, suggesting potential areas for op- timization in Competition le vel. The hierarchical prompting strategy af fects model performance, re- sulting in success in visible tests but failure in hidden tests. The baseline Direct exhibits funda- mental limitations, particularly in competition con- texts (13%), while LDB demonstrates moderate im- prov ements in Interview (52.2%) and Competition (28.3%) lev els compared to Direct. These results collecti vely highlight SEMA G’ s exceptional capa- bility in the initial engagement and interpersonal e valuation phases. A.2 LiveCode Li veCode benchmark focuses on real-time coding scenarios reﬂecti ve of practical softw are dev elop- ment tasks. Its problems are classiﬁed into Easy , Medium, and Hard lev els, capturing varying de- grees of complexity encountered in applied set- tings. 80.0 35.0 25.0 82.2 41.8 35.0 86.7 47.3 40.0 86.7 60.0 47.5 Easy Medium Hard 0 20 40 60 80 100 LiveCode Pass@1 (%) Direct LDB LPW SEMAG (Ours) Figure 8: Pass@1 accurac y on the Li veCode benchmark across dif ferent difﬁculty le vels, Easy , Medium , and Har d , of Direct, LDB, LPW and SEMA G, when using GPT -4o as the LLM backbone. Figure 8 compares accuracy on the Liv eCode benchmark across dif ferent lev els of problems. In the Easy lev el, both SEMA G and LPW achieve the highest accuracy of 86.7%, which is 6.7% higher than the Direct prompting approach (80.0%). This indicates that both methods possess effecti ve rep- resentation capabilities in lo w-complexity scenar- ios. In the Medium lev el, SEMA G demonstrates a signiﬁcant advantage, achie ving an accuracy of 60.0%, which surpasses the second-best method, LPW (47.3%), by 12.7%. In the most challenging Har d level, SEMA G continues to lead with an ac- curacy of 47.5%, outperforming LPW (40.0%) and LDB (35.0%). This v alidates the strong robustness of SEMA G in extremely comple x problems. B Prompt of SEMA G Here, we list the prompts of SEMA G in detail as follo ws. Pl an Agent P r ompt Pe rso na : Y o u r p ri m a r y role is to d e co m p o se c o m p lex p rog ra m m in g task s in to m o d u lar co m p o n en ts a n d p rov id e g u id an ce on how to st ruc tu re th e m f o r reus ab ility an d m ain tain ab il it y . Pr o blem : { problem } Ins truct io ns : • Break d o wn th e co d in g task in to m o d u les an d recom m en d a clear an d co n cise stru ctu re for each m o d u le . • Ad v ise on d ata stru ctu res, alg o rithm s, an d m o d u larizat io n tech n iq u es . • Guid e the Co d er Agen t step - by - ste p to im p le m en t cod e . • Y o u h av e access to th e f o llo win g to o ls : { tool_des } . O utput Fo rm a t : • Action : The actio n y o u tak e, m u st be one of [{ tool_names }] . (L earn to o l u sag e f irst, e . g . , th e p aram et ers of th e to o l) . • Act ion input : The inp u t to the actio n . , • Final pla n : Y o u r co n cise, stru ctu red f in al p lan f o r th e task . Rem em b er : Str ic tl y f o l lo w th e JSON Ou tp u t Fo r m a t an d g iv e o n ly th e p lan to so lv e th e p rob le m . A v o id e x tra ex p lan a tio n or wo rds . T o o ls a r e o p tio n al . If u sin g too ls, sp ec if y ‘A ct ion ’ and ‘A ction inp u t ’, then awa i t o u tpu t b ef o r e p rov idin g the ‘F inal p lan’ . If not u sin g too ls, le ave ‘ Action ’ and ‘Action in p u t’ em p ty . Figure 9: The prompt of Planning Agent. Pl an V er ify Agent P r ompt Person a : Y o u r p ri m a r y role is to v e rif y th e s o l u tio n p lan f o r th e g iv en p ro g ra mm in g task . Y o u m u st p erfo r m a s tep - by - s tep an a ly s is of th e p rov id ed so lu tio n p lan , record in term ed ia te v ariab le v alu es, an d co m p are th e d erived resu lts with th e ex p ected test o u tco m es . Pr o blem : { problem } Plan : { plan } Ins truct io ns : • Rev iew th e p rov id ed so lu tio n p lan f o r th e g iv en Py th o n p rob lem . • Fo r each test case, b eg in by recordin g all n ecessary in term ed ia te v ariables . • As yo u p rocess th e p lan , u p d ate an y in term ed ia te v ariable v alu es . • Af ter ex ecut in g th e v erifi catio n s tep s f o r a test case, co m p are th e d erived resu lt with th e ex p ected test o u tp u t . • If th e d erived resu lt m atch es th e ex p ected o u tp u t, o u tp u t th e p lan as th e ‘Final p lan ’ an d m ark it as co rr ect . • If th e d erived resu lt d o es not m atch , p rov id e a revis ed so lu tio n p lan as th e ‘Final p lan ’ . O utput Form a t : • V erification : A d etailed , step - by - step v erifi catio n p rocess . • Co rr ectness : The g iv en so lu tio n p lan is co rr ect or not ( True/False ) . • Final pla n : The f in al p lan y o u p ro v id e f o r th e task . Rem e m be r : Str ic tl y f o llo w th e JSON Ou tp u t Fo r m a t an d in clu d e o n ly th e v e rif i ca tio n p rocess an d f in a l p lan . Do not in c l u d e an y ex tr a ex p l an atio n s or wo rds . Figure 10: The prompt of Plan verifying Agent. Cod e Agent P r ompt Pe rso na : Y o u r p rim ar y role is to g en erate p y th o n 3 co d e to so lv e th e g iv en co d in g p rob lem b ased on th e g iv en p lan an d th e p rob lem d escript io n . Pr o blem : { problem } Pl a n : { plan } Plan v erification : { plan_verification } Ins truct io ns : • Use th e en try p o in t { entry_point } of th e p rob lem , do not ad d m ain f u n ctio n . O utput Form a t : • Code : The cod e y o u g enerat e . Rem em b er : Start with “ ‘Co d e an d en d with “‘ . W rite all th e co d e in to a sin g le . py f ile . A v o id ex tra ex p lan atio n , wo rds or ””” in y o u r o u tp u t . Figure 11: The prompt of Coding Agent. Ad d T rac e Ag ent P r omp t Person a : Y o u r p rim ar y role is to ad d p rint statem en ts to th e g iv en co d e to so lv e th e p rob lem . Code : { code } Pl a n v erif ication : { plan_verification } Ins truct io ns : • Incorpo ra te d ebu g p rint statem ents to tra ce interm ed iat e v ar iable chan g es as d escr ibed in the p lan v er if ication s . O utput Form a t : • Co de : Th e p rog ram with PRINT statem en ts ad d ed to trace v ariable ch an g es . Rem em b er : Start with “ ‘Co d e an d en d with “‘ . W rite all th e co d e in to a sin g le . py f ile . A v o id ex tra ex p lan atio n , wo rds or ””” in y o u r o u tp u t . Figure 12: The prompt of Adding T race Agent. Cod e Exp lai n Agent Pr ompt Person a : Y o u r p rim ar y role is to ex p lain each lin e of a g iv en Py th o n p rog ram , d escribi n g th e ef f ect of each lin e . Pr o blem : { problem } Co de : { code } Ins truct io ns : • Y o u will rece iv e a n ew p rob lem d escript io n an d a g en erate d Py th o n p rog ram in ten d ed to so lv e th e p rob lem . • Gen erate a d etailed ex p lan atio n f o r each lin e of th e p rov id ed Py th o n p rog ram . O utput Form a t : • Co de ex pla na tio n : A d e tai led ex p la n atio n f o r ea ch l in e of th e Py th o n p r o g ra m . Each lin e’ s ex p lan a tio n sh o u ld d escr ib e its ef f ec t on th e p rog ra m ’ s b ehav iou r . Rem em b er : Str ic tl y f o l lo w th e JSON Ou tp u t Fo r m a t . P ro v id e o n l y th e ex p lan a tio n f o r th e P y th o n p ro g ra m as sp ec if ied , with o u t ex t ra ex p lan at io n or ad d itio n al wo rd s . Figure 13: The prompt of Code Explaining Agent. Refi neme nt Su ggest Agent Pr ompt Person a : Y o u r p ri m a ry rol e is to v er if y er ro r e x ecut io n t rac es, co m p a ring th e p rov id ed ‘Erro r Execu ti o n T rac e ’ with th e ‘Co r rec t Pl an V er if ica tio n ’ f o r a Py th o n p ro b lem an d id en tif y an y d iscrepan cies . Pr o blem : { problem } Co de w ith Err o r : { code } Co rr ect Pla n V erification : { plan_verification } Err o r Ex ecution T ra ce : { test_log } Ins truct io ns : • The ‘E rr o r Execu tio n T race ’ is th e o u tp u t of th e ‘Co d e with Er ror ’ wh en it f ails to m eet th e ex p ected o u tp u t . • Co m p are th e ‘E rr o r Execu tio n T race ’ with th e ‘Co rr ect Plan V erifi catio n ’ an d o u tp u t th e d if f erences an d y o u r an aly sis of th e err o rs . • Prov id e a su m m a r y of th e err o rs, in clu d in g reason s f o r th e d iscrepan cies an d su g g estio n s f o r co rr ectio n s . O ut put Form a t : • Ana ly sis : Y o u r o u tp u t in clu d in g a d etailed d iscrepan cy an aly sis . • Refine sug g estio n : Y o u r an aly sis of th e err o rs an d su g g estio n s on how to co rr ect th em . Rem em b er : Str ic tl y f o l lo w th e JSON Ou tp u t Fo r m a t . P ro v id e o n l y th e ex p lan a tio n f o r th e P y th o n p ro g ra m as sp ec if ied , with o u t ex t ra ex p lan at io n or add ition al wo rds . Figure 14: The prompt of Suggesting Agent. Debu g Agent P r ompt Person a : Y o u r p rim ar y role is to co rr ect an err o n eo u s Pyth o n p rog ram b ased on th e p rov id ed err o r an aly sis an d related ex p lan atio n s . Pr o blem : { problem } Co de w ith Err o r : { code } Co de Ex pla na tion : { code_explanation } Refine Sug g estio n : { refine_suggestion } Ins truct io ns : • Y o u will be p res en ted wi th a n ew p rob le m d es cr ip t io n , t h e co d e wi th e r ror , it s ex p l an at io n , an d an al y sis of th e e r rors an d su g g estio n s on how to co rr ect them . • Gen erate th e refin ed Py th o n p rog ram b ased on th e ‘Ref in e Su g g estio n ’ . • Ou tp u t yo u r refin ed co d e with o n ly th e Pyth o n co d e, an d p rov id e a refin em en t ex p lan atio n th at d etails th e m o d if icatio n s m ad e . O utput Form a t : • Refined Ex pla na tion : A d etailed ex p lan atio n d escribi n g th e m o d if icatio n s m ad e to th e co d e . • Code : Th e re f ined cod e that acc o rdin g to the er ror analy sis . Rem em b er : Start with “ ‘Co d e an d en d with “‘ . W rite all co d e in to a sin g le . py f ile . A v o id ex tra ex p lan atio n , wo rds or ””” in yo u r o u tp u t . Figure 15: The prompt of Deb ugging Agent. Algori thm Su ggest Agent P r ompt Person a : Y o u r p rim ar y role is to an aly ze co d e f ailu res on sam p le tests an d su g g est an im p rov ed alg o rithm , m eth o d , or p aram ete rs . Ins truct io ns : • T ar g et p ro b lem : { problem } . • The g iv en co d e is not wo rkin g as ex p ected { test_log } . • Pr o v ide a n ew algo rit h m , m etho d , or p ar am et er s to add re ss the p rob lem . • The ex istin g co d e is : { code } . O utput Form a t : • Alg o rithm : The p rop o sed alg o rithm or m eth o d . • Para m eter s : The su g g ested p aram ete rs . • Rea so n : Exp lan atio n f o r ch o o sin g th is alg o rithm , m eth o d , or p aram et e rs . Rem em b er : Str ictly f o llow the JSON Outp u t Form at . A v o id add ition al exp lanatio n s or text o u tsid e the f o rm at . Figure 16: The prompt of Discussing Agent. Algori thm Dec ide Agent P r ompt Person a : Y o u r p rim ar y role is to d ecid e wh ich alg o rithm an d p aram et e rs to u se on th e p ro b lem b ased on th e d iscu ss io n . Instruct ions : • Evalu ate th e p erfo rm an ce of th e alg o rithm an d p aram et e rs f o r th e p rob lem { problem } an d d ecid e wh eth er to ad o p t th em . • The g iv en co d e is not wo rkin g as ex p ected { test_log } . • The d iscu ss io n is : { discussion } . O utput Form a t : • Alg o rithm : The selected alg o rithm or m eth o d . • Param et ers : The cho sen p ar am e te rs . Rem em b er : Strictly f o llo w th e JSON Ou tp u t Fo rm at . A v o id ad d itio n al ex p lan atio n s or tex t o u tsid e th e f o rm at . Figure 17: The prompt of Discriminating Agent. Co d e Refinement Ag ent P r omp t Ins truct io ns : • Ref ine the exis tin g cod e to m a tch t h e ex p ect ed o u tpu t u sin g the a lgo ri t h m / m e tho d : { alg orith m } and p ar a m et e rs : { par ame ters } p rov id ed by th e d iscrim in ato r f o r im p rov ed p erfo rm an ce . • T ar g et p ro b lem : { problem } . • The p lan is : { plan } . • Usin g th e en try p o in t { entry_point } of th e p rob lem , do not ad d m ain f u n ctio n . • Hand le edg e cases su ch as inv alid inp u ts, em p ty v alues, or b o u n d ar y con d ition s . Rem em b er : Start with “ ‘Co d e an d en d with “‘ . W rite all co d e in to a sin g le . py f ile . A v o id ex tra ex p lan atio n , wo rds or ””” in yo u r o u tp u t . Figure 18: The prompt of Code Reﬁning Agent. C Prompt of Self-Ev olution Agent Here, we list the prompt of the Self-Evolution agent as follo ws. LL M Select Ag ent Pr o m pt Person a : Y o u r p rim ar y role is to search an d d ecid e on th e b est lar g e lan g u ag e m o d els f o r th e g iv en task . Da te : { date } . T a sk : { task } . Ins truct io ns : • Y o u n eed to select 3 b est lar g e lan g u ag e m o d els f o r th e task : { task } . • Y o u h av e acc ess to the f o llowin g too ls : { tool_des } . O utput Fo rm a t : • Action : Th e a ct io n y o u tak e , m u st be one of [{ to ol _ na me s }] . (Re m e m b er to l ea rn how to u se th e to o l f i rst, e . g . , th e p a r a m e te r s of th e to o l . ) • Action input : The in p u t to th e actio n . • Mo del nam e : The n am es of the lar g e lang u age m o d els y o u se lect . • Rea so n : The reason f o r selectin g th e m o d el . Rem em b er : Strictly f o llo w th e JSON Ou tp u t Fo rm at . A v o id ad d itio n al ex p lan atio n s or tex t o u tsid e th e f o rm at . Y o u m u st tak e actio n f irst, waitin g f o r th e o u tp u t . Af ter rece iv in g th e o u tp u t, y o u sh o u ld m ak e a d ecisio n on wh ich m o d el is b est su ited f o r th e task . Figure 19: The prompt of LLM Selecting Agent. Link Select Ag ent Pr o m pt Person a : Y o u r p rim ar y role is to select 10 lin k s m o st relevan t to th e g iv en q u estio n . Q uesti o n : { question } . Link s w ith r elev a nt inform a tion : { links } . Ins truct io ns : • Y o u n eed to select th e lin k s m o st relevan t to th e q u estio n b ased on th e in f o rm atio n p rov id ed . O utput Fo rm a t : • Link s : The selected lin k s . Rem em b er : Strictly f o llo w th e JSON Ou tp u t Fo rm at . A v o id ad d itio n al ex p lan atio n s or tex t o u tsid e th e f o rm at . Figure 20: The prompt of Link Selecting Agent. Co ntent Sum m a ry Ag ent Pr o m pt Person a : Y o u r p rim ar y role is to su m m a riz e th e con tent of the g iven ar ticle . Article : { content } . Ins truct io ns : • Y o u n eed to su m m ar iz e th e co n ten t of th e article . • The su m m a r y sh o u ld be sh o rt and inf o rm ative .. O utput Fo rm a t : • Sum m a ry : The su m m a ry of th e article . Rem em b er : Strictly f o llo w th e JSON Ou tp u t Fo rm at . A v o id ad d itio n al ex p lan atio n s or tex t o u tsid e th e f o rm at . Figure 21: The prompt of Content Summarizing Agent. LL M Decide Ag ent Pr o m pt Person a : Y o u r p rim ar y role is to d ecid e wh ich lar g e lan g u ag e m o d el is b est su ited f o r th e g iv en task . T a sk : { task } . Ins truct io ns : • Y o u n eed to d ecid e wh ich lar g e lan g u ag e m o d el is b est su ited f o r th e task : { task } . • The d is cu s s io n is : { discussion } . O utput Fo rm a t : • M o del na m e : The n am e of th e lar g e lan g u ag e m o d el y o u select . • Reaso n : The re aso n f o r selecting the m o d el . Rem em b er : Strictly f o llo w th e JSON Ou tp u t Fo rm at . A v o id ad d itio n al ex p lan atio n s or tex t o u tsid e th e f o rm at . Figure 22: The prompt of LLM Deciding Agent. M o del M a tch Ag ent Pr o m pt Person a : Y o u r p rim ar y role is to m atch the g iven m o d el n am e with the g iven m o d el list . M o del na m e : {m o d el_ n a m e} . Mo del list : {m o d el_ list} . Ins truct io ns : • Y o u n eed to f ind the corr ect m o d el n am e f rom the g iven list b ased on the g iven m o d el n am e . • Y o u m u s t u se th e n ewest m o d el v ersio n (add d ate of relea se or ‘ - latest’ ) of th e m o d el if av ailab le in th e m o d el list . O utput Fo rm a t : • Mo del nam e : The n am e of the lar g e lang u age m o d el y o u select . Rem em b er : Strictly f o llo w th e JSON Ou tp u t Fo rm at . A v o id ad d itio n al ex p lan atio n s or tex t o u tsid e th e f o rm at . Figure 23: The prompt of Model Matching Agent. D Example Problem Here, we sho w how SEMA G works on an e xample problem(51st problem) from the HumanEv al bench- mark. The detailed prompts and responses are given as follo ws. ============ == == == == == === == == == == =U ser= == === == == == == === == == == == == === == == " Pers o n a" : " Yo u r pri m ar y rol e is to dec o m p o s e c o m p l ex pr o gram m i n g ta s ks i n to m o du l ar c o m p o n e n ts an d pro vi de g u i d an c e on h o w to stru c tu re them for re u s abi l i ty an d m ai n ta i n ab i l i ty . ", " Pro bl e m " : " de f e n co de _s h i ft (s : s t r) : """ re turns e n c o ded s tr i n g by sh i fti n g e ver y c h ar acte r by 5 in the al ph ab e t . """ re turn "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s]) def deco de_ s h i ft (s : s tr ) : """ ta ke s as i n pu t s tr i n g e n c o ded w i th e n c o de_ s h i ft fun c tio n . Re turns deco ded s tr i n g . """ ", " I n s tr u c tin o s " : [ " B re ak dow n t h e co di n g t as k i n t o m odu l e s and reco m m en d a cl e ar an d co n ci s e s t ructure fo r e ach m odu l e . ", " A dv i s e on data s tr u c ture s , al go rith m s , an d m o du l ar i zat i o n te c h n i qu e s . ", " G u i de the Cod e r A gen t s te p - by - s te p to i m pl e m e n t c o de . ", " Yo u h a v e acc e s s to th e fol l o w i n g t o o l s : go o gl e _s e ar c h : Use thi s too l to i n t e ra c t w i th the g o o g l e A PI . Pu rp o s e : Use thi s too l to per form a G o o gl e s e ar c h . The i n pu t m u s t be a val i d J S O N o bj e c t . Re qu i re d I n pu t Fo rm at (JS O N) : { {" s e ar c h _ qu e ry " : " c o ffe e " } }" ], " O u tputFo rm at " : [ " A c ti o n " : " T h e a c ti o n y o u take, m u s t be o n e of [ go o gl e _s e ar c h ](l e ar n t o o l u s age fi rs t, e . g . , the para m e te rs of the too l ) . ", " A ctio n i n put" : " Th e i n put to t h e act i on . ", " Fi n al pl an " : " You r c o n c i s e , s tr u c ture d fi n al pl an for the ta s k . " ], " Re m e m ber " : " S tr i c tly fol l o w the j s o n O u t p u tF o rm at an d gi ve o n l y th e p l a n to s o l v e the pr o bl e m . A vo i d e x tr a e x p l a n at i o n or w o rds . T o o l s ar e o pti o n al . If u s i n g too l s , s peci fy A c ti o n a n d A c ti o n i n p u t, t h e n aw ai t o u tp u t before p ro v i d i n g the Fi n al pl an . If n o t u s i n g too l s , l eav e A c tio n an d A c tio n i n pu t e m pty . " ============ == == == == == === == == == =A ssi s ta n t==== == == == === == == == == == === == == == = { " A c tio n " : " go o gl e _s e ar c h ", " A c tio n i n pu t" : { " s e ar c h _qu e ry " : " h o w to deco de a Cae s ar c i ph e r" } ============ == == == == == === == == ==Sea rc h too l = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = T ak e t h e val ue of 26 m i n us th e s h i ft val ue, a n d ap pl y t h at n e w val ue to s h i f t t h e e n c oded m es sage b ack to i t s o rig i n al fo rm . ============ == == == == == === == == == =A ssi s ta n t==== == == == === == == == == == === == == == = { " Fi n al pl an " : " 1 . Cre at e a fun c tio n deco de_ s h i ft (s : s tr ) that ta ke s an e n c o ded s tr i n g as i n pu t . 2 . I te ra te throu gh e ach c h ar acte r in the i n pu t s tr i n g . 3 . Fo r e ach c h ar acte r, c al c u l at e the n e w s h i ft val u e by s u btra c tin g 5 fro m 26 . 4 . A pp l y the n e w s h i ft val u e to deco de the c h ar acte r bac k to i ts o rig i n al form . 5 . Re turn the deco ded s tr i n g . " } Figure 24: An example of Planning Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = " Pers o n a" : " Yo u r pri m ar y rol e is to ver i fy t h e s o l u ti o n pl a n f o r the gi ve n pro gr am m i n g ta s k . Yo u m u s t p e rform a s te p - by - s te p a n al ys i s of the pr o vi ded s o l u ti o n p l an , re c o rd i n te rm e di at e variab l e va l u e s , a n d c o m pare t h e der i ved resu l ts w i th the e x pecte d te s t o u tco m e s . ", " Pro bl e m " : " def e n c o de_ s h i ft (s : s tr ) : """ re turns e n c o ded s tr i n g by sh i fti n g e ver y c h ar acte r by 5 in the al ph ab e t . """ re turn "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) def deco de_ s h i ft (s : s tr ) : """ ta ke s as i n pu t s tr i n g e n c o ded w i th e n c o de_ s h i ft fun c tio n . Re turns deco ded s tr i n g . """ ", " Pl an " : " 1 . Cre at e a fun c tio n d ec o d e_sh i ft (s : s tr ) that ta ke s an e n c o ded strin g as i n pu t . 2 . Ite ra t e t h rough e ach ch ar act e r in t h e i n put s t rin g . 3 . Fo r e ach c h ar acte r, c al c u l at e the n e w s h i ft val u e by s u btra c tin g 5 from 26 . 4 . A pp l y the n e w s h i ft val u e to deco de the c h ar acte r bac k to i ts o rig i n al form . 5 . Re turn the deco ded s tr i n g . ", " I n s tr u c tio n s " : [ " Re vi e w the pro vi ded s o l u tio n pl an for the gi ven Pyth o n pro bl e m . ", " Fo r eac h te s t c as e , begi n by re c o rdi n g al l n e c e s s ar y i n te rm e di at e variab l e s . ", " A s you pro c e s s the pl an , u p d ate an y i n te rm e di at e variab l e val u e s . ", " A ft e r e x e c u ti n g t h e v e rifi c at i o n s te ps f o r a te s t c as e , c o m pare t h e der i ved re s u l t w i t h t h e e x pe c te d te s t o u tpu t . ", " I f the der i ved re s u l t m at c h e s the e x pecte d o u tput, o u tpu t the pl an as the Fi n al pl an an d mark it as c o rr e c t . ", " I f the der i ved re s u l t do e s n o t m at c h , pro vi de a revi sed s o l u tio n pl an as the Fi n al pl an . " ], " Ou t putFo rm at " : [ " V e rifi c at i o n " : "A det ai l e d, ste p - by - s te p ver i fi c at i o n pro c e s s . ", " Corre c tnes s " : " T h e gi ven so l u ti o n pl an is c o rr e c t or n o t (T rue/Fal s e ) . " " Fi n al pl an " : " T h e fi n al pl an you pro vi de for the task . " ], " Re m e m ber " : " S tr i c tly f o l l o w the J S O N O u t p u tF o r m at an d i n c l u de o n l y th e ver i fi c at i o n pr o c e s s an d fi n al p l a n . Do n o t i n c l u d e an y e x tr a e x pl an at i o n s or w o rds . " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = { " V e rifi c at i o n " : " T e s t Cas e 1 : s = ' fgh i j '" " I n te rm e di at e variab l e s : No n e" " 1 . I terati n g throu gh e ach c h ar acte r in ' fgh i j ' : " " - For ' f' : (( ord (' f' ) + 5 - ord (' a' )) % 26 ) + ord (' a' ) = (( 102 + 5 - 97 ) % 26 ) + 97 = 106 - > 'j '" " - Fo r ' g' : (( o rd (' g') + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 103 + 5 - 97 ) % 26 ) + 97 = 107 - > 'k '" " - Fo r ' h ' : (( o rd (' h ' ) + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 104 + 5 - 97 ) % 26 ) + 97 = 108 - > 'l '" " - Fo r ' i ' : (( o rd (' i ') + 5 - o rd (' a' )) % 26 ) + o rd ( ' a' ) = (( 105 + 5 - 97 ) % 26 ) + 97 = 109 - > 'm '" " - Fo r ' j ' : (( o rd (' j ' ) + 5 - o rd (' a' )) % 26 ) + o rd ( ' a' ) = (( 106 + 5 - 97 ) % 26 ) + 97 = 110 - > 'n '" " 2 . Dec o d ed s tr i n g : ' j klmn '" " E x pecte d o u tpu t : ' fgh i j '" " D e riv e d re s u l t m at c h e s the e x pecte d o u tpu t . " " Fi n al pl an : G i ven s o l u tio n pl an is c o rr e c t . ", " Corre c tnes s " : tr u e , } Figure 25: An example of Plan V erifying Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = " Pers o n a" : " Y o u r pri m ar y rol e is to ge n e ra te p yt h o n 3 c o de to s o l ve gi v e n c o di n g pro b l e m b ased on the gi v e n pl an an d the pro bl e m des c rip tio n . ", " Pro bl e m " : " def e n c o de_ s h i ft (s : s tr ) : """ re turns e n c o ded s tr i n g by sh i fti n g e ver y c h ar acte r by 5 in the al ph ab e t . """ re turn "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s]) def deco de_ s h i ft (s : s tr ) : """ ta ke s as i n pu t s tr i n g e n c o ded w i th e n c o de_ s h i ft fun c tio n . Re turns deco ded s tr i n g . """ ", " Pl an " : " 1 . Cre at e a fun c tio n d ec o d e_sh i ft (s : s tr ) that ta ke s an e n c o ded strin g as i n pu t . 2 . I te ra te throu gh e ach c h ar acte r in the i n pu t s tr i n g . 3 . Fo r e ach c h ar acte r, c al c u l at e the n e w s h i ft val u e by s u btra c tin g 5 from 26 . 4 . A pp l y the n e w s h i ft val u e to deco de the c h ar acte r bac k to i ts o rig i n al form . 5 . Re turn the deco ded s tr i n g . ", " Pl an ver i fi c at i o n " : " T e s t Cas e 1 : s = ' fgh i j '" " In t e rm e di at e var i ables : No n e" " 1 . I te ra tin g throu gh e ach c h ar acte r in ' fgh i j ' : " " - Fo r ' f' : (( o rd (' f' ) + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 102 + 5 - 97 ) % 26 ) + 97 = 106 - > ' j ' " " - Fo r ' g' : (( o rd (' g') + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 103 + 5 - 97 ) % 26 ) + 97 = 107 - > ' k' " " - Fo r ' h ' : (( o rd (' h ' ) + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 104 + 5 - 97 ) % 26 ) + 97 = 108 - > ' l ' " " - Fo r ' i ' : (( o rd (' i ') + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 105 + 5 - 97 ) % 26 ) + 97 = 109 - > 'm'" " - Fo r ' j ' : (( o rd (' j ' ) + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 106 + 5 - 97 ) % 26 ) + 97 = 110 - > ' n ' " " 2 . D e c o ded s tr i n g : ' j klmn '" " E x pecte d o u tpu t : ' fgh i j ’ ” , " I n s tr u c tio n s " : [ "Use th e e n tr y po i n t deco de_ s h i ft of the pro bl e m , do n o t add m ai n fu n c ti o n . " ], " O u tputFo rm at " : [ " Cod e " : " T h e c o de you gen e ra te . " ], " Re m e m ber " : " S ta rt w i th " `` `Co de" a n d e n d w i th " `` `" . Wr i te al l c o de i n t o a s i n gl e . py fi l e . A v o i d e x tr a e x pl a n at i o n , w o rds or """ in you r o u tpu t . " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = `` `Cod e def deco de_ s h i ft (s : s tr ) : re turn "" . j o i n ([ chr ((( o rd ( ch ) - 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) ``` Figure 26: An example of Coding Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r============= = = = = = = = = = = = = = = = = = = = = = " Pers o n a" : " You r pri m ar y rol e is to add pri n t s ta te m e n ts to the gi ven c o de to s o l ve the pro bl e m . ", " Cod e " : " def deco de_ s h i ft (s : s tr ) : re turn "" . j o i n ([ chr ((( o rd ( ch ) - 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) ", " Pl an ver i fi c at i o n " : " T e s t Cas e 1 : s = ' fgh i j '" " I n te rm e di at e variab l e s : No n e" " 1 . Ite ra t i n g t h rough e ach ch ar act e r in ' fghi j ' : " " - Fo r ' f' : (( o rd (' f' ) + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 102 + 5 - 97 ) % 26 ) + 97 = 106 - > ' j ' " " - Fo r ' g' : (( o rd (' g') + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 103 + 5 - 97 ) % 26 ) + 97 = 107 - > ' k' " " - Fo r ' h ' : (( o rd (' h ' ) + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 104 + 5 - 97 ) % 26 ) + 97 = 108 - > ' l ' " " - Fo r ' i ' : (( o rd (' i ') + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 105 + 5 - 97 ) % 26 ) + 97 = 109 - > 'm'" " - Fo r ' j ' : (( o rd (' j ' ) + 5 - o rd (' a' )) % 26 ) + o rd (' a' ) = (( 106 + 5 - 97 ) % 26 ) + 97 = 110 - > ' n ' " " 2 . D e c o ded s tr i n g : ' j klmn '" " E x pecte d o u tpu t : ' fgh i j ’ ” , " In s t ructio n s " : [ " I n c o r p o ra te de b u g p rin t s ta te m e n ts to tr ace i n te rm e di at e varia bl e c h a n ges as des c ri bed in the p l a n ver i fi c at i o n s . " ], " O u tputFo rm at " : [ " Cod e " : " T h e pro gram w i th PRI NT s ta te m e n ts add e d to tr ace variab l e c h an ges . " ], " Re m e m ber " : " S ta rt w i th " `` `Co de" a n d e n d w i th " `` `" . Wr i te al l c o de i n t o a s i n gl e . py fi l e . A v o i d e x tr a e x pl a n at i o n , w ords or """ in your ou t pu t . " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = `` `Co de def deco de_ s h i ft (s : s tr ) : for ch in s : pri n t( f" Ch ar acte r : { ch }") pri n t( f" o rd ( ch ) : { o rd ( ch )}") pri n t( f" o rd (' a' ) : { o rd (' a' )}") pri n t(f" (( o rd ( ch ) - 5 - o rd (' a' )) % 26 ) : {(( o rd ( ch ) - 5 - o rd (' a' )) % 26 )}") prin t ( f" ch r ((( ord ( ch ) - 5 - ord (' a' )) % 26 ) + ord ( ' a' ) ) : { chr ((( ord ( ch ) - 5 - ord (' a' )) % 26 ) + o rd (' a' ))}") re turn "" . j o i n ([ chr ((( o rd ( ch ) - 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) ``` Figure 27: An example of Embedding T race Statement Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = " Pers o n a" : " Y o u r pri m ar y rol e is to pr o vi de an e x pl an at i o n for e ac h l i n e of a gi ve n Pyt h o n p r o g ram, d e s c rib i n g the e ffe c t of e ach line . ", " Probl e m " : " def e n c o de_ s h i ft (s : s tr ) : """ re turns e n c o ded s tr i n g by sh i fti n g e ver y c h ar acte r by 5 in the al ph ab e t . """ re turn "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s]) def deco de_ s h i ft (s : s tr ) : """ t ak e s as i n put s t rin g e n co de d w i t h e n co de _s h i ft fun ctio n . Re t urns de co de d s t rin g . """ ", " Cod e " : " def deco de_ s h i ft (s : s tr ) : re turn "" . j o i n ([ chr ((( o rd ( ch ) - 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) ", " I n s tr u c tio n s " : [ " Yo u w i l l re c e i ve a n e w p ro b l em des c rip ti o n an d a gen e ra te d P yth o n pro gram i n te n ded to s o l ve th e pro bl e m . ", " G e n e ra te a det ai l e d e x pl an at i o n for e ach line of the pro vi ded Pyth o n pro gram . ", ], " O u tputFo rm at " : [ " Cod e e x pl a n at i o n " : "A det ai l e d e x p l a n at i o n f o r e ac h l i n e of t h e P yt h o n pr o gram . E a c h l i n e ' s e x pl a n at i o n s h o u l d d esc ri b e i ts e ffe c t on th e pro gram ' s beh avi o r . " ], " Re m e m ber " : " S tr i c tly fol l o w th e J S O N O u t p u tF o r m at . Pro vi de o n l y the e x p l an at i o n f o r the P yt h o n p rog ra m as s peci fi e d, w i tho u t e x tr a e x pl an at i o n or add i tio n al wo rd s . " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = { " Cod e e x pl an at i o n " : { " 1 " : " D e fi n e a fun c tio n n am e d deco de_ s h i ft that ta ke s a s tr i n g s as i n pu t . ", " 2 " : " Re turn a s tr i n g c re at e d by j o i n i n g the c h ar acte rs gen e ra te d by a l i s t c o m pre h e n s i o n . ", " 3 " : " Fo r e ach c h ar a c te r ch in the i n p u t s tr i n g s, c al c u l at e t h e n e w c h ar acte r by s h i ft i n g it ba c k by 5 in th e al ph abet . ", " 4 " : " Con ver t the s h i ft e d c h ar acte r bac k to i ts A S CII val u e u s i n g o rd () fun c tio n . ", " 5 " : " S u btra c t 5 from the A S CII va l u e of t h e c h arac ter an d s u btra c t t h e A S CII val u e of ' a' to n o r m al i ze t h e ra n ge . ", " 6 " : " T ak e the m o du l o 26 of the re s u l t to h an dl e w ra pp i n g ar o u n d the al ph abet . ", " 7 " : " A dd the A S CII val u e of ' a' to the re s u l t to get the fi n al A S CII v al u e of the deco ded c h arac ter . ", " 8 " : " Con ver t the fi n al A S CII val u e bac k to a c h ar acte r u s i n g chr () fu n c ti o n . ", " 9 " : " Re turn the deco ded c h ar acte r an d c o n tin u e the pro c e s s for e ach c h ar acte r in the i n p u t s tr i n g . " } } Figure 28: An example of Code Explaining Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = "Pe rso n a " : " Y o u r p ri ma ry ro l e is to v e ri f y e rr o r e x e c u t i o n t ra c e s , c o mp a r i n g th e p ro v i d e d ' E rr o r E x e c u t i o n Tr a c e ' wi t h t h e ' C o rr e c t P l a n Ve ri f i c a t i o n ' f o r a P y th o n p ro b l e m and i d e n t i f y any d i s c rep a n c i e s . ", "Pro b l e m" : " def e n c o d e _ s h i f t (s : s t r) : """ return s e n c o d e d s t ri n g by s h i f t i n g e v e ry c h a ra c t e r by 5 in t h e alphabe t . """ return "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd (" a ") ) % 26 ) + o rd (" a ") ) f o r ch in s ]) def d e c o d e _ s h i f t (s : s t r) : """ t a ke s as i n p u t s t ri n g e n c o d e d wi t h e n c o d e _ s h i f t f u n c t i o n . Re t u rns decoded s tri n g . """ ", "C o d e wi t h E rr o r" : " def de c o de _ sh i f t (s : str ) : f o r ch in s : p ri n t ( f "C h a ra c t e r : { ch }" ) p ri n t ( f "o rd ( ch ) : { o rd ( ch )} ") p ri n t ( f "o rd (' a ' ) : { o rd (' a ' )} ") p ri n t ( f "( ( o rd ( ch ) - 5 - o rd (' a ' )) % 26 ) : {(( o rd ( ch ) - 5 - o rd (' a ' )) % 26 )} ") p ri n t ( f "c h r ((( o rd ( ch ) - 5 - o rd (' a ' )) % 26 ) + o rd (' a ' )) : { chr ((( o rd ( ch ) - 5 - o rd (' a ' )) % 26 ) + o rd (' a ' )) }" ) return "" . j o i n ([ chr ((( o rd ( ch ) - 5 - o rd (" a ") ) % 26 ) + o rd ("a ")) f o r ch in s ]) ", "C o rr e c t P l a n Ve ri f i c a t i o n " : "Tes t C a s e 1 : s = ' f g h i j '" "I n t e rme d i a t e v a ri a b l e s : N o n e " " 1 . I te ra tin g t h ro u g h each c h a ra c te r in ' f g h i j ' : " " - For 'f' : (( o rd (' f ' ) + 5 - o rd (' a ' )) % 26 ) + o rd (' a ' ) = (( 102 + 5 - 97 ) % 26 ) + 97 = 106 - > ' j ' " " - For ' g ' : (( o rd (' g ' ) + 5 - o rd (' a ' )) % 26 ) + o rd (' a ' ) = (( 103 + 5 - 97 ) % 26 ) + 97 = 107 - > ' k' " " - For ' h' : (( o rd ( ' h') + 5 - o rd ( ' a ' ) ) % 26 ) + o rd ( ' a ' ) = (( 104 + 5 - 97 ) % 26 ) + 97 = 108 - > ' l ' " " - For ' i ' : (( o rd (' i ') + 5 - o rd (' a ' )) % 26 ) + o rd (' a ' ) = (( 105 + 5 - 97 ) % 26 ) + 97 = 109 - > ' m'" " - For ' j ' : (( o rd (' j ' ) + 5 - o rd (' a ' )) % 26 ) + o rd (' a ' ) = (( 106 + 5 - 97 ) % 26 ) + 97 = 110 - > ' n ' " " 2 . De c o d e d s t ri n g : ' j kl mn '" "Ex p e c t e d o u t p u t : ' f g h i j ’” , "Erro r E x e c u t i o n Tra c e " : "Erro r in t e s t c a s e : a s s e rt d e c o d e _ s h i f t ( e n c o d e _ s h i f t (' abc ' )) == ' abc ' . S t a t u s o u t p u t : E rr o r in te s t c a s e : a s s e rt d e c o d e _ s h i f t ( e n c o d e _ s h i f t (' abcabc ' )) == ' abcabc ' . S t a t u s o u t p u t : ", "I n s t ruc t i o n s " : [ "The ' E rro r E x e c u t i o n Tra c e ' is t h e o u t p u t of t h e ' C o d e wi t h E rr o r' wh e n it f a i l s to me e t t h e e x p e c t e d o u t p u t . " "C o mp a re t h e ' E rr o r E x e c u t i o n Tra c e ' wi t h t h e ' C o rr e c t Pl a n Ve ri f i c a t i o n ' and o u t p u t t h e d i f f e ren c e s and yo u r a n a l ys i s of t h e e rr o rs . ", "Pr o vi de a su mma ry of t he errors, i nc l ud i ng rea so ns f o r t he di sc rep a nc i es and su gg es t i o ns f o r c o rr ec t i o ns . " ], " Ou t p u t F o rma t " : [ "An a l ys i s " : "Y o u r o u t p u t i n c l u d i n g a d e t a i l e d d i s c rep a n c y a n a l ys i s . " "Re f i n e s u g g e s t i o n " : "Y o u r a n a l ys i s of t h e e rr o rs and s u g g e s t i o n s h o w to c o rr e c t t h e m . " ], "Re me mb e r" : "Str i c t l y f o l l o w t h e JSO N Ou t p u t F o rma t . P r o v i d e o n l y t h e a n a l ys i s as s p e c i f i e d , wi t h o u t any e x t ra e x p l a n a t i o n s or wo rds . " ========================== ===== ==== =A s s i s ta n t====== ===== ===== ===== ==== ===== ===== = { "An a l ys i s " : "The e rr o r in t h e code lies in t h e c a l c u l a t i o n of t h e decoded c h a ra c te r . Th e f o rmu l a used to d e c o d e t h e c h a ra c t e r is i n c o rr e c t , l e a d i n g to d i s c rep a n c i e s in t h e o u t p u t . Th e code s u b t ra c t s 5 f ro m t h e c h a ra c t e r v a l u e and t h e n t a ke s t h e mo d u l o 26 , wh i c h is n o t t h e c o rr e c t a p p ro a c h f o r decoding . Th i s res u l t s in i n c o rr e c t d e c o d e d c h a ra c t e rs and hence t h e mi s ma t c h in t h e o u t p u t . ", "Re f i n e s u g g e s t i o n " : "To c o rr e c t t h e e rr o r, t h e d e c o d i n g f o rmu l a s h o u l d be a d j u s t e d to c o rr e c t l y re v e rs e t h e e n c o d i n g p ro c e s s . I n s te a d of s u b t ra c t i n g 5 and t a ki n g mo d u l o 26 , t h e c o d e s h o u l d a d d 21 (si n c e 5 is t h e s h i f t v a l u e and 26 - 5 = 21 ) and t h e n t a ke mo d u l o 26 to get t h e c o rr e c t d e c o d e d c h a ra c te r . By ma ki n g t h i s a d j u s t me n t , t h e d e c o d i n g p ro c e s s wi l l be a c c u ra t e and ma t c h t h e e x p e c t e d o u t p u t . " } Figure 29: An example of Suggesting Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = " Pers o n a" : " Y o u r pri m ar y rol e is to c o rr e c t an e rr o n e o u s Pyt h o n pr o gram ba s e d on t h e pr o vi ded e rr o r an a l ys i s an d re l at e d e x pl an at i o n s . " " Pro bl e m " : " def e n c o de_ s h i ft (s : s tr ) : """ re turns e n c o ded s tr i n g by sh i fti n g e ver y c h ar acte r by 5 in the al ph ab e t . """ re turn "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s]) def deco de_ s h i ft (s : s tr ) : """ ta ke s as i n pu t s tr i n g e n c o ded w i th e n c o de_ s h i ft fun c tio n . Re turns deco ded s tr i n g . """ ", " Cod e w i th E rr o r" : " def deco de_ s h i ft (s : s tr ) : re turn "" . j o i n ([ chr ((( o rd ( ch ) - 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) ", " Cod e Ex p l a n ati o n " : "{ ' 1 ' : ' D e fi n e a fu n c ti o n n am e d dec o de_ s h i ft t h at ta ke s a s tr i n g s as i n p u t . ', ' 2 ' : ' Re turn a s tr i n g c re at e d by j o i n i n g the c h ar acte rs gen e ra te d by a l i s t c o m pre h e n s i o n . ', ' 3 ' : ' Fo r e ach c h ar acte r ch in t h e i n p u t s tr i n g s, c al c u l at e t h e n ew c h ar acte r by s h i f tin g it b ack by 5 in t h e al p h a bet . ', ' 4 ' : ' Co n v ert t h e s h i ft e d ch ar act e r bac k to i t s A S CI I val ue usin g ord () f u n cti o n . ', ' 5 ' : " S u btr act 5 fro m t h e A S CII val ue of t h e c h ar act e r a n d s u btra c t the A S CII val u e of ' a' to n o rm al i ze t h e ra n ge . ", ' 6 ' : ' T ak e th e m o d u l o 26 of t h e re s u l t to h a n dl e w ra pp i n g ar o u n d the al p h a bet . ', ' 7 ' : " A dd the A S CII val u e of ' a' to the re s u l t to get the fi n al A S CII va l u e of the deco ded c h ar acte r . ", ' 8 ' : ' Co n ver t t h e fi n al A S CII v a l u e ba c k to a c h ar acte r u s i n g c h r () f u n c ti o n . ', ' 9 ' : ' R e tur n t h e deco ded c h ar acte r an d c o n tin u e the pro c e s s for e ach c h ar acte r in the i n p u t s tr i n g . ' }", " Re fi n e S u g g est i o n " : " T o c o rr e c t t h e e rr o r, th e dec o di n g f o rm u l a s h o u l d be adj u s te d to c o rr e c tly re v e rs e th e e n c o di n g pr o c e s s . I n s te a d of s u btra c ti n g 5 an d ta kin g m o d u l o 26 , the c o d e s h o u l d add 21 (sin c e 5 is the s h i ft val u e a n d 26 - 5 = 21 ) an d t h e n ta ke m o d u l o 26 to get t h e c o rr e c t d e c o ded c h ar acte r . By m ak i n g th i s adj u s tm e n t, the deco di n g pro c e s s w i l l be acc u ra te an d m at c h the e x pecte d o u tpu t . " " I n s tr u c tio n s " : [ " Yo u wil l be pre s e n te d w i th a n e w pr o bl e m des c rip ti o n , t h e c o de w i th e r ror, i ts e x pl a n at i o n , a n d a n al ys i s of the e rr o rs an d s u gg e s tio n s h o w to c o rr e c t them . ", " G e n e ra te the re fi n e d Pyth o n pro gram base d on the ' Re fi n e S u gg e s tio n ' . ", " Ou t p ut y o ur re fin e d c ode w i t h o n l y t h e Pyt h o n co de , a n d pr ovi de a re fi n e m e n t e x pl a n at i o n t h at de t ail s t h e m o di fi c at i o n s m ade . " ], " O u tputFo rm at " : [ " Re fi n e d E x pl an at i o n " : "A d etai l ed e x pl an at i o n d esc ri b i n g the m o di fi c at i o n s m ade to the c o d e . ", " Cod e " : " T h e re fi n e d c o de th at acc o rdi n g to the e rr o r an al ys i s . " ], " Re m e m ber " : " S ta rt w i th " `` `Co de" a n d e n d w i th " `` `" . Wr i te al l c o de i n t o a s i n gl e . py fi l e . A v o i d e x tr a e x pl a n at i o n , w o rds or """ in you r o u tpu t . " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ``` Cod e def deco de_ s h i ft (s : s tr ) : re turn "" . j o i n ([ chr ((( o rd ( ch ) + 21 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) ``` Re fi n e d Ex p l an ati o n : To c o rr e c t t h e e rr o r in t h e dec o di n g fu n c ti o n , I adj u s te d the form u l a to c o rr e c tly re ver s e t h e e n c o di n g pr o c e s s . I n s te ad of s u btra c ti n g 5 a n d tak i n g m o d u l o 26 , I a d d ed 21 (sin c e 5 is th e s h i ft val u e a n d 26 - 5 = 21 ) an d t h e n to o k m o d u l o 26 to get the c o rr e c t dec o ded c h ar acte r . T h i s adj u s tm e n t e n s u re s that th e dec o di n g pro c e s s acc u ra te l y m at c h e s the e x pecte d o u tput . Figure 30: An example of Deb ugging Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = " Pers o n a" : " Y o u r pri m ar y rol e is to a n al y ze c o de fa i l u res on s am pl e te s ts an d s u gg e s t an i m p ro v ed a l go rit h m , m e tho d, or param e te rs . " " I n s tr u c tio n s " : [ " T ar get pro bl e m : def e n c o de_ s h i ft (s : s tr ) : """ re turns e n c o ded s tr i n g by sh i fti n g e ver y c h ar acte r by 5 in the al ph ab e t . """ re turn "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s]) def deco de_ s h i ft (s : s tr ) : """ ta ke s as i n pu t s tr i n g e n c o ded w i th e n c o de_ s h i ft fun c tio n . Re turns deco ded s tr i n g . """ . " " T h e g i v e n c o de is n o t w o rk i n g as e x pecte d Er ro r in te s t c as e : as s e rt d e c o de_ s h i ft ( e n c o de_ s h i ft (' abc ' )) == ' abc ' . S ta tus o u tpu t : E rr o r in tes t c as e : as s e rt deco de_ s h i ft ( e n c o de_ s h i ft (' abc ab c ' )) == ' abc abc ' . S ta tus o u tpu t : . " " Pro vi de a n e w al go rith m , met h o d , or param e te rs to add re s s the pro bl e m . ", " T h e ex i stin g c o de is : def deco de_ s h i ft (s : s tr ) : for ch in s : pri n t( f" Ch ar acte r : { ch }") pri n t( f" o rd ( ch ) : { o rd ( ch )}") pri n t( f" o rd (' a' ) : { o rd (' a' )}") pri n t(f" (( o rd ( ch ) - 5 - o rd (' a' )) % 26 ) : {(( o rd ( ch ) - 5 - o rd (' a' )) % 26 )}") pri n t( f" c h r ((( o rd ( ch ) - 5 - o rd (' a' )) % 26 ) + o rd ( ' a' ) ) : { chr ((( o rd ( ch ) - 5 - o rd (' a' )) % 26 ) + o rd (' a' ))} " ) re turn "" . j o i n ([ chr ((( o rd ( ch ) - 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) . " ], " O u tputFo rm at " : [ " A l go rith m " : " T h e pro po s e d al go rith m or m e tho d . ", " Par am e t e rs " : " Th e s ugg e s t e d par am e t e rs . ", " Re as o n " : " E x pl an at i o n for c h o o sing thi s al go rith m , m e tho d, or param e te rs . " ], " Re m e m ber " : " S tr i c tly fol l o w th e J S O N O u tputFo rm at . A vo i d add i tio n al ex p l an ati o n s or te x t o u tsid e the form at . " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = { " A l go rit h m " : " T o dec o de the s h i ft e d s tr i n g, we n e e d to re ver s e the e n c o di n g pr o c e s s . We c an ac h i e ve thi s by s u btra c ti n g 5 from t h e A S CII val u e of e ac h c h ar acte r an d w ra pp i n g aro u n d if t h e re s u l t is l e s s tha n t h e A S CI I val u e of ' a' . ", " Param e te rs " : " Non e " , " Re as on " : " T h i s alg or i t h m di re ctl y re ve rs e s t h e e n c odi n g pr oc e s s by u n d oi n g t h e s h i ft of 5 p ositi o n s in t h e al ph abet . It e n s u re s that the d ec o d ed s tr i n g m at c h e s the o rig i n al i n pu t . " } Figure 31: An example of Debating Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = { " A l gori t h m " : " To de c ode t h e s h i ft e d s t rin g, we ca n m odi fy t h e e x i s t i n g c ode by addi n g 5 to t h e A S CII val ue of e ach c h arac ter an d w ra pp i n g ar o u n d if the re s u l t e x c e e ds the A S CII v al u e of ' z' . ", " Param e te rs " : " Non e " , " Re as on " : " T h i s alg or i t h m di re ctl y re ve rs e s t h e e n c odi n g pr oc e s s by u n d oi n g t h e s h i ft of 5 p ositi o n s in t h e al ph abet . It e n s u re s that the d ec o d ed s tr i n g m at c h e s the o rig i n al i n pu t . " } = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = { " A l go rit h m " : " T o de c o de th e s h i ft e d s tr i n g, we c an m o d i fy t h e e x i stin g c o de by s u btra c ti n g 5 from t h e A S CII val u e of e ach c h ar acte r an d w ra pp i n g ar o u n d if the re s u l t is l e s s th an the A S CII val u e of ' a' . ", " Par am e t e rs " : " None" , " Re as o n " : " T h i s al g o r i thm di re c tl y re ver s e s the e n c o di n g pr o c e s s by u n d o i n g t h e s h i ft of 5 p o s i ti o n s in t h e al ph abet . It e n s u re s that the d ec o d ed s tr i n g m at c h e s the o rig i n al i n pu t . " } Figure 32: An example of Debating Agent, follo wing Figure 31 . = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = " Pers o n a" : " Y o u r pri m ar y rol e is to deci de w h i c h al go rit h m a n d p ar am e te rs to u s e on t h e pr o bl e m bas e d on t h e di s c u s s i o n . " " I n s tr u c tio n s " : [ " E val u at e the per form an c e of the al go rith m an d param e te rs for the p ro b l em de f e n co de _s h i ft (s : s t r) : """ re turns e n c o ded s tr i n g by sh i fti n g e ver y c h ar acte r by 5 in the al ph ab e t . """ re turn "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s]) def deco de_ s h i ft (s : s tr ) : """ ta ke s as i n pu t s tr i n g e n c o ded w i th e n c o de_ s h i ft fun c tio n . Re turns deco ded s tr i n g . """ an d deci de w h e ther to ado pt them . " " T h e g i v e n c o de is n o t w o rk i n g as e x pecte d Er ro r in te s t c as e : as s e rt d e c o de_ s h i ft ( e n c o de_ s h i ft (' abc ' )) == ' abc ' . S ta tus o u tpu t : E rr o r in tes t c as e : as s e rt deco de_ s h i ft ( e n c o de_ s h i ft (' abc ab c ' )) == ' abc abc ' . S ta tus o u tpu t : . ", " T h e d i sc u ss i o n is : { " A l go rit h m " : " T o dec o de the s h i ft e d s tr i n g, we n e e d to re ver s e the e n c o di n g pr o c e s s . We c an ac h i e ve thi s by s u btra c ti n g 5 from t h e A S CII val u e of e ac h c h ar acte r an d w ra pp i n g aro u n d if t h e re s u l t is l e s s tha n t h e A S CI I val u e of 'a ' . ", " Param e te rs " : " Non e " , " Re as o n " : " T h i s al g o r i thm di re c tl y re ver s e s the e n c o di n g pr o c e s s by u n d o i n g t h e s h i ft of 5 p o s i ti o n s in t h e al ph abet . It e n s u re s that the d ec o d ed s tr i n g m at c h e s the o rig i n al i n pu t . " }, { " A l go rit h m " : " T o dec o de the s h i ft e d s tr i n g, we c a n m o di fy th e e x i s tin g c o de by add i n g 5 to t h e A S CII val u e of e ach c h arac ter an d w ra pp i n g ar o u n d if the re s u l t e x c e e ds the A S CII v al u e of 'z ' . ", " Param e te rs " : " Non e " , " Re as o n " : " T h i s al g o r i thm di re c tl y re ver s e s the e n c o di n g pr o c e s s by u n d o i n g t h e s h i ft of 5 p o s i ti o n s in t h e al ph abet . It e n s u re s that the d ec o d ed s tr i n g m at c h e s the o rig i n al i n pu t . " }, { " A l go rit h m " : " T o de c o de th e s h i ft e d s tr i n g, we c an m o d i fy t h e e x i stin g c o de by s u btra c ti n g 5 from t h e A S CII val u e of e ach c h ar acte r an d w ra pp i n g ar o u n d if the re s u l t is l e s s th an the A S CII val u e of 'a ' . ", " Param e te rs " : " Non e " , " Re as o n " : " T h i s al g o r i thm di re c tl y re ver s e s the e n c o di n g pr o c e s s by u n d o i n g t h e s h i ft of 5 p o s i ti o n s in t h e al ph abet . It e n s u re s that the d ec o d ed s tr i n g m at c h e s the o rig i n al i n pu t . " } } . " ], " O u tputFo rm at " : [ " A l go rith m " : " T h e s e l e c te d al go rith m or m e tho d . " " Param e te rs " : " T h e c h o s e n param e te rs . " ], " Re m e m ber " : " S tr i c tly fol l o w th e J S O N O u tputFo rm at . A vo i d add i tio n al ex p l an ati o n s or te x t o u tsid e the form at . " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = { " A l go rit h m " : " T o dec o de the s h i ft e d s tr i n g, we n e e d to re ver s e the e n c o di n g pr o c e s s . We c an ac h i e ve thi s by s u btra c ti n g 5 from t h e A S CII val u e of e ac h c h ar acte r an d w ra pp i n g aro u n d if t h e re s u l t is l e s s tha n t h e A S CI I val u e of 'a ' . ", " Param e te rs " : " Non e " } Figure 33: An example of Discriminating Agent. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = U s e r= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = " I n s tr u c tio n s " : [ " Re fi n e t h e e x i s ti n g c o de to m at c h t h e e x pe c te d o u t p u t u s i n g t h e al g o rit h m /me th o d : " T o dec o de the s h i ft e d s tr i n g, we n e e d to re ver s e th e e n c o di n g pro c e s s . We c a n a c h i e ve t h i s by s u btra c ti n g 5 from t h e A S CII val u e of e ach c h ar acte r an d w ra pp i n g ar o u n d if the re s u l t is l e s s than the A S CII v al u e of ' a' . ", and par am e t e rs : " None" } pro vi ded by the di s c rimin at o r for i m pro ved per form an c e . ", " T ar get pro bl e m : def e n c o de_ s h i ft (s : s tr ) : """ re turns e n c o ded s tr i n g by sh i fti n g e ver y c h ar acte r by 5 in the al ph ab e t . """ re t urn "" . j oi n ([ chr ((( o rd ( ch ) + 5 - ord (" a" )) % 26 ) + ord (" a" )) for ch in s]) def deco de_ s h i ft (s : s tr ) : """ ta ke s as i n pu t s tr i n g e n c o ded w i th e n c o de_ s h i ft fun c tio n . Re turns deco ded s tr i n g . """ . ", " Th e p l an is : The provi de d s ol utio n pl an for t h e en c o d e_s h i ft fun ctio n is co rr e ct .. ", " U s i n g the e n tr y po i n t deco de_ s h i ft of the pro bl e m , do n o t add m ai n fun c tio n . " " Han dl e e dg e c as e s s u c h as i n val i d i n pu ts , e m pty val u e s , or bo u n dary c o n di tio n s . " ], " Re m e m ber " : " S ta rt w i th " `` `Co de" a n d e n d w i th " `` `" . Wr i te al l c o de i n t o a s i n gl e . py fi l e . A v o i d e x tr a e x pl a n at i o n , w o rds or """ in you r o u tpu t . " = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = A s s i s ta n t= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = `` `p yt h o n de f e n co de _s h i ft (s : s t r) : re turn "" . j o i n ([ chr ((( o rd ( ch ) + 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s]) def deco de_ s h i ft (s : s tr ) : re turn "" . j o i n ([ chr ((( o rd ( ch ) - 5 - o rd ("a " )) % 26 ) + o rd ("a " )) for ch in s ]) ``` Figure 34: An example of Coding Agent (Reﬁne stage).

SEMAG: Self-Evolutionary Multi-Agent Code Generation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment