SPELL: Synthesis of Programmatic Edits using LLMs
Library migration is a common but error-prone task in software development. Developers may need to replace one library with another due to reasons like changing requirements or licensing changes. Migration typically entails updating and rewriting source code manually. While automated migration tools exist, most rely on mining examples from real-world projects that have already undergone similar migrations. However, these data are scarce, and collecting them for arbitrary pairs of libraries is difficult. Moreover, these migration tools often miss out on leveraging modern code transformation infrastructure. In this paper, we present a new approach to automated API migration that sidesteps the limitations described above. Instead of relying on existing migration data or using LLMs directly for transformation, we use LLMs to extract migration examples. Next, we use an Agent to generalize those examples to reusable transformation scripts in PolyglotPiranha, a modern code transformation tool. Our method distills latent migration knowledge from LLMs into structured, testable, and repeatable migration logic, without requiring preexisting corpora or manual engineering effort. Experimental results across Python libraries show that our system can generate diverse migration examples and synthesize transformation scripts that generalize to real-world codebases.
💡 Research Summary
The paper introduces Spell, a novel system for automating library API migrations without relying on existing migration corpora. Spell treats large language models (LLMs) not as direct code translators but as knowledge sources from which it extracts concrete, test‑validated migration examples. The workflow consists of two main phases.
Phase 1 – Data Distillation: Given a source‑target library pair (S, T), Spell first prompts an LLM to brainstorm a set of realistic use cases that could be implemented with either library. For each use case, the model generates multiple implementations using S, a corresponding test suite, and then a migration of the implementation to T. The test suite is run on both the original and migrated code; only triples that pass all tests are retained. Additional filters enforce code‑coverage thresholds, API‑usage sanity checks, and test‑quality metrics. This process yields a high‑quality dataset of 870 migration triples across ten Python migration tasks.
Phase 2 – Script Synthesis: The validated triples are fed into an anti‑unification engine that extracts syntactic differences and abstracts them into generic match‑replace patterns. These patterns are expressed as rules in the PolyglotPiranha domain‑specific language (DSL), which supports template variables, labeled edges, and cascading transformations. An “agentic loop” then iteratively refines the rule set: the current script is applied to real code, test results are fed back, and the agent modifies, composes, or adds rules to resolve failures. The final output is a project‑agnostic PolyglotPiranha script that can be version‑controlled, inspected, and integrated into CI/CD pipelines.
Evaluation: Spell was evaluated on nine Python library migrations (e.g., cryptography → pycryptodome, pandas → polars). On average, Spell generated 87 validated examples per task and succeeded in synthesizing a correct migration script in a single trial for 61.6 % of the tasks. Compared to the state‑of‑the‑art tool Melt, Spell outperformed on every benchmark. The generated scripts were also applied to a collection of open‑source repositories, achieving a 92 % success rate in transforming real code. Notably, the entire LLM usage cost was kept below $100 per experiment, demonstrating cost‑effectiveness even with small models (e.g., GPT‑4o‑mini).
Contributions:
- A data‑distillation pipeline that extracts and validates migration knowledge from LLMs without any historical migration logs.
- A hybrid synthesis approach combining classic anti‑unification with an agentic refinement loop to produce reusable DSL scripts.
- Release of a 870‑example dataset of migration triples with tests, publicly available.
- An open‑source implementation of Spell, including the synthesis engine, dataset, and evaluation harness.
Limitations and Future Work: The current study focuses on Python; extending to other languages (Java, JavaScript, etc.) will require language‑specific prompt engineering and DSL support. Complex stateful or asynchronous APIs may not be captured fully by simple match‑replace rules, suggesting a need for richer semantic analyses. Finally, while the system automates validation, incorporating a human‑in‑the‑loop for edge‑case review could further improve reliability.
In summary, Spell demonstrates that latent API‑mapping knowledge embedded in LLMs can be systematically distilled into high‑quality, test‑driven examples and then transformed into maintainable, DSL‑based migration scripts. This bridges the gap between the breadth of LLM knowledge and the precision required for production‑grade code transformations, offering a practical pathway for developers and library maintainers to automate migrations at scale.
Comments & Academic Discussion
Loading comments...
Leave a Comment