Compiling PL/SQL Away

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

“PL/SQL functions are slow,” is common developer wisdom that derives from the tension between set-oriented SQL evaluation and statement-by-statement PL/SQL interpretation. We pursue the radical approach of compiling PL/SQL away, turning interpreted functions into regular subqueries that can then be efficiently evaluated together with their embracing SQL query, avoiding any PL/SQL to SQL context switches. Input PL/SQL functions may exhibit arbitrary control flow. Iteration, in particular, is compiled into SQL-level recursion. RDBMSs across the board reward this compilation effort with significant run time savings that render established developer lore questionable.

💡 Research Summary

The paper tackles the long‑standing performance problem of PL/SQL (and its variants such as PL/pgSQL and T‑SQL) functions, which stems from the costly context switches between the set‑oriented SQL optimizer and the row‑by‑row PL/SQL interpreter. The authors propose a radical solution: completely compile PL/SQL functions away, turning them into pure SQL sub‑queries that can be inlined into the surrounding query, thereby eliminating all Q→f and f→Qᵢ switches.

The transformation pipeline consists of four systematic steps, illustrated using a non‑trivial example function called walk that simulates a robot moving on a grid while accumulating rewards.

SSA Conversion – The original PL/SQL code is rewritten into static single‑assignment (SSA) form. Each variable receives a unique version, and φ‑functions capture values that may come from different control‑flow paths. All high‑level control constructs (FOR, WHILE, LOOP, EXIT, CONTINUE, etc.) are lowered to a minimal set consisting of assignments, conditionals, gotos, and returns. This step preserves the original imperative logic while exposing data‑flow explicitly.
ANF Translation – The SSA program is transformed into functional administrative normal form (ANF). Each label becomes a separate function Lₓ, and every goto Lₓ is replaced by a call to that function, with the current SSA variables passed as arguments (λ‑lifting). The result is a collection of mutually tail‑recursive functions that express the original loops as recursion. Nested let‑bindings and if‑then‑else expressions replace sequential statements, preparing the code for a functional representation.
UDF Flattening – Mutual recursion between the Lₓ functions is eliminated by introducing a discriminator argument fn (defunctionalization). The whole set of functions is collapsed into a single user‑defined function (UDF) that contains a CASE‑WHEN dispatch on fn. The ANF constructs let … in and if … then … else are mapped to SQL’s SELECT and CASE expressions, respectively. This yields a syntactically valid SQL UDF that still contains recursive calls.
Recursive CTE Embedding – The recursive UDF body is recognized as a tail‑recursive pattern and is wrapped inside a WITH RECURSIVE common table expression (CTE). The resulting query Q_f can be inlined wherever the original PL/SQL function was called. Because the entire computation now resides in a single SQL plan, the optimizer generates the plan once, caches it, and reuses it for all iterations, removing the repeated plan generation, cache lookup, and memory allocation overhead that plagued the original implementation.

The authors evaluate the approach on several major RDBMS platforms—PostgreSQL, Microsoft SQL Server, Oracle, and even SQLite (which lacks any PL/SQL support). Using the walk function as a benchmark, they measure the proportion of runtime spent in embedded queries Q₁…Q₃ and the overhead of context switches. In the native PL/pgSQL version, more than half of the total execution time is attributable to Q→f and f→Qᵢ switches, with PostgreSQL spending over 35 % of its time just on these transitions. After compilation to a recursive CTE, the overall runtime drops dramatically, ranging from a 30 % to a 70 % reduction depending on the engine and data size. Notably, SQLite can now execute the same logical algorithm despite having no PL/SQL interpreter, demonstrating the portability benefit.

Beyond raw speedups, the paper discusses auxiliary optimizations that naturally arise during the compilation stages: constant propagation and dead‑code elimination in SSA, removal of redundant let‑bindings in ANF, and tail‑call elimination in the UDF. These optimizations reduce the size and complexity of the generated SQL, allowing the underlying optimizer to choose more efficient execution strategies (e.g., better join ordering, predicate push‑down).

The work challenges the conventional developer wisdom that “avoid PL/SQL functions whenever possible.” By showing that arbitrary PL/SQL, including loops and complex control flow, can be systematically transformed into efficient set‑based SQL, the authors open a new avenue for in‑database programming. They suggest future directions such as building an automated compiler toolchain, extending support to PL/SQL collection types and record structures, and exploring the approach in distributed or cloud‑native database environments.

In summary, the paper presents a complete, practical methodology for eliminating PL/SQL interpretation overhead by compiling functions into recursive SQL, validates the technique with extensive experiments across multiple database systems, and demonstrates both performance gains and increased portability, thereby reshaping the perception of procedural extensions in relational databases.

Compiling PL/SQL Away

💡 Research Summary

Comments & Academic Discussion

Leave a Comment