Scylla: Translating an Applicative Subset of C to Safe Rust

Scylla: Translating an Applicative Subset of C to Safe Rust
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The popularity of the Rust language continues to explode; yet, many critical codebases remain authored in C. Automatically translating C to Rust is thus an appealing course of action. Several works have gone down this path, handling an ever-increasing subset of C through a variety of Rust features, such as unsafe. While the prospect of automation is appealing, producing code that relies on unsafe negates the memory safety guarantees offered by Rust, and therefore the main advantages of porting existing codebases to memory-safe languages. We instead advocate for a different approach, where the programmer iterates on the original C, gradually making the code more structured until it becomes eligible for compilation to safe Rust. This means that redesigns and rewrites can be evaluated incrementally for performance and correctness against existing test suites and production environments. Compiling structured C to safe Rust relies on the following contributions: a type-directed translation from (a subset of) C to safe Rust; a novel static analysis based on “split trees” which allows expressing C’s pointer arithmetic using Rust’s slices and splitting operations; an analysis that infers which borrows need to be mutable; and a compilation strategy for C pointer types that is compatible with Rust’s distinction between non-owned and owned allocations. We evaluate our approach on real-world cryptographic libraries, binary parsers and serializers, and a file compression library. We show that these can be rewritten to Rust with small refactors of the original C code, and that the resulting Rust code exhibits similar performance characteristics as the original C code. As part of our translation process, we also identify and report undefined behaviors in the bzip2 compression library and in Microsoft’s implementation of the FrodoKEM cryptographic primitive.


💡 Research Summary

The paper presents Scylla, a system that enables the incremental migration of existing C codebases to safe Rust without ever generating unsafe Rust code. Rather than attempting to translate the full C language in one step, Scylla encourages developers to refactor their C programs gradually until they fall within an “applicative” subset that can be compiled directly to safe Rust. The workflow consists of three main stages: (1) parsing the original C source with Clang and converting the typed AST into a new intermediate language called Mini‑C, (2) applying a type‑directed translation from Mini‑C to Rust, and (3) performing a series of post‑translation analyses that infer mutability, ownership, and high‑level abstractions.

Mini‑C is an expression‑only language that mirrors C’s control flow and pointer operations but eliminates all “surprises”: every integer has a fixed width, implicit promotions and conversions are made explicit via casts, void* is disallowed, and heap allocation is expressed as malloc(t, n) where the element type and count are known. By normalising these aspects, Mini‑C provides a clean substrate for a sound translation to Rust’s strict type and ownership system.

The core of the translation is a novel split‑tree analysis. Split trees model pointer arithmetic as a hierarchy of slice‑splitting operations. For example, an expression like p + i is turned into slice.split_at(i), and more complex multi‑dimensional indexing becomes a sequence of split_at calls. The analysis statically guarantees that each split stays within the bounds of a single allocation, thereby satisfying Rust’s borrow checker without resorting to raw pointers.

Mutability inference is performed by analysing data‑flow: any write through a pointer forces the corresponding Rust reference to be &mut, while read‑only uses become &. This yields the minimal set of mutable borrows needed for correctness. Ownership handling distinguishes between memory that originates from malloc (mapped to owned containers such as `Box<


Comments & Academic Discussion

Loading comments...

Leave a Comment