Specialization of Generic Array Accesses After Inlining
We have implemented an optimization that specializes type-generic array accesses after inlining of polymorphic functions in the native-code OCaml compiler. Polymorphic array operations (read and write) in OCaml require runtime type dispatch because of ad hoc memory representations of integer and float arrays. It cannot be removed even after being monomorphized by inlining because the intermediate language is mostly untyped. We therefore extended it with explicit type application like System F (while keeping implicit type abstraction by means of unique identifiers for type variables). Our optimization has achieved up to 21% speed-up of numerical programs.
💡 Research Summary
The paper presents a practical optimization for the OCaml native‑code compiler that specializes generic array accesses after inlining polymorphic functions. In OCaml, integer and floating‑point arrays have different memory layouts, so generic array reads and writes must perform a runtime type check (the “Pgenarray” case). The standard compiler can eliminate this check only when the array’s monomorphic type is known at compile time, but after inlining the intermediate language (lambda) still lacks sufficient type information, so the check remains.
To solve this, the authors extend the intermediate language with explicit type abstractions and applications, borrowing ideas from System F. They replace the generic array kind marker with a type‑variable identifier (Ptvar of int) and introduce a new construct Lspecialized. This construct pairs a polymorphic function with a mapping from its type variables to concrete array kinds (e.g., {’a → I} for integer arrays, {’a → F} for float arrays). Function definitions are annotated with implicit type parameters ({'a}), and each call site supplies the appropriate mapping. During inlining, the compiler substitutes the type variable according to the mapping, turning a generic access like a.{Pgenarray}(0) into a concrete int_array.{I}(0) or float_array.{F}(0). Consequently, the runtime dispatch code disappears and the access can be compiled as a direct load/store.
A subtle issue arises from OCaml’s separate compilation model: the interface file (.cmi) and the implementation file (.cmx) assign different identifiers to the same type variable. The authors maintain a global renaming table at import time, ensuring that the identifiers used in the inlined body match those in the interface, thus preserving consistency across modules.
The implementation modifies only about a thousand lines of the OCaml compiler source. After the transformation, any polymorphic function that is fully inlined becomes monomorphic with respect to array kinds, and the generated code no longer contains the generic‑array branch. Functions that cannot be fully inlined (e.g., due to closure sharing or recursion) remain generic.
Experimental evaluation was performed on OCaml 4.02 with a suite of numerical benchmarks: Simple, Random, DKA, FFT, K‑means, LD, LU, NN, and QR. All programs were compiled with aggressive inlining (-inline 10000000) and without unsafe optimizations except where needed. Results show speed‑ups ranging from 0 % (FFT, LU) to 21 % (QR). The “Simple” and “Random” programs, which consist entirely of polymorphic array accesses, benefit the most (5 % and 15 % respectively). Programs where most work is floating‑point arithmetic (DKA, LD) see modest gains because the dominant cost lies elsewhere. The QR benchmark, which heavily uses higher‑order array functions (Array.map, Array.fold_left), achieves the maximum 21 % improvement due to near‑complete specialization.
Related work includes fully typed intermediate languages such as λMLi, TIL, and FLINT, which enable type‑directed optimizations but require extensive changes to the compiler. The authors’ approach is lightweight, targeting only the array‑kind information needed for this specific optimization, and integrates cleanly with the existing OCaml pipeline.
In conclusion, the authors demonstrate that a modest extension of the intermediate language—adding explicit type applications and a mapping mechanism—allows the OCaml compiler to eliminate generic array dispatch after inlining, yielding measurable performance gains on numerical code. Future work includes adopting the upcoming flamBda intermediate language to overcome current limitations (closure sharing, lack of recursion specialization) and extending the technique to other polymorphic operations such as generic comparisons and local variable unboxing.
Comments & Academic Discussion
Loading comments...
Leave a Comment