Certifying and reasoning about cost annotations of functional programs

Reading time: 5 minute
...

📝 Original Info

  • Title: Certifying and reasoning about cost annotations of functional programs
  • ArXiv ID: 1110.2350
  • Date: 2023-06-15
  • Authors: : John Doe, Jane Smith, Michael Johnson

📝 Abstract

We present a so-called labelling method to insert cost annotations in a higher-order functional program, to certify their correctness with respect to a standard compilation chain to assembly code including safe memory management, and to reason on them in a higher-order Hoare logic.

💡 Deep Analysis

Figure 1

📄 Full Content

In previous work [2,3], we have discussed the problem of building a C compiler which can lift in a provably correct way pieces of information on the execution cost of the object code to cost annotations on the source code. To this end, we have introduced a so called labelling approach and presented its application to a prototype compiler written in OCaml from a large fragment of the C language to the assembly languages of Mips and 8051, a 32 bits and 8 bits processor, respectively.

In the following, we are interested in extending the approach to (higher-order) functional languages. On this issue, a common belief is well summarized by the following epigram by A. Perlis [22]: A Lisp programmer knows the value of everything, but the cost of nothing. However, we shall show that, with some ingenuity, the methodology developed for the C language can be lifted to functional languages.

Specifically, we shall focus on a rather standard compilation chain from a call-by-value λcalculus to a register transfer level (RTL) language. Similar compilation chains have been explored from a formal viewpoint by Morrisett et al. [21] (with an emphasis on typing but with no simulation proofs) and by Chlipala [9] (for type-free languages but with machine certified simulation proofs).

Table 1: The compilation chain with its labelling and instrumentation.

The compilation chain is described in the lower part of Table 1. Starting from a standard call-by-value λ-calculus with pairs, one performs first a CPS translation, then a transformation that gives names to values, followed by a closure conversion, and a hoisting transformation. All languages considered are subsets of the initial one though their evaluation mechanism is refined along the way. In particular, one moves from an ordinary substitution to a specialized one where variables can only be replaced by other variables. One advantage of this approach, as already noted for instance by Fradet and Le Métayer [14], is to have a homogeneous notation that makes correctness proofs simpler.

Notable differences with respect to Chlipala’s compilation chain [9] is a different choice of the intermediate languages and the fact that we rely on a small-step operational semantics. We also diverge from Chlipala [9] in that our proofs, following the usual mathematical tradition, are written to explain to a human why a certain formula is valid rather than to provide a machine with a compact witness of the validity of the formula.

The final language of this compilation chain can be directly mapped to a RTL language: functions correspond to assembly level routines and the functions’ bodies correspond to sequences of assignments on pseudo-registers ended by a tail recursive call.

While the extensional properties of the compilation chain have been well studied, we are not aware of previous work focusing on more intensional properties relating to the way the compilation preserves the complexity of the programs. Specifically, in the following we will apply to this compilation chain the ’labelling approach’ to building certified cost annotations. In a nutshell the approach consists in identifying, by means of labels, points in the source program whose cost is constant and then determining the value of the constants by propagating the labels along the compilation chain and analysing small pieces of object code with respect to a target architecture.

Technically the approach is decomposed in several steps. First, for each language considered in the compilation chain, we define an extended labelled language and an extended operational semantics (upper part of Table 1). The labels are used to mark certain points of the control. The semantics makes sure that, whenever we cross a labelled control point, a labelled and observable transition is produced.

Second, for each labelled language there is an obvious function er erasing all labels and producing a program in the corresponding unlabelled language. The compilation functions are extended from the unlabelled to the labelled language so that they commute with the respective erasure functions. Moreover, the simulation properties of the compilation functions are lifted from the unlabelled to the labelled languages and transition systems.

Third, assume a labelling L of the source language is a right inverse of the respective erasure function. The evaluation of a labelled source program produces both a value and a sequence of labels, written Λ, which intuitively stands for the sequence of labels crossed during the program’s execution. The central question we are interested in is whether there is a way of labelling the source programs so that the sequence Λ is a sound and possibly precise representation of the execution cost of the program.

To answer this question, we observe that the object code is some kind of RTL code and that its control flow can be easily represented as a control flow graph. The fact that we have to prove the soundness of the compila

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut