Title: Mechanized semantics for the Clight subset of the C language
ArXiv ID: 0901.3619
Date: 2009-09-18
Authors: Researchers from original ArXiv paper
📝 Abstract
This article presents the formal semantics of a large subset of the C language called Clight. Clight includes pointer arithmetic, "struct" and "union" types, C loops and structured "switch" statements. Clight is the source language of the CompCert verified compiler. The formal semantics of Clight is a big-step operational semantics that observes both terminating and diverging executions and produces traces of input/output events. The formal semantics of Clight is mechanized using the Coq proof assistant. In addition to the semantics of Clight, this article describes its integration in the CompCert verified compiler and several ways by which the semantics was validated.
💡 Deep Analysis
Deep Dive into Mechanized semantics for the Clight subset of the C language.
This article presents the formal semantics of a large subset of the C language called Clight. Clight includes pointer arithmetic, “struct” and “union” types, C loops and structured “switch” statements. Clight is the source language of the CompCert verified compiler. The formal semantics of Clight is a big-step operational semantics that observes both terminating and diverging executions and produces traces of input/output events. The formal semantics of Clight is mechanized using the Coq proof assistant. In addition to the semantics of Clight, this article describes its integration in the CompCert verified compiler and several ways by which the semantics was validated.
📄 Full Content
Formal semantics of programming languages-that is, the mathematical specification of legal programs and their behaviors-play an important role in several areas of computer science. For advanced programmers and compiler writers, formal semantics provide a more precise alternative to the informal English descriptions that usually pass as language standards. In the context of formal methods such as static analysis, model checking and program proof, formal semantics are required to validate the abstract interpretations and program logics (e.g. axiomatic semantics) used to analyze and reason about programs. The verification of programming tools such as compilers, type-checkers, static analyzers and program verifiers is another area where formal semantics for the languages involved is a prerequisite. While formal semantics for realistic languages can be defined on paper using ordinary mathematics [31,16,7], machine assistance such as the use of proof assistants greatly facilitates their definition and uses.
For high-level programming languages such as Java and functional languages, there exists a sizeable body of mechanized formalizations and verifications of operational semantics, axiomatic semantics, and programming tools such as compilers and bytecode verifiers. Despite being more popular for writing systems software and embedded software, lower-level languages such as C have attracted less interest: several formal semantics for various subsets of C have been published, but only a few have been mechanized.
The present article reports on the definition of the formal semantics of a large subset of the C language called Clight. Clight features most of the types and operators of C, including pointer arithmetic, pointers to functions, and struct and union types, as well as all C control structures except goto. The semantics of Clight is mechanized using the Coq proof assistant [10,4]. It is presented as a big-step operational semantics that observes both terminating and diverging executions and produces traces of input/output events. The Clight subset of C and its semantics are presented in sections 2 and 3, respectively.
The work presented in this paper is part of an ongoing project called CompCert that develops a realistic compiler for the C language and formally verifies that it preserves the semantics of the programs being compiled. A previous paper [6] reports on the development and proof of semantic preservation in Coq of the front-end of this compiler: a translator from Clight to Cminor, a low-level, imperative intermediate language. The formal verification of the back-end of this compiler, which generates moderately optimized PowerPC assembly code from Cminor is described in [28]. Section 4 describes the integration of the Clight language and its semantics within the CompCert compiler and its verification.
Formal semantics for realistic programming languages are large and complicated. This raises the question of validating these semantics: how can we make sure that they correctly capture the expected behaviors? In section 5, we argue that the correctness proof of the CompCert compiler provides an indirect but original way to validate the semantics of Clight, and discuss other approaches to the validation problem that we considered.
We finish this article by a discussion of related work in section 6, followed by future work and conclusions in section 7.
Availability The Coq development underlying this article can be consulted on-line at http://compcert.inria.fr
.
Notations [x, y[ denotes the semi-open interval of integers {n ∈ Z | x ≤ n < y}. For functions returning “option” types, ⌊x⌋ (read: “some x”) corresponds to success with return value x, and / 0 (read: “none”) corresponds to failure. In grammars, a * denotes 0, 1 or several occurrences of syntactic category a, and a ? denotes an optional occurrence of syntactic category a.
Clight is structured into expressions, statements and functions. In the Coq formalization, the abstract syntax is presented as inductive data types, therefore achieving a deep embedding of Clight into Coq.
The abstract syntax of Clight types is given in figure 1. Supported types include arithmetic types (integers and floats in various sizes and signedness), array types, pointer types (including pointers to functions), function types, as well as struct and union types. Named types are omitted: we assume that typedef definitions have been expanded away during parsing and type-checking.
The integral types fully specify the bit size of integers and floats, unlike the C types int, long, etc, whose sizes are left largely unspecified in the C standard. Typically, the parser maps int and long to size I32, float to size F32, and double to size F64. Currently, 64-bit integers and extended-precision floats are not supported.
Array types carry the number n of elements of the array, as a compile-time constant. Arrays with unknown sizes (τ[] in C) are replaced by pointer types in funct