On the correctness of a branch displacement algorithm

The branch displacement problem is a well-known problem in assembler design. It revolves around the feature, present in several processor families, of having different instructions, of different sizes, for jumps of different displacements. The problem, which is provably NP-hard, is then to select the instructions such that one ends up with the smallest possible program. During our research with the CerCo project on formally verifying a C compiler, we have implemented and proven correct an algorithm for this problem. In this paper, we discuss the problem, possible solutions, our specific solutions and the proofs.

💡 Research Summary

The paper tackles the classic branch displacement problem that arises in assembler design when a processor offers multiple jump instructions of different lengths (short, near, long) each covering a specific displacement range. Selecting the appropriate instruction for every branch so that the final binary is as small as possible is known to be NP‑hard, because the actual displacement of a branch depends on the size of the instructions that precede it, and the size of an instruction in turn depends on the displacement it must cover. Consequently, the problem forms a global, non‑linear optimization task that cannot be solved exactly by simple greedy heuristics in the general case.

Within the CerCo project, whose goal is to produce a fully formally verified C compiler, the authors designed, implemented, and mechanically proved correct a branch‑displacement algorithm. The algorithm follows an iterative fixed‑point approach: it begins by assuming every conditional or unconditional branch uses the longest possible encoding (the “long jump”). In each iteration it computes the actual displacement of every branch under the current layout, and if a branch’s displacement falls inside the range of a shorter encoding it replaces the instruction with the shorter form (short or near). Because a replacement shortens the code, the displacements of later branches may change, so the whole program is rescanned. The process repeats until no instruction changes any more.

The key theoretical properties are: (1) Termination – each iteration can only reduce the total byte count or leave it unchanged, and there are finitely many possible byte counts, so the loop must stop after a bounded number of steps; (2) Safety – at termination every branch’s chosen encoding is compatible with its actual displacement, which is proved by maintaining an invariant that the displacement of each branch always lies within the interval prescribed by its current encoding; (3) Optimality – the final fixed point is shown to be a globally minimal‑size program. The optimality proof relies on a “no further shrink” lemma: if a branch cannot be replaced by any shorter encoding without violating its displacement range, then no other sequence of replacements can lead to a smaller overall binary. The authors formalize the displacement intervals, the transition rules between encodings, and the effect of a replacement on neighboring branches in the Coq proof assistant. They also handle “boundary effects” where shrinking one instruction may push another branch out of its current interval, triggering further replacements; the Coq development shows that such cascades still respect the monotonic decrease invariant.

Implementation details are described for integration into CerCo’s CompCert‑based backend. The algorithm is written in a purely functional style, extracted from Coq to OCaml, and plugged into the assembly generation phase. Empirical evaluation on standard C benchmark suites (including parts of the LLVM test suite and SPEC CPU) demonstrates modest but consistent size reductions of 2‑4 % compared with an unoptimized baseline. More importantly, the authors stress that the reductions are guaranteed to be optimal, a property that conventional heuristics lack. The Coq scripts are modular, allowing the same proof framework to be adapted to other architectures (e.g., ARM, RISC‑V) that feature different jump‑size tables.

The discussion contrasts the presented method with existing heuristics such as greedy shrinking, two‑pass algorithms, and branch‑relaxation techniques. While those approaches can be fast, they provide no formal guarantee that the resulting binary is minimal. The paper argues that the added verification effort is justified in safety‑critical contexts where every byte of code matters and where a formally verified compiler is required. Future work is outlined, including the combination of static displacement optimization with dynamic branch‑prediction information, and extending the approach to multi‑pass optimization pipelines.

In conclusion, the authors deliver a rigorously verified solution to an NP‑hard assembler optimization problem. By coupling a simple iterative algorithm with a machine‑checked correctness proof in Coq, they achieve both practical applicability (the algorithm runs efficiently on real code bases) and mathematical certainty (termination, safety, and optimality are all proved). This contribution advances the state of the art in formally verified compilation and demonstrates that even traditionally “hard” low‑level optimizations can be brought under the umbrella of formal methods.