Structural abstract interpretation, A formal study using Coq

Strutural abstrat in terpretation A formal study using Co q Y v es Bertot ⋆ INRIA Sophia-Méditerranée Abstrat. Abstrat in terpreters are to ols to ompute appro ximations for b eha viors of a program. These appro ximations an then b e used for optimisation or for error detetion. In this pap er, w e sho w ho w to desrib e an abstrat in terpreter using the t yp e-theory based theorem pro v er Co q, using indutiv e t yp es for syn tax and strutural reursiv e programming for the abstrat in terpreter's k ernel. The abstrat in terpreter an then b e pro v ed orret with resp et to a Hoare logi for the programming language. 1 In tro dution Higher-order logi theorem pro v ers pro vide a desription language that is p o w- erful enough to desrib e programming languages. Indutiv e t yp es an b e used to desrib e the language's main data struture (the syn tax) and reursiv e funtions an b e used to desrib e the b eha vior of instrutions (the seman tis). Reursiv e funtions an also b e used to desrib e to ols to analyse or mo dify programs. In this pap er, w e will desrib e su h a olletion of reursiv e funtion to analyse programs, based on abstrat in terpretation [7 ℄. 1.1 An example of abstrat in terpretation W e onsider a small programming language with lo op statemen ts and assign- men ts. Lo ops are written with the k eyw ords while , do and done , assignmen ts are written with := , and sev eral instrutions an b e group ed together, separat- ing them with a semi-olumn. The instrutions group ed using a semi-olumn are supp osed to b e exeuted in the same order as they are written. Commen ts are written after t w o slashes // . W e onsider the follo wing simple program: x:= 0; // line 1 While x < 1000 do // line 2 x := x + 1 // line 3 done // line 4 ⋆ This w ork w as partially supp orted b y ANR on trat Comp ert, ANR-05-SSIA-0019. W e w an t to design a to ol that is able to gather information ab out the v alue of the v ariable x at ea h p osition in the program. F or instane here, w e kno w that after exeuting the rst line, x is alw a ys in the in terv al [0,0℄; w e kno w that b efore exeuting the assignmen t on the third line, x is alw a ys smaller than 10 (b eause the test x < 10 w as just satised). With a little thinking, w e an also guess that x inreases as the lo op exeutes, so that w e an infer that b efore the third line, x is alw a ys in the in terv al [0,9℄. On the other hand, after the third line, x is alw a ys in the in terv al [1, 10℄. No w, if exeution exits the lo op, w e an also infer that the test x < 10 failed, so that w e kno w that x is larger than or equal to 10 , but sine it w as at b est in [0,10℄ b efore the test, w e an guess that x is exatly 10 after exeuting the program. So w e an write the follo wing new program, where the only dierene is the information added in the ommen ts: // Nothing is known about x on this line x := 0; // 0 <= x <= 0 while x < 10 do // 0 <= x <= 9 x := x + 1 // 1 <= x <= 10 done // 10 <= x <= 10 W e w an t to pro due a to ol that p erforms this analysis and pro dues the same kind of information for ea h line in the program. Our to ol will do sligh tly more: rst it will also b e able to tak e as input extra information ab out v ariables b efore en tering the program, seond it will pro due information ab out v ariables after exeuting the program, third it will asso iate an invariant prop ert y to all while lo ops in the program. Su h an in v arian t is a prop ert y that is true b efor e and after all exeutions of the lo op b o dy (in our example the lo op b o dy is x := x+1 ). A fourth feature of our to ol is that it will b e able to detet o asions when w e an b e sure that some o de is nev er exeuted. In this ase, it will mark the program p oin ts that are nev er rea hed with a false statemen t meaning when this p oin t of the program is rea hed, the false statemen t an b e pro v ed (in other w ords, this annot happ en). Our to ol will also b e designed in su h a w a y that it is guaran teed to terminate in reasonable time. Su h a to ol is alled a stati analysis to ol, b eause the extra information an b e obtained without running the program: in this example, exeuting the program requires at least a thousand op erations, but our reasoning eort tak es less than ten steps. T o ols of this kind are useful, for example to a v oid bugs in programs or as part of eien t ompilation te hniques. F or instane, the rst mail-spread virus exploited a programming error kno wn as a buer o v ero w (an arra y up date w as op erating outside the memory allo ated for that arra y), but buer o v ero ws an b e deteted if w e kno w o v er whi h in terv al ea h v ariable is lik ely to range. 1.2 F ormal desription and pro ofs Users should b e able to trust the information added in programs b y the analy- sers. Program analysers are themselv es programs and w e an reason ab out their orretness. The program analysers w e study in this pap er are based on abstrat in terpretation [7℄ and w e use the Co q system [ 13 ,3℄ to reason on its orretness. The dev elopmen t desrib ed in this pap er is a v ailable on the net at the follo wing address (there are t w o v ersions, ompatible with the latest stable release of Co q V8.1pl3 and with the up oming v ersion V8.2). http://hal.inria .fr /i nr ia- 00 32 957 2 This pap er has 7 more setions. Setion 2 giv es a rough in tro dution to the no- tion of abstrat in terpretation. Setion 3 desrib es the programming language that is used as our pla yground. The seman tis of this programming language is desrib ed using a w eak est pre-ondition alulus. This w eak est pre-ondition alulus is later used to argue on the orretness of abstrat in terpreters. In par- tiular, abstrat in terpretation returns an annotated instrution and an abstrat state, where the abstrat state is used as a p ost-ondition and the annotations in the instrution desrib e the abstrat state at the orresp onding p oin t in the program. Setion 4 desrib es a rst simple abstrat in terpreter, where the main ideas around abstratly in terpreting assignmen ts and sequenes are o v ered, but while lo ops are not treated. In Setion 4, w e also sho w that the abstrat in- terpreter an b e formally pro v ed orret. In Setion 5, w e address while lo ops in more detail and in partiular w e sho w ho w tests an b e handled in abstrat in terpretation, with appliations to dead-o de elimination. In Setion 6, w e ob- serv e that abstrat in terpretation is a general metho d that an b e applied to a v ariet y of abstrat domains and w e reapitulate the t yp es, funtions, and prop- erties that are exp eted from ea h abstrat domain. In Setion 7, w e sho w ho w the main abstrat in terpreter an b e instan tiated for a domain of in terv als, th us making the analysis presen ted in the in tro dution p ossible. In Setion 8, w e giv e a few onluding remarks. 2 An in tuitiv e view of abstrat in terpretation Abstrat in terpretation is a te hnique for the stati analysis of programs. The ob jetiv e is to obtain a to ol that will tak e programs as data, p erform some sym b oli omputation, and return information ab out all exeutions of the in- put programs. One imp ortan t asp et is that this to ol should alw a ys terminate (hene the adjetiv e stati ). The to ol an then b e used either diretly to pro vide information ab out prop erties of v ariables in the program (as in the Astree to ol [8℄), or as part of a ompiler, where it an b e used to guide optimization. F or instane, the kind of in terv al-based analysis that w e desrib e in this pap er an b e used to a v oid run time arra y-b ound  he king in languages that imp ose this kind of disipline lik e Ja v a. The en tral idea of abstrat in terpretation is to replae the v alues normally manipulated in a program b y sets of v alues, in su h a w a y that all op erations still mak e sense. F or instane, if a program manipulates in teger v alues and p erforms additions, w e an deide to tak e an abstrat p oin t of view and only onsider whether v alues are o dd or ev en. With resp et to addition, w e an still obtain meaningful results, b eause w e kno w, for instane, that adding an ev en and an o dd v alue returns an o dd v alue. Th us, w e an deide to run programs with v alues tak en in a new t yp e that on tains v alues even and odd , with an addition that resp ets the follo wing table: odd + even = odd even + odd = odd odd + odd = even even + even = even . When dening abstrat in terpretation for a giv en abstrat domain, all op er- ations m ust b e up dated aordingly . The b eha vior of on trol instrutions is also mo died, b eause abstrat v alues ma y not b e preise enough to deide ho w a giv en deision should b e tak en. F or instane, if w e kno w that the abstrat v alue for a v ariable x is odd , then w e annot tell whi h bran h of a onditional statemen t of the follo wing form will b e tak en: if x < 10 then x := 0 else x := 1. After the exeution of this onditional statemen t, the abstrat v alue for x annot b e odd or even . This example also sho ws that the domain of abstrat v alues m ust on tain an abstrat v alue that represen ts the whole set of v alues, or said dieren tly , an abstrat v alue that represen ts the absene of kno wledge. This v alue will b e alled top later in the pap er. There m ust exist a onnetion b et w een abstrat v alues and onrete v alues for abstrat in terpretation to w ork w ell. This onnetion has b een studied sine [ 7℄ and is kno wn as a Galois onnetion. F or instane, if the abstrat v alues are even , odd , and top , and if w e an infer that a v alue is in {1,2}, then orret  hoies for the abstrat v alue are top or even , but ob viously the abstrat in terpreter will w ork b etter if the more preise even is  hosen. F ormal pro ofs of orretness for abstrat in terpretation w ere already studied b efore, in partiular in [11 ℄. The approa h tak en in this pap er is dieren t, in that it follo ws diretly the syn tax of a simple strutured programming language, while traditional desriptions are tuned to studying a on trol-o w graph language. The main adv an tage of our approa h is that it supp orts a v ery onise desription of the abstrat in terpreter, with v ery simple v eriations that it is terminating. 3 The programming language In this ase study , w e w ork with a v ery small language on taining only assign- men ts, sequenes, and while lo ops. The righ t-hand sides for assignmen ts are expressions made of n umerals, v ariables, and addition. The syn tax of the pro- gramming language is as follo ws:  v ariable names are noted x , y , x 1 , x ′ , et.  in tegers are noted n , n 1 , n ′ , et.  Arithmeti expressions are noted e , e 1 , e ′ , et. F or our ase study , these expressions an only tak e three forms: e ::= n | x | e 1 + e 2  b o olean expressions are noted b , b 1 , b ′ , et. F or our ase study , these expres- sions an only tak e one form: b ::= e 1 < e 2  instrutions are noted i , i 1 , i ′ , et. F or our ase study , these instrutions an only tak e three forms: i ::= x := e | i 1 ; i 2 | while b do i done F or the Co q eno ding, w e use pre-dened strings for v ariable names and in te- gers for the n umeri v alues. Th us, w e use un b ounded in tegers, whi h is on trary to usual programming languages, but the question of using b ounded in tegers or not is irrelev an t for the purp ose of this example. 3.1 Eno ding the language In our Co q eno ding, the desription of the v arious kinds of syn tati omp onen ts is giv en b y indutiv e delarations. Require Import String ZArith List. Open Sope Z_sope. Indutive aexpr : Type := anum (x:Z) | avar (s:string) | aplus (e1 e2:aexpr). Indutive bexpr : Type := blt (e1 e2 : aexpr). Indutive instr : Type := assign (x:string)(e:exp r) | seq (i1 i2:instr) | while (b:bexpr)(i:ins tr ). The rst t w o lines instrut Co q to load pre-dened libraries and to tune the parsing me hanism so that arithmeti form ulas will b e understo o d as form ulas onerning in tegers b y default. The denition for aexpr states that expressions an only ha v e the three forms anum , avar , and aplus , it also expresses that the names anum , avar , and aplus an b e used as funtion of t yp e, Z -> aexpr , string -> aexpr , and aexpr -> aexpr -> aexpr , resp etiv ely . The denition of aexpr as an indutiv e t yp e also implies that w e an write reursiv e funtions on this t yp e. F or instane, w e will use the follo wing funtion to ev aluate an arithmeti expression, giv en a valuation funtion g , whi h maps ev ery v ariable name to an in teger v alue. Fixpoint af (g:string->Z)(e: ae xp r) : Z := math e with anum n => n | avar x => g x | aplus e1 e2 => af g e1 + af g e2 end. This funtion is dened b y pattern-mat hing. There is one pattern for ea h p ossible form of arithmeti expression. The third line indiates that when the input e has the form anum n , then the v alue n is the result. The fourth line indiates that when the input has the form avar x , then the v alue is obtained b y applying the funtion g to x . The fth line desrib es the omputation that is done when the expression is an addition. There are t w o reursiv e alls to the funtion af in the expression returned for the addition pattern. The reursiv e alls are made on diret subterms of the initial instrution, this is kno wn as strutur al r e ursion and guaran tees that the reursiv e funtion will terminate on all inputs. A similar funtion bf is dened to desrib e the b o olean v alue of a b o olean expression. 3.2 The seman tis of the programming language T o desrib e the seman tis of the programming language, w e simply giv e a w eak- est pre-ondition alulus [9 ℄. W e desrib e the onditions that are neessary to ensure that a giv en logial prop ert y is satised at the end of the exeution of an instrution, when this exeution terminates. This w eak est pre-ondition alulus is dened as a pair of funtions whose input is an instrution annotated with logial information at v arious p oin ts in the instrution. The output of the rst funtion all p is a ondition that should b e satised b y the v ariables at the b eginning of the exeution (this is the pre-ondition and it should b e as easy to satisfy as p ossible, hene the adjetiv e w eak est ); the output of the seond funtion, alled v , is a olletion of logial statemen ts. When these statemen ts are v alid, w e kno w that ev ery exeution starting from a state that satises the pre-ondition will mak e the logial annotation satised at ev ery p oin t in the program and mak e the p ost-ondition satised if the exeution terminates. annotating programs W e need to dene a new data-t yp e for instrutions annotated with assertions at v arious lo ations. Ea h assertion is a quan tier- free logial form ula where the v ariables of the program an o ur. The in tended meaning is that the form ula is guaran teed to hold for ev ery exeution of the program that is onsisten t with the initial assertion. The syn tax for assertions is desrib ed as follo ws: Indutive assert : Type := pred (p:string)(l:lis t aexpr) | a_b (b:bexpr) | a_onj (a1 a2:assert) | a_not (a: assert) | a_true | a_false. This denition states that assertions an ha v e six forms: the rst form repre- sen ts the appliation of a prediate to an arbitrary list of arithmeti expressions, the seond represen ts a b o olean test: this assertion holds when the b o olean test ev aluates to true , the third form is the onjuntion of t w o assertions, the fourth form is the negation of an assertion, the fth and sixth forms giv e t w o onstan t assertions, whi h are alw a ys and nev er satised, resp etiv ely . In a minimal de- sription of a w eak est pre-ondition alulus, as in [2 ℄, the last t w o onstan ts are not neessary , but they will b e useful in our desription of the abstrat in terpreter. Logial annotations pla y a en tral role in our ase study , b eause the result of abstrat in terpretation will b e to add information ab out ea h p oin t in the program: this new information will b e desrib ed b y assertions. T o onsider whether an assertion holds, w e need to kno w what meaning is atta hed to ea h prediate name and what v alue is atta hed to ea h v ariable name. W e supp ose the meaning of prediates is giv en b y a funtion m that maps prediate names and list of in tegers to prop ositional v alues and the v alue of v ariables is giv en b y a v aluation as in the funtion af giv en ab o v e. Giv en su h a meaning for prediates and su h a v aluation funtion for v ariables, w e desrib e the omputation of the prop ert y asso iated to an assertion as follo ws: Fixpoint ia (m:string->list Z->Prop)(g:strin g-> Z) (a:assert) : Prop := math a with pred s l => m s (map (af g) l) | a_b b => bf g b = true | a_onj a1 a2 => (ia m g a1) /\ (ia m g a2) | a_not a => not (ia m g a) | a_true => True | a_false => False end. The t yp e of this funtion exhibits a sp eiit y of t yp e theory-based theorem pro ving: prop ositions are desrib ed b y typ es . The Co q system also pro vides a t yp e of t yp es, named Prop , whose elemen ts are the t yp es that are in tended to b e used as prop ositions. Ea h of these t yp es on tains the pro ofs of the prop osition they represen t. This is kno wn as the Curry-Howar d isomorphism . F or instane, the prop ositions that are unpro v able are represen ted b y empt y t yp es. Here, as- sertions are data, their in terpretation as prop ositions are t yp es, whi h b elongs to the Prop t yp e. More details ab out this desription of prop ositions as t yp es is giv en in another artile on t yp e theory in the same v olume. Annotated instrutions are in a new data-t yp e, named a_instr , whi h is v ery lose to the instr data-t yp e. The t w o mo diations are as follo ws: rst an extra op erator pre is added to mak e it p ossible to atta h assertions to an y instrution, seond while lo ops are mandatorily annotated wih an invariant assertion. In onrete syn tax, w e will write { a } i for the instrution i arrying the assertion a (noted pre a i in the Co q eno ding). Indutive a_instr : Type := pre (a:assert)(i:a_in st r) | a_assign (x:string)(e:aexp r) | a_seq (i1 i2:a_instr) | a_while (b:bexpr)(a:ass ert )( i: a_i ns tr ). Reasoning on assertions W e an reason on annotated programs, b eause there are logial reasons for programs to b e onsisten t with assertions. The idea is to ompute a olletion of logial form ulas asso iated to an annotated program and a nal logial form ula, the p ost- ondition . When this olletion of form ulas holds, there exists an other logial form ula, the pr e- ondition whose satisabilit y b efore exeuting the program is enough to guaran tee that the p ost-ondition holds after exeuting the program. Annotations added to an instrution (with the help of the pre onstrut) m ust b e understo o d as form ulas that hold just b efore exeuting the annotated instrution. Assertions added to while lo ops m ust b e understo o d as invariants , they are mean t to hold at the b eginning and the end ev ery time the inner part of the while lo op is exeuted. When assertions are presen t in the annotated instrution, they are tak en for gran ted. F or instane, when the instrution is {x = 3} x := x + 1 , the omputed pre-ondition is x = 3 , whatev er the p ost-ondition is. When the instrution is a plain assignmen t, one an nd the pre-ondition b y substituting the assigned v ariable with the assigned expression in the p ost- ondition. F or instane, when the p ost ondition is x = 4 and the instrution is the assignemen t x := x + 1 , it sues that the pre-ondition x + 1 = 4 is satised b efore exeuting the assignmen t to ensure that the p ost-ondition is satised after exeuting it. When the annotated instrution is a while lo op, the pre-ondition simply is the in v arian t for this while lo op. When the annotated instrution is a sequene of t w o instrutions, the pre-ondition is the pre-ondition omputed for the rst of the t w o instrutions, but using the pre-ondition of the seond instrution as the p ost-ondition for the rst instrution. Co q eno ding for pre-ondition omputation T o eno de this pre-ondition funtion in Co q, w e need to desrib e funtions that p erform the substitution of a v ariable with an arithmeti expression in arithmeti expressions, b o olean expressions, and assertions. These substitution funtions are giv en as follo ws: Fixpoint asubst (x:string) (s:aexpr) (e:aexpr) : aexpr := math e with anum n => anum n | avar x1 => if string_de x x1 then s else e | aplus e1 e2 => aplus (asubst x s e1) (asubst x s e2) end. Definition bsubst (x:string) (s:aexpr) (b:bexpr) : bexpr := math b with blt e1 e2 => blt (asubst x s e1) (asubst x s e2) end. Fixpoint subst (x:string) (s:aexpr) (a:assert) : assert := math a with pred p l => pred p (map (asubst x s) l) | a_b b => a_b (bsubst x s b) | a_onj a1 a2 => a_onj (subst x s a1) (subst x s a2) | a_not a => a_not (subst x s a) | any => any end. In the denition of asubst , the funtion string_de ompares t w o strings for equalit y . The v alue returned b y this funtion an b e used in an if-then-else onstrut, but it is not a b o olean v alue (more detail an b e found in [ 3 ℄). The rest of the o de is just a plain tra v ersal of the struture of expressions and assertions. Note also that the last pattern-mat hing rule in subst is used for b oth a_true and a_false . One w e kno w ho w to substitute a v ariable with an expression, w e an easily desrib e the omputation of the pre-ondition for an annotated instrution and a p ost-ondition. This is giv en b y the follo wing simple reursiv e pro edure: Fixpoint p (i:a_instr) (post : assert) : assert := math i with pre a i => a | a_assign x e => subst x e post | a_seq i1 i2 => p i1 (p i2 post) | a_while b a i => a end. A v eriation ondition generator When it reeiv es an instrution arrying an annotation, the funtion p simply returns the annotation. In this sense, the pre-ondition funtion tak es the annotation for gran ted. T o mak e sure that an instrution is onsisten t with its pre-ondition, w e need to  he k that the assertion really is strong enough to ensure the p ost-ondition. F or instane, when the p ost-ondition is x < 10 and the instrution is the annotated assigmen t { x = 2 } x := x + 1 , satisfying x = 2 b efore the as- signmen t is enough to ensure that the p ost-ondition is satised. On the other hand, if the annotated instrution w as {x < 10 } x := x + 1 , there w ould b e a problem b eause there are ases where x < 10 holds b efore exeuting the as- signmen t and x < 10 do es not hold after. In fat, for assigmen ts that are not annotated with assertions, the funtion p omputes the b est form ula, the we akest pr e- ondition . Th us, in presene of an annotation, it sues to v erify that the annotation do es imply the w eak est pre-ondition. W e are no w going to desrib e a funtion that ollets all the v er- iations that need to b e done. More preisely , the new funtion will ompute onditions that are suien t to ensure that the pre-ondition from the previ- ous setion is strong enough to guaran tee that the p ost-ondition holds after exeuting the program, when the program terminates. The v eriation that an annotated instrution is onsisten t with a p ost- ondition th us returns a sequene of impliations b et w een assertions. When all these impliations are logially v alid, there is a guaran tee that satisfying the pre-ondition b efore exeuting the instrution is enough to ensure that the p ost- ondition will also b e satised after exeuting the instrution. This guaran tee is pro v ed formally in [2℄. When the instrution is a plain assignmen t without annotation, there is no need to v erify an y impliation b eause the omputed pre-ondition is already go o d enough. When the instrution is an annotated instrution { A } i and the p ost-ondition is P , w e an rst ompute the pre-ondition P ′ and a list of impliations l for the instrution i and the p ost-ondition P . W e then only need to add A ⇒ P ′ to l to get the list of onditions for the whole instrution. F or instane, when the p ost-ondition is x=3 and the instrution is the as- signmen t x := x+1 , the pre-ondition omputed b y p is x + 1 = 3 and this is ob viously go o d enough for the p ost-ondition to b e satised. On the other hand, when the instrution is an annotated instrution, { P } x := x+1 , w e need to v erify that P ⇒ x + 1 = 3 holds. If w e lo ok again at the rst example in this setion, onerning an instrution {x < 10} x := x+1 and a p ost-ondition x < 10 , there is a problem, b eause a v alue of 9 satises the pre-ondition, but exeution leads to a v alue of 10, whi h do es not satisfy the p ost-ondition The ondition generator onstruts a ondition of the form x < 10 ⇒ x + 1 < 10 . The fat that this logial form ula is atually unpro v able relates to the fat that the triplet omp osed of the pre- ondition, the assignmen t, and the p ost-ondition is atually inonsisten t. When the instrution is a sequene of t w o instrutions i 1 ; i 2 and the p ost- ondition is P , w e need to ompute lists of onditions for b oth sub-omp onen ts i 1 and i 2 . The list of onditions for i 2 is omputed for the p ost-ondition for the whole onstrut P , while the list of onditions of i 1 is omputed taking as p ost-ondition the pre-ondition of i 2 for P . This is onsisten t with the in tuitiv e explanation that it sues that the pre-ondition for an instrution holds to ensure that the p ost-ondition will hold after exeuting that instrution. If w e w an t P to hold after exeuting i 2 , w e need the pre-ondition of i 2 for P to b e satised and it is the resp onsibilit y of the instrution i 1 to guaran tee this. Th us, the onditions for i 1 an b e omputed with this assertion as a p ost-ondition. When the instrution is a while lo op, of the form while b do { A } i done w e m ust remem b er that the assertion A should b e an in v arian t during the lo op exeution. This is expressed b y requiring that A is satised b efore exeuting i should b e enough to guaran tee that A is also satised after exeuting i . Ho w ev er, this is needed only in the ases where the lo op test b is also satised, b eause when b is not satised the inner instrution of the while lo op is not exeuted. A t the end of the exeution, w e an use the information that the in v arian t A is satised and the information that w e kno w the lo op has b een exeuted b eause the test ev en tually failed. The program is onsisten t when these t w o logial prop erties are enough to imply the initial p ost-ondition P . Th us, w e m ust rst ompute the pre-ondition A ′ for the inner instrution i and the p ost-ondition A , ompute the list of onditions for i with A as p ost-ondition, add the ondition A ∧ b ⇒ A ′ , and add the ondition A ∧ ¬ b ⇒ P . Co q eno ding of the v eriation ondition generator The v eriation onditions alw a ys are impliations. W e pro vide a new data-t yp e for these impli- ations: Indutive ond : Type := imp (a1 a2:assert). The omputation of v eriation onditions is then simply desrib ed as a plain reursiv e funtion, whi h follo ws the struture of annotated instrutions. Fixpoint v (i:a_instr)(post : assert) : list ond := math i with pre a i => (imp a (p i post))::v i post | a_assign _ _ => nil | a_seq i1 i2 => v i1 (p i2 post)++v i2 post | a_while b a i => (imp (a_onj a (a_b b)) (p i a)):: (imp (a_onj a (a_not (a_b b))) post):: v i a end. Desribing the seman tis of programming language using a v eriation on- dition generator is not the only approa h that an b e used to desrib e the lan- guage. In fat, this approa h is partial, b eause it desrib es prop erties of inputs and outputs when instrution exeution terminates, but it giv es no information ab out termination. More preise desriptions an b e giv en using op erational or denotational seman tis and the onsisteny of this v eriation ondition gener- ator with su h a omplete seman tis an also b e v eried formally . This is done in [2 ℄, but it is not the purp ose of this artile. When reasoning ab out the orretness of a giv en annotated instrution, w e an use the funtion v to obtain a list of onditions. It is then neessary to reason on the v alidit y of this list of onditions. What w e w an t to v erify is that the impliations hold for ev ery p ossible instan tiation of the program v ariables. This is desrib ed b y the follo wing funtion. Fixpoint valid (m:string->list Z ->Prop) (l:list ond) : Prop := math l with nil => True | ::tl => (let (a1, a2) :=  in forall g, ia m g a1 -> ia m g a2) /\ valid m tl end. An annotated program i is onsisten t with a giv en p ost-ondition p when the prop ert y valid (v i p ) holds. This means that the p ost-ondition is guaran- teed to hold after exeuting the instrution if the omputed pre-ondition w as satised b efore the exeution and the exeution of the instrution terminates. 3.3 A monotoniit y prop ert y In our study of an abstrat in terpreter, w e will use a prop ert y of the ondition generator. Theorem 1. F or every annotate d instrution i , if p 1 and p 2 ar e two p ost-  onditions suh that p 1 is str onger than p 2 , if the pr e- ondition for i and p 1 is satise d and al l the veri ation  onditions for i and the p ost- ondition p 1 ar e valid, then the pr e- ondition for i and p 2 is also satise d and the veri ation  onditions for i and p 2 ar e also valid. Pr o of. This pro of is done in the on text of a giv en mapping from prediate names to atual prediates, m . The prop ert y is pro v ed b y indution on the struture of the instrution i . The statemen t p 1 is stronger than p 2 when the impliation p 1 ⇒ p 2 is v alid. In other w ords, for ev ery assignmen t of v ariables g , the logial v alue of p 1 implies the logial v alue of p 2 . If the instrution is an assignmen t, w e an rely on a lemma: the v alue of an y assertion subst x e p in an y v aluation g is equal to the v alue of the assertion p in the v aluation g ′ that is equal to g on ev ery v ariable but x , for whi h it returns the v alue of e in the v aluation g . Th us, the preondition for the assignmen t x := e for p i is subst x e p i and the the v alidit y of subst x e p 1 ⇒ sub st x e p 2 simply is an instane of the v alidit y of p 1 ⇒ p 2 , whi h is giv en b y h yp othesis. Also, when the instrution is an assignmen t, there is no generated v eriation ondition and the seond part of the statemen t holds. If the instrution is a sequene i 1 ; i 2 , then w e kno w b y indution h yp othesis that the pre-ondition p ′ 1 for i 2 and p 1 is stronger than the pre-ondition p ′ 2 for i 2 and p 2 and all the v eriation onditions for that part are v alid; w e an use an indution h yp othesis again to obtain that the pre-ondition for i 1 and p ′ 1 is stronger than the pre-ondition for i 1 and p ′ 2 , and the orresp onding v eriation onditions are all v alid. The last t w o pre-onditions are the ones w e need to ompare, and the whole set of v eriation onditions is the union of the sets whi h w e kno w are v alid. If the instrution is an annotated instrution { a } i , the t w o pre-onditions for p 2 and p 1 alre alw a ys a , so the rst part of the statemen t trivially holds. Moreo v er, w e kno w b y indution h yp othesis that the pre-ondition p ′ 1 for i and p 1 is stronger that the pre-ondition p ′ 2 for i and p 2 . The v eriation onditions for the whole instrution and p 1 (resp. p 2 ) are the same as for the sub-instrution, with the ondition a ⇒ p ′ 1 (resp. a ⇒ p ′ 2 ) added. By h yp othesis, a ⇒ p ′ 1 holds, b y indution h yp othesis p ′ 1 ⇒ p ′ 2 , w e an th us dedue that a ⇒ p ′ 2 holds. If the instrution is a lo op while b do { a } i d one , most v eriation onditions and generated pre-onditions only dep end on the lo op in v arian t. The only thing that w e need to  he k is the v eriation ondition on taining the in v arian t, the negation of the test and the p ost-ondition. By h yp othesis, a ∧ ¬ b ⇒ p 1 and p 1 ⇒ p 2 are v alid. By transitivit y of impliation w e obtain a ∧ ¬ b ⇒ p 2 easily . In Co q, w e rst pro v e a lemma that expresses that the satisabilit y of an asser- tion a where a v ariable x is substituted with an arithmeti expression e' for a v aluation g is the same as the satisabilit y of the assertion a without substitu- tion, but for a v aluation that maps x to the v alue of e' in g and oinides with g for all other v ariables. Lemma subst_sound : forall m g a x e', ia m g (subst x e' a) = ia m (fun y => if string_de x y then af g e' else g y) a. This lemma requires similar lemmas for arithmeti expressions, b o olean expres- sions, and lists of expressions. All are pro v ed b y indution on the struture of expressions. An example pro of for substitution F or instane, the statemen t for the substitution in arithmeti expressions is as follo ws: Lemma subst_sound_a : forall g e x e', af g (asubst x e' e) = af (fun y => if string_de x y then af g e' else g y) e. The pro of an b e done in Co q b y an indution on the expression e . This leads the system to generate three ases, orresp onding to the three onstrutors of the aexpr t yp e. The om bined tati w e use is as follo ws: intros g e x e'; indution e; simpl; auto. The tati indution e generates three goals and the tatis simpl and auto are applied to all of them. One of the ases is the ase for the anum onstrutor, where b oth instanes of the af funtion ompute to the v alue arried b y the onstrutor, th us simpl fores the omputation and leads to an equalit y where b oth sides are equal. In this ase, auto solv es the goal. Only the other t w o goals remain. The rst other goal is onerned with the avar onstrut. In this ase the expression has the form avar s and the expression subst x e' (avar s) is transformed in to the follo wing expression b y the simpl tati. if string_de x s then e' else (avar s) F or this ase, the system displa ys a goal that has the follo wing shap e: g : string -> Z s : string x : string e' : aexpr ================ === == == === == af g (if string_de x s then e' else avar s) = (if string_de x s then af g e' else g s) In Co q goals, the information that app ears ab o v e the horizon tal bar is data that is kno wn to exist, the information b elo w the horizon tal bar is the expression that w e need to pro v e. Here the information that is kno wn only orresp onds to t yping information. W e need to reason b y ases on the v alues of the expression string_de x s . The tati ase ... is used for this purp oses. It generate t w o goals, one orresp onding to the ase where string_de x s has an amativ e v alue and one orresp onding to the ase where string_de x s has a negativ e v alue. In ea h the goal, the if-then-else onstruts are redued aordingly . In the goal where string_de x s is armativ e, b oth sides of the equalit y redue to af g e' ; in the other goal, b oth sides of the equalit y redue to g x . Th us in b oth ases, the pro of b eomes easy . This reasoning step is easily expressed with the follo wing om bined tati: ase (string_de x s); auto. There only remains a goal for the last p ossible form of arithmeti expression, aplus e1 e2 . The indution tati pro vides indution hyp otheses stating that the prop ert y w e w an t to pro v e already holds for e1 and e2 . After sym b oli omputation of the funtions af and asubst , as p erformed b y the simpl tati, the goal has the follo wing shap e: ... IHe1 : af g (asubst x e' e1) = af (fun y : string => if string_de x y then af g e' else g y) e1 IHe2 : af g (asubst x e' e2) = af (fun y : string => if string_de x y then af g e' else g y) e2 ================ === == == === == af g (asubst x e' e1) + af g (asubst x e' e2) = af (fun y : string => if string_de x y then af g e' else g y) e1 + af (fun y : string => if string_de x y then af g e' else g y) e2 This pro of an b e nished b y rewriting with the t w o equalities named IHe1 and IHe2 and then reognizing that b oth sides of the equalit y are the same, as required b y the follo wing tatis. rewrite IHe1, IHe2; auto. Qed. W e an no w turn our atten tion to the main result, whi h is then expressed as the follo wing statemen t: Lemma v_monotoni : forall m i p1 p2, (forall g, ia m g p1 -> ia m g p2) -> valid m (v i p1) -> valid m (v i p2) /\ (forall g, ia m g (p i p1) -> ia m g (p i p2)). T o express that this pro of is done b y indution on the struture of instrutions, the rst tati sen t to the pro of system has the form: intros m; indution i; intros p1 p2 p1p2 v1. The pro of then has four ases, whi h are solv ed in ab out 10 lines of pro of sript. 4 A rst simple abstrat in terpreter W e shall no w dene t w o abstrat in terpreters, whi h run instrutions sym b oli- ally , up dating an abstrat state at ea h step. The abstrat state is then trans- formed in to a logial expression whi h is added to the instrutions, th us pro- duing an annotated instrution. The abstrat state is also returned at the end of exeution, in one of t w o forms. In the rst simple abstrat in terpreter, the nal abstrat state is simply returned. In the seond abstrat in terpreter, only an optional abstrat state will b e returned, a None v alue b eing used when the abstrat in terpreter an detet that the program an nev er terminate: the seond abstrat in terpreter will also p erform dead o de detetion. F or example, if w e giv e our abstrat in terpreter an input state stating that x is ev en and y is o dd and the instrution x:= x+y; y:=y+1 , the resulting v alue will b e: ({even x /\ odd y} x:=x+y; {odd x /\ odd y} y:= y+1, (x, odd)::(y,even)::n il ) W e supp ose there exists a data-t yp e A whose elemen ts will represen t abstrat v alues on whi h instrutions are supp osed to ompute. F or instane, the data- t yp e A ould b e the t yp e on taining three v alues even , odd , and top . Another traditional example of abstrat data-t yp e is the t yp e of in terv als, that are either of the form [ m, n ] , with m ≤ n , [ −∞ , n ] , [ m, + ∞ ] , or [ −∞ , + ∞ ] . The data-t yp e of abstrat v alues should ome with a few elemen ts and fun- tions, whi h w e will desrib e progresssiv ely . 4.1 Using Galois onnetions Abstrat v alues represen t sp ei sets of onrete v alues. There is a natural order on sets : set inlusion. Similarly , w e an onsider an order on abstrat v alues, whi h mimis the order b et w een the sets they represen t. The traditional approa h to desrib e this orresp ondane b et w een the order on sets of v alues and the order on abstrat v alues is to onsider that the t yp e of abstrat v alues is giv en with a pair of funtions α and γ , where α : P ( Z ) → A and γ : A → P ( Z ) . The funtion γ maps an y abstrat v alue to the set of onrete v alues it represen ts. The funtion α maps an y set of onrete v alues to the smallest abstrat v alue whose in terpretation as a set on tains the input. W ritten in a mathematial form ula where ⊑ denotes the order on abstrat v alues, the t w o funtions and the orders on sets of onrete v alues and on abstrat v alues are related b y the follo wing statemen t: ∀ a ∈ A, ∀ b ∈ P ( Z ) .b ⊂ γ ( a ) ⇔ α ( b ) ⊑ a. When the funtions α and γ are giv en with this prop ert y , one sa ys that there is a Galois  onne tion . In our study of abstrat in terpretation, the funtions α and γ do not app ear expliitly . In a sense, γ will b e represen ted b y a funtion to_pred mapping abstrat v alues to assertions dep ending on arithmeti expressions. Ho w ev er, it is useful to k eep these funtions in mind when trying to gure out what prop erties are exp eted for the v arious omp onen ts of our abstrat in terpreters, as w e will see in the next setion. 4.2 Abstrat ev aluation of arithmeti expressions Arithmeti expressions on tain in teger onstan ts and additions, neither of whi h are onerned with the data-t yp e of abstrat v alues. T o b e able to asso iate an abstrat v alue to an arithmeti expression, w e need to nd w a ys to establish a orresp ondane b et w een onrete v alues and abstrat v alues. This is done b y supp osing the existene of t w o funtions and a onstan t, whi h are the rst three v alues axiomatized for the data-t yp e of abstrat v alues (but there will b e more later):  from_Z : Z -> A , this is used to asso iate a relev an t abstrat v alue to an y onrete v alue,  a_add : A -> A -> A , this is used to add t w o abstrat v alues,  top : A , this is used to represen t the abstrat v alue that arries no infor- mation. In terms of Galois onnetions, the funtion from_Z orresp onds to the fun- tion α , when applied to singletons. The funtion a_add m ust b e designed in su h a w a y that the follo wing prop ert y is satised: ∀ v 1 v 2 , { x + y | x ∈ ( γ ( v 1 ) , y ∈ ( γ ( v 2 )) } ⊂ γ ( a _ add v 1 v 2 ) . With this onstrain t, a funtion that maps an y pairs of abstrat v alues to top w ould b e aeptable, ho w ev er it w ould b e useless. It is b etter if a_add v 1 v 2 is the least satisfatory abstrat v alue su h that the ab o v e prop ert y is satised. The v alue top is the maximal elemen t of A , the image of the whole Z b y the funtion α . 4.3 Handling abstrat states When omputing the v alue of a v ariable, w e supp ose that this v alue is giv en b y lo oking up in a state, whi h atually is a list of pairs of v ariables and abstrat v alues. Definition state := list(string*A). Fixpoint lookup (s:state) (x:string) : A := math s with nil => top | (y,v)::tl => if string_de x y then v else lookup tl x end. As w e see in the denition of lookup , when a v alue is not dened in a state, the funtion b eha v es as if it w as dened with top as abstrat v alue. The omputation of abstrat v alues for arithmeti expressions is then desrib ed b y the follo wing funtion. Fixpoint a_af (s:state)(e:aexp r) : A := math e with avar x => lookup s x | anum n => from_Z n | aplus e1 e2 => a_add (a_af s e1) (a_af s e2) end. When exeuting assignmen ts abstratly , w e are also supp osed to mo dify the state. If the state on tained no previous information ab out the assigned v ariable, a new pair is reated. Otherwise, the rst existing pair m ust b e up dated. This is done with the follo wing funtion. Fixpoint a_upd(x:string)( v:A )( l: sta te ) : state := math l with nil => (x,v)::nil | (y,v')::tl => if string_de x y then (y, v)::tl else (y,v')::a_upd x v tl end. Later in this pap er, w e dene a funtion that generates assertions from states. F or this purp ose, it is b etter to up date b y mo difying existing pairs of a v ariable and a v alue rather than just inserting the new pair in fron t. 4.4 The in terpreter's main funtion When omputing abstrat in terpretation on instrutions w e w an t to pro due a nal abstrat state and an annotated instrution. W e will need a w a y to trans- form an abstrat v alue in to an assertion. This is giv en b y a funtion with the follo wing t yp e:  to_pred : A -> aexpr -> assert this is used to express that that the v alue of the arithmeti expression in a giv en v aluation will b elong to the set of onrete v alues represen ted b y the giv en abstrat v alue. So to_pred is axiomatized in the same sense as from_Z , a_add , top . Relying on the existene of to_pred , w e an dene a funtion that maps states to assertions: Fixpoint s_to_a (s:state) : assert := math s with nil => a_true | (x,a)::tl => a_onj (to_pred a (avar x)) (s_to_a tl) end. This funtion is implemen ted in a manner that all pairs presen t in the state are transformed in to assertions. F or this reason, it is imp ortan t that a_upd w orks b y mo difying existing pairs rather than hiding them. Our rst simple abstrat in terpreter only implemen ts a trivial b eha vior for while lo ops. Basially , this sa ys that no information an b e gathered for while lo ops (the result is nil , and the while lo op's in v arian t is also nil ). Fixpoint ab1 (i:instr)(s:sta te ) : a_instr*state := math i with assign x e => (pre (s_to_a s) (a_assign x e), a_upd x (a_af s e) s) | seq i1 i2 => let (a_i1, s1) := ab1 i1 s in let (a_i2, s2) := ab1 i2 s1 in (a_seq a_i1 a_i2, s2) | while b i => let (a_i, _) := ab1 i nil in (a_while b (s_to_a nil) a_i, nil) end. In this funtion, w e see that the abstrat in terpretation of sequenes is simply desrib ed as omp osing the eet on states and reom bining the instrution obtained from ea h omp onen t of the sequene. 4.5 Exp eted prop erties for abstrat v alues T o pro v e the orretness of the abstrat in terpreter, w e need to kno w that the v arious funtions and v alues pro vided around the t yp e A satisfy a olletion of prop erties. These are gathered as a set of h yp otheses. One v alue that w e ha v e not talk ed ab out y et is the mapping from predi- ate names to atual prediates on in tegers, whi h is neessary to in terpret the assertions generated b y to_pred . This is giv en axiomatially , lik e top and the others:  m : string -> list Z -> Prop , maps all prediate names used in to_pred to atual prediates on in tegers. The rst h yp othesis expresses that top brings no information. Hypothesis top_sem : forall e, (to_pred top e) = a_true. The next t w o h yp otheses express that the prediates asso iated to ea h ab- strat v alue are p ar ametri with resp et to the arithmeti expression they reeiv e. Their truth do es not dep end on the exat shap e of the expressions, but only on the onrete v alue su h an arithmeti expression ma y tak e in the urren t v alu- ation. Similarly , substitution basially aets the arithmeti expression part of the prediate, not the part that dep ends on the abstrat v alue. Hypothesis to_pred_sem : forall g v e, ia m g (to_pred v e) = ia m g (to_pred v (anum (af g e))). Hypothesis subst_to_pred : forall v x e e', subst x e' (to_pred v e) = to_pred v (asubst x e' e). F or instane, if the abstrat v alues are in terv als, it is natural that the to_pred funtion will map an abstrat v alue [3,10℄ and an arithmeti expression e to an assertion between(3, e, 10) . When ev aluating this assertion with resp et to a giv en v aluation g , the in tegers 3 and 10 will not b e aeted b y g . Similarly , substitution will not aet these in tegers. The last t w o h yp otheses express that the in terpretation of the asso iated prediates for abstrat v alues obtained through from_Z and a_add are onsisten t with the onrete v alues omputed for immediate in tegers and additions. The h yp othesis from_Z_sem atually establishes the orresp ondene b et w een from_Z and the abstration funtion α of a Galois onnetion. The h yp othesis a_add_sem expresses the ondition whi h w e desrib ed informally when in tro duing the funtion a_add_sem . Hypothesis from_Z_sem : forall g x, ia m g (to_pred (from_Z x) (anum x)). Hypothesis a_add_sem : forall g v1 v2 x1 x2, ia m g (to_pred v1 (anum x1)) -> ia m g (to_pred v2 (anum x2)) -> ia m g (to_pred (a_add v1 v2) (anum (x1+x2))). 4.6 A v oiding dupliates in states The w a y s_to_a and a_upd are dened is not onsisten t: s_to_a maps ev ery pair o uring in a state to an assertion fragmen t, while a_upd only mo dies the rst pair o uring in the state. F or instane, when the abstrat in terpretation omputes with in terv als, s is ("x", [1,1℄)::("x",[1,1 ℄) ::n il , and the instrution is x := x + 1 , the re- sulting state is ("x",[2,2℄)::("x ",[ 1, 1℄ ):: ni l and the resulting annotated instrution is { 1 ≤ x ≤ 1 ∧ 1 ≤ x ≤ 1 } x:= x+1 . The p ost-ondition orre- sp onding to the resulting state is 2 ≤ x ≤ 2 ∧ 1 ≤ x ≤ 1 . It is on traditory and annot b e satised when exeuting from v aluations satisfying the pre-ondition, whi h is not on traditory . T o op e with this diult y , w e need to express that the abstrat in terpreter w orks orretly only with states that on tain no dupliates. W e formalize this with a prediate onsistent , whi h is dened as follo ws: Fixpoint mem (s:string)(l:li st string): bool := math l with nil => false | x::l => if string_de x s then true else mem s l end. Fixpoint no_dups (s:state)(l:lis t string) :bool := math s with nil => true | (s,_)::tl => if mem s l then false else no_dups tl (s::l) end. Definition onsistent (s:state) := no_dups s nil = true. The funtion no_dups atually returns true when the state s on tains no du- pliates and no elemen t from the exlusion list l . W e pro v e, b y indution on the of struture of s , that up dating a state that satises no_dups for an exlusion list l , using a_upd for a v ariable x outside the exlusion list returns a new state that still satises no_dups for l . The statemen t is as follo ws: Lemma no_dups_update : forall s l x v, mem x l = false -> no_dups s l = true -> no_dups (a_upd x v s) l = true. The pro of of this lemma is done b y indution on s , making sure that the prop ert y that is established for ev ery s is univ ersally quan tied o v er l : the indution h yp othesis is atually used for a dieren t v alue of the the exlusion list. The orollary from this lemma orresp onding to the ase where l is instan- tiated with the empt y list expresses that a_upd preserv es the onsistent prop- ert y . Lemma onsistent_update : forall s x v, onsistent s -> onsistent (a_upd x v s). 4.7 Pro ving the orretness of this in terpreter When the in terpreter runs on an instrution i and a state s and returns an annotated instrution i ′ and a new state s ′ , the orretness of the run is expressed with three prop erties:  The assertion s_to_a s is stronger than the pre-ondition p i ′ (s_to_a s ′ ) ,  All the v eriation onditions in v i ′ (s_to_a s ′ ) are v alid,  The annotated instrution i ′ is an annotated v ersion of the input i . In the next few setions, w e will pro v e that all runs of the abstrat in terpreter are orret. 4.8 Soundness of abstrat ev aluation for expressions When an expression e ev aluates abstratly to an abstrat v alue a and onretely to an in teger z , z should satisfy the prediate asso iated to the v alue a . Of ourse, the ev aluation of e an only b e done using a v aluation that tak es are of pro viding v alues for all v ariables o uring in e . This v aluation m ust b e onsisten t with the abstrat state that is used for the abstrat ev aluation leading to a . The fat that a v aluation is onsisten t with an abstrat state is simply expressed b y sa ying that the in terpretation of the orresp onding assertion for this v aluation has to hold. Th us, the soundness of abstrat ev aluation is expressed with a lemma that has the follo wing shap e: Lemma a_af_sound : forall s g e, ia m g (s_to_a s) -> ia m g (to_pred (a_af s e) (anum (af g e))). This lemma is pro v ed b y indution on the expression e . The ase where e is a n um b er is a diret appliation of the h yp othesis from_Z_sem , the ase where e is an addition is a onsequene of a_add_sem , om bined with indution h yp otheses. The ase where e is a v ariable relies on another lemma: Lemma lookup_sem : forall s g, ia m g (s_to_a s) -> forall x, ia m g (to_pred (lookup s x) (anum (g x))). This other lemma is pro v ed b y indution on s . In the base ase, s is empt y , lookup s x is top , and the h yp othesis top_sem mak es it p ossible to onlude; in the step ase, if s is (y,v)::s' then the h yp othesis ia m g (s_to_a s) redues to to_pred v (avar y) /\ ia m g (s_to_a s') W e reason b y ases on whether x is y or not. If x is equal to y then to_pred v (avar y) is the same as to_pred v (anum (g x)) aording to to_pred_sem and lookup s x is the same as v b y denition of lookup , this is enough to onlude this ase. If x and y are dieren t, w e use the indution h yp othesis on s' . 4.9 Soundness of up date In the w eak est pre-ondition alulus, assignmen ts of the form x := e are tak en are of b y substituting all o urrenes of the assigned v ariable x with the arith- meti expression e in the p ost-ondition to obtain the w eak est pre-ondition. In the abstrat in terpreter, assignmen t is tak en are of b y up dating the rst instane of the v ariable in the state. There is a disrepany b et w een the t w o ap- proa hes, where the rst approa h ats on all instanes of the v ariable and the seond approa h ats only on the rst one. This disrepany is resolv ed in the onditions of our exp erimen t, where w e w ork with abstrat states that on tain only one binding for ea h v ariable: in this ase, up dating the rst v ariable is the same as up dating all v ariables. W e express this with the follo wing lemmas: Lemma subst_no_our : forall s x l e, no_dups s (x::l) = true -> subst x e (s_to_a s) = (s_to_a s). Lemma subst_onsistent : forall s g v x e, onsistent s -> ia m g (s_to_a s) -> ia m g (to_pred v (anum (af g e))) -> ia m g (subst x e (s_to_a (a_upd x v s))). Both lemmas are pro v ed b y indution on s and the seond one uses the rst in the ase where the substituted v ariable x is the rst v ariable o uring in s . This pro of also relies on the h yp othesis subst_to_pred . 4.10 Relating input abstrat states and pre-onditions F or the orretness pro of w e onsider runs starting from an instrution i and an initial abstrat state s and obtaining an annotated instrution i' and a nal abstrat state s' . W e are then onerned with the v eriation onditions and the pre-ondition generated for the p ost-ondition orresp onding to s' and the annotated instrution i' . The pre-ondition w e obtain is either the assertion orresp onding to s or the assertion a_true , when the rst sub-instrution in i is a while lo op. In all ases, the assertion orresp onding to s is stronger than the pre-ondition. This is expressed with the follo wing lemma, whi h is easily pro v ed b y indution on i . Lemma ab1_p : forall i i' s s', ab1 i s = (i', s') -> forall g a, ia m g (s_to_a s) -> ia m g (p i' a). This lemma is atually stronger than needed, b eause the p ost-ondition used for omputing the pre-ondition do es not matter, sine the resulting annotated instrution is hea vily annotated with assertions and the pre-ondition alw a ys omes from one of the annoations. 4.11 V alidit y of generated onditions The main orretness statemen t only onerns states that satisfy the onsistent prediate, that is, states that on tain at most one en try for ea h v ariable. The statemen t is pro v ed b y indution on instrutions. As is often the ase, what w e pro v e b y indution is a stronger statemen t; Su h a stronger statemen t also means stronger indution h yp otheses. Here w e add the information that the resulting state is also onsisten t. Theorem 2. If s is a  onsistent state and running the abstr at interpr eter ab1 on i fr om s r eturns a new annotate d instrution i ′ and anal state s ′ , then al l the veri ation  onditions gener ate d for i ′ and the p ost- ondition asso iate d to s ′ ar e valid. Mor e over, the state s ′ is  onsistent. The Co q eno ding of this theorem is as follo ws: Theorem ab1_orret : forall i i' s s', onsistent s -> ab1 i s = (i', s') -> valid m (v i' (s_to_a s')) /\ onsistent s'. This statemen t is pro v ed b y indution on i . Three ases arise, orresp onding to the three instrutions in the language. 1. When i is an assignmen t x := e , this is the base ase. ab1 i s omputes to (pre (s_to_a s) (a_assign x e), a_upd x (a_af s e) s) F rom the lemma a_af_sound w e obtain that the onrete v alue of e in an y v aluation g that satises ia m g (s_to_a s) satises the follo wing prop- ert y: ia m g (to_pred (a_af s e) (anum (af g e))) The lemma subst_onsistent an then b e used to obtain the v alidit y of the follo wing ondition. imp (s_to_a s) (subst x e (s_to_a (a_upd x (a_af s e) s))) This is the single v eriation ondition generated for this instrution. The seond part is tak en are of b y onsistent_upda te . 2. When the instrution i is a sequene seq i1 i2 , the abstrat in terpreter rst pro esses i1 with the state s as input to obtain an annotated instrution a_i1 and an output state s1 , it then pro esses i2 with s1 as input to obtain an annotated instrution a_i2 and a state s2 . The state s2 is used as the output state for the whole instrution. W e then need to v erify that the on- ditions generated for a_seq a_i1 a_i2 using s_to_a a2 as p ost-ondition are v alid and s2 satises the onsistent prop ert y . The onditions an b e split in t w o parts. The seond part is v a_i2 (s_to_a a2) . the v alidit y of these onditions is a diret onsequene of the indution h yp otheses. The rst part is v a_i1 (p a_i2 (s_to_a s2)) . This is not a diret onsequene of the indution h yp othesis, whi h only states v a_i1 (s_to_a s1) . Ho w- ev er, the lemma ab1_p applied on a_i2 states that s_to_a s1 is stronger than p (s_to_a s2) and the lemma v_monotoni mak es it p ossible to onlude. With resp et to the onsistent prop ert y , it is reursiv ely trans- mitted from s to s1 and from s1 to s2 . 3. When the instrution is a while lo op, the b o dy of the lo op is reursiv ely pro essed with the nil state, whi h is alw a ys satised. Th us, the v eriation onditions all onlude to a_true whi h is trivially true. Also, the nil state also trivially satises the onsistent prop ert y . 4.12 The annotated instrution W e also need to pro v e that the pro dued annotated instrution really is an annotated v ersion of the initial instrution. T o state this new lemma, w e rst dene a simple funtion that forgets the annotations in an annotated instrution: Fixpoint leanup (i: a_instr) : instr := math i with pre a i => leanup i | a_assign x e => assign x e | a_seq i1 i2 => seq (leanup i1) (leanup i2) | a_while b a i => while b (leanup i) end. W e then pro v e a simple lemma ab out the abstrat in terpreter and this funtion. Theorem ab1_lean : forall i i' s s', ab1 i s = (i', s') -> leanup i' = i. The pro of of this lemma is done b y indution on the struture of i . 4.13 Instan tiating the simple abstrat in terpreter W e an instan tiate this simple abstrat in terpreter on a data-t yp e of o dd-ev en v alues, using the follo wing indutiv e t yp e and funtions: Indutive oe : Type := even | odd | oe_top. Definition oe_from_Z (n:Z) : oe := if Z_eq_de (Zmod n 2) 0 then even else odd. Definition oe_add (v1 v2:oe) : oe := math v1,v2 with odd, odd => even | even, even => even | odd, even => odd | even, odd => odd | _, _ => oe_top end. The abstrat v alues an then b e mapp ed in to assertions in the ob vious w a y using a funtion oe_pred whi h w e do not desrib e here for the sak e of oniseness. Running this simple in terpreter on a small example, represen ting the program x := x + y; y := y + 1 for the state ("x", odd)::("y", even)::nil is represen ted b y the follo wing dialog: Definition ab1oe := ab1 oe oe_from_Z oe_top oe_add oe_to_pred. Eval vm_ompute in ab1oe (seq (assign "x" (aplus (avar "x") (avar "y"))) (assign "y" (aplus (avar "y") (anum 1)))) (("x",even)::("y ",o dd ): :ni l) . = (a_seq (pre (a_onj (pred "even" (avar "x" :: nil)) (a_onj (pred "odd" (avar "y" :: nil)) a_true)) (a_assign "x" (aplus (avar "x") (avar "y")))) (pre (a_onj (pred "odd" (avar "x" :: nil)) (a_onj (pred "odd" (avar "y" :: nil)) a_true)) (a_assign "y" (aplus (avar "y") (anum 1)))), ("x", odd) :: ("y", even) :: nil) : a_instr * state oe 5 A stronger in terpreter More preise results an b e obtained for while lo ops. F or ea h lo op w e need to nd a state whose in terpretation as an assertion will b e an aeptable in v arian t for the lo op. W e w an t this in v arian t to tak e in to aoun t an y information that an b e extrated from the b o olean test in the lo op: when en tering inside the lo op, w e kno w that the test sueeded; when exiting the lo op w e kno w that the test failed. It turns out that this information an help us detet ases where the b o dy of a lo op is nev er exeuted and ases where a lo op an nev er terminate. T o desrib e non-termination, w e  hange the t yp e of v alues returned b y the abstrat in terpreter: instead of returning an annotated instrution and a state, our new abstrat in terpreter returns an annotated instrution and an optional state: the optional v alue is None when w e ha v e deteted that exeution annot terminate. This detetion of guaran teed non-termination is onserv ativ e: when the analyser annot guaran tee that an instrution lo ops, it returns a state as usual. The presene of optional states will sligh tly omplexify the struture of our stati analysis. W e assume the existene of t w o new funtions for this purp ose.  learn_from_sue ss : state -> bexpr -> option state , this is used to eno de the information learned when the test sueeded. F or instane if the en vironmen t initially on tains an in terv al [0,10℄ for the v ariable x and the test is x < 6 , then w e an return the en vironmen t so that the v alue for x b eomes [0, 5℄ . Sometimes, the initial en vironmen t is so that the test an nev er b e satised, in this ase a v alue None is returned instead of an en vironmen t.  learn_from_failu re : state -> bexpr -> option state , this is used to ompute information ab out a state kno wing that a test failed. The b o dy of a while lo op is often mean t to b e run sev eral times. In abstrat in- terpretation, this is also true. A t ev ery run, the information ab out ea h v ariable at ea h lo ation of the instrution needs to b e up dated to tak e in to aoun t more and more onrete v alues that ma y b e rea hed at this lo ation. In traditional approa hes to abstrat in terpretation, a binary op eration is applied at ea h lo- ation, to om bine the information previously kno wn at this lo ation and the new v alues diso v ered in the urren t run. This is mo deled b y a binary op eration.  join : A -> A -> A , this funtion tak es t w o abstrat v alues and returns a new abstrat v alue whose in terpretation as a set is larger than the t w o inputs. The theoretial desription of abstrat in terpretation insists that the set A , to- gether with the v alues join and top should onstitute an upp er semi-lattie. In fat, W e will use only part of the prop erties of su h a struture in our pro ofs ab out the abstrat in terpreter. When the funtions learn_from_su ess and learn_from_failu re return a None v alue, w e atually detet that some o de will nev er b e exeuted. F or instane, if learn_from_sue ss returns None , w e an kno w that the test at the en try of a lo op will nev er b e satised and w e an onlude that the b o dy of the lo op is not exeuted. In this ondition, w e an mark this lo op b o dy with a false assertion. W e pro vide a funtion for this purp ose: Fixpoint mark (i:instr) : a_instr := math i with assign x e => pre a_false (a_assign x e) | seq i1 i2 => a_seq (mark i1) (mark i2) | while b i => a_while b a_false (mark i) end. Beause it marks almost ev ery instrution, this funtion mak es it easy to reog- nize at rst glane the fragmen ts of o de that are dead o de. A more ligh t w eigh t approa h ould b e to mark only the sub-instrutions for whi h an annotation is mandatory: while lo ops. 5.1 Main struture of in v arian t sear h In general, nding the most preise in v arian t for a while lo op is an undeidable problem. Here w e are desribing a stati analysis to ol. W e will trade preiseness for guaran teed termination. The approa h w e will desrib e will b e as follo ws: 1. Run the b o dy of the lo op abstratly for a few times, progressiv ely widening the sets of v alues for ea h v ariable at ea h run. If this pro ess stabilizes, w e ha v e rea hed an in v arian t, 2. If no in v arian t w as rea hed, try taking o v er-appro ximations of the v alues for some v ariables and run again the lo op for a few times. This pro ess ma y also rea h an in v arian t, 3. If no in v arian t w as rea hed b y progressiv e widening, pi k an abstrat state that is guaran teed to b e an in v arian t (as w e did for the rst simple in ter- preter: tak e the top state that giv es no information ab out an y v ariable), 4. In v arian ts that w ere obtained b y o v er-appro ximation an then b e impro v ed b y a narr owing pro ess: when run through the lo op again, ev en if no infor- mation ab out the state is giv en at the b eginning of the lo op, w e ma y still b e able to gather some information at the end of exeuting the lo op. The state that gathers the information at the end of the lo op and the information b e- fore en tering the lo op is most lik ely to b e an in v arian t, whi h is more preise (narro w er) than the top state. Again this pro ess ma y b e run sev eral times. W e shall no w review the op erations in v olv ed in ea h of these steps. 5.2 Joining states together Abstrat states are nite list of pairs of v ariable names and abstrat v alues. When a v ariable do es not o ur in a state, the asso iated abstrat v alue is top . When joining t w o states together ev ery v ariable that do es not o ur in one of the t w o states should reeiv e the top v alue, and ev ery v ariable that o urs in b oth states should reeiv e the join of the t w o v alues found in ea h state. W e desrib e this b y writing a funtion that studies all the v ariables that o ur in one of the lists: it is guaran teed to p erform the righ t b eha vior for all the v ariables in b oth lists, it naturally asso iates the top v alue to the v ariables that do not o ur in the rst list (b eause no pair is added for these v ariables), and it naturally asso iates the top v alue to the v ariables that do not o ur in the seond list, b eause top is the v alue found in the seond list and join preserv es top . Fixpoint join_state (s1 s2:state) : state := math s1 with nil => nil | (x,v)::tl => a_upd x (join v (lookup s2 x)) (join_state tl s2) end. Beause w e sometimes detet that some instrution will not b e exeuted w e o - asionally ha v e to onsider situation w ere w e are not giv en a state after exeuting a while lo op. In this ase, w e ha v e to om bine together a state and the absene of a state. But b eause the absene of state orresp onds to a false assertion, the other state is enough to desrib e the required in v arian t. W e eno de this in an auxiliary funtion. Definition join_state' (s: state)(s':optio n state) : state := math s' with Some s' => join_state s s' | None => s end. 5.3 Running the b o dy a few times In our general desription of the abstrat in terpretation of lo ops, w e need to exeute the b o dy of lo ops in t w o dieren t mo des: one mo de is a widening mo de the other is a narr owing mo de. In the narro wing mo de, after exeuting the b o dy of the lo op needs to b e joined with the initial state b efore exeuting the b o dy of the lo op, so that the result state is less preise than b oth the state b efore exeuting the b o dy of the lo op and the state after exeuting the b o dy of the lo op. In the narr owing mo de, w e start the exeution with an en vironmen t that is guaran teed to b e large enough, hoping to narro w this en vironmen t to a more preise v alue. In this ase, the join op eration m ust not b e done with the state that is used to start the exeution, but with another state whi h desrib es the information kno wn ab out v ariables b efore onsidering the lo op. T o aomo date these t w o mo des of abstrat exeution, w e use a funtion that tak es t w o states as input: the rst state is the one with whi h the result m ust b e joined, the seond state is the one with whi h exeution m ust start. In this funtion, the argumen t ab is the funtion that desrib es the abstrat in terpretation on the instrution inside the lo op, the argumen t b is the test of the lo op. The funtion ab returns an optional state and an annotated instrution. The optional state is None when the abstrat in terpreter an detet that the exeution of the program from the input state will nev er terminate. When putting all elemen ts together, the argumen t ab will b e instan tiated with the reursiv e all of the abstrat in terpreter on the lo op b o dy . Definition step1 (ab: state -> a_instr * option state) (b:bexpr) (init s:state) : state := math learn_from_su es s s b with Some s1 => let (_, s2) := ab s1 in join_state' init s2 | None => s end. W e then onstrut a funtion that rep eats step1 a ertain n um b er of times. This n um b er is denoted b y a natural n um b er n . In this funtion, the onstan t 0 is a natural n um b er and w e need to mak e it preise to Co q's parser, b y expressing that the v alue m ust b e in terpreted in a parsing sop e for natural n um b ers instead of in tegers, using the sp eier %nat . Fixpoint step2 (ab: state -> a_instr * option state) (b:bexpr) (init s:state) (n:nat) : state := math n with 0%nat => s | S p => step2 ab b init (step1 ab b init s) p end. The omplexit y of these funtions an b e impro v ed: there is no need to ompute all iterations if w e an detet early that a xed p oin t w as rea hed. In this pap er, w e prefer to k eep the o de of the abstrat in terpreter simple but p oten tially ineien t to mak e our formal v eriation w ork easier. 5.4 V erifying that a state is more preise than another T o v erify that w e ha v e rea hed an in v arian t, w e need to  he k for a state s , so that running this state through step1 ab b s s returns a new state that is not less preise than s . F or this, w e assume that there exist a funtion that mak es it p ossible to ompare t w o abstrat v alues:  thinner : A -> A -> bool , this funtion returns true when the rst ab- strat v alue giv es more preise information than the seond one. Using this basi funtion on abstrat v alues, w e dene a new funtion on states: Fixpoint s_stable (s1 s2 : state) : bool := math s1 with nil => true | (x,v)::tl => thinner (lookup s2 x) v && s_stable tl s2 end. This funtion tra v erses the rst state to  he k that the abstrat v alue asso iated to ea h v ariable is less preise than the information found in the seond state. This funtion is then easily used to v erify that a giv en state is an in v arian t through the abstrat in terpretation of a lo op's test and b o dy . Definition is_inv (ab:state-> a_instr * option state) (s:state)(b:bexp r): bo ol := s_stable s (step1 ab b s s). 5.5 Narro wing a state The step2 funtion reeiv es t w o argumen ts of t yp e state . The rst argumen t is solely used for join op erations, while the seond argumen t is used to start a sequene of abstrat states that orresp ond to iterated in terpretations of the lo op test and b o dy . When the start state is not stable through in terpretation, the resulting state is larger than b oth the rst argumen t and the start argumen t. When the start state is stable through in terpretation, there are ases where the resulting state is smaller than the start state. F or instane, in the ases where the abstrat v alues are even and odd , if the rst state argumen t maps the v ariable y to even and the v ariable z to odd , the start state maps y and z to the top abstrat v alue (the abstrat v alue that giv es no information) and the while lo op is the follo wing: while (x < 10) do x := x + 1; z:= y + 1; y := 2 done Then, after abstratly exeuting the lo op test and b o dy one, w e obtain a state where y has the v alue even and z has the top abstrat v alue. This state is more preise than the start state. After abstratly exeuting the lo op test and b o dy a seond time, w e obtain a state where z has the v alue odd and y has the v alue even . This state is more preise than the one obtained only after the rst abstrat run of the lo op test and b o dy . The example ab o v e sho ws that o v er-appro ximations are impro v ed b y running the abstrat in terpreter again on them. This phenomenon is kno wn as narr owing . It is w orth foring a narro wing phase after ea h phase that is lik ely to pro due an o v er-appro ximation of the smallest xed-p oin t of the abstrat in terpreter. This is used in the abstrat in terpreter that w e desrib e b elo w. 5.6 Allo wing for o v er-appro ximations In general, the nite amoun t of abstrat omputation p erformed in the step2 funtion is not enough to rea h the smallest stable abstrat state. This is re- lated to the undeidabilit y of the halting problem: it is often p ossible to write a program where a v ariable will reeiv e a preise v alue exatly when some other program terminates. If w e w ere able to ompute the abstrat v alue for this v ari- able in a nite amoun t of time, w e w ould b e able to design a program that solv es the halting problem. Ev en if w e are faing a program where nding the smallest state an b e done in a nite amoun t of time, w e ma y w an t to aelerate the pro ess b y taking o v er-appro ximations. F or instane, if w e onsider the follo wing lo op: while x < 10 do x := x + 1 done If the abstrat v alues w e are w orking with are in terv als and w e start with the in terv al [0,0℄ , after abstratly in terpreting the lo op test and b o dy one, w e obtain that the v alue for x should on tain at least [0,1℄ , after abstratly in ter- preting 9 times, w e obtain that the v alue for x should on tain at least [0,9℄ . Un til these 9 exeutions, w e ha v e not seen a stable state. A t the 10th exeution, w e obtain that the v alue for x should on tain at least [0, 10℄ and the 11th exeution sho ws that this v alue atually is stable. A t an y time b efore a stable state is rea hed, w e ma y  ho ose to replae the urren t unstable state with a state that is larger. F or instane, w e ma y  ho ose to replae [0,3℄ with [0,100℄ . When this happ ens, the abstrat in terpreter an diso v er that the resulting state after starting with the one that maps x to [0,100℄ atually maps x to [0,10℄ , th us [0,100℄ is stable and is go o d andidate to en ter a narro wing phase. This narro wing phase atually on v erges to a state that maps x to [0,10℄ . The  hoie of o v er-appro ximations is arbitrary and information ma y atually b e lost in the pro ess, b eause o v er-appro ximated states are less preise, but this is omp ensated b y the fat that the abstrat in terpreter giv es qui k er answ ers. The termination of the abstrat in terpreter an ev en b e guaran teed if w e imp ose that a guaran teed o v er-appro ximation is tak en after a nite amoun t of steps. An example of a guaran teed o v er-appro ximation is a state that maps ev ery v ariable to the top abstrat v alue. In our Co q eno ding, su h a state is represen ted b y the nil v alue. The  hoie of o v er-appro ximation strategies v aries from one abstrat domain to the other. In our Co q eno ding, w e  hose to let this o v er-appro ximation b e represen ted b y a funtion with the follo wing signature:  over_approx : nat -> state -> state -> state When applied to n , s , and s' , this funtion omputes an o v er appro ximation of s' . The v alue s is supp osed to b e a v alue that omes b efore s' in the abstrat in terpretation and an b e used to  ho ose the o v er-appro ximation lev erly , as it giv es a sense of diretion to the urren t ev olution of suessiv e abstrat v alues. The n um- b er n should b e used to ne-tune the oarseness of the o v er-appro ximation: the lo w er the v alue of n , the oarser the appro ximation. F or instane, when onsidering the example ab o v e, kno wing that s = [ 0 , 1 ] and s ′ = [ 0 , 2 ] are t w o suessiv e unstable v alues rea hed b y the abstrat in terpreter for the v ariable x an suggest to  ho ose an o v er-appro ximation where the upp er b ound  hanges but the lo w er b ound remains un hanged. In this ase, w e exp et the funtion over_approx to return [0, + ∞ ℄ , for example. 5.7 The main in v arian t sear hing funtion W e an no w desrib e the funtion that p erforms the pro ess desrib ed in se- tion 5.1 . The o de of this funtion is as follo ws: Fixpoint find_inv ab b init s i n : state := let s' := step2 ab b init s (hoose_1 s i) in if is_inv ab s' b then s' else math n with 0%nat => nil | S p => find_inv ab b init (over_approx p s s') i p end. The funtion hoose_1 is pro vided at the same time as all other funtions that are sp ei to the abstrat domain A , su h as join , a_add , et. The argumen t funtion ab is supp osed to b e the funtion that p erforms the abstrat in terpretation of the lo op inner instrution i (also alled the lo op b o dy), the b o olean expression b is supp osed to b e the lo op test. The state init is supp osed to b e the initial input state at the rst in v o ation of find_inv on this lo op and s is supp osed to b e the urren t o v er-appro ximation of init , n is the n um b er of o v er-appro ximations that are still allo w ed b efore the funtion should swit h to the nil state, whi h is a guaran teed o v er-appro ximation. This funtion systematially runs the abstrat in terpreter on the inner instrution an arbitrary n um b er of times (giv en b y the funtion hoose_1 ) and then tests whether the resulting state is an in v arian t. Narro wing steps atually tak e plae if the n um b er of iterations giv en b y hoose_1 is large enough. If the result of the iterations is an in v arian t, then it is returned. When the result state is not an in v arian t, the funtion find_inv is alled reursiv ely with a larger appro ximation omputed b y over_approx . When the n um b er of allo w ed reursiv e alls is rea hed, the nil v alue is returned. 5.8 Annotating the lo op b o dy with abstrat information The find_inv funtion only pro dues a state, while the abstrat in terpreter is also supp osed to pro due an annotated v ersion of the instrution. One w e kno w the in v arian t, w e an annotate the while lo op with this in v arian t and obtain an annotated v ersion of the lo op b o dy b y re-running the abstrat in terpreter on this instrution. This is done with the follo wing funtion: Definition do_annot (ab:state-> a_instr * option state) (b:bexpr) (s:state) (i:instr) : a_instr := math learn_from_su es s s b with Some s' => let (ai, _) := ab s' in ai | None => mark i end. In this funtion, ab is supp osed to ompute the abstrat in terpretation of the lo op b o dy . When the funtion learn_from_sue ss returns a None v alue, this means that the lo op b o dy is nev er exeuted and it is mark ed as dead o de b y the funtion mark . 5.9 The abstrat in terpreter's main funtion With the funtion find_inv , w e an no w design a new abstrat in terpreter. Its main struture is ab out the same as for the naiv e one, but there are t w o imp ortan t dierenes. First, the abstrat in terpreter no w uses the find_inv funtion to ompute an in v arian t state for the while lo op. Seond, this abstrat in terpreter an detet ases where instrutions are guaran teed to not terminate. This is a seond part of dead o de detetion: when a go o d in v arian t is deteted for the while lo op, a omparison b et w een this in v arian t and the lo op test ma y giv e the information that the lo op test an nev er b e falsied. If this is the ase, no state is returned and the instrutions follo wing this while lo op in sequenes m ust b e mark ed as dead o de. This is handled b y the fat that the abstrat in terpreter no w returns an optional state and an annotated instrution. The ase for the sequene is mo died to mak e sure instrution are mark ed as dead o de when reeiving no input state. Fixpoint ab2 (i:instr)(s:sta te ) : a_instr*option state := math i with assign x e => (pre (s_to_a s) (a_assign x e), Some (a_upd x (a_af s e) s)) | seq i1 i2 => let (a_i1, s1) := ab2 i1 s in math s1 with Some s1' => let (a_i2, s2) := ab2 i2 s1' in (a_seq a_i1 a_i2, s2) | None => (a_seq a_i1 (mark i2), None) end | while b i => let inv := find_inv (ab2 i) b s s i (hoose_2 s i) in (a_while b (s_to_a inv) (do_annot (ab2 i) b inv i), learn_from_failu re inv b) end. This funtion relies on an extra n umeri funtion hoose_2 to deide the n um b er of times find_inv will attempt progressiv e o v er-appro ximations b efore giving up and falling ba k on the nil state. Lik e hoose_1 and over_approx , this funtion m ust b e pro vided at the same time as the t yp e for abstrat v alues. 6 Pro ving the orretness of the abstrat in terpreter T o pro v e the orretness of our abstrat in terpreter, w e adapt the orretness statemen ts that w e already used for the naiv e in terpreter. The main  hange is that the resulting state is optional, with a None v alue orresp onding to non- termination. This means that when a None v alue is obtained w e an tak e the p ost- ondition as the false assertion. This is expressed with the follo wing funtion, mapping an optional state to an assertion. Definition s_to_a' (s':option state) : assert := math s' with Some s => s_to_a s | None => a_false end. The main orretness statemen t th us b eomes the follo wing one: Theorem ab2_orret : forall i i' s s', onsistent s -> ab2 i s = (i', s') -> valid m (v i' (s_to_a' s')). By omparison with the similar theorem for ab1 , w e remo v ed the part ab out the nal state satisfying the onsistent . This part is atually pro v ed in a lemma b eforehand. The reason wh y w e  hose to establish the t w o results at the same time for ab1 and in t w o stages for ab2 is anedotal. As for the naiv e in terpreter this theorem is paired with a lemma asserting that leaning up the resulting annotated instrution i' yields ba k the initial instrution i . W e atually need to pro v e t w o lemmas, one for the mark funtion (used to mark o de as dead o de) and one for ab2 itself. Lemma mark_lean : forall i, leanup (mark i) = i. Theorem ab2_lean : forall i i' s s', ab2 i s = (i', s') -> leanup i' = i. These t w o lemmas are pro v ed b y indution on the struture of the instrution i . 6.1 Hyp otheses ab out the auxiliary funtions The abstrat in terpreter relies on a olletion of funtions that are sp ei to the abstrat domain b eing handled. In our Co q dev elopmen t, this is handled b y dening the funtion inside a setion, where the v arious omp onen ts that are sp ei to the abstrat domain of in terpretation are giv en as setion v ariables and h yp otheses. When the setion is losed, the v arious funtions dened in the setion are abstrated o v er the v ariables that they use. Th us, the funtion ab2 b eomes a 16-argumen t funtion. The extra t w elv e argumen ts are as follo ws: 1. A : Type , the t yp e on taining the abstrat v alues, 2. from_Z : Z -> A , a funtion mapping in teger v alues to abstrat v alues, 3. top : A , an abstrat v alue represen ting la k of information, 4. a_add : A -> A -> A , an addition op eration for abstrat v alues, 5. to_pred : A -> aexpr -> assert , a funtion mapping abstrat v alues to their in terpretations as assertions on arithmeti expressions, 6. learn_from_sue ss : state A -> bexpr -> state A , a funtion that is able to impro v e a state, kno wing that a b o olean expression's ev aluation re- turns true , 7. learn_from_failu re : state A -> bexpr -> state A , similar to the pre- vious one, but using the kno wledge that the b o olean expression's ev aluation returns false , 8. join : A -> A -> A , a binary funtion on abstrat v alues that returns an abstrat v alue that is oarser than the t w o inputs, 9. thinner : A -> A -> bool , a omparison funtion that sueeds when the rst argumen t is more preise than the seond, 10. over_approx : nat -> state A -> state A -> state A , a funtion that implemen ts heuristis to nd o v er-appro ximations of its argumen ts, 11. hoose_1 : state A -> instr -> nat , a funtion that returns the n um- b er of times a lo op b o dy should b e exeuted with a giv en start state b efore testing for stabilisation, 12. hoose_2 : state A -> instr -> nat , a funtion that returns the n um- b er of times o v er-appro ximations should b e attempted b efore giving up and using the oarsest state. Most of these funtions m ust satisfy a olletion of prop erties to ensure that the orretness statemen t will b e pro v able. There are fourteen su h prop erties, whi h an b e sorted in the follo wing w a y: 1. Three prop erties are onerned with the assertions reated b y to_pred , with resp et to their logial in terpretation and to substitution. 2. T w o prop erties are onerned with the onsisteny of in terpretation of ab- strat v alues obtained through from_Z and a_add as prediates o v er in tegers. 3. T w o prop erties are onerned with the logial prop erties of abstrat states omputed with the help of learn_from_sues s and learn_from_fail ur e . 4. F our prop erties are onerned with ensuring that over_approx , join , and thinner do return or detet o v er-appro ximations orretly , 5. Three prop erties are onerned with ensuring that the onsistent prop er- ties is preserv ed through learn_from... and over_approx . 6.2 Main taining the onsistent prop ert y F or this abstrat in terpreter, w e need again to pro v e that it main tains the prop- ert y that all states are dupliation-free. It is rst established for the join_state op eration. A tually , sine the join_state op eration p erforms rep etitiv e up dates from the nil state, the result is dupliation-free, regardless of the dupliations in the inputs. This is easily obtained with a pro of b y indution on the rst argumen t. F or one, w e sho w the full pro of sript. Lemma join_state_onsis te nt : forall s1 s2, onsistent (join_state s1 s2). intros s1 s2; indution s1 as [ | [x v℄ s1 IHs1℄; simpl; auto. apply onsistent_update ; auto. Qed. The rst t w o lines of this Co q exerpt giv e the theorem statemen t. The line intros ... explains that a pro of b y indution should b e done. This pro of raises t w o ases, and the as ... fragmen t states that in the step ase (the seond ase), one should onsider a list whose tail is named s1 and whose rst pair on tains a v ariable x and an abstrat v alue v , and w e ha v e an indution h yp othesis, whi h should b e named IHs1 : this indution h yp othesis states that s1 already satises the onsistent prop ert y . The simpl diretiv e expresses that the reursiv e fun- tion should b e simplied if p ossible, and auto attempts to solv e the goals that are generated. A tually , the omputation of reursiv e funtions leads to pro ving true = true in the base ase and auto tak es are of this. F or the step ase, w e simply need to rely on the theorem onsistent_updat e (see setion 4.6 ). The premise of this theorem atually is IHs1 and auto nds it. 6.3 Relating input abstrat states and pre-onditions Similarly to what w as done for the naiv e abstrat in terpreter, w e w an t to ensure that the in terpretation of the input abstrat state as a logial form ula implies the pre-ondition for the generated annotated instrution and the generated p ost- ondition. F or the while lo op, this relies on the fat that the seleted in v arian t is obtained after rep etitiv e joins with the input state. W e rst establish t w o monotoniit y prop erties for the join_state funtion, w e sho w only the rst one: Lemma join_state_safe_1 : forall g s1 s2, ia m g (s_to_a s1) -> ia m g (s_to_a (join_state s1 s2)). Then, w e only need to propagate the prop ert y up from the step1 funtion. Again, w e sho w only the rst one but there are similar lemmas for step2 , find_inv ; and w e onlude with the prop ert y for ab2 : Lemma step1_p : forall g ab b s s', ia m g (s_to_a s) -> ia m g (s_to_a s') -> ia m g (s_to_a (step1 ab b s s')). Lemma ab2_p : forall i i' s s', ab2 i s = (i', s') -> forall g a, ia m g (s_to_a s) -> ia m g (p i' a). The pro of for step1_p is a diret onsequene of the denition and the prop er- ties of join_state . The pro ofs for step2 and find_inv are done b y indution on n . The pro of for ab2 is an easy indution on the instrution i . In partiular, the t w o state argumen ts to the funtion find_inv are b oth equal to the input state in the ase of while lo ops. 6.4 V alidit y of the generated onditions The main theorem is ab out ensuring that all v eriation onditions are pro v able. A go o d half of this problem is already tak en are of when w e pro v e the theorem ab2_p , whi h expresses that at ea h step the state is strong enough to ensure the v alidit y of the pre-ondition for the instrution that follo ws. The main added diult y is to v erify that the in v arian t omputed for ea h while lo op atually is in v arian t. This diult y is tak en are of b y the struture of the funtion find_inv , whi h atually in v ok es the funtion is_inv on its exp eted output b efore returning it. Th us, w e only need to pro v e that is_inv orretly detets states that are in v arian ts: Lemma is_inv_orret : forall ab b g s s' s2 ai, is_inv ab s b = true -> learn_from_su ess s b = Some s' -> ab s' = (ai, s2) -> ia m g (s_to_a' s2) -> ia m g (s_to_a s). W e an then dedue that find_inv is orret: the pro of pro eeds b y sho wing that the v alue this funtion returns is either v eried using is_inv or the nil state. The orretness statemen t for find_inv has the follo wing form: Lemma find_inv_orret : forall ab b g i n init s s' s2 ai, learn_from_sue ss (find_inv ab b init s i n) b = Some s' -> ab s' = (s2, ai) -> ia m g (s_to_a' s2) -> ia m g (s_to_a (find_inv ab b init s i n)). This an then b e om bined with the assumptions that learn_from_sues s and learn_from_failu re orretly impro v e the information giv en in abstrat state to sho w that the v alue returned for while lo ops in ab2 is orret. These assump- tions ha v e the follo wing form (the h yp othesis for the learn_from_failur e has a negated third assumption). Hypothesis learn_from_sues s_ se m : forall s b g, onsistent s -> ia m g (s_to_a s) -> ia m g (a_b b) -> ia m g (s_to_a' (learn_from_su e ss s b)). 7 An in terv al-based instan tiation The abstrat in terpreters w e ha v e desrib ed so far are generi and are ready to b e instan tiated on sp ei abstrat domains. In this setion w e desrib e an instan tiation on an abstrat domain to represen t in terv als. This domain of in- terv als on tains in terv als with nite b ounds and in terv als with innite b ounds. The in terv al with t w o innite b ounds represen ts the whole t yp e of in tegers. W e desrib e these in terv als with an indutiv e t yp e that has four v arian ts: Indutive interval : Type := above : Z -> interval | below : Z -> interval | between : Z -> Z -> interval | all_Z : interval. F or instane, the in terv al on taining all v alues larger than or equal to 10 is represen ted b y above 10 and the whole t yp e of in tegers is represen ted b y all_Z . The in terv al asso iated to an in teger is simply desrib ed as the in terv al with t w o nite b ounds equal to this in teger. Definition i_from_Z (x:Z) := between x x. When adding t w o in terv als, it sues to add the t w o b ounds, b eause addi- tion preserv es the order on in tegers. Coping with all the v arian ts of ea h p ossible input yields a funtion with man y ases. Definition i_add (x y:interval) := math x, y with above x, above y => above (x+y) | above x, between y z => above (x+y) | below x, below y => below (x+y) | below x, between y z => below (x+z) | between x y, above z => above (x+z) | between x y, below z => below (y+z) | between x y, between z t => between (x+z) (y+t) | _, _ => all_Z end. The assertions asso iated to ea h abstrat v alue an rely on only one, as w e an re-use the same omparison prediate for almost all v arian ts. This is desrib ed in the to_pred funtion. Definition i_to_pred (x:interval) (e:aexpr) : assert := math x with above a => pred "leq" (anum a::e::nil) | below a => pred "leq" (e::anum a::nil) | between a b => a_onj (pred "leq" (anum a::e::nil)) (pred "leq" (e::anum b::nil)) | all_Z => a_true end. Of ourse, the meaning atta hed to the string "leq" m ust b e orretly xed in the orresp onding instan tiation for the m parameter: Definition i_m (s : string) (l: list Z) : Prop := if string_de s "leq" then math l with x::y::nil => x <= y | _ => False end else False. 7.1 Learning from omparisons The funtions i_learn_from_su  es s and i_learn_from_fai lu re used when pro essing while lo ops an b e made arbitrarily omplex. F or the sak e of onise- ness, w e ha v e only designed a pair of funtions that detet the ase where the b o olean test has the form x < e , where e is an arbitrary arithmeti expression. In this ase, the funtion i_learn_from_su e ss up dates only the v alue asso i- ated to x : the initial in terv al asso iated with x is in terseted with the in terv al of all v alues that are less than the upp er b ound of the in terv al omputed for e . An imp ossibilit y is deteted when the lo w est p ossible v alue for x is larger than or equal to the upp er b ound for e . Ev en this simple strategy yields a funtion with man y ases, of whi h w e sho w only the ases where b oth x and e ha v e in terv al v alues with nite b ounds: Definition i_learn_from_su es s s b := math b with blt (avar x) e => math a_af _ i_from_Z all_Z i_add s e, lookup _ all_Z s x with ... | between _ n, between m p => if Z_le_de n m then None else if Z_le_de n p then Some (a_upd _ x (between m (n-1)) s) else Some s ... end | _ => Some s end. In the o de of this funtion, the funtions a_af , lookup , and a_upd are parame- terized b y the funtions from the datat yp e of in terv als that they use: i_from_Z , all_Z and i_add for a_af , all_Z for lookup , et. The funtion i_learn_from_fail ur e is designed similarly , lo oking at upp er b ounds for x and lo w er b ounds for e instead. 7.2 Comparing and joining in terv als The treatemen t of lo ops also requires a funtion to nd upp er b ounds of pairs of in terv als and a funtion to ompare t w o in terv als. These funtions are simply dened b y pattern-mat hing on the kind of in terv als that are enoun tered and then omparing the upp er and lo w er b ounds. Definition i_join (i1 i2:interval) : interval := math i1, i2 with above x, above y => if Z_le_de x y then above x else above y ... | between x y, between z t => let lower := if Z_le_de x z then x else z in let upper := if Z_le_de y t then t else y in between lower upper | _, _ => all_Z end. Definition i_thinner (i1 i2:interval) : bool := math i1, i2 with above x, above y => if Z_le_de y x then true else false | above _, all_Z => true ... | between x _, above y => if Z_le_de y x then true else false | between _ x, below y => if Z_le_de x y then true else false | _, all_Z => true ... end. 7.3 Finding o v er-appro ximations When the in terv al asso iated to a v ariable do es not stabilize, an o v er-appro xi- mation m ust b e found for this in terv al. W e implemen t an approa h where sev eral steps of o v er-appro ximation an b e tak en one after the other. F or in terv als, nding o v er-appro ximations an b e done b y pushing one of the b ounds of ea h in terv al to innit y . W e use the fat that the generi abstrat in terpreter alls the o v er-appro ximation with t w o v alues to  ho ose the b ound that should b e pushed to innit y: in a rst round of o v er-appro ximation, only the b ound that do es not app ear to b e stable is mo died. This strategy is partiularly w ell adapted for lo ops where one v ariable is inreased or dereased b y a xed amoun t at ea h exeution of the lo op's b o dy . The strategy is implemen ted in t w o funtions, the rst funtion o v er-appro xi- mates an in terv al, the seond funtion applies the rst to all the in terv alles found in a state. Definition open_interval (i1 i2:interval) : interval := math i1, i2 with below x, below y => if Z_le_de y x then i1 else all_Z | above x, above y => if Z_le_de x y then i1 else all_Z | between x y, between z t => if Z_le_de x z then if Z_le_de t y then i1 else above x else if Z_le_de t y then below y else all_Z | _, _ => all_Z end. Definition open_intervals (s s':state interval) : state interval := map (fun p:string*interval => let (x, v) := p in (x, open_interval v (lookup _ all_Z s' x))) s. The result of open_interval i1 i2 is exp eted to b e an o v er-appro ximation of i1 . The seond argumen t i2 is only used to  ho ose whi h of the b ounds of i1 should b e mo died. The funtion i_over_approx reeiv es a n umeri parameter to indiate the strength of o v er-appro ximation that should b e applied. Here, there are only t w o strengths: at the rst try (when the lev el is larger than 0), the funtion applies open_intervals ; at the seond try , it simply returns the nil state, whi h orresp onds to the top v alue in the domain of abstrat states. Definition i_over_approx n s s' := math n with S _ => open_intervals s s' | _ => nil end. The abstrat in terpreter also requires t w o funtions that ompute the n um b er of attempts at ea h lev el of rep etitiv e op eration. W e dene these t w o funtions as onstan t funtions: Definition i_hoose_1 (s:state interval) (i:instr) := 2%nat. Definition i_hoose_2 (s:state interval) (i:instr) := 3%nat. One the t yp e interval and the v arious funtions are pro vided w e obtain an abstrat in terpreter for omputing with in terv als. Definition abi := ab2 interval i_from_Z all_Z i_add i_to_pred i_learn_from_su e ss i_learn_from_fa ilu re i_join i_thinner i_over_approx i_hoose_1 i_hoose_2. W e an already run this instan tiated in terpreter inside the Co q system. F or instane, w e an run the in terpreter on the instrution: while x < 10 do x := x + 1 done This giv es the follo wing dialog (where the answ er of the Co q system is written in italis): Eval vm_ompute in abi (while (blt (avar "x") (anum 10)) (assign "x" (aplus (avar "x") (anum 1)))) (("X", between 0 0)::nil). = (a_while (blt (avar "x") (anum 10)) (a_onj (a_onj (pred "leq" (anum 0 :: avar "x" :: nil)) (pred "leq" (avar "x" :: anum 10 :: nil))) a_true) (pre (a_onj (a_onj (pred "leq" (anum 0 :: avar "x" :: nil)) (pred "leq" (avar "x" :: anum 9 :: nil))) a_true) (a_assign "x" (aplus (avar "x") (anum 1)))), Some (("x", between 10 10) :: nil)) : a_instr * option (state interval) 8 Conlusion This pap er desrib es ho w the funtional language presen t in a higher-order the- orem pro v er an b e used to eno de a to ol to p erform a stati analysis on an arbitrary programming language. The example programming language is  ho- sen to b e extremely simple, so that the example an b e desrib ed preisely in this tutorial pap er. The stati analysis to ol that w e desrib ed is inspired b y the approa h of abstrat in terpretation. Ho w ev er this w ork is not a omprehensiv e in tro dution to abstrat in terpretation, nor do es it o v er all the asp ets of en- o ding abstrat in terpretation inside a theorem pro v er. Better desriptions of abstrat in terpretation and its formal study are giv en in [11 ,5 ,12 ℄. The exp erimen t is p erformed with the Co q system. More extensiv e studies of programming languages using this system ha v e b een dev elop ed o v er the last y ears. In partiular, exp erimen ts b y the Comp ert team sho w that not only stati analysis but also eien t ompilation an b e desrib ed and pro v ed orret [4,10 ,6 ℄. Co q is also used extensiv ely for the study of funtional programming languages, in partiular to study the prop erties of t yp e systems and there are a few Co q-based solutions to the general landmark ob jetiv e kno wn as POPLMark [1℄. The abstrat in terpreter w e desrib e here is ineien t in man y resp ets: when analysing the b o dy of a lo op, this lo op needs to b e exeuted abstratly sev eral times, the annotations omputed ea h time are forgotten, and then when an in v arian t is diso v ered, the whole pro ess needs to b e done again to pro due the annotated instrution. A more eien t in terpreter ould b e designed where omputed annotations are k ept in memory long enough to a v oid reomputation when the in v arian t is found. W e did not design the abstrat in terpreter with this optimisation, thinking that the soures of ineieny ould b e alulated a w a y through systemati transformation of programs, as studied in another pap er in this v olume. The abstrat in terpreter pro vided with the pap er [2℄ on tains some of these optimisations. An imp ortan t remark is that program analyses an b e m u h more eien t when they onsider the relations b et w een sev eral v ariables at a time, as opp osed to the exp erimen t desrib ed here where the v ariables are onsidered indep en- den tly of ea h other. More preise w ork where relations b et w een v ariables an b e tra k ed is p ossible, on the ondition that abstrat v alues are used to desrib e omplete states, instead of single v ariables as in [ 4 ℄, where the result of the analy- sis is used as a basis for a ompiler optimisation kno wn as  ommon sub expr ession elimination . W e ha v e onen trated on a v ery simple while language in this pap er, for didatial purp oses. Ho w ev er, abstrat in terpreters ha v e b een applied to m u h more omplete programming languages. F or instane, the Astree [8 ℄ analyser o v ers most of the C programming language. On the other hand, the founda- tional pap ers desrib e abstrat in terpretation in terms of analyses on on trol o w graphs. The idea of abstrat in terpretation is general enough that it should b e p ossible to apply it to an y form of programming language. Referenes 1. B. A ydemir, A. Bohannon, M. F airbairn, J. F oster, B. Piere, P . Sew ell, D. V ytin- iotis, G. W ash burn, S. W eiri h, and S. Zdanewi. Me hanized metatheory for the masses: The POPLmark  hallenge. In Pr o  e e dings of the Eighte enth International Confer en e on The or em Pr oving in Higher Or der L o gis (TPHOLs 2005) , 2005. 2. Y v es Bertot. Theorem pro ving supp ort in programming language seman tis. T e h- nial Rep ort 6242, INRIA, 2007. to app ear in a b o ok in memory of Gilles Kahn. 3. Y v es Bertot and Pierre Castéran. Inter ative The or em Pr oving and Pr o gr am Devel- opment, Co q'A rt:the Cal ulus of Indutive Construtions . Springer-V erlag, 2004. 4. Y v es Bertot, Benjamin Grégoire, and Xa vier Lero y . A strutured approa h to pro ving ompiler optimizations based on datao w analysis. In T yp es for Pr o ofs and Pr o gr ams, W orkshop TYPES 2004 , v olume 3839 of L e tur e Notes in Computer Sien e , pages 6681. Springer, 2006. 5. F rédéri Besson, Thomas Jensen, and Da vid Pi hardie. Pro of-arrying o de from ertied abstrat in terpretation to xp oin t ompression. The or eti al Computer Sien e , 364(3):273291, 2006. 6. Sandrine Blazy , Za ynah Darga y e, and Xa vier Lero y . F ormal v eriation of a C ompiler fron t-end. In FM 2006: Int. Symp. on F ormal Metho ds , v olume 4085 of L e tur e Notes in Computer Sien e , pages 460475. Springer, 2006. 7. P atri k Cousot and Radhia Cousot. Abstrat in terpretation: a unied lattie mo del for stati analysis of programs b y onstrution or appro ximation of xp oin ts. In Confer en e R e  or d of the F ourth A CM Symp osium on Priniples of Pr o gr amming L anguages, POPL'77 , pages 238252. A CM Press, 1977. 8. P atri k Cousot, Radhia Cousot, Jérome F eret, An toine Mine Lauren t Maub orgne, Da vid Monniaux, and Xa vier Riv al. The Astrée analyzer. In Eur op e an Symp osium on Pr o gr amming, ESOP'XIV , v olume 3444 of LNCS , pages 2130. Springer, 2005. 9. Edsger W. Dijkstra. A disipline of Pr o gr amming . Pren tie Hall, 1976. 10. Xa vier Lero y . F ormal ertiation of a ompiler ba k-end, or: programming a ompiler with a pro of assistan t. In 33r d symp osium Priniples of Pr o gr amming L anguages , pages 4254. A CM Press, 2006. 11. Da vid Pi hardie. Interpr étation abstr aite en lo gique intuitionniste : extr ation d'analyseurs Java  ertiés . PhD thesis, Univ ersité Rennes 1, 2005. In fren h. 12. Da vid Pi hardie. Building ertied stati analysers b y mo dular onstrution of w ell-founded latties. In Pr o . of the 1st International Confer en e on F oundations of Informatis, Computing and Softwar e (FICS'08) , Eletroni Notes in Theoreti- al Computer Siene, 2008. 13. The Co q dev elopmen t team. The o q pro of assistan t, 2008. http://oq.inria.fr .

Structural abstract interpretation, A formal study using Coq

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment