Lecture notes on the lambda calculus

Lecture Notes on the Lambda Calculus Peter Selinger Department of Mathematics and Statistics Dalhousie Univ ersity , Halifax, Canada Abstract This is a set of lecture notes t hat de velop ed out of courses on the lambda calculus that I t aught at the Unive rsity of Ot tawa in 2001 and at Dalhousie Univ ersit y in 2007 and 2013. T opics co vere d in these notes include the un- typed lambda calculus, the Church-Rosser th eorem, combin atory a lgebras, the simply-typed lamb da calculus, the Curry-Howard isomorphism, weak and strong normalization, polymorphism, type inference, denotational se- mantics, complete partial orders, and the languag e P CF . Contents 1 Introduction 6 1.1 Extensional vs. in tensional vie w of functions . . . . . . . . . . . 6 1.2 The lambda calculu s . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Untyped vs. typed lambd a-calculi . . . . . . . . . . . . . . . . . 8 1.4 Lambda calculus and comp utability . . . . . . . . . . . . . . . . 9 1.5 Connection s to comp uter s cience . . . . . . . . . . . . . . . . . . 10 1.6 Connection s to logic . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7 Connection s to mathem atics . . . . . . . . . . . . . . . . . . . . 11 2 The untyped lambda calculus 11 2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1 2.2 Free and bound variables, α -equi valence . . . . . . . . . . . . . . 13 2.3 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 5 2.4 Introd uction to β -reductio n . . . . . . . . . . . . . . . . . . . . . 17 2.5 Formal deﬁnitions of β -red uction and β -equ i valence . . . . . . . 1 8 3 Programming in the untyped lambda calculus 19 3.1 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Natural numb ers . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Fixed points and recursi ve function s . . . . . . . . . . . . . . . . 22 3.4 Other data types: pairs, tuples, lists, tre es, etc. . . . . . . . . . . . 24 4 The Church-Rosser Theorem 26 4.1 Extensionality , η - equiv alence, and η -re duction . . . . . . . . . . . 26 4.2 Statement of the Church-Ro sser Theorem, and some consequ ences 28 4.3 Preliminary remark s on the proof of the Church- Rosser Theorem . 30 4.4 Proof of the Church -Rosser Theorem . . . . . . . . . . . . . . . . 32 4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 Combinatory algebras 38 5.1 Applicative structures . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Combinator y completene ss . . . . . . . . . . . . . . . . . . . . . 40 5.3 Combinator y algebras . . . . . . . . . . . . . . . . . . . . . . . . 4 2 5.4 The failure of soundness for combinatory algebras . . . . . . . . . 43 5.5 Lambda algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 5.6 Extensional combin atory algebras . . . . . . . . . . . . . . . . . 49 6 Simply-typed lambda calculus, propositional logic, and the Curry - Howard isomorphi sm 51 6.1 Simple types and simply-typ ed terms . . . . . . . . . . . . . . . . 51 6.2 Connection s to prop ositional logic . . . . . . . . . . . . . . . . . 54 6.3 Proposition al intuitionistic logic . . . . . . . . . . . . . . . . . . 56 2 6.4 An alternative presentation of natural deduction . . . . . . . . . . 58 6.5 The Curry-Howard Isomorp hism . . . . . . . . . . . . . . . . . . 60 6.6 Reductions in the simply-typ ed lambda calculus . . . . . . . . . . 62 6.7 A word on Church-Rosser . . . . . . . . . . . . . . . . . . . . . 6 3 6.8 Reduction as proof simpliﬁcation . . . . . . . . . . . . . . . . . . 64 6.9 Getting mileage out of the Curry- How ard isomorphism . . . . . . 65 6.10 Disjunctio n and sum types . . . . . . . . . . . . . . . . . . . . . 66 6.11 Classical log ic vs. intuitionistic logic . . . . . . . . . . . . . . . . 68 6.12 Classical log ic and the Curry-Howard isomorphism . . . . . . . . 70 7 W eak and strong normalization 71 7.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.2 W eak and strong normalization in typed lambda calculus . . . . . 72 8 Polymorphism 73 8.1 Syntax of System F . . . . . . . . . . . . . . . . . . . . . . . . . 73 8.2 Reduction rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.3.1 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.3.2 Natural numb ers . . . . . . . . . . . . . . . . . . . . . . 7 6 8.3.3 Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 8.4 Church-Rosser prope rty and strong normalizatio n . . . . . . . . . 77 8.5 The Curry-Howard isomorph ism . . . . . . . . . . . . . . . . . . 78 8.6 Supplyin g the missing logical connectiv es . . . . . . . . . . . . . 79 8.7 Normal form s and long normal forms . . . . . . . . . . . . . . . 80 8.8 The structure of closed normal forms . . . . . . . . . . . . . . . . 82 8.9 Application: repr esentation of arbitrary data in System F . . . . . 84 9 T ype inference 86 9.1 Principal types . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7 3 9.2 T ype templates and type substitutions . . . . . . . . . . . . . . . 87 9.3 Uniﬁers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 9 9.4 The uniﬁcation algorithm . . . . . . . . . . . . . . . . . . . . . . 90 9.5 The type inferen ce algorithm . . . . . . . . . . . . . . . . . . . . 92 10 Denota tional s emantics 93 10.1 Set-theo retic interpretation . . . . . . . . . . . . . . . . . . . . . 9 4 10.2 Soun dness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 10.3 Comp leteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 11 The languag e PCF 98 11.1 Syntax an d typing rules . . . . . . . . . . . . . . . . . . . . . . . 99 11.2 Axio matic equiv alence . . . . . . . . . . . . . . . . . . . . . . . 100 11.3 Oper ational semantics . . . . . . . . . . . . . . . . . . . . . . . . 101 11.4 Big-step sem antics . . . . . . . . . . . . . . . . . . . . . . . . . 103 11.5 Oper ational equiv a lence . . . . . . . . . . . . . . . . . . . . . . . 10 5 11.6 Oper ational approximatio n . . . . . . . . . . . . . . . . . . . . . 1 06 11.7 Discussion o f operation al equiv alence . . . . . . . . . . . . . . . 10 6 11.8 Oper ational equiv a lence and parallel or . . . . . . . . . . . . . . 1 07 12 Complete partial orders 109 12.1 Why are sets n ot enough, in general? . . . . . . . . . . . . . . . . 1 09 12.2 Comp lete partial orders . . . . . . . . . . . . . . . . . . . . . . . 109 12.3 Prop erties of limits . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.4 Contin uous functions . . . . . . . . . . . . . . . . . . . . . . . . 111 12.5 Pointed cp o’ s and strict functions . . . . . . . . . . . . . . . . . . 112 12.6 Prod ucts and function spaces . . . . . . . . . . . . . . . . . . . . 112 12.7 The inter pretation of the simply-typed lambda calculus in com- plete partial ord ers . . . . . . . . . . . . . . . . . . . . . . . . . 114 12.8 Cpo’ s and ﬁxed points . . . . . . . . . . . . . . . . . . . . . . . 1 14 4 12.9 Exam ple: Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 115 13 Denota tional s emantics of PCF 116 13.1 Soun dness and adequacy . . . . . . . . . . . . . . . . . . . . . . 1 16 13.2 Full ab straction . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 14 Acknowledgements 119 15 Bibliography 120 5 1 Introd uction 1.1 Extensional vs. intensional view of functions What is a function? In modern mathematics, the pre valent notion is t hat of “func- tions as graphs”: each function f has a ﬁxed dom ain X and codomain Y , an d a function f : X → Y is a set of p airs f ⊆ X × Y suc h that for each x ∈ X , there exists exactly on e y ∈ Y such that ( x, y ) ∈ f . T wo f unctions f , g : X → Y are considered equal if they yield the same output on each in put, i.e., f ( x ) = g ( x ) for all x ∈ X . Th is is called the extensional view o f functio ns, because it speciﬁes that the only thing observable about a function is how it maps inputs to outpu ts. Howe ver , b efore the 20th century , functions were rarely loo ked at in th is way . An older notion of functions is that o f “functions as rules”. In this view , to giv e a function means to give a rule for how the f unction is to b e calculated . Often, such a ru le can b e giv en by a formula , f or instance, the familiar f ( x ) = x 2 or g ( x ) = sin( e x ) fro m calculus. As b efore, two function s ar e e xtensionally equal i f they have the same inp ut-outpu t behavior; but now we can also sp eak of anothe r notion of eq uality: two functions are intensionally 1 equal if they are g iv en by (essentially) the same formula. When we think o f functio ns as given by fo rmulas, it is no t always nec essary to know the doma in and codomain o f a fun ction. Consider fo r instance the fun ction f ( x ) = x . This is, of course, the id entity function. W e may re gard it as a function f : X → X f or any set X . In most of mathe matics, the “fun ctions as graph s” p aradigm is the mo st elegant and appropriate way o f dealing with functions. Graphs deﬁn e a more general class of functions, be cause it includes fun ctions th at are not n ecessarily g i ven by a ru le. Thus, when we prove a mathematical statement suc h as “any d ifferen- tiable fun ction is contin uous”, we really mea n this is tru e for a ll functio ns (in the mathematical sense), not just those functio ns for which a rule can be giv en. On the other han d, in compu ter scien ce, the “functions as rules” paradigm is often more ap propr iate. Think o f a co mputer pro gram as d eﬁning a f unction that map s input to o utput. M ost comp uter pro grammer s (and users) do not o nly care abou t the extension al b ehavior of a pr ogram (which in puts are mapp ed to wh ich ou t- puts), but a lso about how the output is calculated: How m uch time does it take? How much memor y and disk space is used in the p rocess? How much comm uni- cation b andwidth is u sed? Th ese are intension al que stions having to do with the 1 Note that thi s word is intenti onally spelled “intensional ly”. 6 particular way in which a function was deﬁned. 1.2 The lambda calculus The lambda calculus is a theory of functions as formulas . It is a system for ma- nipulating functions as expr essions . Let us begin by looking at another well-known lang uage of expressions, namely arithmetic. Arith metic expressions are mad e up f rom variables ( x, y , z . . . ), n um- bers ( 1 , 2 , 3 , . . . ), and o perators (“ + ”, “ − ”, “ × ” etc. ). An expression such as x + y stands for the r esult o f an addition (as o pposed to an in struction to ad d, or the statement th at someth ing is bein g added). Th e great advantage o f this languag e is that expression s can be nested with out any n eed to mention the interm ediate results explicitly . So for instance, we write A = ( x + y ) × z 2 , and not let w = x + y , then let u = z 2 , then let A = w × u . The latter notation would be tiring and cumbersome to manipulate. The lambd a calcu lus extends th e idea of an expression langu age to inclu de f unc- tions. Where we no rmally write Let f be the fun ction x 7→ x 2 . The n consider A = f (5) , in the lambda calculus we just write A = ( λx.x 2 )(5) . The expression λx.x 2 stands for th e function that ma ps x to x 2 (as opposed t o the statement th at x is bein g mapp ed to x 2 ). As in arithmetic, we use par entheses to group terms. It is understoo d that the variable x is a loc al variable in the ter m λx.x 2 . Thus, it does not make any difference if we wr ite λy .y 2 instead. A local variable is also called a boun d variable. One advantage of the lambda no tation is that it allows us to easily talk ab out higher-or der functio ns, i.e., functions wh ose inputs and/or o utputs are them selves function s. An examp le is the operation f 7→ f ◦ f in mathematics, which takes a 7 function f and maps it to f ◦ f , the comp osition of f with itself. In the lambda calculus, f ◦ f is written as λx.f ( f ( x )) , and the operatio n that maps f to f ◦ f is written as λf .λx.f ( f ( x )) . The ev aluation of higher-order f unctions can get some what complex; as an exam- ple, consider the following e xpression:  ( λf .λx.f ( f ( x )))( λy.y 2 )  (5) Con vince your self th at this e v aluates to 625. Another example is gi ven in the following e xercise: Exercise 1. Ev aluate the lambda-expression   ( λf .λx.f ( f ( f ( x )))) ( λg .λy .g ( g ( y )))  ( λz .z + 1 )  (0) . W e will soon introdu ce some con ventions for reducin g the nu mber of parentheses in such expressions. 1.3 Untyped vs. typed lambda-calculi W e ha ve already men tioned that, when con sidering “fu nctions as rules”, is n ot always necessary to kn ow the domain a nd cod omain of a fu nction ahead of tim e. The simplest example is the id entity function f = λx.x , which can h av e any set X as its domain an d codo main, as lo ng as d omain an d co domain are equal. W e say that f has the type X → X . Another example is the function g = λf .λx.f ( f ( x )) that we encou ntered above. One can c heck that g maps any f unction f : X → X to a function g ( f ) : X → X . In this case, we say that the type of g is ( X → X ) → ( X → X ) . By being ﬂexible about dom ains and codo mains, we are able to m anipulate fun c- tions in ways that w ould not be possible in ordinary mathematics. For instance, if f = λx.x is the identity function, then we hav e f ( x ) = x for a ny x . In particular , we can take x = f , and we get f ( f ) = ( λx.x )( f ) = f . 8 Note that th e eq uation f ( f ) = f n ev er makes sense in ordinary mathematics, since it is not p ossible (fo r set-th eoretic reason s) for a fun ction to be in cluded in its own domain. As another example, let ω = λx.x ( x ) . Exercise 2. What is ω ( ω ) ? W e h av e se veral options regarding types in t he lambd a calculus. • Untyped lambda calculus. In the untyp ed lam bda calculus, we n ev er spec ify the type of any e xpression. Thu s we never sp ecify the domain or codom ain of any fun ction. T his g iv es us maxima l ﬂexibility . I t is also very unsafe , because we might run i nto situations where we t ry to apply a function to an argument that it does not understand . • Simply-typ ed lambda ca lculus. In the simp ly-typed lambd a calculus, we always comple tely specify th e type of e very expression. Th is is very similar to the situatio n in set theor y . W e nev er allow the app lication of a function to an argument unless the typ e of the argument is the same as the domain of the fu nction. Th us, ter ms such as f ( f ) are ruled ou t, even if f is the ide ntity function . • P olymorphically typed lambda calculus. This is an in termediate situation, where we may specif y , for instance, that a term h as a type of the fo rm X → X for all X , witho ut actually specifying X . As we will see, each of these alternatives has dram atically different proper ties from the others. 1.4 Lambda calculus and computability In the 1 930’ s, several people were inter ested in the q uestion: what does it m ean for a functio n f : N → N to be comp utable ? An inform al deﬁnition of comp utability is that th ere should be a pencil-and-p aper meth od allowing a trained per son to calculate f ( n ) , for any given n . Th e concep t of a pencil-and- paper method is not so easy to fo rmalize. Three different researcher s a ttempted to do so, r esulting in the following deﬁnitions of computa bility: 1. T uring deﬁne d an id ealized computer we now call a T u ring machine , and postulated th at a f unction is computab le (in the intu itiv e sense) if and o nly if it can be computed by such a machine. 9 2. G ¨ odel deﬁned the class of general r ecursive functions as the smallest set of function s containing all the c onstant fun ctions, the succe ssor functio n, and closed und er certain o perations (such as co mpositions and rec ursion). He postulated th at a f unction is computab le (in the intu itiv e sense) if and o nly if it is general recursive. 3. Church deﬁn ed an idealized programming lang uage called the lambda cal- culus , and po stulated th at a fu nction is computa ble (in the intuitiv e sen se) if and only if it can be written as a lambda term. It was proved by Church, Kleene, Rosser , and T uring tha t all three comp utational models were equi valent to each other , i.e., each mod el de ﬁnes the same clas s of compu table fu nctions. Whether or n ot they ar e equivalent to th e “intuitive” notion of computability is a question that cannot be answered, because there is no formal deﬁnition of “intu iti ve co mputability” . The assertion that they are in fact equiv alent to intuitive co mputability is known as the Chur ch-T uring thesis . 1.5 Connections to computer science The lambda ca lculus is a v ery i dealized programmin g language; arguably , it is the simplest po ssible p rogram ming langu age that is Turing com plete. Because of its simplicity , it is a useful tool for deﬁning and proving properties of programs. Many real-world prog ramming languages can be re garded as exten sions o f the lambda calcu lus. This is true for all func tional pr ogramming lang uages , a class that includes Lisp, Scheme, Haskell, and ML. Such languages combine the lambda calculus with add itional f eatures, such a s data types, input/outp ut, side effects, updatab le memory , object orien tated featur es, etc. The la mbda calcu lus pr ovides a vehicle for studying such extension s, in isolation and jo intly , to see how they will affect each other, and to pr ove pro perties of programming language (such as: a well-form ed program will not crash). The lambda calculus is also a tool used in compiler construction , see e.g. [8, 9]. 1.6 Connections to logic In the 19th and early 20th centur ies, there was a philo sophical dispute among mathematician s about what a proof is. The so -called con structivists , such a s Brouwer an d Heyting, believed that to prove th at a mathematical object exists, one 10 must be able to construct it explicitly . Classical logicians , such as Hilbert, held that it is sufﬁcient to der i ve a contrad iction f rom the assump tion that it doesn’t exist. Ironically , on e of the better-known example s o f a proo f that isn’t con structive is Brouwer’ s proof of his own ﬁxed p oint theor em, which states that every continu- ous f unction on th e un it disk has a ﬁxed point. Th e pr oof is by contr adiction and does not give any infor mation on the location of the ﬁxed point. The connectio n between lambda calculu s and con structiv e logics is via the “proo fs- as-progr ams” paradigm. T o a con structivist, a p roof (of an existence statement) must be a “constructio n”, i.e ., a prog ram. The lamb da calculus is a no tation for such progr ams, and it can also be used as a notion for (constructive) p roofs. For t he most part, constructivism has not prev ailed as a philosophy in mainstrea m mathematics. However , there has been renewed interest in constructivism in the second h alf of the 20th centu ry . T he r eason is that co nstructive proofs give more informa tion than classical ones, and in particular , they allow one to compute s olu- tions to problems (as oppo sed to merely kn owing the existence of a solutio n). T he resulting algorithms can be useful in c omputatio nal mathem atics, f or instan ce in computer algebra systems. 1.7 Connections to mathematics One way to study the lamb da calculus is to give mathem atical models of it, i.e., to provide spaces in which lambd a terms can be giv en meaning. Such mod els are constructed using methods fr om algeb ra, partially ordered sets, topology , category theory , and o ther areas of mathematics. 2 The untyped lambda calculus 2.1 Syntax The lambd a calculus is a fo rmal langu age . The expression s of the langu age are called lambda terms , and we will give rule s for manipulating them. Deﬁnition. Assume given an inﬁnite set V of variables , denoted b y x, y , z etc. The set of lambda terms is giv en by the follo wing Backus-Naur Form: Lambda terms: M , N ::= x ( M N ) ( λx.M ) 11 The above Backus-Naur Form (BNF) is a convenient abbreviation for the follow- ing equiv alent, more traditionally mathematical deﬁnition: Deﬁnition. Assume given an inﬁn ite set V of variables. Let A be an alp habet consisting of the elements of V , and th e special symbo ls “( ”, “)”, “ λ ”, and “. ”. Let A ∗ be the set o f strings (ﬁnite sequen ces) ov er the alphabet A . T he set of lam bda terms is the smallest subset Λ ⊆ A ∗ such that: • Whenever x ∈ V then x ∈ Λ . • Whenever M , N ∈ Λ then ( M N ) ∈ Λ . • Whenever x ∈ V an d M ∈ Λ th en ( λx.M ) ∈ Λ . Comparing the two equiv alent de ﬁnitions, we see th at the Back us-Naur Form is a conv enient n otation bec ause: (1) the deﬁnition o f the alphab et c an be left im- plicit, (2 ) the use of distinct meta-sym bols fo r different sy ntactic classes ( x, y , z for variables an d M , N for terms) e liminates the need to explicitly qua ntify over the sets V an d Λ . In the futur e, we will alw ays present syntactic deﬁnition s in the BNF style. The following are some examples of lambda terms: ( λx.x ) (( λx. ( xx ))( λy . ( y y ))) ( λf . ( λx. ( f ( f x )))) Note tha t in the d eﬁnition of lam bda term s, we h av e built in eno ugh manda tory parenthe ses to en sure that ev ery ter m M ∈ Λ can be uniqu ely d ecomposed into subterms. This means, each ter m M ∈ Λ is of precisely one of the forms x , ( M N ) , ( λx.M ) . T erm s of these three form s are called va riables , app lications , and lambda abstractions , respecti vely . W e use the notatio n ( M N ) , rather than M ( N ) , to denote the application of a fu nc- tion M to an argumen t N . Thu s, in the lambd a calculus, we wr ite ( f x ) in stead of the mor e traditional f ( x ) . Th is allows us to econom ize more ef ﬁciently on the use of p arentheses. T o av oid having to write an excessiv e number of pare ntheses, we establish the following con ventio ns for writin g lambda terms: Con vention. • W e omit outer most parentheses. For instanc e, we wr ite M N instead of ( M N ) . • Application s associate to the left; thus, M N P m eans ( M N ) P . This is conv enient when applying a function t o a number o f arguments, as in f xy z , which means (( f x ) y ) z . 12 • The body o f a lamb da abstraction (the part after the dot) extends as far to the righ t as p ossible. In p articular, λx.M N m eans λx. ( M N ) , and not ( λx.M ) N . • Multiple lambda abstractions can be contr acted; thus λxy z .M will abbr e- viate λx.λy.λz .M . It is im portant to note that this co n vention is o nly for notationa l co n venience; it does not affect the “ofﬁcial” deﬁnition of lambda terms. Exercise 3. (a) Write the fo llowing terms with as few parenth esis as possible, without changing the meaning or structure of the terms: (i) ( λx. ( λy. ( λz . (( xz )( y z ))))) , (ii) ((( ab )( cd ))(( ef )( g h ))) , (iii) ( λx. (( λy. ( y x ))( λ v .v ) z ) u )( λw.w ) . (b) Restore all the dr opped parenth eses in the following terms, withou t chang- ing the meaning or structure of the terms: (i) xxxx , (ii) λx.xλy.y , (iii) λx. ( xλy.y xx ) x . 2.2 Fr ee and bound variables, α -equiv alence In our info rmal discussion of lam bda terms, we h av e already poin ted out th at the terms λx.x and λy .y , which differ on ly in th e name of their bo und variable, are essentially the same. W e will say that such terms are α -e quiv alent, an d we write M = α N . In the rare e vent tha t we want to say that two terms are pre cisely equ al, symbol for symb ol, we say that M and N are identical and we write M ≡ N . W e reserve “ = ” as a generic symbol used for different pu rposes. An occurren ce of a variable x inside a term of the form λx .N is said to be bound . The corr esponding λx is c alled a bin der , and we say that the subterm N is the scope o f the b inder . A variable oc currence that is no t bou nd is fr ee . Thus, fo r example, in the term M ≡ ( λx.x y )( λy .y z ) , x is b ound , but z is free. The variable y h as both a f ree an d a bound occu rrence. The set of free variables of M is { y , z } . 13 More generally , the set of free variables of a term M is denoted F V ( M ) , and it is deﬁned formally as follows: F V ( x ) = { x } , F V ( M N ) = F V ( M ) ∪ F V ( N ) , F V ( λx.M ) = F V ( M ) \ { x } . This de ﬁnition is an examp le of a deﬁnition by rec ursion on terms. In oth er words, in deﬁning F V ( M ) , we ass ume that we ha ve alrea dy deﬁned F V ( N ) for all subterms o f M . W e will o ften e ncounter such recursive d eﬁnitions, as well as inductive pro ofs. Before we c an forma lly deﬁn e α - equiv alence, we n eed to deﬁne wh at it m eans to rename a variable in a ter m. If x, y ar e variables, an d M is a term, we write M { y /x } fo r the result of renaming x as y in M . Renamin g is formally d eﬁned as follows: x { y /x } ≡ y , z { y /x } ≡ z , if x 6 = z , ( M N ) { y /x } ≡ ( M { y /x } )( N { y /x } ) , ( λx.M ) { y /x } ≡ λy . ( M { y /x } ) , ( λz .M ) { y/ x } ≡ λz . ( M { y / x } ) , if x 6 = z . Note that this k ind of renaming rep laces all occurr ences of x by y , wh ether free, bound , or bind ing. W e will only app ly it in cases where y do es not alread y occur in M . Finally , we are in a position to formally deﬁne wh at it mean s for two ter ms to be “the same up to renamin g of bound variables”: Deﬁnition. W e deﬁne α -eq uivalence to be the smallest congr uence relatio n = α on lambda ter ms, such that for all terms M and all variables y tha t do not occur in M , λx.M = α λy . ( M { y /x } ) . Recall that a relation on lambda term s is an equiv alence relation if it satis ﬁes rules ( r eﬂ ), ( symm ) , and ( trans ). It is a cong ruence if it also satisﬁes rules ( cong ) and ( ξ ). Thus, by d eﬁnition, α -equivalence is the smallest relatio n on lamb da terms satisfying the six rules in T a ble 1. It is easy to prove by inductio n that any lamb da term is α - equiv alent to anoth er term in which the names o f all boun d variables are distinct f rom each othe r and from any fr ee v ariables. Thus, whe n we manipulate lamb da terms in theory and 14 ( r eﬂ ) M = M ( symm ) M = N N = M ( trans ) M = N N = P M = P ( cong ) M = M ′ N = N ′ M N = M ′ N ′ ( ξ ) M = M ′ λx.M = λx.M ′ ( α ) y 6∈ M λx.M = λy . ( M { y /x } ) T a ble 1: The rules for alpha-e quiv alence in practice, we ca n (and will) always assume without loss of generality that b ound variables have been renamed to be distinct. This con ventio n is called Bar endr e gt’s variable con vention . As a remark, the no tions of fr ee and bou nd variables and α -equivalence are of course no t particular to the lambd a calculus; they ap pear in m any standard math - ematical notations, as well as in computer science. Here are four examples where the variable x is bound. R 1 0 x 2 dx P 10 x =1 1 x lim x →∞ e − x int succ(int x) { return x+1; } 2.3 Substitution In the previous section , we deﬁn ed a r enaming o peration , which allowed us to replace a variable by anoth er variable in a lambda term. Now we turn to a less trivial o peration, ca lled substitutio n , which allows us to replace a variable by a lambda term. W e will write M [ N /x ] for the result of replacing x by N in M . The deﬁnition of substitution is complicated by two circumstances: 1. W e sho uld on ly replace fr ee variables. This is bec ause the name s of bou nd variables are considere d immaterial, and should not affect the result of a substitution. Thus, x ( λxy.x )[ N /x ] is N ( λxy .x ) , and no t N ( λxy .N ) . 15 2. W e need to a void u nintended “cap ture” of free variables. Consid er fo r ex- ample the term M ≡ λx.y x , an d let N ≡ λz .xz . Note that x is f ree in N and bou nd in M . What sho uld b e the result o f sub stituting N for y in M ? If we do this naively , we get M [ N /y ] = ( λx.y x )[ N/ y ] = λx.N x = λx. ( λz .xz ) x. Howe ver , th is is not what we intend ed, since the variable x was free in N , and during the substitution , it got bou nd. W e ne ed to ac count for the fact that the x th at was b ound in M w as n ot the “same” x as the one th at was free in N . The proper thing to do is to rename the bou nd variable befor e the substitution: M [ N /y ] = ( λx ′ .y x ′ )[ N / y ] = λx ′ .N x ′ = λx ′ . ( λz .xz ) x ′ . Thus, the oper ation o f substitution forces us to sometimes rename a boun d vari- able. In this case, it is b est to pick a variable from V that has not been used yet as the n ew name of the b ound variable. A variable tha t is cu rrently unu sed is called fr e sh . Th e re ason we stipu lated that the set V is inﬁnite was to make sure a fresh variable is always a vailable when we need one. Deﬁnition. The (captur e-av oiding) sub stitution of N for free occurren ces o f x in M , in symbols M [ N / x ] , is deﬁne d as follo ws: x [ N/ x ] ≡ N , y [ N /x ] ≡ y , if x 6 = y , ( M P )[ N/ x ] ≡ ( M [ N/x ])( P [ N/x ]) , ( λx.M )[ N / x ] ≡ λx.M , ( λy .M )[ N /x ] ≡ λy . ( M [ N /x ]) , if x 6 = y and y 6∈ F V ( N ) , ( λy .M )[ N /x ] ≡ λy ′ . ( M { y ′ /y } [ N / x ]) , if x 6 = y , y ∈ F V ( N ) , and y ′ fresh. This deﬁnition has one tech nical ﬂaw: in the last clause, w e did no t s pecify which fresh variable to pick , and thu s, techn ically , substitutio n is no t well-de ﬁned. One way to solve this problem is to declare all lam bda terms to be identiﬁed up to α -equiv alence, and to prove that su bstitution is in fact we ll-deﬁned mo dulo α - equiv alence. A nother way would be to specify w hich variable y ′ to choose: for instance, assume that th ere is a well-order ing on the set V of variables, and stipu- late that y ′ should be chosen to be th e least variable that d oes n ot occu r in either M or N . 16 2.4 Intr oduction to β -re duction Con vention. From no w on, unless s tated otherwise, we identify lambda ter ms up to α -equiv alen ce. Th is means, when we speak of lambda terms b eing “equal”, we mean that th ey a re α -eq uiv alent. Formally , we regar d lam bda terms as equiv alence classes mo dulo α -eq uiv a lence. W e will of ten use the ordin ary equality symbol M = N to denote α -equiv alence. The process of ev alu ating lambd a terms by “plug ging argum ents into fun ctions” is ca lled β -r eduction . A term of the f orm ( λx.M ) N , which con sists of a la mbda abstraction ap plied to anoth er term , is called a β -r ede x . W e say th at it r educes to M [ N/x ] , and we c all the latter ter m the reduct . W e r educe lambda te rms by ﬁnding a subterm that is a redex, an d the n rep lacing that redex by its redu ct. W e repeat th is as many times as we like, or until th ere ar e no more redexes lef t to reduce. A lambda term witho ut any β -redexes is said to be in β -normal form . For example, th e lambda term ( λ x.y )(( λz .z z )( λ w.w )) can be reduced as follows. Here, we underline each redex just before reducing it: ( λx.y )(( λz .z z )( λw .w ) ) → β ( λx.y )(( λw.w )( λw.w ) ) → β ( λx.y )( λw.w ) → β y . The last ter m, y , h as no redexes and is th us in n ormal form . W e cou ld reduce th e same term differently , by choosing the redexes in a different order: ( λx.y )(( λz .z z )( λw .w )) → β y . As we can see from this example: - red ucing a redex can cr eate new redexes, - red ucing a redex can d elete some other redexes, - the number of steps that it takes to reach a n ormal form can v ar y , dependin g on the order in which the redexes are redu ced. W e can also see that the ﬁnal re sult, y , does not seem to d epend on the ord er in which the redexes are reduced . In fact, this is true in ge neral, as we will prove later . If M and M ′ are term s su ch th at M → → β M ′ , and if M ′ is in norma l for m, th en we say that M evaluates to M ′ . 17 Not every term evaluates to some thing; some terms can be reduced for ev er without reaching a normal form. The following is an example: ( λx.xx )( λy .y y y ) → β ( λy .y y y )( λy .y y y ) → β ( λy .y y y )( λy .y y y )( λy .yy y ) → β . . . This examp le also shows that the size of a lamb da term n eed not decre ase during reduction ; it can in crease, or r emain the same. T he term ( λx.xx )( λx.xx ) , wh ich we encoun tered in Section 1, is another example of a lamb da term that does not reach a normal form. 2.5 F ormal deﬁnitions of β -r eduction and β -equi valence The concept of β - reduction can be deﬁned formally as follows: Deﬁnition. W e deﬁne single-step β -r eduction to be the smallest relation → β on terms satisfying: ( β ) ( λx.M ) N → β M [ N /x ] ( cong 1 ) M → β M ′ M N → β M ′ N ( cong 2 ) N → β N ′ M N → β M N ′ ( ξ ) M → β M ′ λx.M → β λx.M ′ Thus, M → β M ′ iff M ′ is obtain ed from M by reducin g a single β - redex of M . Deﬁnition. W e write M → → β M ′ if M reduces to M ′ in zero or m ore steps. Formally , → → β is deﬁned to be the reﬂexi ve transiti ve closure of → β , i.e. , the smallest reﬂexiv e t ransitive relatio n containing → β . Finally , β -equiv ale nce is obtained by allowing red uction steps as well a s in verse reduction steps, i.e., by making → β symmetric: Deﬁnition. W e write M = β M ′ if M can be transform ed into M ′ by zero or more red uction steps and/or inverse red uction steps. Formally , = β is deﬁned to be the reﬂexive symm etric transitiv e closure of → β , i.e., the smallest equ i valence relation containing → β . 18 Exercise 4. This d eﬁnition of β -eq uiv alence is slightly different fro m the o ne giv en in class. Prove th at they are in f act the same. 3 Pr ogramming in the untyped lambda calculus One o f the amazin g facts abo ut the u ntyped lambd a calculus is that we can u se it to enc ode data, suc h as boolean s and n atural num bers, as well as pr ograms that operate on the data. This can b e done pu rely within the lam bda calculus, witho ut adding any additional syntax or axioms. W e will o ften have occasion to give names to particular lamb da ter ms; we will usually use boldface letters for such names. 3.1 Booleans W e begin by deﬁning two lamb da terms to enco de the truth values “true” and “false”: T = λxy .x F = λxy .y Let and be the term λab.ab F . V erif y the following: and TT → → β T and TF → → β F and FT → → β F and FF → → β F Note that T an d F are normal fo rms, so we can really say that a term such a s and TT evaluates to T . W e say that and encod es the boolean function “and”. It is understo od that this coding is with respect to the particular coding of “tru e” and “false”. W e d on’t claim th at and M N e v aluates to anyth ing meaningf ul if M or N are terms other than T and F . Incidentally , there is nothin g unique about th e term λab.ab F . It is on e of many possible ways of encoding the “and” function. Another possibility is λab.bab . Exercise 5. Find lambda terms or an d not th at enco de the bo olean functio ns “or” and “not”. Can you ﬁnd mor e than one term? 19 Moreover , we deﬁne the term if then else = λx.x . This term behaves like an “if-then- else” function — speciﬁcally , we have if then else T M N → → β M if then else F M N → → β N for all lambda terms M , N . 3.2 Natural numbers If f an d x are lambd a terms, and n > 0 a natur al number, write f n x for the ter m f ( f ( . . . ( f x ) . . . )) , where f occur s n times. For each natural number n , we d eﬁne a lambd a term n , called th e n th Chu r ch n umeral , as n = λf x .f n x . Here are the ﬁrst few Church numerals: 0 = λf x.x 1 = λf x.f x 2 = λf x.f ( f x ) 3 = λf x.f ( f ( f x )) . . . This particular way o f encoding the natural number s is due to Alon zo Chur ch, who was also the inventor of the lamb da calculus. Note that 0 is in fact th e same term as F ; th us, when interp reting a lambd a term, we shou ld know ahead of time whether to interpr et the result as a boolean or a numer al. The suc cessor fu nction can be d eﬁned as follows: succ = λnf x.f ( nf x ) . What does this term compu te when applied to a numeral? succ n = ( λnf x.f ( nf x ))( λf x.f n x ) → β λf x.f (( λf x.f n x ) f x ) → → β λf x.f ( f n x ) = λf x.f n +1 x = n + 1 Thus, we have proved that the term succ does indee d encode the successor fun c- tion, when app lied to a nu meral. Here are possible deﬁn itions of addition and multiplication: add = λnmf x.nf ( mf x ) mult = λnmf .n ( mf ) . 20 Exercise 6. (a) Manually e valuate the lambda terms add 2 3 and mult 2 3 . (b) Prove that add n m → → β n + m , for all natural numbers n , m . (c) Prove that mult n m → → β n · m , for all natural number s n , m . Deﬁnition. Suppo se f : N k → N is a k - ary function on the natural numbers, and that M is a lambd a term . W e say tha t M (n umeralwise) repr esents f if for all n 1 , . . . , n k ∈ N , M n 1 . . . n k → → β f ( n 1 , . . . , n k ) . This deﬁnition makes explicit what it m eans to be an “e ncoding ”. W e can say , for instance, that the ter m add = λnmf x.nf ( mf x ) r epresents the ad dition fu nction. The deﬁnition generalizes easily to boo lean fu nctions, or functions of oth er data types. Often hand y is the fun ction iszero fr om natur al num bers to boo leans, which is deﬁned by iszero (0 ) = true iszero ( n ) = false, if n 6 = 0 . Con vince yourself that the following te rm is a representation of this function: iszero = λnxy .n ( λz .y ) x. Exercise 7. Find lambda terms that represent each of the follo wing functions: (a) f ( n ) = ( n + 3) 2 , (b) f ( n ) =  true if n is e ven , false if n is odd, (c) exp ( n, m ) = n m , (d) pred ( n ) = n − 1 . Note: part (d) is not easy . In fact, Chu rch b eliev ed for a wh ile that it was impo s- sible, until his student Kleene found a solution . (In fact, Kleene said he foun d the solutio n while having his wisdom teeth p ulled, so his trick for deﬁn ing th e predecessor functio n is sometimes referred to as the “wisdom teeth trick”.) W e ha ve seen how to encode some simple bo olean and arithmetic fu nctions. How- ev er , we do n ot yet have a s ystematic metho d of constructing such fun ctions. What 21 we n eed is a mechanism for d eﬁning more co mplicated fun ctions fr om simple ones. Consider for example the factorial function, deﬁned by: 0! = 1 n ! = n · ( n − 1)! , if n 6 = 0 . The encoding of such functio ns in the lambd a calculu s is the su bject of the n ext section. It is related to the con cept of a ﬁx ed point. 3.3 Fixed points and r ecursiv e functions Suppose f is a function. W e s ay that x is a ﬁxed poin t o f f if f ( x ) = x . In arithmetic and calculus, some functions hav e ﬁx ed points, while others don’t. For instance, f ( x ) = x 2 has two ﬁxed p oints 0 and 1 , wh ereas f ( x ) = x + 1 has no ﬁxed p oints. Some functions have inﬁnitely many ﬁxed p oints, notably f ( x ) = x . W e apply the notion of ﬁxed p oints to the lamb da calculus. If F and N are lambda terms, we say th at N is a ﬁxed po int of F if F N = β N . T he lamb da calculus contrasts with a rithmetic in that every lambd a ter m has a ﬁxed po int. This is perhap s the ﬁrst surprising fact about the lambda calculus we learn in this course. Theorem 3.1. In the untyped lambda calculus, every term F has a ﬁxed point. Pr oo f. Let A = λxy.y ( xxy ) , and deﬁne Θ = AA . Now sup pose F is a ny lambda term, and let N = Θ F . W e claim th at N is a ﬁxed poin t of F . This is sh own by the following calculation: N = Θ F = AAF = ( λxy .y ( xxy )) AF → → β F ( AAF ) = F ( Θ F ) = F N .  The term Θ used in the proof is called T uring’ s ﬁxed point combinator . The impo rtance of ﬁxed points lies in th e fact that they allow us to solve equa tions . After all, ﬁn ding a ﬁx ed poin t fo r f is the same thing a s solving the equation x = f ( x ) . Th is covers equa tions with an arbitrary r ight-han d side, whose left- hand side is x . Fro m the above theorem , we know that we can alw ays solve such equations in the lambda calculus. 22 T o see how to ap ply this idea, consider the question from the last s ection, namely , how to deﬁn e the factorial f unction. The m ost natu ral deﬁnition of th e factorial function is recursive, an d we can write it in the lambda calculus as follows: fact n = if then else ( iszero n )(1)( mult n ( fa ct ( pred n ))) Here we ha ve used various abbr eviations fo r lamb da ter ms that were introd uced in the previous section. The evident problem with a recu rsiv e deﬁnition such as this one is that the term to be deﬁned, fact , appears both on the left- and th e righ t-hand side. In other words, to ﬁnd fact req uires solving an equation! W e now apply ou r n ewfound knowledge of how to solve ﬁxed point equa tions in the lambda calculus. W e start by rewriting the problem slightly: fact = λn. if then else ( iszero n )(1)( mult n ( fa ct ( pred n ))) fact = ( λf .λn. if then e lse ( iszero n )(1)( mult n ( f ( pred n )))) fact Let us temporar ily write F for the term λf .λn. if then else ( iszer o n )(1)( mult n ( f ( pred n ))) . Then the last equatio n bec omes fa ct = F fact , which is a ﬁxed point equ ation. W e can solve it up to β -equiv alence, by letting fact = Θ F = Θ ( λf .λn. if then else ( iszero n )(1)( mult n ( f ( pred n )))) Note th at fact has disappear ed from the rig ht-hand side. Th e right-h and side is a closed lam bda term that rep resents the f actorial fun ction. (A lambda term is called closed if it contains no free variables). T o see how this deﬁn ition works in practice, let u s ev aluate fact 2 . Recall fr om the proof of Theore m 3.1 that Θ F → → β F ( Θ F ) , th erefore fact → → β F fact . fact 2 → → β F fact 2 → → β if then else ( iszero 2)(1)( mult 2 ( fact ( pred 2))) → → β if then else ( F )(1 )( mult 2( fact ( pred 2))) → → β mult 2( fact ( pred 2)) → → β mult 2( fact 1) → → β mult 2( F fact 1) → → β . . . → → β mult 2( mult 1( fac t 0)) → → β mult 2( mult 1( F f act 0 )) → → β mult 2( mult 1( if t hen else ( iszero 0)(1)( mult 0 ( fact ( pred 0))))) → → β mult 2( mult 1( if t hen else ( T )(1)( mult 0( fact ( pred 0))))) 23 → → β mult 2 ( mult 1 1 ) → → β 2 Note that this calculation , while messy , is completely mechanical. Y ou can easily convince you rself th at fact 3 reduces to mult 3( fact 2) , and therefor e, by the above calc ulation, to mult 3 2 , and ﬁnally to 6 . It is n ow a matter of a sim ple induction to prove that fact n → → β n ! , for any n . Exercise 8. Write a lambda term that repre sents the Fibonac ci fun ction, deﬁned by f (0) = 1 , f (1) = 1 , f ( n + 2) = f ( n + 1) + f ( n ) , for n > 2 Exercise 9. Write a lambda term that represents the characteristic function of the prime numbers, i.e., f ( n ) = true if n is prim e, and f alse otherwise. Exercise 10. W e h av e remarked at the beginnin g of this section that the n umber- theoretic fu nction f ( x ) = x + 1 does not have a ﬁxed poin t. On the other hand , the lam bda term F = λx. succ x , which re presents the same function, d oes have a ﬁxed point by Theorem 3.1. How can you reconcile the two statements? Exercise 11 . The ﬁrst ﬁx ed p oint co mbinator for the la mbda calculus was discov- ered by Curr y . Curry ’ s ﬁxed point combin ator, which is also called the p aradoxi- cal ﬁxed point combinato r , is the term Y = λf . ( λx.f ( xx ))( λx.f ( xx )) . (a) Prove th at this is indeed a ﬁxed po int combin ator , i.e., that Y F is a ﬁxed point of F , for any term F . (b) T uring’ s ﬁxed point combinato r not on ly satisﬁes Θ F = β F ( Θ F ) , but also Θ F → → β F ( Θ F ) . W e used th is fact in ev aluating fa ct 2 . Does an an alo- gous property hold for Y ? Doe s this af fect the outcome of t he ev a luation of fact 2 ? (c) Can you ﬁnd another ﬁx ed point combinator, besides Curry’ s and T ur ing’ s? 3.4 Other data types: pairs, tuples, lists, trees, etc. So far, we have discu ssed lam bda terms that represen ted fu nctions on boo leans and n atural n umber s. Howe ver , it is easily possible to encod e mo re gen eral da ta structures in the untyp ed lamb da calcu lus. Pairs and tuples are of interest to ev- erybod y . The examp les of lists a nd tre es a re primarily inter esting to people with 24 experience in a list-processing langua ge such as LISP or PROLOG; you can safely ignore these examples if you w ant to. Pairs. If M and N are lambda terms, we deﬁne th e pair h M , N i to be t he lambda term λz .z M N . W e also deﬁne two terms π 1 = λp.p ( λxy .x ) and π 2 = λp.p ( λxy .y ) . W e ob serve th e following: π 1 h M , N i → → β M π 2 h M , N i → → β N The terms π 1 and π 2 are called the left and right pr oje ctions . T uples. The e ncoding of pairs easily extend s to arb itrary n -tuples. If M 1 , . . . , M n are terms, we deﬁne the n -tup le h M 1 , . . . , M n i as the lambda term λz .z M 1 . . . M n , and we deﬁne the i th projection π n i = λp.p ( λx 1 . . . x n .x i ) . Then π n i h M 1 , . . . , M n i → → β M i , fo r all 1 6 i 6 n . Lists. A list is different f rom a tup le, because its len gth is n ot necessarily ﬁxed. A list is either emp ty (“nil”), or else it co nsists of a ﬁrst e lement (th e “h ead”) followed by another list (the “tail”). W e write nil for th e emp ty list, and H :: T for the list who se h ead is H and wh ose tail is T . So, for in stance, the list o f the ﬁrst three n umbers can be written as 1 :: (2 :: (3 :: nil )) . W e usually o mit the parenthe ses, where it is understood that ” :: ” ass ociates to t he right. Note th at every list ends in nil . In the lambd a calcu lus, we c an deﬁne nil = λxy .y and H :: T = λxy.xH T . Here is a lambda term that adds a list of number s: addlist l = l ( λh t. add h ( addlist t ))( 0) . Of course, this is a recursive deﬁnition, a nd must be translated into an actual lambda term b y the method of Section 3.3. In the deﬁnition of addlist , l an d t are lists of numbers, an d h is a number . I f you are very diligent, you can calculate the sum of last weekend’ s Canadian lottery results by ev aluating the term addlist ( 4 :: 22 :: 24 :: 32 :: 42 :: 4 3 :: nil ) . Note that lists en able us to give an altern ativ e encoding of the natural num bers: W e can en code a natural num ber as a list of boolean s, which we interpr et as the b inary digits 0 and 1. Of course, with this enco ding, we would ha ve to carefully redesign our basic functions, such as successor, additio n, and mu ltiplication. Howe ver, if 25 done proper ly , such an encodin g would b e a lo t m ore efﬁ cient ( in terms of nu mber of β - reduction s to be perform ed) th an the encoding by Church numerals. T rees. A binary tree is a d ata structure th at can be one of two things: either a leaf , labeled by a natu ral nu mber, or a node , which has a left an d a right subtree. W e write leaf ( N ) f or a leaf lab eled N , and node ( L, R ) for a no de with left subtree L and right subtree R . W e can encode trees as lambda ter ms, for instance as follows: leaf ( n ) = λxy .xn, node ( L, R ) = λxy .y L R As an illustration, here is a p rogram (i.e., a lambda te rm) th at adds all the nu mbers at the leafs of a given tre e. addtree t = t ( λn.n )( λl r . add ( addtree l )( addtree r )) . Exercise 12. T his is a voluntary programming exercise. (a) Write a lambda term that calculates the length of a list. (b) Write a lam bda term that calculates th e depth (i. e., the n esting level) of a tree. Y ou may need to deﬁn e a f unction max that calcu lates the m aximum of two numbers. (c) Write a lambda term th at sorts a list of numbers. Y ou may a ssume given a term less that comp ares tw o numbers. 4 The Chur ch-Rosser Theore m 4.1 Extensionality , η -equivalen ce, and η -r educ tion In the un typed lam bda calculu s, any term c an b e app lied to an other ter m. T here- fore, any term can be regarded as a fu nction. C onsider a term M , n ot co ntaining the variable x , an d consider the ter m M ′ = λx.M x . Then for any argument A , we have M A = β M ′ A . So in th is sense, M and M ′ deﬁne “the sam e fu nction”. Should M and M ′ be considered equiv alent as terms? The answer dep ends on whether we want to a ccept the pr inciple that “if M an d M ′ deﬁne the same f unction, then M and M ′ are eq ual”. Th is is called the prin ciple of extensionality , and we have already encounter ed it in Section 1.1. Formally , the extensionality rule is the following: ( ext ∀ ) ∀ A.M A = M ′ A M = M ′ . 26 In th e pr esence of the axioms ( ξ ), ( con g ), and ( β ), it can be easily s een tha t M A = M ′ A is true for all terms A if and only if M x = M ′ x , where x is a fr esh variable. Therefo re, we can replace the extensionality rule by the following equi valent, but simpler rule: ( ext ) M x = M ′ x, where x 6∈ F V ( M , M ′ ) M = M ′ . Note that we can app ly the extensionality r ule in particular to the case whe re M ′ = λx.M x , w here x is no t f ree in M . As we have remar ked above, M x = β M ′ x , and thus extensionality imp lies that M = λx.M x . This last equatio n is called the η -law (eta-law): ( η ) M = λx.M x, wher e x 6∈ F V ( M ) . In fact, ( η ) and ( ext ) are eq uiv alent in the p resence of th e o ther axioms o f the lambda calcu lus. W e have already seen that ( ext ) a nd ( β ) imp ly ( η ). Con versely , assume ( η ), and assume that M x = M ′ x , fo r some terms M and M ′ not co n- taining x f reely . Then by ( ξ ), we h av e λx.M x = λx.M ′ x , hence by ( η ) and transitivity , M = M ′ . Thu s ( e x t ) ho lds. W e no te that the η -law does not fo llow from the ax ioms an d rule s o f the lambd a calculus that we hav e co nsidered so far . In particular, the terms x and λy.xy are n ot β - equiv alent, alth ough they are clearly η - equiv alent. W e will pr ove that x 6 = β λy .xy in Corollary 4.5 below . Single-step η -redu ction is the smallest relation → η satisfying ( cong 1 ), ( con g 2 ), ( ξ ), and the following axiom ( which is the same as the η -law , directed righ t to left): ( η ) λx.M x → η M , where x 6∈ F V ( M ) . Single-step β η -reductio n → β η is deﬁned as the union of the single- step β - and η -redu ctions, i.e., M → β η M ′ iff M → β M ′ or M → η M ′ . Multi-step η - reduction → → η , multi-step β η -redu ction → → β η , as well as η -eq uiv alence = η and β η -equiv a lence = β η are deﬁned in the obvious way as we did for β -red uction an d equiv alence. W e also get the evident n otions of η -n ormal fo rm, β η -norm al form, etc. 27 4.2 Statement of the Church- Rosser Th eor em, and some con- sequences Theorem (Chu rch and Rosser , 1936) . Let → → den ote either → → β or → → β η . Suppose M , N , and P ar e lambda terms such that M → → N and M → → P . Then there exis ts a lambd a term Z such that N → → Z an d P → → Z . In pictu res, the th eorem states th at the following diag ram can al ways be co m- pleted: M     ❄ ❄ ❄ ❄ ❄     ⑧ ⑧ ⑧ ⑧ ⑧ P     N     Z This p roperty is called th e Chur ch-Rosser p r ope rty , or co nﬂuence . Before w e prove th e Church -Rosser Theorem, let us highligh t so me of its consequ ences. Corollary 4.1. I f M = β N then ther e exis ts some Z with M , N → → β Z . Similarly for β η . Pr oo f. Please r efer to Figure 1 for an illustration o f this proof. Recall that = β is the reﬂexi ve symmetric transitiv e closur e of → β . Suppose that M = β N . Then there exist n > 0 and terms M 0 , . . . , M n such th at M = M 0 , N = M n , and for all i = 1 . . . n , either M i − 1 → β M i or M i → β M i − 1 . W e pr ove the claim by in duction on n . For n = 0 , we have M = N and ther e is nothin g to show . Suppose the claim has been proven for n − 1 . Then by induction hypo thesis, ther e exists a term Z ′ such th at M → → β Z ′ and M n − 1 → → β Z ′ . Further, we know that either N → β M n − 1 or M n − 1 → β N . In case N → β M n − 1 , then N → → β Z ′ , and we are d one. In case M n − 1 → β N , we app ly the Church -Rosser Theorem to M n − 1 , Z ′ , and N to obtain a t erm Z such that Z ′ → → β Z and N → → β Z . Since M → → β Z ′ → → β Z , we are done. T he pr oof in the case of β η -r eduction is identical.  Corollary 4.2. If N is a β -normal form and N = β M , th en M → → β N , a nd similarly for β η . Pr oo f. By Cor ollary 4.1, there exists some Z with M , N → → β Z . B ut N is a normal form, thus N = α Z .  28 M 5   ❄ ❄ ❄ ❄ ❄   ⑧ ⑧ ⑧ ⑧ ⑧ M 6   ❄ ❄ ❄ ❄ ❄     M 7     M 3   ❄ ❄ ❄ ❄ ❄   ⑧ ⑧ ⑧ ⑧ ⑧ M 4     M 2   ⑧ ⑧ ⑧ ⑧ ⑧ M 0   ❄ ❄ ❄ ❄ ❄ M 1     Z ′′     Z ′     Z Figure 1: The proof of Corollary 4.1 Corollary 4.3. If M and N ar e β -norma l forms s uch that M = β N , then M = α N , and similarly for β η . Pr oo f. By Corollar y 4 .2, we have M → → β N , but since M is a normal fo rm, w e have M = α N .  Corollary 4.4. If M = β N , then neither or both have a β -n ormal form. Simila rly for β η . Pr oo f. Suppose that M = β N , and that one o f them has a β -normal form. Say , for instance, that M has a normal form Z . Then N = β Z , h ence N → → β Z b y Corollary 4.2.  Corollary 4.5 . The terms x and λy .xy ar e n ot β -equivalen t. I n pa rticular , the η -rule does not follow fr om the β -rule. Pr oo f. The terms x and λy.xy are both β - normal forms, and they are not α - equiv alent. It f ollows b y Corollary 4.3 that x 6 = β λy .xy .  29 4.3 Pr eliminary remarks on the proof of the Chur ch-Rosser Theor em Consider any binary relation → on a set, and let → → be its reﬂexive transitive closure. Consider the following three properties of such relations: (a) M     ❄ ❄ ❄ ❄ ❄     ⑧ ⑧ ⑧ ⑧ ⑧ P     N     Z (b) M   ❄ ❄ ❄ ❄ ❄   ⑧ ⑧ ⑧ ⑧ ⑧ P     N     Z (c) M   ❄ ❄ ❄ ❄ ❄   ⑧ ⑧ ⑧ ⑧ ⑧ P   N   Z Each of these p roperties states th at for all M , N , P , if the so lid arrows exist, then there exists Z such that th e do tted arrows exist. The o nly difference between (a ), (b), and (c) is the difference between where → and → → are used. Property ( a) is the Church-Ro sser p roperty . Property ( c) is called the d iamond proper ty (because the diagram is shaped like a diamond). A n aiv e attempt to prove the Church-Ro sser T heorem m ight p roceed as fo llows: First, prove that th e relation → β satisﬁes pro perty (b) (this is relatively easy to prove); then use an inductive argument to con clude th at it also satisﬁes proper ty (a). Unfortu nately , this does no t work: the r eason is that in g eneral, pro perty (b) does not imply proper ty (a) ! An example of a relation t hat satisﬁes property ( b) b ut not proper ty (a) is shown in Figure 2. In other words, a proo f of proper ty (b ) is not sufﬁcient in order to prove property (a). On the other hand, pro perty (c), the d iamond pro perty , does imply p roper ty (a). This is very easy to prove b y induc tion, and the proof is illustrated in Figure 3. But unfor tunately , β -red uction does not satisfy property (c), so again we are stuck. T o sum marize, we are faced with the following d ilemma: • β - reduction satisﬁes p roper ty (b), but pr operty (b) do es not imply pro perty (a). • Property (c) imp lies pro perty (a), but β -reduction do es not satisfy pro perty (c). On the other han d, it seems ho peless to prove pro perty (a ) dire ctly . In the next 30 •   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦   ❅ ❅ ❅ ❅ ❅ •   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦ •   ⑦ ⑦ ⑦ ⑦ ⑦   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦ •   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦   ❅ ❅ ❅ ❅ ❅ •   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦ •   ⑦ ⑦ ⑦ ⑦ ⑦   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦ •   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦   ❅ ❅ ❅ ❅ ❅ •   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦ •   ⑦ ⑦ ⑦ ⑦ ⑦   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦ •   ❅ ❅ ❅ ❅ ❅ •   ⑦ ⑦ ⑦ ⑦ ⑦   ❅ ❅ ❅ ❅ ❅ •   ❅ ❅ ❅ ❅ ❅ •   ☎ ☎ ☎ ☎ ☎ ☎ •   ☎ ☎ ☎ ☎ ☎ ☎   ✿ ✿ ✿ ✿ ✿ ✿ •   ☎ ☎ ☎ ☎ ☎ ☎ . . . . . . . . . Figure 2: An example of a relation that satisﬁes proper ty (b), b ut not property (a) •   ❄ ❄ ❄   ⑧ ⑧ ⑧ •   ❄ ❄ ❄   •   ❄ ❄ ❄   •   •     ⑧ ⑧ ⑧ •     •     •   •     ⑧ ⑧ ⑧ •     •     •   •     ⑧ ⑧ ⑧ •     •     •   •   •   •   • Figure 3: Proof that prop erty (c) implies proper ty (a) 31 section, we will solve th is dilemma b y deﬁning yet an other reduction r elation ⊲ , with the following proper ties: • ⊲ satisﬁ es pro perty (c), and • the transitiv e closure of ⊲ is the same as that of → β (or → β η ). 4.4 Pr oof of the Chur ch-Rosser Theor em In this section , we will prove the Chu rch-Rosser Th eorem for β η -redu ction. The proof for β -reduction (without η ) is very similar , and in fact slightly simpler , so we omit it here. The proof presented here is due to T a it and Martin-L ¨ of. W e begin by deﬁnin g a n ew relation M ⊲ M ′ on terms, called p arallel one- step r eduction . W e d eﬁne ⊲ to be the smallest relation satisfying (1) x ⊲ x (2) P ⊲ P ′ N ⊲ N ′ P N ⊲ P ′ N ′ (3) N ⊲ N ′ λx.N ⊲ λx.N ′ (4) Q ⊲ Q ′ N ⊲ N ′ ( λx.Q ) N ⊲ Q ′ [ N ′ /x ] (5) P ⊲ P ′ , where x 6∈ F V ( P ) λx.P x ⊲ P ′ . Lemma 4.6. (a) F or all M , M ′ , if M → β η M ′ then M ⊲ M ′ . (b) F or all M , M ′ , if M ⊲ M ′ then M → → β η M ′ . (c) → → β η is the r eﬂexive , tr ansitive closur e of ⊲ . Pr oo f. (a) First note that we hav e P ⊲ P , for any term P . T his is easily shown by induction on P . W e now prove the claim by ind uction on a deriv ation of M → β η M ′ . Please r efer to pag es 18 an d 2 7 fo r the rules that d eﬁne → β η . W e make a case distinction based on the last rule used in the deriv ation of M → β η M ′ . • If the last rule was ( β ), th en M = ( λx.Q ) N and M ′ = Q [ N / x ] , fo r some Q an d N . But then M ⊲ M ′ by (4), using the facts Q ⊲ Q and N ⊲ N . 32 • If th e last ru le was ( η ), then M = λ x.P x and M ′ = P , for some P s uch that x 6∈ F V ( P ) . T hen M ⊲ M ′ follows f rom (5), using P ⊲ P . • If th e last rule was ( co ng 1 ), then M = P N and M ′ = P ′ N , for some P , P ′ , an d N where P → β η P ′ . By in duction hyp othesis, P ⊲ P ′ . Fro m this and N ⊲ N , it follows immediately th at M ⊲ M ′ by (2). • If the last rule was ( cong 2 ), we proceed similarly to the last case. • If the last rule was ( ξ ), then M = λx.N and M ′ = λx.N ′ for some N and N ′ such that N → β η N ′ . By ind uction hypoth esis, N ⊲ N ′ , which implies M ⊲ M ′ by (3). (b) W e prove this by induction on a deriv ation o f M ⊲ M ′ . W e distingu ish several cases, depend ing on the last rule used in the deriv ation. • If the last rule was (1), then M = M ′ = x , and we are done bec ause x → → β η x . • If th e last rule was (2 ), then M = P N and M ′ = P ′ N ′ , for some P , P ′ , N , N ′ with P ⊲ P ′ and N ⊲ N ′ . By inductio n hypothesis, P → → β η P ′ and N → → β η N ′ . Since → → β η satisﬁes ( con g ), it follows that P N → → β η P ′ N ′ , hence M → → β η M ′ as desired. • If the last ru le was (3), then M = λx.N an d M ′ = λx.N ′ , fo r some N , N ′ with N ⊲ N ′ . By induction hy pothesis, N → → β η N ′ , hence M = λx.N → → β η λx.N ′ = M ′ by ( ξ ). • If the l ast rule w as (4), th en M = ( λx.Q ) N an d M ′ = Q ′ [ N ′ /x ] , for some Q, Q ′ , N , N ′ with Q ⊲ Q ′ and N ⊲ N ′ . By induction hy pothesis, Q → → β η Q ′ and N → → β η N ′ . Therefor e M = ( λx.Q ) N → → β η ( λx.Q ′ ) N ′ → β η Q ′ [ N ′ /x ] = M ′ , as desired. • If the last rule was (5), th en M = λx.P x and M ′ = P ′ , for some P , P ′ with P ⊲ P ′ , and x 6∈ F V ( P ) . By induction h ypothesis, P → → β η P ′ , hence M = λx.P x → β η P → → β η P ′ = M ′ , as desired. (c) This follows directly from (a) and (b). Let us write R ∗ for the reﬂexi ve tr ansi- ti ve clo sure o f a relation R . B y (a), we hav e → β η ⊆ ⊲ , h ence → → β η = → β η ∗ ⊆ ⊲ ∗ . By (b ), we ha ve ⊲ ⊆ → → β η , hence ⊲ ∗ ⊆ → → β η ∗ = → → β η . It follo ws that ⊲ ∗ = → → β η .  33 W e will soon p rove that ⊲ satisﬁes the d iamond proper ty . Note that togeth er with Lemma 4.6(c), this will immediately imply that → → β η satisﬁes the Church-Ro sser proper ty . Lemma 4.7 (Substitution) . If M ⊲ M ′ and U ⊲ U ′ , then M [ U /y ] ⊲ M ′ [ U ′ /y ] . Pr oo f. W e assume withou t lo ss of gen erality th at any b ound variables of M are different from y and from the fr ee variables of U . The claim is now pr oved by induction on d eriv ations of M ⊲ M ′ . W e disting uish se veral cases, depen ding on the last rule used in the deriv ation: • If the last r ule was (1 ), then M = M ′ = x , for som e variable x . If x = y , then M [ U /y ] = U ⊲ U ′ = M ′ [ U ′ /y ] . If x 6 = y , then by (1), M [ U /y ] = x ⊲ x = M ′ [ U ′ /y ] . • If the last r ule was (2 ), th en M = P N and M ′ = P ′ N ′ , for some P , P ′ , N , N ′ with P ⊲ P ′ and N ⊲ N ′ . By induction hypothesis, P [ U /y ] ⊲ P ′ [ U ′ /y ] and N [ U /y ] ⊲ N ′ [ U ′ /y ] , hence by (2) , M [ U /y ] = P [ U /y ] N [ U /y ] ⊲ P ′ [ U ′ /y ] N ′ [ U ′ /y ] = M ′ [ U ′ /y ] . • If the last rule was ( 3), then M = λx.N and M ′ = λx.N ′ , for some N , N ′ with N ⊲ N ′ . By induction hy pothesis, N [ U /y ] ⊲ N ′ [ U ′ /y ] , h ence by ( 3) M [ U /y ] = λx.N [ U /y ] ⊲ λx.N ′ [ U ′ /y ] = M ′ [ U ′ /y ] . • If the l ast rule w as (4), th en M = ( λx.Q ) N an d M ′ = Q ′ [ N ′ /x ] , for some Q, Q ′ , N , N ′ with Q ⊲ Q ′ and N ⊲ N ′ . By induction hyp othesis, Q [ U /y ] ⊲ Q ′ [ U ′ /y ] and N [ U /y ] ⊲ N ′ [ U ′ /y ] , hence b y (4 ), ( λx.Q [ U / y ]) N [ U /y ] ⊲ Q ′ [ U ′ /y ][ N ′ [ U ′ /y ] /x ] = Q ′ [ N ′ /x ][ U ′ /y ] . Th us M [ U /y ] ⊲ M ′ [ U ′ /y ] . • If the last rule was (5), then M = λx.P x and M ′ = P ′ , for some P , P ′ with P ⊲ P ′ , and x 6∈ F V ( P ) . By inductio n hy pothesis, P [ U /y ] ⊲ P ′ [ U ′ /y ] , hence by (5), M [ U /y ] = λx.P [ U /y ] x ⊲ P ′ [ U ′ /y ] = M ′ [ U ′ /y ] .  A m ore conc eptual way of lookin g at this p roof is the following: c onsider any deriv ation of M ⊲ M ′ from axiom s (1)–(5). In this der i vation, re place any axiom y ⊲ y by U ⊲ U ′ , and p ropagate the changes (i.e., r eplace y by U on the lef t- hand-side , and by U ′ on th e r ight-han d-side of any ⊲ ). The r esult is a d eriv ation of M [ U /y ] ⊲ M ′ [ U ′ /y ] . (The formal pr oof that the result of th is rep lacement is indeed a valid der iv atio n req uires an ind uction, and this is the reason why the proof of the substitution lemma is so long). 34 Our next goal is to pr ove that ⊲ satisﬁes th e diamon d p roperty . Before provin g this, we ﬁrst deﬁne the maximal parallel one-step r edu ct M ∗ of a term M as follows: 1. x ∗ = x , fo r a v ariable. 2. ( P N ) ∗ = P ∗ N ∗ , if P N is not a β -redex. 3. (( λx.Q ) N ) ∗ = Q ∗ [ N ∗ /x ] . 4. ( λx.N ) ∗ = λx.N ∗ , if λx.N is not an η -redex. 5. ( λx.P x ) ∗ = P ∗ , if x 6∈ F V ( P ) . Note that M ∗ depend s only o n M . The following lem ma imp lies the diamo nd proper ty for ⊲ . Lemma 4 .8 (Maximal parallel one-step reduction s) . Whenever M ⊲ M ′ , th en M ′ ⊲ M ∗ . Pr oo f. By indu ction o n the size of M . W e distingu ish ﬁ ve cases, d ependin g on the last r ule used in the deri vation o f M ⊲ M ′ . As usual, we assume tha t all bound variables hav e been renamed to a v oid clashes. • If the last rule was (1), then M = M ′ = x , also M ∗ = x , an d we are done. • If the last rule was (2), then M = P N and M ′ = P ′ N ′ , where P ⊲ P ′ and N ⊲ N ′ . By induc tion hypothesis P ′ ⊲ P ∗ and N ′ ⊲ N ∗ . T wo cases: – If P N is no t a β -redex, then M ∗ = P ∗ N ∗ . Thus M ′ = P ′ N ′ ⊲ P ∗ N ∗ = M ∗ by (2), and we are done . – If P N is a β -r edex, say P = λx.Q , then M ∗ = Q ∗ [ N ∗ /x ] . W e d is- tinguish two subcases, d ependin g on th e last rule used in the deriv ation of P ⊲ P ′ : ∗ If the last rule was (3) , then P ′ = λx.Q ′ , wh ere Q ⊲ Q ′ . By induction hy pothesis Q ′ ⊲ Q ∗ , an d with N ′ ⊲ N ∗ , it fo llows that M ′ = ( λx.Q ′ ) N ′ ⊲ Q ∗ [ N ∗ /x ] = M ∗ by (4). ∗ If the last r ule was (5), then P = λx .Rx and P ′ = R ′ , where x 6∈ F V ( R ) and R ⊲ R ′ . Consider the ter m Q = Rx . Since Rx ⊲ R ′ x , and Rx is a sub term of M , by induc tion hypo the- sis R ′ x ⊲ ( Rx ) ∗ . By the substitution lemma, M ′ = R ′ N ′ = ( R ′ x )[ N ′ /x ] ⊲ ( R x ) ∗ [ N ∗ /x ] = M ∗ . 35 • If the last rule was (3) , then M = λx.N and M ′ = λx.N ′ , where N ⊲ N ′ . T wo cases: – If M is not an η -redex, then M ∗ = λx.N ∗ . By indu ction hypothesis, N ′ ⊲ N ∗ , hence M ′ ⊲ M ∗ by (3). – If M is an η -r edex, then N = P x , where x 6∈ F V ( P ) . In this case, M ∗ = P ∗ . W e disting uish two subcases, d ependin g o n the last r ule used in the deriv ation of N ⊲ N ′ : ∗ If the last rule w as (2), then N ′ = P ′ x , wher e P ⊲ P ′ . By induction hy pothesis P ′ ⊲ P ∗ . Hence M ′ = λx.P ′ x ⊲ P ∗ = M ∗ by (5). ∗ If the last rule was (4), then P = λy .Q and N ′ = Q ′ [ x/y ] , where Q ⊲ Q ′ . Then M ′ = λx.Q ′ [ x/y ] = λy .Q ′ (note x 6∈ F V ( Q ′ ) ). But P ⊲ λy .Q ′ , hence by inductio n hypothesis, λy .Q ′ ⊲ P ∗ = M ∗ . • If the last rule was (4), the n M = ( λx.Q ) N and M ′ = Q ′ [ N ′ /x ] , wher e Q ⊲ Q ′ and N ⊲ N ′ . Th en M ∗ = Q ∗ [ N ∗ /x ] , and M ′ ⊲ M ∗ by the substitution lemma. • If the last rule was (5), then M = λx.P x and M ′ = P ′ , where P ⊲ P ′ and x 6∈ F V ( P ) . Then M ∗ = P ∗ . By induction hypoth esis, P ′ ⊲ P ∗ , hence M ′ ⊲ M ∗ .  The previous lemma immediately implies the diamon d property for ⊲ : Lemma 4.9 (Diamond property for ⊲ ) . I f M ⊲ N and M ⊲ P , then ther e e xists Z such that N ⊲ Z and P ⊲ Z . Pr oo f. T ake Z = M ∗ .  Finally , we have a proof of the Church-Rosser Theorem: Pr oo f of Theor em 4 .2: Since ⊲ satisﬁes the diamond prope rty , it follows that its r eﬂexi ve transiti ve closure ⊲ ∗ also satisﬁes the diamo nd prop erty , as shown in Figure 3. But ⊲ ∗ is the same as → → β η by Lemma 4 .6(c), and the diamond property for → → β η is just the Church-Rosser prope rty for → β η .  36 4.5 Exer cises Exercise 13. Give a detailed proof that proper ty (c) from Section 4.3 implies proper ty (a). Exercise 14. Pr ove th at M ⊲ M , for all terms M . Exercise 15. Without using Lemma 4.8, prove that M ⊲ M ∗ for all terms M . Exercise 16. L et Ω = ( λx.xx )( λx.xx ) . Prove that Ω 6 = β η ΩΩ . Exercise 17 . What chan ges have to be made to Section 4.4 to get a pr oof of the Church-Rosser Theorem for → β , instead of → β η ? Exercise 18. Recall the pro perties (a)–( c) of b inary relatio ns → that wer e dis- cussed in Sectio n 4.3. Co nsider the following similar pro perty , which is some- times called the “strip proper ty”: (d) M   ❄ ❄ ❄ ❄ ❄     ⑧ ⑧ ⑧ ⑧ ⑧ P     N     Z. Does (d) imply (a) ? Doe s (b ) imp ly (d )? I n each case, give e ither a proo f or a counterexam ple. Exercise 19. T o every lambda term M , we may associate a directed g raph (w ith possibly m ultiple edge s and loops) G ( M ) as follows: (i) the vertice s are terms N such that M → → β N , i.e., all th e ter ms that M can β -reduce to; ( ii) th e edg es are giv en by a single-step β -redu ction. Note that the same term may have tw o (or more) reductions coming from different red exes; each such reduc tion is a sep arate edge. For example, let I = λx.x . Let M = I ( I x ) . Then G ( M ) = I ( I x ) * * 4 4 I x / / x . Note that there are two separate edges fro m I ( I x ) to I x . W e also som etimes write b ullets instead of terms, to get • ( ( 6 6 • / / • . As another exam ple, let Ω = ( λx.xx )( λx.xx ) . Then G (Ω) = • d d . 37 (a) Let M = ( λx.I ( xx ) )( λx.xx ) . Find G ( M ) . (b) For each of the following graphs, ﬁnd a term M such tha t G ( M ) is the gi ven graph, or explain why no such term e xists. (No te: th e “starting” vertex ne ed not always be the leftmost vertex in the pictur e). W arning: some of these terms are tricky to ﬁnd! (i) • / / • d d (ii) • • d d o o (iii) • • / / o o • (iv) • • o o ( ( • h h (v) • : : • ( ( o o • / / h h • d d (vi) • / / •   ✟ ✟ ✟ ✟ ✟ ✟ • [ [ ✻ ✻ ✻ ✻ ✻ ✻ (vii) • : : / / • d d   ✟ ✟ ✟ ✟ ✟ ✟ • Z Z [ [ ✻ ✻ ✻ ✻ ✻ ✻ 5 Combinatory algebras T o give a mo del of the lam bda calcu lus means to provid e a math ematical space in which the ax ioms of lambd a calculus are satisﬁed. Th is usually mean s that the elements of th e space can be understoo d a s functio ns, and that certain function s can be understoo d as elements. 38 Na¨ ıvely , o ne might try to constru ct a m odel of lam bda calculu s by ﬁn ding a set X such that X is in bijective co rrespon dence with the set X X of all functio ns from X to X . This, howe ver , is impossible: for ca rdinality reason, the equation X ∼ = X X has no solutions excep t for a one-elemen t set X = 1 . T o see this, ﬁrst note that the emp ty set ∅ is no t a solu tion. Also, s uppose X is a solution with | X | > 2 . Then | X X | > | 2 X | , but by Cantor ’ s argum ent, | 2 X | > | X | , hence X X is of greater cardinality than X , con tradicting X ∼ = X X . There are two main strategies for constructing m odels of the lambda calculus, and both in volve a restriction on th e class of fu nctions to make it sma ller . The ﬁrst approa ch, which will be d iscussed in this section, uses algebra , and the essential idea is to r eplace th e set X X of all function by a smaller, and su itably d eﬁned set o f polyn omials . The secon d app roach is to equip the set X with additiona l structure (such as topology , ord ered structu re, etc), and to r eplace X X by a set of structure -preservin g functions (f or examp le, con tinuous functio ns, mo notone function s, etc). 5.1 A pplicativ e structur es Deﬁnition. An ap plicative structure ( A , · ) is a set A together w ith a binary o p- eration “ · ”. Note tha t there ar e no fur ther assumptions; in par ticular , we do not a ssume that application is an associati ve o peration . W e write ab for a · b , an d as in the lambda calculus, we follow the con vention of lef t associativity , i.e., we wr ite abc f or ( ab ) c . Deﬁnition. Let ( A , · ) b e an applicative structure. A polyn omial in a set of v ari- ables x 1 , . . . , x n and with coefﬁcients in A is a formal expression b uilt from vari- ables and elements of A by mean s o f the application oper ation. In other words, the set of polyno mials is given by the following grammar: t, s ::= x a ts, where x ran ges over variables an d a rang es over the elem ents of A . W e write A { x 1 , . . . , x n } for the set of polyn omials in variables x 1 , . . . , x n with coefﬁcients in A . Here are some examples of polynomials in the v ariables x, y , z , where a, b ∈ A : x, xy , axx, ( x ( y ( z b )))( ax ) . 39 If t ( x 1 , . . . , x n ) is a po lynomial in the indicate d variables, a nd b 1 , . . . , b n are el- ements of A , th en we c an evaluate the polynomial at the given elements: the ev a luation t ( b 1 , . . . , b n ) th e element o f A o btained by “ pluggin g” x i = b i into the polyno mial, for i = 1 , . . . , n , and ev aluating th e resultin g expression in A . Note that in this way , ev ery polynomial t in n variables can be u nderstoo d as a function from A n → A . This is very similar to the usual poly nomials in algebra, which can also either be understoo d as formal expressions or as functions. If t ( x 1 , . . . , x n ) and s ( x 1 , . . . , x n ) ar e two polyno mials with coe fﬁcients in A , we say that the equation t ( x 1 , . . . , x n ) = s ( x 1 , . . . , x n ) holds in A if for a ll b 1 , . . . , b n ∈ A , t ( b 1 , . . . , b n ) = s ( b 1 , . . . , b n ) . 5.2 Combinatory completeness Deﬁnition (Combinator y co mpleteness) . An applicative structure ( A , · ) is com- binatorially comp lete if for every poly nomial t ( x 1 , . . . , x n ) of n > 0 variables, there exists some element a ∈ A such that ax 1 . . . x n = t ( x 1 , . . . , x n ) holds in A . In other words, co mbinator y completeness means that every polynom ial function t ( x 1 , . . . , x n ) can be represented (in curried form) by some eleme nt of A . W e are theref ore setting u p a cor respond ence between functions and elemen ts as dis- cussed in the introdu ction of this section. Note that we do not require the element a to be uniq ue in the deﬁnition of combi- natory com pleteness. Th is mean s that we are dealing with an intensional view of function s, where a given f unction migh t in gen eral hav e sev eral different names (but see the discussion of extensionality in Section 5.6). The follo wing theore m characte rizes combin atory completeness in terms of a much simpler algebraic condition . Theorem 5.1. An ap plicative structur e ( A , · ) is comb inatorially complete if a nd only if there exist two eleme nts s, k ∈ A , such that th e following equatio ns a r e satisﬁed for all x, y , z ∈ A : (1) sxy z = ( xz )( y z ) (2) k xy = x 40 Example 5.2 . Before we prove t his theorem , let us look at a few examples. (a) The iden tity func tion. Can we ﬁn d an element i ∈ A such that ix = x for all x ? Y es, indeed , we can let i = sk k . W e check that for all x , s k k x = ( k x )( k x ) = x . (b) The boolean “true”. Can we ﬁnd an element T such that for all x, y , T xy = x ? Y es, this is easy: T = k . (c) The boolean “f alse”. Can we ﬁnd F such th at F xy = y ? Y es, what we need is F x = i . Therefore a solution is F = k i . And ind eed, fo r all y , we have k ixy = iy = y . (d) Find a f unction f s uch that f x = xx for all x . Solution: let f = si i . Th en siix = ( i x )( ix ) = xx . Pr oo f of Th eor em 5.1: The “o nly if ” directio n is trivial. If A is com binatorially complete, th en consider the poly nomial t ( x, y , z ) = ( x z )( y z ) . By comb inatory completen ess, there exists som e s ∈ A with sxy z = t ( x, y , z ) , an d similarly for k . W e thus have to prove the “if ” direc tion. Recall that A { x 1 , . . . , x n } is the set of polyno mials with variables x 1 , . . . , x n . For each polyn omial t ∈ A { x, y 1 , . . . , y n } in n + 1 variables, we will deﬁne a new polynom ial λ ∗ x.t ∈ A { y 1 , . . . , y n } in n variables, as follows by recursion on t : λ ∗ x.x := i, λ ∗ x.y i := k y i where y i 6 = x is a variable, λ ∗ x.a := k a where a ∈ A , λ ∗ x.pq := s ( λ ∗ x.p )( λ ∗ x.q ) . W e claim that f or all t , the equation ( λ ∗ x.t ) x = t ho lds in A . Indeed, this is easily proved by induction on t , using the deﬁnition of λ ∗ : ( λ ∗ x.x ) x = ix = x, ( λ ∗ x.y i ) x = k y i x = y i , ( λ ∗ x.a ) x = k ax = a, ( λ ∗ x.pq ) x = s ( λ ∗ x.p )( λ ∗ x.q ) x = (( λ ∗ x.p ) x )(( λ ∗ x.q ) x ) = pq . Note that the last case uses the induction hypo thesis f or p an d q . 41 Finally , to prove the th eorem, assume tha t A h as elements s, k satisfy ing equations (1) and (2 ), and consider a p olynomia l t ∈ A { x 1 , . . . , x n } . W e must show that there exists a ∈ A such that ax 1 . . . x n = t holds in A . W e let a = λ ∗ x 1 . . . . .λ ∗ x n .t. Note th at a is a polynom ial in 0 variables, wh ich we may consider as an element of A . The n from the pre vious claim, it follo ws that ax 1 . . . x n = ( λ ∗ x 1 .λ ∗ x 2 . . . . .λ ∗ x n .t ) x 1 x 2 . . . x n = ( λ ∗ x 2 . . . . .λ ∗ x n .t ) x 2 . . . x n = . . . = ( λ ∗ x n .t ) x n = t holds in A .  5.3 Combinatory algebras By Theorem 5 .1, combinatory co mpleteness is equivalent to the existence o f the s and k ope rators. W e en shrine this in the following deﬁnition: Deﬁnition (Com binatory algeb ra) . A combin atory algebra ( A , · , s, k ) is an ap- plicative structu re ( A , · ) together with elements s, k ∈ A , satisfying the f ollowing two axioms: (1) sxy z = ( xz )( y z ) (2) k xy = x Remark 5 .3 . The oper ation λ ∗ , deﬁned in the p roof o f Theorem 5.1, is d eﬁned on the polyn omials of any combin atory algebra. It is c alled the d erived lambd a abstractor , and it satisﬁes the law of β -equiv a lence, i.e., ( λ ∗ x.t ) b = t [ b/x ] , for all b ∈ A . Finding actu al examples of co mbinatory algebras is not so easy . Here ar e some examples: Example 5.4 . The on e-element set A = { ∗} , with ∗ · ∗ = ∗ , s = ∗ , and k = ∗ , is a combin atory algebra. It is called the trivial combin atory algebra. Example 5.5 . Recall that Λ is the set of lambda terms. Let A = Λ / = β , the set of lambda ter ms modulo β -equiv alence. Deﬁne M · N = M N , S = λxyz . ( xz )( yz ) , and K = λxy .x . T hen (Λ , · , S, K ) is a combina tory algebra. Also no te that, by Corollary 4.5, this algebra is non-trivial, i.e., it has more than one element. 42 Similar examples are obtained by replacing = β by = β η , and/or replacing Λ by the set Λ 0 of closed terms. Example 5.6 . W e construct a combin atory algeb ra o f S K -terms a s follows. Let V b e a gi ven set of variables. Th e set C of terms of comb inatory logic is gi ven by the grammar: A, B ::= x S K AB , where x ranges ov er the elements of V . On C , we d eﬁne com binatory equ i valence = c as the smallest e quiv alence rela- tion satisfying S AB C = c ( AC )( B C ) , K AB = c A , a nd the ru les ( cong 1 ) and ( cong 2 ) (see pag e 18). Then the set C / = c is a co mbinatory algebra (called th e fr e e combin atory algebra generated by V , or the term algebra ). Y ou will prove in Exercise 20 that it is non-trivial. Exercise 20 . On the set C o f combin atory terms, d eﬁne a notion of single- step r e duction by the following laws: S AB C → c ( AC )( B C ) , K AB → c A, together with the u sual rules ( cong 1 ) and ( cong 2 ) (see page 18). As in lambda calculus, we call a term a n ormal form i f it cannot be reduce d. Pr ove th at the reduction → c satisﬁes the Church -Rosser p roperty . (Hint: similarly to th e lambda calculus, ﬁrst deﬁn e a suitable pa rallel one-step red uction ⊲ whose reﬂexive tr an- siti ve closure is that of → c . Then show that it satisﬁ es the diamo nd property .) Corollary 5.7. I t immediately follows fr om the Chur ch-Rosser Theor em for com- binatory logic (Exer cise 20) that two n ormal forms are = c -equivalen t if and only if they a r e equal. 5.4 The failur e of soundness f or combinatory algebras A combina tory algeb ra is almost a mod el of th e lambda calculus. I ndeed, given a c ombinato ry algebra A , we can interpret any lambda term as fo llows. T o each term M with free v ariables among x 1 , . . . , x n , we recursi vely associate a polyno- mial [ [ M ] ] ∈ A { x 1 , . . . , x n } : [ [ x ] ] := x, [ [ N P ] ] := [ [ N ] ] [ [ P ] ] , [ [ λx.M ] ] := λ ∗ x. [ [ M ] ] . 43 ( r eﬂ ) M = M ( symm ) M = N N = M ( trans ) M = N N = P M = P ( cong ) M = M ′ N = N ′ M N = M ′ N ′ ( ξ ) M = M ′ λx.M = λx.M ′ ( β ) ( λx.M ) N = M [ N / x ] T a ble 2: The rules for β -equiv a lence Notice that th is deﬁnition is almo st th e identity function, except th at we ha ve replaced the ordin ary lamb da ab stractor of lambda ca lculus by th e derived lambda abstractor of combinato ry logic. The result is a polyn omial in A { x 1 , . . . , x n } . I n the particular case where M is a closed term, we can regard [ [ M ] ] as an element o f A . T o be able to say that A is a “model” of the lambd a calculu s, we would like th e following property to be true: M = β N ⇒ [ [ M ] ] = [ [ N ] ] h olds in A . This property is called soundn ess of the interpretation. Unfor tunately , it is in general false for combinatory algebras, as the following example shows. Example 5 .8 . Let M = λx.x and N = λx. ( λy .y ) x . Then clear ly M = β N . On the other hand, [ [ M ] ] = λ ∗ x.x = i, [ [ N ] ] = λ ∗ x. ( λ ∗ y .y ) x = λ ∗ x.ix = s ( k i ) i. It follows fro m Exercise 20 and Corollary 5.7 tha t the equation i = s ( k i ) i does not hold in the comb inatory algebra C / = c . In other words, the interpr etation is not sound. Let us analyze the failure of the soundness property further . Recall that β -equiva- lence is the smallest eq uiv alence relation on lambda terms satisfying the six rules in T ab le 2. If we deﬁne a relation ∼ on lambda terms by M ∼ N ⇐ ⇒ [ [ M ] ] = [ [ N ] ] h olds in A , 44 then we may ask which of the si x rules of T able 2 the relation ∼ satisﬁes. Clear ly , not all six rules ca n be satisﬁed, or else w e would have M = β N ⇒ M ∼ N ⇒ [ [ M ] ] = [ [ N ] ] , i.e., the model would be sound. Clearly , ∼ is an equ iv alen ce relation, an d ther efore satisﬁes ( r eﬂ ), ( symm ), and ( trans ). Also, ( cong ) is satisﬁed, because whenever p, q , p ′ , q ′ are polyno mials such that p = p ′ and q = q ′ holds in A , then clearly p q = p ′ q ′ holds in A as well. Finally , we know from Remark 5.3 that the rule ( β ) is satis ﬁed. So the rule th at fails is the ( ξ ) rule. In deed, Example 5 .8 illustrates this. Note that x ∼ ( λy .y ) x (from th e pr oof of Theorem 5.1), but λx.x 6∼ λx. ( λy .y ) x , and therefor e the ( ξ ) ru le is violated. 5.5 Lambda algebras A lambda algebr a is, by deﬁnition , a combinator y alg ebra that is a sound model of lambda calculus, and in which s and k have their expected meanings. Deﬁnition (Lambda algebra) . A la mbda algebra is a combinato ry a lgebra A sat- isfying the following proper ties: ( ∀ M , N ∈ Λ) M = β N ⇒ [ [ M ] ] = [ [ N ] ] ( sou ndness ) , s = λ ∗ x.λ ∗ y .λ ∗ z . ( xz )( y z ) ( s-derived ) , k = λ ∗ x.λ ∗ y .x ( k-derived ) . The purpose of the remainder of this section is to gi ve an axiomatic description of lambda algebras. Lemma 5.9 . Recall that Λ 0 is th e set of closed lamb da terms, i.e., la mbda terms without fr ee variables. Sound ness is eq uivalent to the following: ( ∀ M , N ∈ Λ 0 ) M = β N ⇒ [ [ M ] ] = [ [ N ] ] (c losed soundne ss) Pr oo f. Clearly soun dness implies clo sed soun dness. For the con verse, assume closed soundne ss and let M , N ∈ Λ w ith M = β N . Let F V ( M ) ∪ F V ( N ) = 45 { x 1 , . . . , x n } . Then M = β N ⇒ λx 1 . . . x n .M = β λx 1 . . . x n .N by ( ξ ) ⇒ [ [ λx 1 . . . x n .M ] ] = [ [ λx 1 . . . x n .N ] ] by closed sound ness ⇒ λ ∗ x 1 . . . x n . [ [ M ] ] = λ ∗ x 1 . . . x n . [ [ N ] ] by def. of [ [ − ] ] ⇒ ( λ ∗ x 1 . . . x n . [ [ M ] ] ) x 1 . . . x n = ( λ ∗ x 1 . . . x n . [ [ N ] ] ) x 1 . . . x n ⇒ [ [ M ] ] = [ [ N ] ] by proof of Thm 5.1 This proves soundness.  Deﬁnition (T ranslations b etween combin atory logic and lamb da calcu lus) . Let A ∈ C be a comb inatory term (see Ex ample 5.6). W e deﬁne its translation to lambda calculus in the obvious w ay: the translation A λ is giv en recursi vely by: S λ = λxy z . ( xz )( y z ) , K λ = λxy .x, x λ = x, ( AB ) λ = A λ B λ . Con versely , g i ven a lambda term M ∈ Λ , we recursi vely deﬁne its translation M c to combinato ry l ogic like this: x c = x, ( M N ) c = M c N c , ( λx.M ) c = λ ∗ x. ( M c ) . Lemma 5.10. F or all lambda terms M , ( M c ) λ = β M . Lemma 5.11 . Let A b e a combina tory algebra satisfying k = λ ∗ x.λ ∗ y .x an d s = λ ∗ x.λ ∗ y .λ ∗ z . ( xz )( y z ) . Then for all combina tory terms A , ( A λ ) c = A holds in A . Exercise 21. Pr ove L emmas 5.10 and 5.11. Let C 0 be the set o f closed co mbinatory terms. The following is our ﬁrst u seful characterizatio n of lambda calculus. Lemma 5.12. Let A be a combina tory a lgebra. The n A is a lambda algebra if and only if it satisﬁes the following pr op erty: ( ∀ A, B ∈ C 0 ) A λ = β B λ ⇒ A = B holds in A . (a lt-soundn ess) 46 Pr oo f. First, assume that A satisﬁes ( alt-sou ndness ) . T o prove ( closed sou ndness ), let M , N be lambda terms with M = β N . Then ( M c ) λ = β M = β N = β ( N c ) λ , hence b y ( alt-so undness ), M c = N c holds in A . But this is th e de ﬁnition of [ [ M ] ] = [ [ N ] ] . T o p rove ( k-derived ), note that k λ = ( λx.λy.x ) by deﬁnition of ( − ) λ = (( λx.λy.x ) c ) λ by Lemma 5.10 = ( λ ∗ x.λ ∗ y .x ) λ by deﬁnition of ( − ) c . Hence, by ( alt-soundness ), it follo ws that k = ( λ ∗ x.λ ∗ y .x ) h olds in A . Similarly for ( s-derived ). Con versely , assume that A is a lambda algebra. Let A, B ∈ C 0 and assume A λ = β B λ . By sou ndness, [ [ A λ ] ] = [ [ B λ ] ] . By de ﬁnition o f the in terpretation , ( A λ ) c = ( B λ ) c holds in A . But by ( s-d erived ), ( k -derived ), and Lem ma 5. 11, A = ( A λ ) c = ( B λ ) c = B holds in A , proving ( alt-sound ness ) .  Deﬁnition (Homo morph ism) . Let ( A , · A , s A , k A ) , ( B , · B , s B , k B ) be comb ina- tory algeb ras. A ho momorphism of combinator y algebras is a function ϕ : A → B such that ϕ ( s A ) = s B , ϕ ( k A ) = k B , and for all a, b ∈ A , ϕ ( a · A b ) = ϕ ( a ) · B ϕ ( b ) . Any given homo morph ism ϕ : A → B can be extend ed to poly nomials in the obvious w ay: we deﬁne ˆ ϕ : A { x 1 , . . . , x n } → B { x 1 , . . . , x n } b y ˆ ϕ ( a ) = ϕ ( a ) for a ∈ A , ˆ ϕ ( x ) = x if x ∈ { x 1 , . . . , x n } , ˆ ϕ ( pq ) = ˆ ϕ ( p ) ˆ ϕ ( q ) . Example 5.13 . If ϕ ( a ) = a ′ and ϕ ( b ) = b ′ , then ˆ ϕ (( ax )( by )) = ( a ′ x )( b ′ y ) . The following is th e main technical concept needed in the characterization of lambda algebr as. W e say th at an equ ation holds absolu tely if it hold s in A and in any homom orphic image o f A . If an equatio n holds only in the p revious sen se, then we sometimes say it holds locally . Deﬁnition (Ab solute equatio n) . Let p , q ∈ A { x 1 , . . . , x n } be two po lynomials with coefﬁcients in A . W e s ay t hat the equation p = q holds absolutely in A if for all combinato ry algeb ras B and all ho momorp hisms ϕ : A → B , ˆ ϕ ( p ) = ˆ ϕ ( q ) holds in B . If an equ ation holds absolutely , we write p = abs q . 47 ( a ) 1 k = abs k , ( b ) 1 s = abs s, ( c ) 1 ( k x ) = abs k x, ( d ) 1 ( sx ) = abs sx, ( e ) 1 ( sxy ) = abs sxy , ( f ) s ( s ( kk ) x ) y = abs 1 x, ( g ) s ( s ( s ( k s ) x ) y ) z = abs s ( sxz )( sy z ) , ( h ) k ( xy ) = abs s ( k x )( k y ) , ( i ) s ( k x ) i = abs 1 x. T a ble 3: An axiom atization of lambda algebras. Her e 1 = s ( k i ) . W e can now state th e main th eorem char acterizing lamb da algebras. Let 1 = s ( k i ) . Theorem 5.14. Let A be a combinato ry alg ebra. Then the following ar e equiva- lent: 1. A is a lambd a algebra, 2. A satisﬁe s (alt-sound ness) , 3. for all A, B ∈ C such that A λ = β B λ , the e quation A = B h olds absolu tely in A , 4. A a bsolutely satisﬁes the nine axioms in T able 3, 5. A satisﬁes (s-derived ) and (k-derived ), and for all p, q ∈ A { y 1 , . . . , y n } , if px = abs q x then 1 p = abs 1 q , 6. A satisﬁes (s-derived ) and (k-derived ), and for all p, q ∈ A { x, y 1 , . . . , y n } , if p = abs q th en λ ∗ x.p = abs λ ∗ y .q . The proof proceeds via 1 ⇒ 2 ⇒ 3 ⇒ 4 ⇒ 5 ⇒ 6 ⇒ 1 . W e h av e already proven 1 ⇒ 2 in Lemma 5.12. T o prove 2 ⇒ 3 , let F V ( A ) ∪ F V ( B ) ⊆ { x 1 , . . . , x n } , and assume A λ = β B λ . T hen λx 1 . . . x n . ( A λ ) = β λx 1 . . . x n . ( B λ ) , hence ( λ ∗ x 1 . . . x n .A ) λ = β ( λ ∗ x 1 . . . x n .B ) λ (why?). Since the latter terms are clo sed, it fo llows by the ru le ( alt-soun dness ) that λ ∗ x 1 . . . x n .A = λ ∗ x 1 . . . x n .B holds in A . Since clo sed equations are pr eserved by hom omorph isms, th e latter also ho lds in B for any 48 homom orphism ϕ : A → B . Fina lly , this imp lies that A = B h olds for any such B , proving that A = B holds absolutely in A . Exercise 22. Pr ove th e implication 3 ⇒ 4 . The implication 4 ⇒ 5 is the m ost difﬁcult part of the theorem. W e ﬁrst dispense with the easier part: Exercise 2 3. Prove th at the ax ioms from T ab le 3 imply ( s-derived ) an d ( k-derived ). The last part of 4 ⇒ 5 need s the following lemma: Lemma 5.15 . Suppose A satisﬁes the nin e axioms fr om T able 3. Deﬁne a struc- tur e ( B , • , S, K ) b y: B = { a ∈ A | a = 1 a } , a • b = sab, S = k s, K = k k . Then B is a well-d eﬁned c ombinato ry algebra. Moreover , the fun ction ϕ : A → B deﬁned by ϕ ( a ) = k a deﬁnes a homomorph ism. Exercise 24. Pr ove L emma 5.15. T o prove the implication 4 ⇒ 5 , assume a x = b x holds absolutely in A . Then ˆ ϕ ( ax ) = ˆ ϕ ( bx ) holds in B by deﬁnition of “absolute” . But ˆ ϕ ( ax ) = ( ϕa ) x = s ( k a ) x and ˆ ϕ ( bx ) = ( ϕb ) x = s ( k b ) x . Therefore s ( k a ) x = s ( k b ) x holds in A . W e p lug in x = i to get s ( k a ) i = s ( k b ) i . By axio m ( i ) , 1 a = 1 b . T o p rove 5 ⇒ 6 , assume p = abs q . T hen ( λ ∗ x.p ) x = abs p = abs q = abs ( λ ∗ x.q ) x by the proof of Theorem 5.1. Then by 5., ( λ ∗ x.p ) = abs ( λ ∗ x.q ) . Finally , to prove 6 ⇒ 1 , note that if 6 holds, th en th e absolute inter pretation satisﬁes the ξ -r ule, and therefore satisﬁes all the axioms of lambda calculus. Exercise 25. Pr ove 6 ⇒ 1 . Remark 5.16 . The ax ioms in T able 3 are req uired to hold a bsolutely . Th ey can be r eplaced b y loc al axio ms by p refacing each axiom with λ ∗ xy z . Note that this makes the axioms much longer . 5.6 Extensional combinatory algebras Deﬁnition. An app licativ e structur e ( A , · ) is extensional if for all a, b ∈ A , if ac = bc h olds for all c ∈ A , then a = b . 49 Proposition 5.17 . In an e x tensional combinato ry algebra, the ( η ) axioms is valid. Pr oo f. By ( β ), ( λ ∗ x.M x ) c = M c fo r all c ∈ A . Therefo re, by extensionality , ( λ ∗ x.M x ) = M .  Proposition 5.18 . In a n extensional co mbinatory a lgebra, an equation hold s lo- cally if and only if it holds absolutely . Pr oo f. Clearly , if an equ ation ho lds abso lutely , then it ho lds loca lly . C on versely , assume the e quation p = q hold s locally in A . Let x 1 , . . . , x n be the variables occurrin g in the equation. By ( β ), ( λ ∗ x 1 . . . x n .p ) x 1 . . . x n = ( λ ∗ x 1 . . . x n .q ) x 1 . . . x n holds locally . By extensionality , λ ∗ x 1 . . . x n .p = λ ∗ x 1 . . . x n .q holds. Since this is a clo sed equ ation (no free variables), it autom atically holds absolutely . This im plies that ( λ ∗ x 1 . . . x n .p ) x 1 . . . x n = ( λ ∗ x 1 . . . x n .q ) x 1 . . . x n holds absolutely , an d ﬁnally , by ( β ) again , that p = q holds absolutely .  Proposition 5.19. E very extensional combinatory algebra i s a lambd a algebr a. Pr oo f. By Theo rem 5.14(6), it sufﬁces to p rove ( s-d erived ), ( k-d erived ) and the ( ξ )-ru le. Let a, b, c ∈ A be arbitrary . T hen ( λ ∗ x.λ ∗ y .λ ∗ z . ( xz )( y z )) abc = ( ac )( bc ) = sabc by ( β ) and deﬁnition o f s . Ap plying extensionality three times (with respect to c , b , and a ), we get λ ∗ x.λ ∗ y .λ ∗ z . ( xz )( y z ) = s. This proves ( s-derived ). The pro of of ( k-d erived ) is similar . Finally , to prove ( ξ ), assume that p = abs q . Then by ( β ), ( λ ∗ x.p ) c = ( λ ∗ x.q ) c for all c ∈ A . By extensionality , λ ∗ x.p = λ ∗ x.q ho lds.  50 6 Simply-typed lambda calculus, prop ositional logic, and the Curry-Howard isomorphism In the unty ped la mbda calculus, we spo ke ab out f unctions without speaking about their domains and codomains. Th e domain and codomain of an y function was the set of all lam bda terms. W e now introduce types into the lam bda calculus, an d thus a notion of domain an d codo main for functions. Th e difference between ty pes and sets is that ty pes ar e syntactic o bjects, i.e. , we can speak of typ es withou t having to speak of their elements. W e can think of types as names for sets. 6.1 Simple types and simply-typed terms W e assume a set of basic typ es. W e usually use the Gr eek letter ι (“iota”) to denote a basic type. The set of simple types is given b y the following B NF: Simple types: A, B ::= ι A → B A × B 1 The in tended m eaning of these ty pes is as f ollows: base types are things like the type of integers or the type of boolean s. Th e type A → B is the type of fun ctions from A to B . The type A × B is the ty pe of pairs h x, y i , where x has type A and y has type B . The type 1 is a one-eleme nt typ e. Y ou can think of 1 as an abridged version o f the booleans, in which there is only one b oolean instead of two. Or you can thin k of 1 as the “void” or “unit” type in many pr ogramm ing languag es: the result type of a function that has no real result. When we write types, we adopt the con ventio n th at × binds stronger than → , and → associates to the right. Th us, A × B → C is ( A × B ) → C , and A → B → C is A → ( B → C ) . The set of raw typed lambda terms is gi ven by the following B NF: Raw terms: M , N ::= x M N λx A .M h M , N i π 1 M π 2 M ∗ Unlike what we d id in the unty ped lamb da calcu lus, we hav e added special syntax here f or p airs. Spe ciﬁcally , h M , N i is a pair of ter ms, π i M is a projectio n, w ith the intention th at π i h M 1 , M 2 i = M i . Also, we have added a term ∗ , wh ich is th e unique element o f the type 1 . O ne other change fr om th e un typed lam bda calculus is that we now write λx A .M for a lam bda abstractio n to indicate that x has ty pe A . However , we will so metimes omit the super scripts and write λx.M as before. The notions of fre e an d bound variables and α -co n version ar e deﬁned as for th e untyped lambda calculus; again we identify α -equiv alent terms. 51 ( var ) Γ , x : A ⊢ x : A ( app ) Γ ⊢ M : A → B Γ ⊢ N : A Γ ⊢ M N : B ( abs ) Γ , x : A ⊢ M : B Γ ⊢ λx A .M : A → B ( pair ) Γ ⊢ M : A Γ ⊢ N : B Γ ⊢ h M , N i : A × B ( π 1 ) Γ ⊢ M : A × B Γ ⊢ π 1 M : A ( π 2 ) Γ ⊢ M : A × B Γ ⊢ π 2 M : B ( ∗ ) Γ ⊢ ∗ : 1 T a ble 4: T yping rules for the simply-type d lambda calculus W e call the above terms the raw terms, bec ause we have not y et imposed any typing discipline on these terms. T o avoid meaningless terms s uch as h M , N i ( P ) or π 1 ( λx.M ) , we in troduce typing rules . W e u se the colon no tation M : A to me an “ M is of type A ”. (Similar to the element no tation in set th eory). T he typing r ules are expressed in term s of typing judgments . A typing judgm ent is an e xpression of the form x 1 : A 1 , x 2 : A 2 , . . . , x n : A n ⊢ M : A. Its meaning is: “und er th e assumption that x i is of type A i , for i = 1 . . . n , the term M is a well-typed term of type A . ” Th e free variables of M must be contained in x 1 , . . . , x n . The idea is th at in order to d etermine the type of M , we must ma ke some assumptions about t he ty pe of its fre e variables. For instance, the term xy will have type B if x : A → B an d y : A . Clearly , the type of xy depen ds on the type of its free variables. A seque nce of assumptions of the fo rm x 1 : A 1 , . . . , x n : A n , as in the left- hand-side of a typing judg ment, is called a typing context . W e always assume that no variable appears more than o nce i n a typing con text, and we allow typing co ntexts to be re- ordered implicitly . W e often use the Greek letter Γ to stand for an arbitrary typing context, and we use the notatio ns Γ , Γ ′ and Γ , x : A to denote the con catenation of typing contexts, where it is alw ays assumed that the sets of variables a re disjoint. The symbol ⊢ , whic h appears in a typing judgment, is ca lled the turnstile s ymbol. Its purpo se is to separate the left-hand side from the right-han d side. The typin g rules for the simp ly-typed lambda calculu s are shown in T able 4. The rule ( var ) is a tautolog y: under th e assumption that x has type A , x h as type A . The r ule ( app ) states that a fu nction o f type A → B can be app lied to an a rgument 52 of type A to produce a result of type B . The rule ( a bs ) states that i f M is a term of type B with a free v ariable x of type A , then λx A .M is a function of t ype A → B . The other rules hav e similar interpretations. Here is an example of a v alid typing deri vation: x : A → A, y : A ⊢ x : A → A x : A → A, y : A ⊢ x : A → A x : A → A, y : A ⊢ y : A x : A → A, y : A ⊢ xy : A x : A → A, y : A ⊢ x ( xy ) : A x : A → A ⊢ λy A .x ( xy ) : A → A ⊢ λx A → A .λy A .x ( xy ) : ( A → A ) → A → A One impo rtant p roperty of these ty ping rules is that ther e is precisely o ne rule for each k ind of lam bda term. Thus, whe n we co nstruct typin g der iv atio ns in a bottom-u p fashion, the re is a lw ays a un ique cho ice of w hich r ule to ap ply next. The only real choice we hav e is about which types to assign to v ariables. Exercise 26. Give a typing deri v ation of each of the follo wing typing judgments: (a) ⊢ λx ( A → A ) → B .x ( λy A .y ) : (( A → A ) → B ) → B (b) ⊢ λx A × B . h π 2 x, π 1 x i : ( A × B ) → ( B × A ) Not a ll t erms are typa ble. For i nstance, the terms π 1 ( λx.M ) and h M , N i ( P ) cannot be assigned a type, and neither can the ter m λx.xx . Here, by “assigning a type” w e m ean, assign ing types to the free and b ound variables such th at the correspo nding typin g jud gment is deri vable. W e say that a t erm is ty pable if it can be assigned a type. Exercise 27 . Show that neither o f the three t erms mentioned in the previous par a- graph is typable. Exercise 28. W e said that we will identify α - equiv alent terms. Show th at this is actu ally ne cessary . In particular, show that if we d idn’t identify α - equiv alent terms, there would be no v alid deriv ation o f the typing judgment ⊢ λx A .λx B .x : A → B → B . Giv e a deri v ation of this typing judgment using the bound v ariable con vention. 53 6.2 Connections to pr opositional logic Consider the following types: (1) ( A × B ) → A (2) A → B → ( A × B ) (3) ( A → B ) → ( B → C ) → ( A → C ) (4) A → A → A (5) (( A → A ) → B ) → B (6) A → ( A × B ) (7) ( A → C ) → C Let us ask, in each case, whether it is p ossible to ﬁnd a closed term of the given type. W e ﬁnd th e following term s: (1) λx A × B .π 1 x (2) λx A .λy B . h x, y i (3) λx A → B .λy B → C .λz A .y ( xz ) (4) λx A .λy A .x an d λx A .λy A .y (5) λx ( A → A ) → B .x ( λy A .y ) (6) ca n’t ﬁnd a closed term (7) ca n’t ﬁnd a closed term Can we an swer th e general questio n, given a type , wheth er there exists a closed term for it? For a new w ay to lo ok at the problem , take the types ( 1)–(7 ) and make the fo llow- ing r eplacement o f symbols: r eplace “ → ” by “ ⇒ ” and rep lace “ × ” by “ ∧ ”. W e obtain the following formu las: (1) ( A ∧ B ) ⇒ A (2) A ⇒ B ⇒ ( A ∧ B ) (3) ( A ⇒ B ) ⇒ ( B ⇒ C ) ⇒ ( A ⇒ C ) (4) A ⇒ A ⇒ A (5) (( A ⇒ A ) ⇒ B ) ⇒ B (6) A ⇒ ( A ∧ B ) (7) ( A ⇒ C ) ⇒ C Note that the se ar e formulas of propositional log ic, whe re “ ⇒ ” is implication , an d “ ∧ ” is conjunction (“and”). Wh at c an we say about the validity of these formulas? It turns out that (1)– (5) are tautolo gies, wher eas ( 6)–(7) are not. Thus, the types 54 for which we could ﬁnd a lambda term turn out t o be the ones that are valid when considered as formulas in prop ositional lo gic! This is not entirely coin cidental. Let us c onsider, f or exam ple, how to prove ( A ∧ B ) ⇒ A . T he pr oof is very short. It go es as follows: “ Assume A ∧ B . Then , by the ﬁrst part of that assumption , A holds. Thus ( A ∧ B ) ⇒ A . ” On the other hand , the lambda ter m of th e correspo nding typ e is λx A × B .π 1 x . Y ou can see that the re is a close conne ction between the proof and the lambda term. Namely , if o ne reads λx A × B as “assume A ∧ B (call th e assumption ‘ x ’) ”, and if one rea ds π 1 x as “by th e ﬁrst pa rt of assumption x ”, then this lambd a term can be r ead as a proof of the pr oposition ( A ∧ B ) ⇒ A . This connection between simply-typed lambda calculus and propositional logic is known as th e “Curr y-Howard isomorphism”. Since typ es o f th e lambd a calcu lus correspo nd to formulas in proposition al logic, and ter ms correspond to proofs, the concept is also kn own a s the “pro ofs-as-pr ograms” p aradigm, or the “formu las- as-types” cor respond ence. W e will make the a ctual corre sponden ce more pre cise in the next tw o sections. Before we go any fur ther , we must make one im portant point. When we are going to m ake precise the conn ection between simply-typ ed lambda calculus and propo sitional logic, we w ill see that the approp riate logic is intuition istic logic , and not th e ordinar y classical logic th at we are u sed to fr om mathem atical practice. The m ain difference between intuitionistic and classical logic is that the form er misses the principles of “pr oof by con tradiction” and “e xcluded middle”. The principle of p roof by con tradiction states that if the assumption “n ot A ” leads to a contrad iction th en we have pr oved A . The prin ciple of exclude d midd le states that either “ A ” or “not A ” must be true. Intuitionistic logic is also k nown as co nstructive logic , because all p roofs in it are by con struction. Thus, in intuitionistic logic, the on ly way to p rove the ex- istence of some objec t is by actually con structing the ob ject. This is in contrast with classical logic, where we may prove the existence o f a n object simp ly by deriving a contradictio n fr om th e assumptio n that the object d oesn’t exist. The disadvantage of constru ctiv e logic is that it is ge nerally more d ifﬁcult to prove things. The advantage is th at onc e one has a p roof, the pr oof can be transfo rmed into an algorithm. 55 6.3 Pr opositional intuitionistic logic W e start by introd ucing a system for intuitio nistic log ic that uses on ly th ree co n- nectives: “ ∧ ”, “ → ”, an d “ ⊤ ”. F ormula s A, B . . . are built fro m atomic f ormulas α, β , . . . via the BNF Formulas: A, B ::= α A → B A ∧ B ⊤ . W e now nee d to form alize pr oofs. The fo rmalized p roofs will be called “deriv a- tions”. The system we introd uce he re is known as natu ral ded uction , and is du e to Gentzen (193 5). In n atural ded uction, derivations are certain kinds of trees. In gen eral, we will dea l with deriv ations of a formula A from a set of a ssumptions Γ = { A 1 , . . . , A n } . Such a deriv ation will be written schematically as x 1 : A 1 , . . . , x n : A n . . . A . W e simplif y the bookkeepin g by g iving a name to each assumption, and we will use lower-case letters such as x, y , z f or such n ames. In u sing th e ab ove notation for schematically writin g a deriv ation of A from assumptions Γ , it is understoo d that the deriv ation m ay in fact use a given assumption m ore than once, or ze ro times. The rules for constru cting deri v ations are as follows : 1. (Axiom ) ( ax ) x : A A x is a derivation of A fr om assumption A (and possibly oth er assump tions that wer e used zer o time s). W e h av e written the letter “ x ” next to the rule, to indicate precisely which assumption we hav e used here. 2. ( ∧ -introd uction) If Γ . . . A and Γ . . . B 56 are deriv ations of A an d B , respectively , then ( ∧ -I ) Γ . . . A Γ . . . B A ∧ B is a derivation of A ∧ B . I n o ther words, a p roof of A ∧ B is a p roof of A and a proof of B . 3. ( ∧ -elimination ) If Γ . . . A ∧ B is a deriv a tion of A ∧ B , then ( ∧ -E 1 ) Γ . . . A ∧ B A and ( ∧ -E 2 ) Γ . . . A ∧ B B are de riv a tions of A and B , respe cti vely . In o ther words, from A ∧ B , we are allowed to conclud e both A and B . 4. ( ⊤ -introdu ction) ( ⊤ -I ) ⊤ is a de riv atio n o f ⊤ (possibly from some a ssumptions, wh ich were not used). In other words, ⊤ is always true. 5. ( → -intro duction) If Γ , x : A . . . B is a deriv a tion of B fro m assumptions Γ and A , then ( → -I ) Γ , [ x : A ] . . . B A → B x 57 is a d eriv ation o f A → B from Γ alone. Her e, the assump tion x : A is no longer an assum ption of the ne w deriv ation — we say that it has been “can- celed”. W e indicate canceled a ssumptions by en closing them in brackets [ ] , and we in dicate the place whe re the assump tion was canceled by writing the letter x next to the rule where it was canceled. 6. ( → -elimina tion) If Γ . . . A → B and Γ . . . A are deriv ations of A → B and A , r espectiv ely , then ( → -E ) Γ . . . A → B Γ . . . A B is a derivation of B . I n oth er words, from A → B and A , we are allo wed to conclud e B . T his rule is sometimes called by its L atin name, “mo dus ponen s”. This ﬁnishes the deﬁnition o f deri vations in natural deduction. Note that, with the exception of the a xiom, each rule belongs t o some speciﬁc logical connecti ve, and there are introduction and elimination rules. “ ∧ ” and “ → ” hav e both in troduction and elimination rules, whereas “ ⊤ ” only has an introdu ction rule. In natur al deductio n, like in rea l mathem atical lif e, assumptions can be m ade at any tim e. Th e challenge is to get rid of assumptions once they are made. In the end, we w ould like to have a deriv ation of a given formu la that depen ds on as few assumption s as p ossible — in fact, we do n’t r egard the fo rmula as proven unless we can derive it from n o assumptions. The rule ( → - I ) allows us to discard temporar y assumptions that we might have ma de during the proof . Exercise 29. Gi ve a deri v ation, in n atural de duction, for each of the f ormulas (1)–(5 ) from Section 6.2. 6.4 An altern ative p r esentation of natural deduction The ab ove notation f or n atural dedu ction d eriv ations suffers fro m a pro blem of presentation : since assump tions are ﬁrst written down, later ca nceled dynam ically , 58 it is not easy to see when each assumption in a ﬁnished deriv ation w as canceled. The f ollowing alternate presentation of natural deduction works by deriving entire judgments , rath er than formulas . Rather tha n keeping track of assump tions as the leav es of a proof tree, we annotate each formula in a deri v ation with the entire set of assumptio ns that were used in der iving it. In practice, this ma kes d eriv ations more verbo se, by repe ating most assump tions on each line. In theory , howe ver, such deriv ations are easier to reason about. A judgmen t is a stateme nt of the fo rm x 1 : A 1 , . . . , x n : A n ⊢ B . It states that the formu la B is a consequen ce of the ( labeled) assump tions A 1 , . . . , A n . Th e rules of natural deduction can now be reformu lated as rules for deriving judgments: 1. (Axiom ) ( ax x ) Γ , x : A ⊢ A 2. ( ∧ -introd uction) ( ∧ -I ) Γ ⊢ A Γ ⊢ B Γ ⊢ A ∧ B 3. ( ∧ -elimination ) ( ∧ -E 1 ) Γ ⊢ A ∧ B Γ ⊢ A ( ∧ -E 2 ) Γ ⊢ A ∧ B Γ ⊢ B 4. ( ⊤ -introdu ction) ( ⊤ -I ) Γ ⊢ ⊤ 5. ( → -intro duction) ( → -I x ) Γ , x : A ⊢ B Γ ⊢ A → B 6. ( → -elimina tion) ( → -E ) Γ ⊢ A → B Γ ⊢ A Γ ⊢ B 59 6.5 The Curry-Howard Isomorphism There is an obvious one-to -one correspon dence between types of the s imply-typ ed lambda calculus and the formu las of pro positional intu itionistic logic intro duced in Section 6 .3 (provided that the set of b asic types c an be iden tiﬁed with the set of atomic formu las). W e will identify formulas and types from n ow on, where it is conv enient to do so. Perhaps less obviou s is the fact that deriv a tions are in one-to- one corresponden ce with simply -typed lambda term s. T o be p recise, we will give a translation from deriv ations to lambda terms, and a translation from lambda terms to d eriv ations, which are mutually in verse up to α -equiv alence. T o any deri vation of x 1 : A 1 , . . . , x n : A n ⊢ B , we will associate a lambd a term M such th at x 1 : A 1 , . . . , x n : A n ⊢ M : B is a v alid typing judgment. W e deﬁne M by recursion on the deﬁnition of deri vations. W e prove simultaneo usly , by induction, that x 1 : A 1 , . . . , x n : A n ⊢ M : B is indeed a valid typing judgment. 1. (Axiom ) If the deriv ation is ( ax x ) Γ , x : A ⊢ A , then the lambda term is M = x . Clearly , Γ , x : A ⊢ x : A is a v alid typ ing judgmen t by ( var ). 2. ( ∧ -introd uction) If t he deriv ation is ( ∧ -I ) . . . Γ ⊢ A . . . Γ ⊢ B Γ ⊢ A ∧ B , then th e lam bda term is M = h P , Q i , wher e P and Q are the terms as- sociated to th e two respective subderi vations. By induction hypoth esis, Γ ⊢ P : A and Γ ⊢ Q : B , thus Γ ⊢ h P , Q i : A × B by ( pair ). 3. ( ∧ -elimination ) If the deriv ation is ( ∧ -E 1 ) . . . Γ ⊢ A ∧ B Γ ⊢ A , 60 then we let M = π 1 P , where P is the term associated to the subderiv ation. By ind uction h ypothesis, Γ ⊢ P : A × B , thus Γ ⊢ π 1 P : A by ( π 1 ). The case of ( ∧ -E 2 ) is entirely symmetric. 4. ( ⊤ -introdu ction) If the deriv ation is ( ⊤ -I ) Γ ⊢ ⊤ , then let M = ∗ . W e have ⊢ ∗ : 1 b y ( ∗ ). 5. ( → -intro duction) If the deriv ation is ( → -I x ) . . . Γ , x : A ⊢ B Γ ⊢ A → B , then we let M = λx A .P , where P is the term associated to the sub deriv a- tion. By induction hypoth esis, Γ , x : A ⊢ P : B , hence Γ ⊢ λx A .P : A → B by ( abs ). 6. ( → -elimina tion) Finally , if the deriv ation is ( → -E ) . . . Γ ⊢ A → B . . . Γ ⊢ A Γ ⊢ B , then we let M = P Q , where P an d Q are the te rms associated to the two respective subder iv atio ns. By in duction h ypothesis, Γ ⊢ P : A → B and Γ ⊢ Q : A , thus Γ ⊢ P Q : B by ( app ). Con versely , gi ven a well-typed lambda term M , with associated ty ping judgmen t Γ ⊢ M : A , then we can con struct a de riv a tion of A fro m assumption s Γ . W e deﬁne th is de riv a tion by recursion o n the typ e der i vation of Γ ⊢ M : A . The details are to o tedious to spell them ou t here; we simp ly go thr ough each of th e rules ( var ), ( a bs ), ( ap p ), ( pa ir ) , ( π 1 ), ( π 2 ), ( ∗ ) and apply the c orrespon ding ru le ( ax ), ( → -I ), ( → -E ) , ( ∧ -I ), ( ∧ -E 1 ), ( ∧ -E 2 ), ( ⊤ -I ), respecti vely . 61 6.6 Reductions in the simply-typed lambda calculus β - a nd η -r eduction s in the simply- typed lambda calcu lus are deﬁn ed much in the same way a s for the u ntyped lam bda calculus, except that we hav e introdu ced some add itional terms (such a s pairs an d proje ctions), which calls fo r some addi- tional reduction rules. W e deﬁne the following reductions: ( β → ) ( λx A .M ) N → M [ N /x ] , ( η → ) λx A .M x → M , where x 6∈ F V ( M ) , ( β × , 1 ) π 1 h M , N i → M , ( β × , 2 ) π 2 h M , N i → N , ( η × ) h π 1 M , π 2 M i → M , ( η 1 ) M → ∗ , if M : 1 . Then single- and mu lti-step β - and η -reductio n are deﬁned as the usual contextual closure of the above ru les, and the deﬁnitions of β - and η - equiv alence also follow the u sual pa ttern. In addition to the usual ( co ng ) an d ( ξ ) rules, we now also have congru ence rules that apply to pairs and projections. W e remar k that, to be perfec tly pr ecise, we should h av e d eﬁned redu ctions be- tween typin g judg ments, and no t between ter ms. This is necessary becau se some of the reductio n ru les, no tably ( η 1 ) , depend on th e ty pe of the term s inv olved. Howe ver , this would be notationally very cumbersom e, and we w ill blu r th e dis- tinction, pretendin g at t imes that terms appear in some implicit typing co ntext that we do not write. An important prope rty of the reduction is the “subject reductio n” proper ty , which states th at well-type d terms r educe only to well-typed ter ms of the same typ e. This has an imm ediate application to pro grammin g: subject reduc tion guarantees that if we write a prog ram of type “integer”, t hen the ﬁnal result of evaluating the progr am, if any , will indeed be an inte ger , and not, say , a boolean . Theorem 6.1 (Subject Reduction) . If Γ ⊢ M : A and M → β η M ′ , then Γ ⊢ M ′ : A . Pr oo f. By inductio n on the d eriv a tion of M → β η M ′ , an d by case distinction on the last rule u sed in the derivation of Γ ⊢ M : A . For instance, if M → β η M ′ by ( β → ) , then M = ( λx B .P ) Q and M ′ = P [ Q/x ] . I f Γ ⊢ M : A , then we must have Γ , x : B ⊢ P : A and Γ ⊢ Q : B . It follows that Γ ⊢ P [ Q /x ] : A ; the latter statement can b e proved separa tely (as a “substitution lem ma”) by ind uction on P and makes crucial use of the fact that x and Q hav e the same type. 62 The o ther ca ses are similar, and we leave them as an exercise. Note that, in par- ticular , one ne eds to conside r the ( con g ), ( ξ ), and oth er cong ruence rules a s well.  6.7 A word on Chu r ch-Rosser One imp ortant the orem that do es not h old f or β η -redu ction in the simply- typed λ → , × , 1 -calculus is the Church-Rosser theorem. T he culprit is the r ule ( η 1 ) . For instance, if x is a variable of ty pe A × 1 , the n the term M = h π 1 x, π 2 x i reduces to x by ( η × ) , but also to h π 1 x, ∗i by ( η 1 ) . Both th ese ter ms are norm al for ms. Thus, the Church- Rosser property f ails. h π 1 x, π 2 x i η × { { ✈ ✈ ✈ ✈ ✈ ✈ ✈ ✈ ✈ ✈ η 1 & & ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ x h π 1 x, ∗i There are several ways aroun d this pr oblem. For instance, if we omit a ll th e η - reduction s an d consider only β -redu ctions, then the C hurch- Rosser p roperty does hold. Eliminating η -reductions does n ot have much o f an effect on the lambda calculus f rom a computation al point of view; alre ady in the unty ped lamb da cal- culus, we noticed that all inter esting calculations could in fact be carried out with β - reduction s alone. W e can say that β -r eduction s are the engine for computa tion, whereas η -r eductions o nly serve to clean up th e result. In p articular, it can never happen that som e η -r eduction inhib its anoth er β -reduc tion: if M → η M ′ , and if M ′ has a β -redex, th en it must be the case that M already has a co rrespond ing β - redex. Also, η - reduction s always reduce the size of a ter m. It fo llows th at if M is a β -n ormal f orm, then M can always b e red uced to a β η - normal f orm (n ot necessarily uniqu e) in a ﬁnite sequence of η -reductio ns. Exercise 30. Prove the Church-Rosser theor em fo r β -reductions in the λ → , × , 1 - calculus. Hint: use the sam e method that we used in the untyped case. Another solution is to omit the type 1 and the term ∗ fr om th e languag e. In this case, the Church- Rosser property holds e ven for β η -red uction. Exercise 31 . Prove the Chu rch-Rosser th eorem for β η - reduction in th e λ → , × - calculus, i.e., the simply-typ ed lambda calculus without 1 an d ∗ . 63 6.8 Reduction as pr oof simpliﬁcation Having made a one- to-one correspon dence between simp ly-typed lamb da ter ms and deriv ations in intuition istic natu ral de duction, we may now ask wh at β - and η -redu ctions co rrespond to u nder this correspon dence. I t turns out that these re- ductions can be though t of as “proof simpliﬁcation steps”. Consider for example the β -r eduction π 1 h M , N i → M . If we translate the left- hand side and the righ t-hand sid e via the Curry-Howard isomorp hism (h ere we use the ﬁrst notation for natural deduction ), we get ( ∧ -E 1 ) ( ∧ -I ) Γ . . . A Γ . . . B A ∧ B A → Γ . . . A . W e can see th at the left d eriv a tion con tains an in troduction r ule imm ediately fo l- lowed by an elimination ru le. This lead s to an obvio us simpliﬁcation if we rep lace the left deriv ation by the right one. In g eneral, β -redexes cor respond to situation s where an introd uction rule is im- mediately followed by an elimination rule, and η - redexes correspo nd to si tuations where an elimination rule is immediately followed by an introduc tion rule. For example, consider the η - reduction h π 1 M , π 2 M i → M . This tran slates to: ( ∧ -I ) ( ∧ -E 1 ) Γ . . . A ∧ B A ( ∧ -E 2 ) Γ . . . A ∧ B B A ∧ B → Γ . . . A ∧ B Again, this is a n obvious simpliﬁcation step, but it has a side co ndition: the left and rig ht subder i vation must be the same! Th is side con dition correspon ds to the fact th at in the red ex h π 1 M , π 2 M i , th e two sub terms c alled M must b e eq ual. It is another characteristic of η -r eduction s that they often carry such side con ditions. The reduction M → ∗ tran slates as follows : Γ . . . ⊤ → ( ⊤ -I ) ⊤ 64 In other w ords, any deriv ation of ⊤ can be replaced by the c anonical such deri va- tion. More inte resting is the case of the ( β → ) rule. Here, we ha ve ( λx A .M ) N → M [ N /x ] , which can be translated via the C urry-Howard Iso morphism as follows: ( → -E ) ( → -I ) Γ , [ x : A ] . . . B A → B x Γ . . . A B → Γ , Γ . . . A . . . B . What is going on here is that we have a d eriv ation M of B f rom assumptions Γ and A , and we ha ve anoth er derivation N of A f rom Γ . W e can directly ob tain a deriv ation of B from Γ by stacking the second deriv a tion on top of the ﬁrst! Notice that th is last pr oof “simpliﬁcatio n” step may not actually b e a simpliﬁca- tion. Namely , if the hyp othesis labeled x is used many time s in the derivation M , then N will have to be c opied many time s in the right- hand side term . This correspo nds to the fact that if x occurs se veral tim es in M , then M [ N /x ] might be a longer and more complicated term than ( λx.M ) N . Finally , co nsider the ( η → ) rule λx A .M x → M , w here x 6∈ F V ( M ) . This tran s- lates to deriv ations as follo ws: ( → -I ) ( → -E ) Γ . . . A → B ( ax ) [ x : A ] A x B A → B x → Γ . . . A → B 6.9 Getting mileage out of the Curry-Howard isomorph ism The Curry-Howard isomorp hism m akes a connection between the lambda calculus and lo gic. W e can think of it as a con nection betwee n “p rogram s” and “ proofs”. What is such a connection good for? Like any isomorp hism, it allo ws us to switch back and fo rth and think in wh ichever system suits our intuitio n in a given situ- ation. Moreover , we can save a lot of work by transferrin g theore ms th at were proved abou t the lambda calcu lus to lo gic, and vice v ersa. As an example, we will 65 see in the next section h ow to add d isjunctions to propo sitional intuitionistic lo gic, and then we will explore what we can learn about the lambda calculus from that. 6.10 Disjunction and sum types T o the BNF for fo rmulas of proposition al intuitionistic logic from Section 6.3, we add the following clauses: Formulas: A, B ::= . . . A ∨ B ⊥ . Here, A ∨ B stand s for disjunction, or “or” , and ⊥ stands fo r falsity , which we can also thin k of as zero- ary disjunctio n. The symbol ⊥ is also kn own by the names o f “bo ttom”, “absurd ity”, o r “ contradictio n”. The rules for constructin g deriv ations are e xtended by the following cases: 7. ( ∨ -introd uction) ( ∨ -I 1 ) Γ ⊢ A Γ ⊢ A ∨ B ( ∨ -I 2 ) Γ ⊢ B Γ ⊢ A ∨ B In o ther words, if we have proven A or we have proven B , then we ma y conclud e A ∨ B . 8. ( ∨ -elimination ) ( ∨ -E x,y ) Γ ⊢ A ∨ B Γ , x : A ⊢ C Γ , y : B ⊢ C Γ ⊢ C This is known as the “p rinciple of case distinction”. If we know A ∨ B , and we wish to prove so me form ula C , then we ma y proceed by cases. In th e ﬁrst case, we assume A ho lds and p rove C . In th e secon d case, we assume B ho lds and prove C . In either case, we prove C , wh ich ther efore h olds indepen dently . Note th at th e ∨ -elimination rule differs f rom all o ther r ules we h av e consid- ered so far, b ecause it inv olves some arbitrary formula C that is not direc tly related to the principal formula A ∨ B b eing eliminated. 9. ( ⊥ -elimination) ( ⊥ -E ) Γ ⊢ ⊥ Γ ⊢ C , for an ar bitrary fo rmula C . This rule formalizes the familiar p rinciple “ex falsum quodlibet”, which means that f alsity implies anything. 66 There is no ⊥ -introd uction ru le. This is sym metric to the fact that there is no ⊤ -elimination rule. Having extended our logic with disjunctions, we can now ask what these disjunc- tions corr espond to un der the Curry -Howard isomorphism. N aturally , we need to extend th e lambd a calculu s by as many new terms as we h av e n ew rules in the logic. It turns out that disjunctio ns correspond to a concept that is quite natural in progr amming: “sum ” or “union” types. T o th e lambda calculus, add type constructo rs A + B and 0 . Simple types: A, B ::= . . . A + B 0 . Intuitively , A + B is the disjoint union of A and B , as in set theory: an element of A + B is e ither an element o f A o r an elem ent o f B , to gether with an ind ication of which one is the case. I n particular, if we consider an element of A + A , we can s till tell whether it is in the le ft or righ t compone nt, even thou gh the tw o types are the same. In progr amming lan guages, this is sometimes kn own as a “union ” or “variant” typ e. W e call it a “ sum” type here. The typ e 0 is simply the empty type, correspon ding to the empty set in set theory . What should the lambda terms be that go with these new type s? W e know fr om our experienc e with th e Curry- How ard isomorp hism that we have to have pre- cisely one term co nstructor f or each introd uction or elimin ation r ule of natural deduction . Moreover , we know that if such a ru le has n sub deriv ations, the n o ur term constructo r has to h av e n immediate subterm s. W e also know something about b ound variables: Each time a hyp othesis is can celed in a natural ded uction rule, ther e must be a bind er of the corr esponding variable in the lambda calculus. This informatio n more or less unique ly determines what the lambda terms should be; the only choice that is left is what to call them! W e ad d four terms to the lambda calculus: Raw terms: M , N , P ::= . . . in 1 M in 2 M case M o f x A ⇒ N | y B ⇒ P  A M The typing rules for these n ew terms are shown in T able 5. By comparing these rules to ( ∨ -I 1 ), ( ∨ -I 2 ), ( ∨ -E ) , an d ( ⊥ -E ), you can see that th ey are precisely analogo us. But what is the meaning o f these n ew terms? The term in 1 M is s imply an element of the left comp onent o f A + B . W e c an think of in 1 as the injection function A → A + B . Sim ilar for in 2 . Th e term ( case M of x A ⇒ N | y B ⇒ P ) is a case d istinction: e valuate M of type A + B . The answer is eithe r an element of 67 ( in 1 ) Γ ⊢ M : A Γ ⊢ in 1 M : A + B ( in 2 ) Γ ⊢ M : B Γ ⊢ in 2 M : A + B ( case ) Γ ⊢ M : A + B Γ , x : A ⊢ N : C Γ , y : B ⊢ P : C Γ ⊢ ( case M o f x A ⇒ N | y B ⇒ P ) : C (  ) Γ ⊢ M : 0 Γ ⊢  A M : A T a ble 5: T yping rules for sums the left comp onent A or of the r ight co mponen t B . In the ﬁrst case, assign th e answer to the variable x an d evaluate N . In the seco nd case, assign the answer to the variable y and evaluate P . Sin ce both N an d P are o f type C , we g et a ﬁnal result of ty pe C . No te that the case statement is very similar to an if- then- else; the o nly difference is tha t the two alternatives also carr y a value. I ndeed, the bo oleans can b e deﬁn ed as 1 + 1 , in wh ich case T = in 1 ∗ , F = in 2 ∗ , an d if then e lse M N P = case M o f x 1 ⇒ N | y 1 ⇒ P , where x and y d on’t o ccur in N and P , resp ectiv ely . Finally , the term  A M is a simple type cast, correspo nding to the un ique fun ction  A : 0 → A from the empty set to any set A . 6.11 Classical logic vs. intuitionistic logic W e hav e men tioned befor e that the natu ral d eduction calculu s we have presented correspo nds to intuition istic logic, and not classical logic. Bu t what exactly is the difference? W ell, the difference is that in intu itionistic lo gic, we have no rule fo r proof by contradiction , and we do not have A ∨ ¬ A as an axiom . Let us ado pt th e fo llowing con vention for negation : the fo rmula ¬ A (“ not A ”) is regarded as an abb reviation fo r A → ⊥ . This way , we do not have to introdu ce special fo rmulas and rules for negation; we simply use the existing rules fo r → and ⊥ . In intuitio nistic logic, the re is no der iv atio n of A ∨ ¬ A , for gene ral A . Or eq uiv- alently , in the simply -typed lambd a calculus, there is n o closed term of typ e A + ( A → 0) . W e are not y et in a po sition to prove this formally , but informally , the argumen t goes as fo llows: If the typ e A is empty , then there can be n o closed 68 term of type A (other wise A would have that term as an e lement). On the other hand, if the type A is non-empty , then ther e can be no closed term of type A → 0 (or othe rwise, if we applied that term to some elemen t of A , we would ob tain an element of 0 ). But if we wer e to wr ite a generic ter m of ty pe A + ( A → 0) , th en this term would have to work no matter wh at A is. Thus, the term would have to decide whether to u se the lef t or righ t component ind ependen tly of A . But for an y such term, we can get a contradictio n by choosing A eith er empty or non-emp ty . Closely related is the fact that in intuitionistic logic, we do not ha ve a principle of proof by contradiction . Th e “proof by contradiction” rule is the following: ( contra x ) Γ , x : ¬ A ⊢ ⊥ Γ ⊢ A . This is not a rule o f intuitionistic p roposition al lo gic, but we can explore wh at would happen if we wer e to add such a rule . First, we ob serve that the contradic- tion rule is very similar to the following: Γ , x : A ⊢ ⊥ Γ ⊢ ¬ A . Howe ver , since we deﬁned ¬ A to be the same as A → ⊥ , the latter ru le is an instance of ( → -I ). The con tradiction rule, on the othe r hand, is not an in stance of ( → -I ). If we admit the rule ( contra ), then A ∨ ¬ A can be derived. Th e following is such a deriv ation: ( → -E ) ( ax y ) y : ¬ ( A ∨ ¬ A ) ⊢ ¬ ( A ∨ ¬ A ) ( → -E ) ( ax y ) y : ¬ ( A ∨ ¬ A ) , x : A ⊢ ¬ ( A ∨ ¬ A ) ( ∨ -I 1 ) ( ax x ) y : ¬ ( A ∨ ¬ A ) , x : A ⊢ A y : ¬ ( A ∨ ¬ A ) , x : A ⊢ A ∨ ¬ A ( ∨ -I 2 ) ( → -I x ) y : ¬ ( A ∨ ¬ A ) , x : A ⊢ ⊥ y : ¬ ( A ∨ ¬ A ) ⊢ ¬ A y : ¬ ( A ∨ ¬ A ) ⊢ A ∨ ¬ A ( contr a y ) y : ¬ ( A ∨ ¬ A ) ⊢ ⊥ ⊢ A ∨ ¬ A Con versely , if we adde d A ∨ ¬ A as a n axio m to intuitio nistic logic, then this already imp lies th e ( contra ) rule. Na mely , fro m any d eriv a tion of Γ , x : ¬ A ⊢ ⊥ , we can obtain a deriv ation of Γ ⊢ A by using A ∨ ¬ A as an axiom. Thus, we can simulate the ( contra ) rule, in the presence of A ∨ ¬ A . ( ∨ -E x,y ) ( excluded middle ) Γ ⊢ A ∨ ¬ A ( ⊥ -E ) Γ , x : ¬ A ⊢ ⊥ Γ , x : ¬ A ⊢ A ( ax y ) Γ , y : A ⊢ A Γ ⊢ A 69 In this sense, we can say that the rule ( contra ) and the axiom A ∨ ¬ A are equiv a- lent, in the presence of the other axioms and rules of intuitionistic logic. It turns ou t that the s ystem of intu itionistic logic plu s ( co ntra ) is e quiv alent to classical lo gic as we know it. It is in this sen se th at we can say that intu itionistic logic is “classical logic without proofs by contrad iction”. Exercise 32 . The formu la (( A → B ) → A ) → A is called “ Peirce’ s law”. It is valid in classical logic, but not in intuitionistic logic. Giv e a proof of Peirce’ s law in natural deduc tion, using the rule ( contra ). Con versely , Peir ce’ s law , when add ed to intuitio nistic logic for all A an d B , im- plies ( contra ). Here is the proof. Recall that ¬ A is an abbreviation fo r A → ⊥ . ( → -E ) ( P eir ce’s law for B = ⊥ ) Γ ⊢ (( A → ⊥ ) → A ) → A ( → -I x ) ( ⊥ -E ) Γ , x : A → ⊥ ⊢ ⊥ Γ , x : A → ⊥ ⊢ A Γ ⊢ ( A → ⊥ ) → A Γ ⊢ A W e sum marize the results of this section in terms of a slogan: intuitionistic logic + ( contra ) = intu itionistic logic + “ A ∨ ¬ A ” = intu itionistic logic + Peirce’ s law = classical log ic. The pro of theory of in tuitionistic logic is a very interesting subject in its own right, and an entire course could be taught just on that subject. 6.12 Classical logic and the Curry-Howard isomorph ism T o extend the Cu rry-Howard isomorphism to classical log ic, according to the ob- servations of the previous section, it is sufﬁcient to add to the lambda ca lculus a term repre senting Peir ce’ s law . All we have to d o is to add a term C : (( A → B ) → A ) → A , for all types A an d B . Such a term is known as F elleisen’s C , and it has a speciﬁc interpretation in terms of p rogram ming lang uages. It can be un derstood as a con trol opera tor (similar to “goto” , “break”, or exception h andling in some pr ocedura l pro grammin g lan - guages). 70 Speciﬁcally , Felleisen’ s interpr etation requires a term of the form M = C ( λk A → B .N ) : A to be evaluated as follo ws. T o e valuate M , ﬁrst e valuate N . Note that both M and N have type A . If N retu rns a result, then this immediately becomes the result of M as well. On the other hand , if during the e valuation o f N , the f unction k is e ver called with some argument x : A , then the further e valuation o f N is ab orted, and x im mediately becomes the result of M . In other words, the ﬁnal result of M can be calculated anywhere inside N , n o matter h ow de eply nested, b y p assing it to k as an argument. The function k is known as a continua tion . There is a lot more to p rogram ming with con tinuations than can be explained in these lecture no tes. For a n interesting ap plication of con tinuations to compiling , see e.g. [9] from the biblio graphy (Section 15). The above explanation of what it means to “ev aluate” th e ter m M glosses over several d etails. In particular, we have not gi ven a reduction rule for C in the style of β -redu ction. T o do so is rather complicated and is beyond the scope of these notes. 7 W eak and strong normalization 7.1 Deﬁnitions As we have seen, computing with lambd a terms means red ucing lambda terms to normal form. By th e Church- Rosser theor em, such a n ormal form is guaranteed to be uniq ue if it exists. But so far , we have paid little attention to the question whether normal forms e xist for a given term , and if so, ho w we need to reduce the term to ﬁnd a normal form. Deﬁnition. Given a notion of term and a red uction relatio n, we say that a ter m M is weakly no rmalizing if there exists a ﬁnite sequen ce of redu ctions M → M 1 → . . . → M n such that M n is a normal form. W e say th at M is str on gly normalizing if there does n ot exist an inﬁnite sequenc e of redu ctions starting from M , or in other words, if e very sequence of reductio ns starting from M is ﬁnite. Recall the following con sequence o f the Church-Rosser theorem, which we stated as Corollary 4.2: If M has a n ormal fo rm N , then M → → N . It follows that a term M is weak ly n ormalizing if an d on ly if it has a normal fo rm. Th is d oes no t 71 imply that e very po ssible way of reducin g M leads to a normal form. A term is strongly n ormalizing if and only if e very way of reducing it leads to a normal form in ﬁnitely many steps. Consider for example the follo wing terms in the untyped lambda calculus: 1. The term Ω = ( λx.xx )( λx.x x ) is neither weakly nor stro ngly normalizing . It does not hav e a normal form. 2. The term ( λx.y )Ω is weakly n ormalizing , but no t strong ly no rmalizing. It reduces to the normal form y , b ut it also has an in ﬁnite reduction sequence. 3. The term ( λx.y )(( λx.x )( λx .x )) is stron gly nor malizing. While there ar e se veral different ways to red uce this term, they all lea d to a no rmal form in ﬁnitely many steps. 4. The term λx.x is stron gly n ormalizing , since it has n o reduction s, much less an inﬁnite redu ction sequence. More g enerally , ev ery normal form is strongly normalizing. W e see immediately th at strongly normalizing implies weakly n ormalizing . How- ev er , as the above examp les s how , the con verse is not true. 7.2 W eak and str ong normalizati on in typed lambda calculus W e fo und that the term Ω = ( λx.xx )( λx .xx ) is not weakly or strongly normaliz- ing. On the oth er h and, we also k now that th is ter m is not typab le in the simp ly- typed lambda calcu lus. This is not a coincidence, as the following theorem shows. Theorem 7.1 (W eak n ormalization theo rem) . In the simply-typ ed lambd a ca lcu- lus, all terms ar e weakly normalizing. Theorem 7.2 (Strong no rmalization th eorem) . In the s imply-typed lambda calcu- lus, all terms ar e str ongly no rmalizing. Clearly , the stron g n ormalization theo rem implies the weak norma lization theo- rem. Howe ver , the weak normalizatio n theorem is much easier to prove, which is the reason we proved b oth these theo rems in class. In particu lar , the proo f of the weak nor malization the orem gives an explicit measure of the co mplexity of a term, in terms of the numb er o f redexes of a certain degree in the term. Th ere 72 is n o co rrespon ding complexity measur e in th e pr oof of the strong normalization theorem. Please r efer to Chapters 4 and 6 o f “Proo fs and T y pes” by Girard, Lafont, and T a ylor [2] for the proof s of Theorem s 7 .1 and 7.2, respectively . 8 Polymor phism The polymorphic lambd a calculus, als o known as “System F”, is obtained extend- ing the C urry-Howard isomo rphism to the quantiﬁer ∀ . For e xample, consider t he identity function λx A .x . This function h as ty pe A → A . Anothe r iden tity fu nc- tion is λx B .x of type B → B , and so fo rth f or every type. W e can thus th ink o f the iden tity fu nction as a family o f f unctions, o ne fo r ea ch typ e. In the polymor- phic lambd a calcu lus, the re is a dedicated syn tax f or su ch families, and we write Λ α.λx α .x of type ∀ α.α → α . System F was ind epende ntly discovered by Jean-Yves Girard an d John Reynolds in the early 1970’ s. 8.1 Syntax of System F The primar y difference between System F and simply- typed lamb da ca lculus is that System F h as a new kind of fun ction that takes a type , rather th an a term , as its argum ent. W e can also think o f such a functio n as a family of term s that is indexed by a type. Let α, β , γ rang e over a coun table set o f type variab les . The types of System F are giv en by the grammar T y pes: A, B ::= α A → B ∀ α.A A type of the for m A → B is called a fu nction type , and a type of the f orm ∀ α.A is ca lled a u niversal type . The type variable α is b ound in ∀ α.A , and we identify types u p to ren aming of bound variables; th us, ∀ α.α → α and ∀ β .β → β are the same ty pe. W e write F T V ( A ) fo r the set of free type variables of a type A , deﬁned inductively by : • F T V ( α ) = { α } , • F T V ( A → B ) = F T V ( A ) ∪ F T V ( B ) , 73 ( var ) Γ , x : A ⊢ x : A ( app ) Γ ⊢ M : A → B Γ ⊢ N : A Γ ⊢ M N : B ( abs ) Γ , x : A ⊢ M : B Γ ⊢ λx A .M : A → B ( typeapp ) Γ ⊢ M : ∀ α.A Γ ⊢ M B : A [ B /α ] ( typeabs ) Γ ⊢ M : A α 6∈ F T V (Γ) Γ ⊢ Λ α.M : ∀ α.A T a ble 6: T y ping rules for System F • F T V ( ∀ α.A ) = F T V ( A ) \ { α } . W e also write A [ B /α ] for the r esult of rep lacing all fr ee occurren ces of α by B in A . Just like the substitution of terms (see Section 2.3), this type substitution must be captu r e-fr ee , i.e., sp ecial care mu st be taken to ren ame any boun d v ariables of A so that their names are dif ferent from the free v ariables of B . The terms of System F are: T erms: M , N ::= x M N λx A .M M A Λ α.M Of these, variables x , applicatio ns M N , and lambd a abstractions λx A .M are ex- actly as f or the simply-ty ped lambda calculus. The new terms ar e type application M A , which is the application of a type function M to a type A , and type abstr ac- tion Λ α.M , which denotes the type function that maps a type α to a term M . The typing rules for System F are shown in T able 6. W e also write F T V ( M ) f or the set of free type variables in the term M . W e need a ﬁnal notion of sub stitution: if M is a term, B a ty pe, a nd α a type variable, we write M [ B /α ] f or the capture-f ree sub stitution of B for α in M . 8.2 Reduction rules In System F , ther e are two r ules for β -reductio n. The ﬁrst on e is the familiar rule for the applicatio n of a func tion to a term. The second one is an an alogous rule 74 for the application of a type function to a type. ( β → ) ( λx A .M ) N → M [ N /x ] , ( β ∀ ) (Λ α.M ) A → M [ A/α ] , Similarly , there are two rules for η - reduction . ( η → ) λx A .M x → M , if x 6∈ F V ( M ) , ( η ∀ ) Λ α.M α → M , if α 6∈ F T V ( M ) . The congr uence and ξ -r ules are as expected: M → M ′ M N → M ′ N N → N ′ M N → M N ′ M → M ′ λx A M → λx A M ′ M → M ′ M A → M ′ A M → M ′ Λ αM → Λ αM ′ 8.3 Examples Just as in the untyped lambd a calcu lus, many interesting data types an d op erations can be encoded in System F . 8.3.1 Boo leans Deﬁne the System F type bool , and terms T , F : bool , as follows: bool = ∀ α.α → α → α, T = Λ α.λx α .λy α .x, F = Λ α.λx α .λy α .y . It is ea sy to see fro m the typin g r ules that ⊢ T : b ool and ⊢ F : bo ol ar e valid typing judgeme nts. W e can deﬁne an if-then-else operation if then else : ∀ β . bool → β → β → β , if then else = Λ β .λz bool .z β . It is then easy to see that, for any type B a nd a ny p air of terms M , N : B , we h av e if then else B T M N → → β M , if then else B F M N → → β N . 75 Once we hav e if-then-else, it is easy to deﬁne other b oolean operations, for exam- ple and = λa bool .λb bool . if then e lse bool a b F , or = λa bool .λb bool . if then e lse bool a T b, not = λa bool . if then else bool a F T . Later , in Pro position 8.8, we will sh ow that up to β η equality , T and an d F are the only closed terms of type boo l . This, together with the if- then-else operation, justiﬁes calling this the type of booleans. Note t hat the above enco dings of the b ooleans a nd their if-then-else o peration in System F is exactly the same as the corr espondin g encodings in the untyped lambda calculus f rom Section 3 .1, provid ed that one erases all the ty pes and ty pe abstractions. Howe ver , there is an im portant d ifference: in the u ntyped lambd a calculus, the boo leans were just two terms among many , and there was no gu ar- antee th at the argumen t of a bo olean functio n (such as a nd and or ) was ac tually a bo olean. In System F , the typ ing guaran tees that all closed boo lean terms even- tually reduce to either T o r F . 8.3.2 Na tural numbers W e can also deﬁne a type of Church num erals in S ystem F . W e deﬁn e: nat = ∀ α. ( α → α ) → α → α, 0 = Λ α.λf α → α .λx α .x, 1 = Λ α.λf α → α .λx α .f x, 2 = Λ α.λf α → α .λx α .f ( f x ) , 3 = Λ α.λf α → α .λx α .f ( f ( f x )) , . . . It is then easy to deﬁne simp le fun ctions, such as successor , a ddition, an d mu lti- plication: succ = λn nat . Λ α.λf α → α .λx α .f ( nαf x ) , add = λn nat .λm nat . Λ α.λf α → α .λx α .nαf ( mαf x ) , mult = λn nat .λm nat . Λ α.λf α → α .nα ( mαf ) . Just as for the b ooleans, these encoding s o f the Church numerals an d fu nctions are exactly the same as those of the untyped lambda calculus from Section 3.2, if one erases all the types and typ e abstraction s. W e will show in Proposition 8.9 below that the Church numerals a re, up to β η -equiv a lence, the o nly closed terms of ty pe nat . 76 8.3.3 Pairs Y ou wil l ha ve noticed that we didn’t inclu de a cartesian product type A × B in the deﬁnition of System F . Th is is because such a type is deﬁnable. Speciﬁcally , let A × B = ∀ α. ( A → B → α ) → α, h M , N i = Λ α.λf A → B → α .f M N . Note that wh en M : A and N : B , th en h M , N i : A × B . Moreover, for any pair of types A, B , we have p rojection fun ctions π 1 AB : A × B → A and π 2 AB : A × B → B , deﬁned by π 1 = Λ α. Λ β .λp α × β .pα ( λx α .λy β .x ) , π 2 = Λ α. Λ β .λp α × β .pβ ( λx α .λy β .y ) . This satisﬁes the usual laws π 1 AB h M , N i → → β M , π 2 AB h M , N i → → β N . Once again, these encodings of pairs an d pr ojections are exactly the same as those we used in the unty ped lamb da calculus, when on e erases all t he ty pe-related p arts of the term s. Y o u will show in E xercise 36 that every closed term o f typ e A × B is β η -equiv ale nt to a term of the form h M , N i . Remark 8.1 . It is also worth noting that the correspond ing η -laws, such as h π 1 M , π 2 M i = M , are not d eriv able in System F . Th ese laws hold whenever M is a closed term, but not necessarily when M contains free variables. Exercise 33 . Find suitab le enco dings in System F of th e typ es 1 , A + B , and 0 , along with the corr espondin g ter ms ∗ , in 1 , in 2 , case M of x A ⇒ N | y B ⇒ P , and  A M . 8.4 Chur ch-Rosser propert y and strong normalization Theorem 8.2 ( Church-Rosser) . Sy stem F satisﬁes the Chu r ch-R osser pr operty , both for β - r ed uction and for β η -r edu ction. Theorem 8.3 (Strong normalization) . In System F , all terms ar e str ongly normal- izing. 77 The proo f of the Chu rch-Rosser pro perty is si milar to that of the simply -typed lambda calculus, and is left as an exercise. The proof o f strong normalizatio n is much more complex; it can be found in Chapter 14 of “Proofs and T ypes” [2]. 8.5 The Curry-Howard isomorphism From the p oint of view of the Curry -Howard isomorp hism, ∀ α.A is the u niversally quantiﬁed log ical statem ent “for all α , A is true”. Here α ranges over atomic propo sitions. For example, the formu la ∀ α. ∀ β .α → ( β → α ) expresses the valid fact that the implication α → ( β → α ) is true for all pr opositions α and β . Since this quantiﬁer ranges over pr op ositions , it is called a second-or der quantiﬁer , and the correspon ding logic is second-o r d er pr opo sitional logic . Under the Curry- How ard isom orphism, the typing rules for System F become the following logical rules: • (Axiom) ( ax x ) Γ , x : A ⊢ A • ( → -intro duction) ( → -I x ) Γ , x : A ⊢ B Γ ⊢ A → B • ( → -eliminatio n) ( → -E ) Γ ⊢ A → B Γ ⊢ A Γ ⊢ B • ( ∀ -introd uction) ( ∀ -I ) Γ ⊢ A α 6∈ F T V (Γ) Γ ⊢ ∀ α.A • ( ∀ -eliminatio n) ( ∀ -E ) Γ ⊢ ∀ α.A Γ ⊢ A [ B /α ] The ﬁrst three of these rules are familiar from propositional logic. The ∀ -intr oduction rule is also known as un iversal g eneralization . I t correspo nds to a well-known logical reasoning principle: If a statement A has been proven for some arbitrary α , then it follows that it holds for all α . The req uirement that α is 78 “arbitrary ” has been formalized in the logic by req uiring that α does not appear in any of the hy potheses that we re used to derive A , o r in o ther words, that α is no t among the free type variables of Γ . The ∀ -eliminatio n r ule is also k nown as un iversal specialization . It is the simple principle that if some statement is true for all pro positions α , then the same state- ment is true for any particular proposition B . Note that, unlike the ∀ -introductio n rule, this rule does not require a side condition . Finally , we no te that th e side con dition in the ∀ - introdu ction rule is o f co urse th e same as that of the typ ing rule ( typeabs ) o f T able 6. Fro m the poin t of view of logic, the side co ndition is justiﬁed becau se it asserts that α is “arbitra ry”, i.e ., no assum ptions have bee n ma de abo ut it. From a lambda calculu s v iew , the side condition also makes sense: otherwise, the ter m λx α . Λ α.x would be well-typ ed of ty pe α → ∀ α.α , which clearly d oes no t make any sense: there is no way th at a n element x of some ﬁx ed type α cou ld suddenly become an element of an arb itrary type. 8.6 Supplying the missing logical connect ives It tu rns out that a logic with on ly implicatio n → and a second- order un iv ersal quantiﬁer ∀ is sufﬁcient for expressing all th e other usual lo gical conn ectiv es, for example: A ∧ B ⇐ ⇒ ∀ α. ( A → B → α ) → α, (1) A ∨ B ⇐ ⇒ ∀ α. ( A → α ) → ( B → α ) → α, (2) ¬ A ⇐ ⇒ ∀ α.A → α, (3) ⊤ ⇐ ⇒ ∀ α.α → α, (4) ⊥ ⇐ ⇒ ∀ α.α, (5) ∃ β .A ⇐ ⇒ ∀ α. ( ∀ β . ( A → α )) → α. (6) Exercise 34 . Using informal intuition istic re asoning, p rove that the left-hand side is logically equiv alent to the right-han d side for each of (1)–(6). Remark 8 .4 . The deﬁnition s (1)–(6) ar e so mewhat reminiscent of De Morgan’ s laws and double negation s. Indeed , if we replac e the type variable α by the c on- stant F in (1), the right-hand side becomes ( A → B → F ) → F , which is intuitionistically eq uiv alent to ¬ ¬ ( A ∧ B ) . Similarly , the righ t-hand side of (2) becomes ( A → F ) → ( B → F ) → F , which is intu itionistically equiv alent to 79 ¬ ( ¬ A ∧ ¬ B ) , and similarly for th e remaining co nnectives. Howe ver , the ver- sions of (1), (2), and (6) using F are only classically , but not intuition istically equiv alent to th eir respe cti ve left-hand sides. On th e othe r ha nd, it is rem arkable that by the u se of ∀ α , each r ight-han d side is intu itionistically equivalent to th e left-hand sides. Remark 8.5 . No te the r esemblance between (1) a nd the deﬁnition of A × B giv en in Section 8.3.3. Natura lly , th is i s not a coincidence, as logical conjunction A ∧ B should co rrespond to cartesian product A × B under th e Curry -Howard corresp on- dence. Indeed, by app lying the same princip le to th e other logical conn ectiv es, one arrives at a g ood hint for Exercise 33. Exercise 35 . E xtend System F with an existential qu antiﬁer ∃ β .A , not by using (6), but by adding a new typ e with explicit in troductio n an d elimination rules to the language. Justify the resulting rules by compa ring them with the usual r ules of mathematical reasoning for “there exists”. Can yo u e xplain the meaning of the type ∃ β .A from a pro grammin g lang uage or lambda calculus point of vie w? 8.7 Normal for ms and long normal f orms Recall that a β -normal for m of System F is, by d eﬁnition, a term that co ntains no β - redex, i.e., no su bterm of the fo rm ( λx A .M ) N or (Λ α.M ) A . Th e following propo sition giv es another useful w ay to character ize the β - normal forms. Proposition 8. 6 (Normal forms) . A term of System F is a β -normal form if and only if it is of the form Λ a 1 . Λ a 2 . . . Λ a n .z Q 1 Q 2 . . . Q k , (7) wher e: • n > 0 and k > 0 ; • Each Λ a i is either a lambda abstraction λx A i i or a type abstraction Λ α i ; • Each Q j is either a term M j or a type B j ; and • Each Q j , when it is a term, is recursively in no rmal form. Pr oo f. First, it is clear that e very term of the form (7) is in norm al form: the term cannot itself be a redex, and the only place where a red ex could occu r is inside one of the Q j , but these are assumed to be normal. 80 For t he conv erse, consider a term M in β - normal form. W e show that M is of the form (7) by induction on M . • If M = z is a variable, then it is of the form (7) with n = 0 and k = 0 . • If M = N P is normal, then N is no rmal, so by indu ction hypothesis, N is of th e fo rm (7). But since N P is normal, N cannot be a lamb da abstractio n, so we must have n = 0 . It follows that N P = z Q 1 Q 2 . . . Q k P is itself of the form (7). • If M = λ x A .N is no rmal, then N is no rmal, so by ind uction hypothesis, N is of the form (7). It follows im mediately that λx A .N is also of the form (7). • The case for M = N A is like the case for M = N P . • The case for M = Λ α.N is like the case for M = λx A .N .  Deﬁnition. In a term of the form (7), the v a riable z is called the hea d variable of the term. Of course, b y the Church- Rosser pro perty together with strong normaliza tion, it follows th at ev ery term of System F is β -equiv alent to a uniq ue β -n ormal form, which must then b e of the f orm (7). On the o ther han d, th e nor mal form s (7) a re not unique up to η -conversion; f or example, λx A → B .x and λx A → B .λy A .xy ar e η -equivalent terms and are bo th of the form (7). I n order to achie ve uniqu eness up to β η - conv ersion, we introdu ce the notion of a long normal form . Deﬁnition. A term of System F is a long normal form if • it is of the form (7); • the body z Q 1 . . . Q k is of atomic type (i.e., its type is a type variable); and • each Q j , when it is a term, is recursively in lon g normal form. Proposition 8 .7. Every term o f System F is β η - equivalent to a unique long normal form. Pr oo f. By strong norm alization a nd the Church- Rosser pro perty of β -reduction, we already k now th at every term is β -equiv alent to a u nique β -norm al form. It therefor e sufﬁces to show that every β -normal f orm is η - equiv alent to a u nique long normal form. 81 W e ﬁrst show that e very β - normal f orm is η - equiv alent to som e lon g no rmal for m. W e prove this by induc tion. Indeed , consider a β -normal form of the f orm (7). By in duction hyp othesis, e ach of Q 1 , . . . , Q k can b e η -co n verted to lo ng n ormal form. Now we proceed by ind uction on the type A of z Q 1 . . . Q k . If A = α is atomic, th en the normal form is already long, and there is no thing to show . If A = B → C , then we can η -expand (7) to Λ a 1 . Λ a 2 . . . Λ a n .λw B .z Q 1 Q 2 . . . Q k w and p roceed b y the inn er ind uction hy pothesis. If A = ∀ α.B , then we can η - expand (7) to Λ a 1 . Λ a 2 . . . Λ a n . Λ α.z Q 1 Q 2 . . . Q k α and procee d by the inner inductio n hypoth esis. For uniqu eness, we must show th at no two different lon g norm al form s ca n be β η -equiv a lent to each other . W e leave this as an exercise.  8.8 The structur e of closed normal forms It is a r emarkable fact that if M is in lo ng normal form, then a lo t of the structu re of M is comp letely determ ined by its ty pe. Speciﬁcally: if the typ e o f M is atomic, then M must start with a head variable. If the type of M is of the form B → C , then M must be, up to α -equivalence, of the form λx B .N , where N is a long no rmal f orm of type C . An d if the typ e of M is of the form ∀ α.C , th en M must be, up to α -eq uiv alence, of the form Λ α.N , wher e N is a lon g normal form of type C . So for example, consider the type A = B 1 → B 2 → ∀ α 3 .B 4 → ∀ α 5 .β . W e say that th is ty pe have ﬁ ve pr eﬁ xes , whe re each p reﬁx is o f the for m “ B i → ” or “ ∀ α i . ”. Therefore, every long nor mal form of typ e A must a lso start with ﬁve preﬁxes; speciﬁcally , it must start with λx B 1 1 .λx B 2 2 . Λ α 3 .λx B 4 4 . Λ α 5 . . . . The next part of the lon g no rmal fo rm is a ch oice of head variable. If the term is closed, the head v ariable must be one of the x 1 , x 2 , or x 4 . Once the head variable has bee n chosen, th en its type deter mines how many arguments Q 1 , . . . , Q k the head variable must be app lied to, and th e types of these argu ments. The stru cture 82 of each of Q 1 , . . . , Q k is then recur si vely d etermined b y its type, with its own choice of h ead variable, wh ich the n recursively deter mines its su bterms, and so on. In o ther word s, th e d egree of fr eedom in a long normal for m is a choice of head variable at each level. This choice of h ead variables c ompletely determ ines the long normal form. Perhaps the precedin g discussion can be made more compre hensible by means of some concr ete examples. The examples take th e for m of the f ollowing p roposi- tions and their proof s. Proposition 8.8. E very closed term of type bo ol is β η - equivalen t to eith er T or F . Pr oo f. Let M be a closed term of type bool . By Proposition 8.7, we may assume that M is a lo ng normal form. Since bool = ∀ α.α → α → α , every long normal form of this type must start, up to α -equiv alence, with Λ α.λx α .λy α . . . . This mu st be fo llowed by a hea d variable, wh ich, sinc e M is c losed, c an o nly b e x o r y . Since bo th x and y have atom ic type, neithe r of them can b e app lied to further arguments, and therefore, the only tw o possible long normal forms are: Λ α.λx α .λy α .x Λ α.λx α .λy α .y , which are T and F , respecti vely .  Proposition 8.9 . Every closed term of type nat is β η - equivalen t to a Churc h numeral n , for some n ∈ N . Pr oo f. Let M be a closed term o f type nat . By Prop osition 8. 7, we may assum e that M is a long normal fo rm. Since na t = ∀ α. ( α → α ) → α → α , e very long normal form of this type must start, up to α -equiv alence, with Λ α.λf α → α .λx α . . . . This mu st be fo llowed by a hea d variable, wh ich, sinc e M is c losed, c an o nly b e x o r f . If the head variable is x , then it takes no ar gument, and we ha ve M = Λ α.λf α → α .λx α .x 83 If the head variable is f , then it takes exactly one argument, so M is of the form M = Λ α.λf α → α .λx α .f Q 1 . Because Q 1 has type α , its own long normal fo rm h as n o pr eﬁx; therefore Q 1 must start with a head variable, which must again be x or f . If Q 1 = x , we have M = Λ α.λf α → α .λx α .f x. If Q 1 has he ad variable f , th en we have Q 1 = f Q 2 , an d proce eding in this ma n- ner, we ﬁn d that M ha s to be of the form M = Λ α.λf α → α .λx α .f ( f ( . . . ( f x ) . . . )) , i.e., a Church numer al.  Exercise 36. Pr ove th at every clo sed term of ty pe A × B is β η - equiv alent to a term of the form h M , N i , where M : A and N : B . 8.9 A pplication: rep r esentation of arbitrary data in System F Let us consider the deﬁnition of a long no rmal form one m ore ti me. By deﬁnition, ev ery long normal form is of the form Λ a 1 . Λ a 2 . . . Λ a n .z Q 1 Q 2 . . . Q k , (8) where z Q 1 Q 2 . . . Q k has atom ic type and Q 1 , . . . , Q k are, recursively , long no r- mal fo rms. Instead of writing th e lon g no rmal for m on a sing le line as in (8), let us write it in tree form instead: Λ a 1 . Λ a 2 . . . Λ a n .z ✞ ✞ ✞ ✞ ✞ ✞ ✞ ✼ ✼ ✼ ✼ ✼ ✼ ✼ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ Q 1 Q 2 · · · Q k , where the long n ormal forms Q 1 , . . . , Q k are recursively also written as tree s. For example, with this notation, the C hurch nu meral 2 becomes Λ α.λf α → α .λx α .f f x, (9) 84 and the pair h M , N i becomes Λ α.λf A → B → α .f ✠ ✠ ✠ ✺ ✺ ✺ M N . W e can use th is very id ea to e ncode (almo st) arbitrary da ta structu res. For exam - ple, su ppose that th e data stru cture we wish t o en code is a binary tree whose leaves are labelled by natural numbers. Let’ s c all s uch a thing a leaf- labelled binary tr ee . Here is an example: • ☛ ☛ ✸ ✸ ✸ 5 • ☞ ☞ ☞ ✹ ✹ ✹ 8 7 . (10) In gene ral, every leaf-lab elled binary tree is either a leaf , which is labelled by a natural numb er , or else a branch that has exactly two childr en (a left one and a right one), each of which is a lea f-labelled binary tr ee. Written as a BNF , we ha ve the following grammar for leaf-labelled binary trees: T ree: T , S ::= leaf ( n ) branch ( T , S ) . When translating this as a System F type, we think along the lines of long normal forms. W e need a typ e variable α to represent leaf -labelled binary trees. W e need two h ead variables who se typ e ends in α : The ﬁrst head variable, let’ s call it ℓ , represen ts a lea f, and takes a single argume nt that is a n atural num ber . Thus ℓ : nat → α . T he second head variable, let’ s call it b , repr esents a bran ch, and takes tw o arguments that are leaf-labelled binary trees. Thus b : α → α → α . W e end up with the following System F type: tree = ∀ α. ( nat → α ) → ( α → α → α ) → α. A typical long normal form of this type is: Λ α.λℓ nat → α .λb α → α → α .b     ❂ ❂ ❂ ❂ ℓ b ✁ ✁ ✁ ✁ ❃ ❃ ❃ ❃ 5 ℓ ℓ 8 7 , 85 where 5 , 7 , an d 8 d enote Chu rch num erals as in (9), here n ot expan ded into lon g normal fo rm fo r br evity . Notice how closely th is lon g n ormal fo rm f ollows (10). Here is the same term written on a single line: Λ α.λℓ nat → α .λb α → α → α .b ( ℓ 5)( b ( ℓ 8)( ℓ 7)) Exercise 37. Prove that the closed lo ng normal for ms of type tree ar e in one -to- one correspo ndence with leaf-labelled binary trees. 9 T ype infer ence In Section 6, we intr oduced the simply -typed lambd a calculus, and we discussed what it m eans for a term to b e well-typed . W e have also asked the qu estion, for a giv en term, whether it is typable or not. In this section, we will discuss an algorithm that decides, gi ven a term, whether it is typable or not, and if the answer is y es, it also outputs a type for t he term. Such an algorithm is known as a type infer ence algorithm . A we aker kin d of algorith m is a type chec king alg orithm . A ty pe c hecking algo- rithm takes as its inpu t a te rm with full type anno tations, as well as the typ es of any f ree variables, and it decides whether th e term is well-typed or not. Th us, a type ch ecking algor ithm d oes no t infe r any type s; the type must be given to it as an input and the algorithm merely checks whether the type is legal. Many compilers of pr ogramm ing langu ages in clude a typ e checker , and pr ograms that are not well-typed are typically ref used. T he co mpilers of s ome p rogram ming languag es, such as ML o r Haskell, go one step fu rther and inclu de a ty pe infer- ence algor ithm. This allows pro gramme rs to write pro grams with n o o r very few type a nnotation s, and the com piler will ﬁgure out the ty pes automatica lly . This makes the prog rammer’ s life much ea sier , especially in the case of h igher-order languag es, where type s such as (( A → B ) → C ) → D are not uncom mon and would be very cum bersome to write d own. Howev er , in the event that type in- ference fails , it is not always ea sy f or the comp iler to issue a mea ningful erro r message that can help the human pr ogramm er ﬁx th e pr oblem. Of ten, at least a basic under standing of how the type infere nce algor ithm works is necessary for progr ammers to understan d these error messages. 86 9.1 Principal types A simply-typ ed lambda term can have more than one p ossible type. Su ppose that we have three basic typ es ι 1 , ι 2 , ι 3 in our type system. Then the fo llowing are all valid typing judgments for the term λx.λy .y x : ⊢ λx ι 1 .λy ι 1 → ι 1 .y x : ι 1 → ( ι 1 → ι 1 ) → ι 1 , ⊢ λx ι 2 → ι 3 .λy ( ι 2 → ι 3 ) → ι 3 .y x : ( ι 2 → ι 3 ) → (( ι 2 → ι 3 ) → ι 3 ) → ι 3 , ⊢ λx ι 1 .λy ι 1 → ι 3 .y x : ι 1 → ( ι 1 → ι 3 ) → ι 3 , ⊢ λx ι 1 .λy ι 1 → ι 3 → ι 2 .y x : ι 1 → ( ι 1 → ι 3 → ι 2 ) → ι 3 → ι 2 , ⊢ λx ι 1 .λy ι 1 → ι 1 → ι 1 .y x : ι 1 → ( ι 1 → ι 1 → ι 1 ) → ι 1 → ι 1 . What all these typing judgm ents hav e in common is that they are of the form ⊢ λx A .λy A → B .y x : A → ( A → B ) → B , for certa in types A and B . In fact, as we will see, every po ssible type of th e term λx.λy .y x is of th is form. W e also say that A → ( A → B ) → B is th e most general type o r the princip al type of th is term, wh ere A an d B are placeho lders for arbitrar y types. The existence o f a most general type is not a p eculiarity of th e term λxy .y x , but it is true of th e simply-typed lambda calculus in general: every typable term has a most general type. This statement is known as the principal type pr operty . W e will see that o ur ty pe inference algor ithm n ot only calculates a possible typ e for a term, but in fact it calcu lates the most gen eral ty pe, if a ny type exists at all. In fact, we will prove the prin cipal type prop erty by closely exam ining th e typ e inference algorith m. 9.2 T ype templates and type substitutions In ord er to formalize the n otion o f a mo st gener al type, we need to be able to s peak of types with placeholder s. Deﬁnition. Sup pose we are given an inﬁn ite set of typ e va riables , wh ich we de- note by up per case letters X , Y , Z etc. A type temp late is a simple type, b uilt from type v ariables and possibly basic types. Formally , typ e templates are gi ven by the BNF T y pe templates: A, B ::= X ι A → B A × B 1 87 Note th at we use the same letters A, B to denote type templates tha t we pre viously used to denote types. In fact, from now o n, we will simply regard types as s pecial type templates that happen to contain no type variables. The p oint of typ e variables is that they are placeholders (just lik e any other kind of variables). This means, we can replace type variables by ar bitrary typ es, or ev en by type templates. A type substitutio n is just such a replacement. Deﬁnition. A type substitution σ is a f unction from type variables to type tem - plates. W e often write [ X 1 7→ A 1 , . . . , X n 7→ A n ] for the substitutio n deﬁned by σ ( X i ) = A i for i = 1 . . . n , and σ ( Y ) = Y if Y 6∈ { X 1 , . . . , X n } . If σ is a typ e substitution, and A is a typ e template, then we deﬁne ¯ σ A , the a pplication of σ to A , as follows by recursion on A : ¯ σ X = σ X , ¯ σ ι = ι, ¯ σ ( A → B ) = ¯ σA → ¯ σB , ¯ σ ( A × B ) = ¯ σA × ¯ σB , ¯ σ 1 = 1 . In words, ¯ σ A is simply the same as A , except that all the typ e v ariables hav e b een replaced accor ding to σ . W e ar e now in a po sition to fo rmalize what it m eans for one type template to be more general than another . Deﬁnition. Sup pose A and B are type te mplates. W e say that A is more gener al than B if there exists a type substitution σ such that ¯ σ A = B . In o ther word s, w e con sider A to be m ore g eneral than B if B can be o btained from A b y a substitution. W e also say that B is an instance of A . Examples: • X → Y is mo re general than X → X . • X → X is mor e general than ι → ι . • X → X is mor e general than ( ι → ι ) → ( ι → ι ) . • Neither of ι → ι and ( ι → ι ) → ( ι → ι ) is more general th an the other . W e say that these types are incomparable . • X → Y is more general than W → Z , and vice versa. W e say th at X → Y and W → Z are equ ally gener al . 88 W e can also speak of on e substitution being mor e general than another: Deﬁnition. If τ an d ρ are type substitutions, we say that τ is more general than ρ if there exists a type substitution σ such that ¯ σ ◦ τ = ρ . 9.3 Uniﬁers W e will be co ncerned with solving equations between type temp lates. The basic question is not very dif ferent f rom solving equ ations in arithmetic: given an equa- tion between e xpressions, for instance x + y = x 2 , is it possible to ﬁnd values fo r x an d y that make the equation true ? The answer is yes in th is ca se, f or in stance x = 2 , y = 2 is one so lution, and x = 1 , y = 0 is another p ossible solu tion. W e can ev en gi ve the most general solution, which is x = ar bitrary , y = x 2 − x . Similarly , for type templates, we might ask whether an equation such as X → ( X → Y ) = ( Y → Z ) → W has any solutions. The a nswer is ye s, and o ne solution , for in stance, is X = ι → ι , Y = ι , Z = ι , W = ( ι → ι ) → ι . But this is not th e most general solu tion; the most general solu tion, in this case, is Y = arbitrary, Z = ar bitrary, X = Y → Z , W = ( Y → Z ) → Y . W e use substitutions to represent the solutions to such eq uations. For instance, the most general solution to the sample equation from the last paragr aph is rep resented by the substitution σ = [ X 7→ Y → Z , W 7→ ( Y → Z ) → Y ] . If a substitution σ solves the equation A = B in this way , then we also say tha t σ is a uniﬁer of A a nd B . T o g i ve another e xample, consider the equation X × ( X → Z ) = ( Z → Y ) × Y . This equation do es n ot have any solution, becau se we would have to h av e both X = Z → Y and Y = X → Z , which implies X = Z → ( X → Z ) , which is i mpossible to solve in simp le typ es. W e a lso say that X × ( X → Z ) an d ( Z → Y ) × Y canno t be uniﬁed. In general, we will be concer ned with solvin g not just single equations, but sys- tems of se veral equations. T he formal deﬁnition of un iﬁers and most gene ral uniﬁers is as follows: 89 Deﬁnition. Gi ven two sequence s of type templates ¯ A = A 1 , . . . , A n and ¯ B = B 1 , . . . , B n , we say that a type su bstitution σ is a uniﬁ er of ¯ A an d ¯ B if ¯ σ A i = ¯ σ B i , fo r a ll i = 1 . . . n . Mo reover , we say that σ is a most general uniﬁer of ¯ A and ¯ B if it is a uniﬁer, a nd if it is more general than any other uniﬁer of ¯ A and ¯ B . 9.4 The uniﬁcation algorithm Uniﬁcation is the process of deter mining a most gen eral uniﬁer . Mor e speciﬁcally , uniﬁcation is an algo rithm whose in put are two sequ ences of type templates ¯ A = A 1 , . . . , A n and ¯ B = B 1 , . . . , B n , and whose ou tput is either “failur e”, if no uniﬁer exists, or else a most g eneral uniﬁer σ . W e call this a lgorithm mgu f or “most general un iﬁer”, and we write mgu ( ¯ A ; ¯ B ) f or the resu lt of applyin g the algorithm to ¯ A and ¯ B . Before we state the algorithm, let us note th at we only use ﬁn itely m any ty pe variables, namely , the o nes that occur in ¯ A and ¯ B . In particular, the sub stitutions generated by this algorithm are ﬁnite objects that can be represented and manipu- lated by a compu ter . The alg orithm for calcu lating mgu( ¯ A ; ¯ B ) is as f ollows. By c on vention, the al- gorithm ch ooses the ﬁr st ap plicable clause in the fo llowing list. Note that the algorithm is recursive. 1. mgu( X ; X ) = id , th e identity substitution. 2. mgu( X ; B ) = [ X 7→ B ] , if X does not occur in B . 3. mgu( X ; B ) fails, if X occu rs in B an d B 6 = X . 4. mgu( A ; Y ) = [ Y 7→ A ] , if Y d oes not occur in A . 5. mgu( A ; Y ) fails, if Y occur s in A and A 6 = Y . 6. mgu( ι ; ι ) = id . 7. mgu( A 1 → A 2 ; B 1 → B 2 ) = mgu( A 1 , A 2 ; B 1 , B 2 ) . 8. mgu( A 1 × A 2 ; B 1 × B 2 ) = mgu( A 1 , A 2 ; B 1 , B 2 ) . 9. mgu(1; 1) = id . 10. mgu( A ; B ) fails, in all other cases. 90 11. mgu( A, ¯ A ; B , ¯ B ) = ¯ τ ◦ ρ , wh ere ρ = mgu( ¯ A ; ¯ B ) and τ = mgu( ¯ ρA ; ¯ ρB ) . Note tha t clauses 1–1 0 calculate th e most ge neral uniﬁer of two typ e templates, whereas clause 1 1 deals with lists o f type templates. Clause 10 is a catch-all clause that fails if none of the earlier clauses app ly . In particular, this clause c auses th e following to f ail: mgu( A 1 → A 2 ; B 1 × B 2 ) , mgu( A 1 → A 2 ; ι ) , etc. Proposition 9. 1. If mgu( ¯ A ; ¯ B ) = σ , then σ is a most general uniﬁer of ¯ A and ¯ B . If mgu( ¯ A ; ¯ B ) fails, then ¯ A and ¯ B h ave no uniﬁer . Pr oo f. First, it is easy to prove by induction on the deﬁnition of mgu that if mgu( ¯ A ; ¯ B ) = σ , then σ is a uniﬁer of ¯ A and ¯ B . T his is evident in all cases except perhaps clause 11: b ut here, by in duction hypo thesis, ¯ ρ ¯ A = ¯ ρ ¯ B and ¯ τ ( ¯ ρA ) = ¯ τ ( ¯ ρ B ) , h ence als o ¯ τ ( ¯ ρ ( A, ¯ A )) = ¯ τ ( ¯ ρ ( B , ¯ B )) . Here we have used the evident notation of applying a substitution to a list of type templates. Second, we pr ove that if ¯ A and ¯ B can be u niﬁed, then mgu( ¯ A ; ¯ B ) retu rns a mo st general uniﬁer . This is again p roved by ind uction. For example , in clau se 2, we have σ = [ X 7→ B ] . Su ppose τ is ano ther uniﬁer of X an d B . Then ¯ τ X = ¯ τ B . W e claim th at ¯ τ ◦ σ = τ . But ¯ τ ( σ ( X )) = ¯ τ ( B ) = ¯ τ ( X ) = τ ( X ) , wh ereas if Y 6 = X , then ¯ τ ( σ ( Y )) = ¯ τ ( Y ) = τ ( Y ) . Hen ce ¯ τ ◦ σ = τ , and it follo ws tha t σ is more ge neral than τ . Th e clauses 1–10 all follo w by similar ar guments. For clause 11, suppose that A , ¯ A and B, ¯ B h av e some uniﬁer σ ′ . Then σ ′ is also a uniﬁer for ¯ A an d ¯ B , a nd th us the recursive call return a most general u niﬁer ρ of ¯ A an d ¯ B . Since ρ is more gene ral than σ ′ , we h av e ¯ κ ◦ ρ = σ ′ for some sub stitution κ . But then ¯ κ ( ¯ ρA ) = ¯ σ ′ A = ¯ σ ′ B = ¯ κ ( ¯ ρB ) , he nce ¯ κ is a u niﬁer for ¯ ρA and ¯ ρB . By induction hypothesis, τ = mgu( ¯ ρA ; ¯ ρB ) exists and is a most general uniﬁer fo r ¯ ρA and ¯ ρB . I t fo llows tha t τ is more gener al than ¯ κ , th us ¯ κ ′ ◦ τ = ¯ κ , for some substitution κ ′ . Finally we nee d to show that σ = ¯ τ ◦ ρ is more g eneral than σ ′ . But this follows because ¯ κ ′ ◦ σ = ¯ κ ′ ◦ ¯ τ ◦ ρ = ¯ κ ◦ ρ = σ ′ .  Remark 9 .2 . Proving that th e a lgorithm mg u ter minates is tricky . In par ticular , termination can’t be pr oved by inductio n on the size of th e arguments, beca use in the seco nd recu rsiv e call in clause 11, the application of ¯ ρ may well increase the size of the argu ments. T o prove ter mination, note that each substitution σ generated by the algorithm is either the id entity , or else it eliminates at least one variable. W e can use this to p rove term ination by nested induction on the numb er of variables and o n the size of the argu ments. W e leave the details for ano ther time. 91 9.5 The type infer ence algorithm Giv en the un iﬁcation algorith m, type inferen ce is now relatively easy . W e f or- mulate anoth er algor ithm, typeinfer , which takes a typ ing judg ment Γ ⊢ M : B as its inp ut (using tem plates instead of ty pes, and no t necessarily a valid typin g judgmen t). T he algorith m either outputs a most gen eral substitution σ such that ¯ σ Γ ⊢ M : ¯ σ B is a valid ty ping judgm ent, or if no such σ exists, the algor ithm fails. In other word s, the algo rithm calcu lates the mo st g eneral substitutio n th at makes the giv en typing judgment v alid. It is deﬁned as follows: 1. typeinfer( x 1 : A 1 , . . . , x n : A n ⊢ x i : B ) = mgu( A i ; B ) . 2. typeinfer(Γ ⊢ M N : B ) = ¯ τ ◦ σ , where σ = typeinfer (Γ ⊢ M : X → B ) , τ = t ypeinfer( ¯ σ Γ ⊢ N : ¯ σ X ) , for a fresh type variable X . 3. typeinfer(Γ ⊢ λx A .M : B ) = ¯ τ ◦ σ , w here σ = mgu( B ; A → X ) and τ = t ypeinfer( ¯ σ Γ , x : ¯ σ A ⊢ M : ¯ σ X ) , for a fresh type v ariable X . 4. typeinfer(Γ ⊢ h M , N i : A ) = ¯ ρ ◦ ¯ τ ◦ σ , where σ = mgu( A ; X × Y ) , τ = typeinfer( ¯ σ Γ ⊢ M : ¯ σX ) , an d ρ = typeinfer ( ¯ τ ¯ σ Γ ⊢ N : ¯ τ ¯ σ Y ) , for fresh type variables X a nd Y . 5. typeinfer(Γ ⊢ π 1 M : A ) = typeinfer (Γ ⊢ M : A × Y ) , for a fresh type variable Y . 6. typeinfer(Γ ⊢ π 2 M : B ) = t ypeinfer (Γ ⊢ M : X × B ) , f or a f resh ty pe variable X . 7. typeinfer(Γ ⊢ ∗ : A ) = mgu( A ; 1) . Strictly speaking, the algorithm is non-determ inistic, bec ause some of the clauses in volve choosing o ne or more fresh typ e variables, and the choice is arbitrary . Howe ver , the ch oice is n ot essential, sin ce we may regard all f resh type variables are equiv alent. Here, a ty pe v ariable is called “fresh” if it has nev er been used. Note that th e algorithm t ypeinfer c an fail; this happ ens if an d on ly if th e ca ll to mgu fails in steps 1, 3, 4, or 7. Also n ote th at the algorith m obviously always terminates; this fo llows by induc- tion on M , since each recursive call only uses a smaller term M . 92 Proposition 9. 3. I f ther e e xists a substitution σ such that ¯ σ Γ ⊢ M : ¯ σ B is a valid typing judgment, the n typeinfer(Γ ⊢ M : B ) will r eturn a most general such substitution. Otherwise, the algorithm will fail. Pr oo f. The proof is similar to that of Proposition 9.1.  Finally , the q uestion “is M typ able” can be answered by cho osing distinct type variables X 1 , . . . , X n , Y and applying the algorithm typeinfer to the typing jud g- ment x 1 : X 1 , . . . , x n : X n ⊢ M : Y . Note that if the alg orithm succeeds and retur ns a substitution σ , then σ Y is th e most general type of M , and the f ree variables have types x 1 : σ X 1 , . . . , x n : σ X n . 10 Denotational semantics W e intro duced the lambda c alculus as the “th eory o f fun ctions”. But so far, we have only spo ken of functions in abstract term s. Do lam bda terms correspon d to any actual f unctions, such as, functions in set theo ry? A nd what abou t the notions of β - and η -equiv alence? W e intuitively ac cepted these conc epts as expr essing truths ab out the equality o f fu nctions. But do the se pr operties really hold of real function s? A re there other prop erties that functions hav e th at that a re not captured by β η -equiv alence? The word “seman tics” com es fro m the G reek word fo r “m eaning”. Den otational semantics means to give meaning to a languag e by interpreting its terms as math- ematical objects. This is d one b y d escribing a function that maps syntactic objects (e.g., type s, terms) to seman tic ob jects (e.g ., sets, elements). This function is called an interpr etation or mea ning functio n , and we u sually den ote it by [ [ − ] ] . Thus, if M is a term, we will usually write [ [ M ] ] for the mean ing of M under a giv en interpretation. Any go od den otational seman tics should be composition al , which m eans, the in- terpretation of a ter m should be given in terms of th e interp retations of its sub- terms. Thus, fo r e xample, [ [ M N ] ] sh ould be a function of [ [ M ] ] and [ [ N ] ] . Suppose that we have an axiomatic n otion of equality ≃ o n terms (for instance, β η -equiv a lence in th e case of the lambd a c alculus). W ith respe ct to a pa rticular class of interpretatio ns, soundness is the prop erty M ≃ N ⇒ [ [ M ] ] = [ [ N ] ] f or all interpretations in the class . 93 Completeness is the property [ [ M ] ] = [ [ N ] ] f or all interpretations in the class ⇒ M ≃ N . Dependin g o n our viewpoint, we will either say th e axioms are sound (with r espect to a gi ven interpretation), o r the interpretation is sound ( with respect to a g iv en set of axioms). Similarly for completeness. Sou ndness e xpresses the fact that our ax- ioms (e.g. , β or η ) ar e true with respect to th e given interpretation. Completeness expresses the fact that our axioms are sufﬁcient. 10.1 Set-theor etic interpr etation The si mply-ty ped lam bda calcu lus can be given a straigh tforward set-theor etic interpretatio n as follows. W e m ap ty pes to sets and typ ing judgments to functions. For each basic ty pe ι , assume that we h av e chosen a no n-emp ty set S ι . W e can then associate a set [ [ A ] ] to each typ e A recur si vely: [ [ ι ] ] = S ι [ [ A → B ] ] = [ [ B ] ] [ [ A ] ] [ [ A × B ] ] = [ [ A ] ] × [ [ B ] ] [ [1] ] = {∗} Here, f or two sets X , Y , we write Y X for the set of all fu nctions from X to Y , i.e., Y X = { f | f : X → Y } . Of cour se, X × Y deno tes the usual ca rtesian produ ct of sets, and {∗} is som e singleton set. W e can now in terpret lamb da terms, o r mo re p recisely , typ ing jud gments, as cer- tain function s. Intu iti vely , we already know which f unction a typ ing ju dgment correspo nds to. For instance, the typ ing judg ment x : A, f : A → B ⊢ f x : B co rre- sponds to the f unction that takes an element x ∈ [ [ A ] ] and an element f ∈ [ [ B ] ] [ [ A ] ] , and that returns f ( x ) ∈ [ [ B ] ] . In genera l, the interpreta tion of a typing judgment x 1 : A 1 , . . . , x n : A n ⊢ M : B will be a functio n [ [ A 1 ] ] × . . . × [ [ A n ] ] → [ [ B ] ] . Which p articular functio n it is depends of course on the term M . For convenience, if Γ = x 1 : A 1 , . . . , x n : A n is a context, let us write [ [Γ] ] = [ [ A 1 ] ] × . . . × [ [ A n ] ] . W e now deﬁne [ [Γ ⊢ M : B ] ] by re cursion on M . 94 • If M is a variable, we deﬁne [ [ x 1 : A 1 , . . . , x n : A n ⊢ x i : A i ] ] = π i : [ [ A 1 ] ] × . . . × [ [ A n ] ] → [ [ A i ] ] , where π i ( a 1 , . . . , a n ) = a i . • If M = N P is an application, we recursiv ely calculate f = [ [Γ ⊢ N : A → B ] ] : [ [Γ] ] → [ [ B ] ] [ [ A ] ] , g = [ [Γ ⊢ P : A ] ] : [ [Γ] ] → [ [ A ] ] . W e th en deﬁne [ [Γ ⊢ N P : B ] ] = h : [ [Γ] ] → [ [ B ] ] by h (¯ a ) = f (¯ a )( g ( ¯ a )) , f or all ¯ a ∈ [ [Γ] ] . • If M = λx A .N is an abstraction , we recursively calcu late f = [ [Γ , x : A ⊢ N : B ] ] : [ [Γ] ] × [ [ A ] ] → [ [ B ] ] . W e th en deﬁne [ [Γ ⊢ λx A .N : A → B ] ] = h : [ [Γ] ] → [ [ B ] ] [ [ A ] ] by h (¯ a )( a ) = f (¯ a, a ) , for all ¯ a ∈ [ [Γ] ] an d a ∈ [ [ A ] ] . • If M = h N , P i is an pair , we recursively calcu late f = [ [Γ ⊢ N : A ] ] : [ [Γ] ] → [ [ A ] ] , g = [ [Γ ⊢ P : B ] ] : [ [Γ] ] → [ [ B ] ] . W e th en deﬁne [ [Γ ⊢ h N , P i : A × B ] ] = h : [ [Γ] ] → [ [ A ] ] × [ [ B ] ] by h (¯ a ) = ( f ( ¯ a ) , g (¯ a )) , fo r all ¯ a ∈ [ [Γ] ] . • If M = π i N is a projection (for i = 1 , 2 ), we recur si vely c alculate f = [ [Γ ⊢ N : B 1 × B 2 ] ] : [ [Γ] ] → [ [ B 1 ] ] × [ [ B 2 ] ] . W e th en deﬁne [ [Γ ⊢ π i N : B i ] ] = h : [ [Γ] ] → [ [ B i ] ] by h (¯ a ) = π i ( f (¯ a )) , for all ¯ a ∈ [ [Γ] ] . Here π i in the meta-lang uage denotes the set-th eoretic fu nction π i : [ [ B 1 ] ] × [ [ B 2 ] ] → [ [ B i ] ] given by π i ( b 1 , b 2 ) = b i . 95 • If M = ∗ , we deﬁne [ [Γ ⊢ ∗ : 1 ] ] = h : [ [Γ] ] → {∗} by h (¯ a ) = ∗ , for all ¯ a ∈ [ [Γ] ] . T o m inimize notation al inconvenience, we will oc casionally a buse the no tation and write [ [ M ] ] instead o f [ [Γ ⊢ M : B ] ] , thus p retending that term s ar e typin g judgmen ts. Howe ver , this is only an abbreviation, an d it will be understoo d that the inter pretation r eally depend s on th e typin g jud gment, an d not ju st the term, ev en if we use the abbreviated n otation. W e also refer to an interp retation as a model . 10.2 Soundne ss Lemma 10.1 (Context cha nge) . The in terpr etation beha ves as e xpected un der r e or dering of conte xts an d und er the addition of d ummy varia bles to contexts. Mor e precisely , if σ : { 1 , . . . , n } → { 1 , . . . , m } is an injective map, and if th e fr e e va riables o f M ar e amon g x σ 1 , . . . , x σ n , then the interpr etations o f the two typing judgments, f = [ [ x 1 : A 1 , . . . , x m : A m ⊢ M : B ] ] : [ [ A 1 ] ] × . . . × [ [ A m ] ] → [ [ B ] ] , g = [ [ x σ 1 : A σ 1 , . . . , x σn : A σn ⊢ M : B ] ] : [ [ A σ 1 ] ] × . . . × [ [ A σn ] ] → [ [ B ] ] ar e r elated as follo ws: f ( a 1 , . . . , a m ) = g ( a σ 1 , . . . , a σn ) , for all a 1 ∈ [ [ A 1 ] ] , . . . , a m ∈ [ [ A m ] ] . Pr oo f. Easy , b ut tedious, inductio n on M .  The sign iﬁcance of this lem ma is th at, to a ce rtain extent, the context does not matter . Thus, if the free variables of M and N are contained in Γ as well as Γ ′ , then we hav e [ [Γ ⊢ M : B ] ] = [ [Γ ⊢ N : B ] ] iff [ [Γ ′ ⊢ M : B ] ] = [ [Γ ′ ⊢ N : B ] ] . Thus, wheth er M and N h av e equ al denotatio ns only depe nds on M and N , an d not on Γ . 96 Lemma 10.2 (Substitution Lemma) . If [ [Γ , x : A ⊢ M : B ] ] = f : [ [Γ] ] × [ [ A ] ] → [ [ B ] ] and [ [Γ ⊢ N : A ] ] = g : [ [Γ] ] → [ [ A ] ] , then [ [Γ ⊢ M [ N /x ] : B ] ] = h : [ [Γ] ] → [ [ B ] ] , wher e h (¯ a ) = f (¯ a, g (¯ a )) , for a ll ¯ a ∈ [ [Γ] ] . Pr oo f. V ery easy , b ut very tedious, induction on M .  Proposition 10 .3 (Soundn ess) . The set-theoretic in terpr etation is so und for β η - r e asoning. In other wor ds, M = β η N ⇒ [ [Γ ⊢ M : B ] ] = [ [Γ ⊢ N : B ] ] . Pr oo f. Let us wr ite M ∼ N if [ [Γ ⊢ M : B ] ] = [ [Γ ⊢ N : B ] ] . By the remark after Lemma 1 0.1, this notion is in depend ent of Γ , and thus a well-deﬁned relation on terms ( as o pposed to typ ing jud gments). T o pr ove soundness, we mu st show that M = β η N implies M ∼ N , f or all M and N . I t sufﬁces to show that ∼ satisﬁes all the axioms of β η - equiv alence. The axio ms ( r eﬂ ), ( symm ), and ( trans ) hold tri vially . Similarly , all the ( con g ) and ( ξ ) r ules hold, d ue to the fact that th e m eaning of compo site terms was deﬁned solely in terms of the mean ing of their subterm s. I t rema ins to pr ove that each o f the various ( β ) and ( η ) laws is satisﬁed (see page 62). W e prove the rule ( β → ) as an example; the remaining rules are left as an e xercise. Assume Γ is a context such that Γ , x : A ⊢ M : B and Γ ⊢ N : A . Le t f = [ [Γ , x : A ⊢ M : B ] ] : [ [Γ] ] × [ [ A ] ] → [ [ B ] ] , g = [ [Γ ⊢ N : A ] ] : [ [Γ] ] → [ [ A ] ] , h = [ [Γ ⊢ ( λx A .M ) : A → B ] ] : [ [Γ] ] → [ [ B ] ] [ [ A ] ] , k = [ [Γ ⊢ ( λx A .M ) N : B ] ] : [ [Γ] ] → [ [ B ] ] , l = [ [Γ ⊢ M [ N /x ] : B ] ] : [ [Γ] ] → [ [ B ] ] . W e must sh ow k = h . By deﬁnition, we have k (¯ a ) = h (¯ a )( g (¯ a )) = f (¯ a, g (¯ a )) . On the other hand, l (¯ a ) = f (¯ a, g (¯ a )) by the substitution lemma.  Note that the pro of of soundn ess amounts to a simp le calculation ; while there are many de tails to attend to, n o particular ly intere sting new idea is r equired . This is typical of soun dness proofs in general. C ompletene ss, on the oth er ha nd, is usually much more difﬁcult to prove a nd often requires clever id eas. 97 10.3 Completeness W e cite two com pleteness theo rems fo r the set- theoretic interpretation. Th e ﬁr st one is for th e class of all mode ls with ﬁn ite base type. The seco nd one is for the single model with one countably inﬁnite base type. Theorem 10 .4 (Com pleteness, Plotk in, 1973) . The class of set-th eor etic mod els with ﬁnite base types is complete for the lambda- β η calculus. Recall that completene ss fo r a class of models means that if [ [ M ] ] = [ [ N ] ] ho lds in all mo dels o f the g iv en class, then M = β η N . This is n ot the sam e as c omplete- ness for each individual model in the class. Note t hat, for each ﬁxed choice o f ﬁnite sets as the interpretation s of the b ase types, there ar e some la mbda terms su ch that [ [ M ] ] = [ [ N ] ] b ut M 6 = β η N . For instance, consider terms of ty pe ( ι → ι ) → ι → ι . There ar e inﬁnitely m any β η -distinct term s o f this typ e, namely , the Churc h nu merals. On the other ha nd, if S ι is a ﬁnite set, then [ [( ι → ι ) → ι → ι ] ] is also a ﬁnite set. Since a ﬁn ite set can not have inﬁnitely many distinct elements, there must necessarily be two distinct Church numerals M , N such that [ [ M ] ] = [ [ N ] ] . Plotkin’ s comple teness theo rem, o n the othe r han d, shows th at wh enever M and N are d istinct lambda terms, th en there exist some set-theoretic mod el with ﬁnite base types in which M and N are different. The second completen ess theo rem is for a sin gle model, namely the one where S ι is a countab ly inﬁnite set. Theorem 1 0.5 (Completeness, Friedman, 1 975) . The set-theor etic mode l w ith base type equal to N , th e set o f natural nu mbers, is complete for the lamb da- β η calculus. W e o mit the proof s. 11 The language PCF PCF stand s for “pr ogramm ing w ith com putable function s”. T he langu age PCF is an extension of th e simply -typed lambd a calculus with b ooleans, n atural nu mbers, and recursion. I t w as ﬁrst introduc ed by Dana S cott as a simple p rogram ming lan- guage on which to try out techniques for reasoning about p rogram s. Altho ugh PCF is not intended as a “real world” programmin g languag e, many real progra mming 98 languag es can be regarded as (syntactic variants of) e xtensions of PCF , and ma ny of the reasoning techniq ues developed fo r PCF also apply to mo re complica ted languag es. PCF is a “p rogram ming langu age”, not just a “calculus”. By this we mean , PCF is equipped with a speciﬁc e valuation order, or rules that d etermine precisely ho w terms are to be ev aluated. W e follow the s logan: Programm ing language = syntax + ev aluation rules. After in troducin g the syntax o f PCF , we will look at three d ifferent equiv alence relations on terms. • Axiomatic e quivalenc e = ax will be given b y ax ioms in the sp irit of β η - equiv alence. • Operational equivalenc e = op will be deﬁned in terms of th e opera tional behavior of ter ms. T wo term s are op erationally eq uiv alent if one can be substituted for the other in any co ntext withou t chan ging th e beh avior of a progr am. • Denotatio nal eq uivalence = den is deﬁned via a denotation al semantics. W e will develop methods f or reason ing abou t th ese equ iv ale nces, a nd thus fo r reasoning abo ut pr ograms. W e will also investigate how the thr ee equiv alences are related to each other . 11.1 Syntax and typing rules PCF types are simple types over t wo base types bool and nat . A, B ::= bool nat A → B A × B 1 The r aw terms of PCF are tho se of the simply-typ ed lam bda ca lculus, tog ether with some additio nal con structs that deal with boolean s, natural nu mbers, and recursion. M , N , P ::= x M N λx A .M h M , N i π 1 M π 2 M ∗ T F zer o succ ( M ) pr ed ( M ) iszero ( M ) if M then N else P Y ( M ) 99 ( true ) Γ ⊢ T : boo l ( false ) Γ ⊢ F : bool ( zer o ) Γ ⊢ zero : nat ( succ ) Γ ⊢ M : nat Γ ⊢ succ ( M ) : nat ( pr ed ) Γ ⊢ M : nat Γ ⊢ pred ( M ) : nat ( iszer o ) Γ ⊢ M : nat Γ ⊢ iszero ( M ) : boo l ( ﬁx ) Γ ⊢ M : A → A Γ ⊢ Y ( M ) : A ( if ) Γ ⊢ M : bool Γ ⊢ N : A Γ ⊢ P : A Γ ⊢ if M then N else P : A T a ble 7: T y ping rules for PCF The inten ded me aning of these terms is the sam e as that o f the correspondin g terms we used to progr am in the untyped lambda calcu lus: T and F are the boolean constants, zer o is the con stant z ero, succ an d pred are the successor and p redecessor f unctions, iszero tests whether a given numbe r is equal to z ero, if M then N else P is a conditional, and Y ( M ) is a ﬁxed point of M . The ty ping rules fo r PCF are the sam e as the ty ping rules for the simply-typ ed lambda calcu lus, shown in T able 4, plus th e ad ditional typin g r ules shown in T a- ble 7. 11.2 Axiomatic equi valence The axio matic equivalence of PCF is based on the β η -equivalence of th e simply- typed lambd a calculus. The relatio n = ax is the least relatio n giv en by the fo llow- ing: • All the β - and η -a xioms of the simp ly-typed lambda calculu s, as shown on page 62. • One co ngruen ce or ξ -rule for each term constructo r . Th is mean s, for in- stance M = ax M ′ N = ax N ′ P = ax P ′ if M then N else P = ax if M ′ then N ′ else P ′ , 100 pred ( zero ) = zero pred ( succ ( n )) = n iszero ( zero ) = T iszero ( succ ( n )) = F if T then N else P = N if F then N else P = P Y ( M ) = M ( Y ( M )) T a ble 8: Axiom atic equi valence for PCF and similar for all the other term constructors. • The additional axioms shown in T able 8. Here, n stands for a numeral , i.e., a term of the form succ ( . . . ( succ ( zero )) . . . ) . 11.3 Operational semantics The oper ational semantics of PCF is commonly given in two d ifferent styles: the small-step or shallow style, and t he big-step or deep style. W e give the small-step semantics ﬁrst, because it is closer to the notion of β -reductio n that we consider ed for the simply-typ ed l ambda calculus. There are some importan t d ifferences b etween an oper ational semantics, as we are g oing to gi ve it here, a nd the notion of β -r eduction in the simply-typ ed lambd a calculus. Most importan tly , the operation al semantics is going to be d eterministic , which means, each te rm can b e reduced in at m ost o ne way . Thu s, ther e will ne ver be a ch oice b etween m ore th an o ne red ex. Or in oth er words, it will a lw ays be uniquely speciﬁed which redex to reduce next. As a consequenc e of the previous paragraph , we will abandon many of the congru- ence rules, as well as the ( ξ )-rule. W e adop t the follo wing informal con vention s: • never redu ce the body of a lambda abstraction, • never red uce the argument of a function (except f or primitive f unctions such as succ and pred ) , • never redu ce the “then” or “else” part of an if-then-else statement, • never redu ce a term inside a pair . 101 Of course, th e terms that these rules prevent fro m bein g reduced can neverthe- less become subject to red uction later: the body of a lam bda ab straction and the argument o f a functio n can be redu ced a fter a β -r eduction ca uses the λ to disap- pear and the argum ent to be sub stituted in the body . The “then” or “else” parts of an if-then-e lse term can be reduced after the “if ” part e valuates to true or false. And the te rms inside a p air can b e reduced after th e pair ha s been broken up by a projection . An imp ortant technical n otion is th at of a value , which is a term that repr esents the resu lt of a co mputation and cann ot be re duced fu rther . V alues are given as follows: V alues: V , W ::= T F zero succ ( V ) ∗ h M , N i λx A .M The transition r ules for the small-step op erational semantics of PCF are shown in T a ble 9. W e write M → N if M r educes to N by these rules. W e write M 6→ if th ere does n ot exist N such that M → N . T he ﬁrst two impo rtant technical pr operties of small-step reduction are summarized in the following lemma. Lemma 11.1. 1. V alues are normal forms. If V is a value, then V 6→ . 2. Evaluation is deterministic. If M → N and M → N ′ , then N ≡ N ′ . Another im portant pr operty is sub ject redu ction: a well- typed term red uces on ly to another well-typed term of the same type. Lemma 11.2 (Subject Reduction) . If Γ ⊢ M : A and M → N , then Γ ⊢ N : A . Next, we want to prove that the ev aluation of a w ell-typed term does not get “stuck”. I f M is so me term s uch that M 6→ , but M is not a value, then we regard this as an error, an d we also write M → err or . Examples of such terms are π 1 ( λx.M ) and h M , N i P . Th e follo wing lem ma s hows tha t well-typed c losed terms cannot lead to such errors. Lemma 11.3 (Pro gress) . If M is a closed, well-typ ed term, then e ither M is a value, or else ther e exists N such that M → N . The Prog ress Lemma is very impo rtant, because it imp lies that a well- typed term cannot “g o wro ng”. It guarantee s that a well-ty ped term will eithe r ev aluate to a value in ﬁ nitely many steps, o r else it will reduce inﬁnitely and thus not term inate. 102 M → N pred ( M ) → pred ( N ) pred ( zero ) → zero pred ( succ ( V )) → V M → N iszero ( M ) → iszero ( N ) iszero ( zero ) → T iszero ( succ ( V )) → F M → N succ ( M ) → succ ( N ) M → N M P → N P ( λx A .M ) N → M [ N /x ] M → M ′ π i M → π i M ′ π 1 h M , N i → M π 2 h M , N i → N M : 1 , M 6 = ∗ M → ∗ M → M ′ if M then N else P → if M ′ then N else P if T then N else P → N if F then N else P → P Y ( M ) → M ( Y ( M )) T a ble 9: Small-step ope rational semantics of PC F But a well-typ ed term can ne ver g enerate an erro r . In programmin g lang uage terms, a term tha t typ e-checks at co mpile-time canno t g enerate an error at run - time . T o express this idea form ally , let us write M → ∗ N in the u sual way if M reduces to N in zero or more steps, and let us write M → ∗ error if M red uces in zero or more steps to an error . Proposition 11.4 (Safety) . If M is a closed, well-typed term, then M 6→ ∗ error . Exercise 38. Pr ove L emmas 11.1 – 11.3 and Proposition 11.4. 11.4 Big-step semantics In the small- step semantics, if M → ∗ V , we say that M evaluates to V . Note that by determina cy , fo r e very M , there exists at most one V such that M → ∗ V . 103 T ⇓ T F ⇓ F zero ⇓ zero h M , N i ⇓ h M , N i λx A .M ⇓ λx A .M M ⇓ zero pred ( M ) ⇓ zero M ⇓ succ ( V ) pred ( M ) ⇓ V M ⇓ zero iszero ( M ) ⇓ T M ⇓ succ ( V ) iszero ( M ) ⇓ F M ⇓ V succ ( M ) ⇓ succ ( V ) M ⇓ λx A .M ′ M ′ [ N /x ] ⇓ V M N ⇓ V M ⇓ h M 1 , M 2 i M 1 ⇓ V π 1 M ⇓ V M ⇓ h M 1 , M 2 i M 2 ⇓ V π 2 M ⇓ V M : 1 M ⇓ ∗ M ⇓ T N ⇓ V if M then N else P ⇓ V M ⇓ F P ⇓ V if M then N else P ⇓ V M ( Y ( M )) ⇓ V Y ( M ) ⇓ V T a ble 10: Big-step op erational semantics of PCF It is also possible to a xiomatize th e relation “ M ev aluates to V ” d irectly . This is known as th e big -step semantics. Here, we write M ⇓ V if M ev aluates to V . The axioms for the big-step semantics are shown in T able 10. The big-step semantics satisﬁes pr operties similar to th ose of the small-step se- mantics. Lemma 11.5. 1. V alues. F or all values V , we have V ⇓ V . 2. Determinacy . If M ⇓ V a nd M ⇓ V ′ , then V ≡ V ′ . 3. Subject Reduction. If Γ ⊢ M : A an d M ⇓ V , then Γ ⊢ V : A . The analogu es o f the Progress and Safety properties cannot be as easily s tated for big-step redu ction, because we cann ot easily talk ab out a single redu ction step or about inﬁn ite r eduction seque nces. Howe ver , some comfor t can be ta ken in the fact that the big-step semantics and small-step semantics coincide: 104 Proposition 11.6. M → ∗ V iff M ⇓ V . 11.5 Operational equiv alence Inform ally , two terms M and N will be called operationa lly equi valent if M and N are in terchange able as part o f any larger pr ogram, without ch anging the ob- servable behavior of the program . This no tion of eq uiv alence is a lso of ten called observational equiv alence, to emp hasize the fact that it concentrates on ob servable proper ties of terms. What is an observable behavior of a program? Norm ally , what we ob serve about a progr am is its output, such as the characters it prints to a terminal. Since any such characters can be co n verted in p rinciple to natur al num bers, we take the point o f view th at the observable b ehavior of a program is a natur al number that it ev aluates to. Similarly , if a p rogram co mputes a boolean , we regard the b oolean value as observable. Howe ver , we do n ot regard abstract values, such as functions, as being directly observable, on the groun ds that a function cannot be observed until we supply it some arguments and observe th e result. Deﬁnition. An ob servable typ e is either boo l or na t . A r esult is a closed value of observable type. Thu s, a result is e ither T , F , or n . A pr ogram is a closed ter m of observable type. A context is a term with a ho le, written C [ − ] . Formally , the class of contexts is deﬁned by a BNF: C [ − ] ::= [ − ] x C [ − ] N M C [ − ] λx A .C [ − ] . . . and so on, extending through all the cases in the deﬁnition of a PCF term. W ell-typ ed contexts are d eﬁned in the same way as well-typed terms, where it is un derstood that the hole a lso h as a type. T he fre e variables of a context are deﬁned in the same way as f or terms. Mo reover , we deﬁne the captur ed variables of a context to be th ose bo und variables wh ose scop e includes the h ole. So fo r instance, in the con text ( λx. [ − ])( λy.z ) , the variable x is cap tured, the v ariable z is free, and y is n either free nor captured. If C [ − ] is a context and M is a term o f th e appr opriate ty pe, we write C [ M ] for the result of rep lacing the hole in th e c ontext C [ − ] by M . Her e, w e d o not α - rename any b ound variables, so that we allow free variables of M to be cap tured by C [ − ] . W e ar e now ready to state the deﬁnition of operation al equiv alence. 105 Deﬁnition. T wo term s M , N a re op erationally equivalent , in s ymbols M = op N , if for all closed and closing context C [ − ] o f observable ty pe and all values V , C [ M ] ⇓ V ⇐ ⇒ C [ N ] ⇓ V . Here, by a closing co ntext we mean that C [ − ] should capture all the free variables of M and N . This is equiv alent to requiring that C [ M ] and C [ N ] are closed terms of obser vable types, i.e., p rograms. T hus, two term s are eq uiv alent if they can b e used interchan geably in any program. 11.6 Operational appr oximation As a reﬁn ement of op erational equiv alence, we can also deﬁne a no tion of ope ra- tional appr oximation : W e say that M operationally app r oximates N , in sy mbols M ⊑ op N , if for all closed an d closing c ontexts C [ − ] of ob servable type and all values V , C [ M ] ⇓ V ⇒ C [ N ] ⇓ V . Note that this deﬁnitio n includes the case wher e C [ M ] diver ges, but C [ N ] co n- verges, fo r some N . This form alizes the n otion that N is “more deﬁned” than M . Clearly , we have M = op N iff M ⊑ op N an d N ⊑ op M . Thus, we get a pa rtial order ⊑ op on the set of a ll terms of a gi ven ty pe, modulo operational equi valence. Also, this p artial ord er has a least element, na mely if we let Ω = Y ( λx.x ) , then Ω ⊑ op N for any term N of the ap propr iate typ e. Note that, in gen eral, ⊑ op is not a com plete partial order, du e to missing limits of ω -cha ins. 11.7 Discussion of operational equi valence Operationa l equivalence is a very u seful co ncept for reaso ning abo ut prog rams, and p articularly f or reasoning ab out p rogra m fragm ents. I f M an d N are oper a- tionally equ iv ale nt, then we k now th at we can r eplace M by N in any progr am without af fecting its behavior . For exam ple, M cou ld b e a slow , but simple sub- routine for sorting a list. The term N could b e a re placement that runs much faster . If we can p rove M and N to be op erationally equiv alent, then this mean s we can safely use the faster routine instead of the slo wer one. Another exam ple are com piler optim izations. Many com pilers will try to optimize the cod e that they produ ce, to elimin ate useless instru ctions, to av oid dup licate 106 calculations, etc. Such an optim ization of ten means rep lacing a piec e of c ode M by another piece of code N , withou t necessarily knowing much about the context in which M is u sed. Su ch a re placement is safe if M and N a re op erationally equiv alent. On the oth er hand , operatio nal eq uiv alence is a somewhat problematic notion . Th e problem is that the con cept is not stable unde r add ing new lan guage features. It can happ en that two terms, M and N , ar e operation ally equiv alent, but when a new feature is added to the language, th ey become n onequ iv alen t, e ven if M and N do not use the new featu r e . T he reason is the operational eq uiv alence is deﬁned in terms of co ntexts. Ad ding new featur es to a l anguag e also means that there will be new contexts, and these ne w contexts might be able to distinguish M and N . This can be a problem in p ractice. Certain compiler optim izations m ight be sound for a sequ ential lan guage, but migh t beco me unsoun d if new langu age featur es are ad ded. Co de th at used to be cor rect mig ht sudde nly beco me incor rect if used in a rich er environmen t. For examp le, many pro grams an d libr ary fun ctions in C assume that th ey are executed in a single-thr eaded environment. If th is code is ported to a m ulti-threade d en vironmen t, it often tur ns out to b e no lo nger corre ct, and in many cases it must be re-written from scratch. 11.8 Operational equiv alence and parallel or Let us now loo k at a concrete example in PCF . W e say that a term POR imple- ments the parallel or function if it has the following beh avior: POR T P → T , fo r all P POR N T → T , for all N POR FF → F . Note that th is in particular im plies PO R T Ω = T a nd P OR Ω T = T , where Ω is some divergent term. It shou ld be clear why P OR is called th e “parallel” or : the only way to achieve su ch behavior is to evaluate both it s ar guments in parallel, and to stop as soon as one argument e valuates to T o r both e valuate t o F . Proposition 11.7. P OR is not deﬁn able in PCF . W e do no t give the pr oof of this fact, but th e idea is relatively simple: on e proves by in duction th at every PCF context C [ − , − ] with two ho les h as th e following proper ty: either, there exists a te rm N such th at C [ M , M ′ ] = N for all M , M ′ (i.e., th e con text does no t look at M , M ′ at all), or else, eithe r C [Ω , M ] div erges 107 for all M , o r C [ M , Ω] d i verges for all M . Here, again, Ω is some divergent term such as Y ( λx.x ) . Although PO R is not d eﬁnable in PCF , we can d eﬁne th e fo llowing term, ca lled the POR-tester : POR-test = λx. if x T Ω then if x Ω T t hen if x FF t hen Ω else T else Ω else Ω The POR-tester has the p roperty th at POR-test M = T if M imp lements the parallel or functio n, and in all other cases POR- test M diver ges. In particula r , since parallel or is not deﬁnable in PCF , we have th at POR-test M diverges, for all PCF terms M . Thus, when applied to any PCF term, POR-test behaves precisely as the function λx. Ω does. O ne can make this into a rigorous argum ent that sho ws that POR-test and λx. Ω are operation ally equi v alent: POR-test = op λx. Ω (in PCF) . Now , suppo se we want to deﬁne an exten sion o f PCF called p arallel P CF . It is de ﬁned in exactly the same way as PCF , except that we add a new primitive function POR , and small-step red uction rules M → M ′ N → N ′ POR M N → POR M ′ N ′ POR T N → T POR M T → T POR FF → F Parallel PCF enjoys many of the same pr operties as PCF , f or instance , Lem- mas 11.1 – 11.3 and Proposition 11.4 continu e to hold for it. But notice that POR-test 6 = op λx. Ω (in parallel PCF) . This is because the con text C [ − ] = [ − ] PO R distinguishes th e two ter ms: clea rly , C [ POR-test ] ⇓ T , whereas C [ λx. Ω] div erges. 108 12 Complete partial orders 12.1 Why ar e s ets not enough, in general? As we h av e seen in Section 1 0, the interp retation of ty pes as plain sets is quite sufﬁcient for the simply-ty ped lambd a calculus. Howev er , it is insufﬁcient for a languag e such as PCF . Speciﬁcally , the p roblem is the ﬁxed po int op erator Y : ( A → A ) → A . It is c lear that th ere are many fu nctions f : A → A from a set A to itself that do no t have a ﬁxed point; th us, the re is no cha nce we are g oing to ﬁnd an interpretatio n for a ﬁxed point operator in the simple s et-theoretic mo del. On the other hand, if A and B are types, there are gener ally many function s f : [ [ A ] ] → [ [ B ] ] in the set-th eoretic m odel that are not deﬁnable b y lamb da term s. For in stance, if [ [ A ] ] and [ [ B ] ] are inﬁnite sets, then there ar e uncou ntably many function s f : [ [ A ] ] → [ [ B ] ] ; howev er , there are only countably many la mbda terms, and thu s there are nec essarily going to b e functio ns that are n ot the den otation of any lambda term. The idea is to pu t additional stru cture on the sets that inter pret types, an d to r e- quire function s to preserve that structu re. This is goin g to cut down th e size o f the function spaces, decreasing the “slack” between the functions deﬁnable in the lambda calculus a nd the fu nctions that exist in the mo del, and simultan eously in- creasing the chan ces that additional structu re, s uch as ﬁxed point o perators, might exist in the model. Complete partial o rders are on e suc h structure th at is commo nly used for this purpo se. The meth od is originally due to Dana Scott. 12.2 Complete partial orders Deﬁnition. A partially order ed set or poset is a set X togethe r with a bina ry relation ⊑ satisfying • r eﬂe xivity: for all x ∈ X , x ⊑ x , • antisymmetry: for all x, y ∈ X , x ⊑ y and y ⊑ x implies x = y , • transitivity: for all x, y , z ∈ X , x ⊑ y and y ⊑ z implies x ⊑ z . The conce pt of a pa rtial or der d iffers from a total or der in that we d o n ot req uire that fo r any x and y , either x ⊑ y or y ⊑ x . Thu s, in a par tially or dered set it is 109 1 2 3 4 n T F * 0 ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❏ ❏ ✡ ✡ ◗ ◗ ◗ ❏ ❏ ✡ ✡ ✑ ✑ ✑ ✏ ✏ ✏ ✏ ✏ . . . . . . . . . . . . . . . . . . ✧ ✧ ✧ ✧ ✧ ✧ ❜ ❜ ❜ ❜ ❜ ❜ . ❜ ❜ ❜ ✧ ✧ ✧ ❛ ❛ ❛ ❛ ❡ ❡ ✪ ✪ ✦ ✦ ✦ ✦ . . . . . . . . . . . . 1 B 2 N ω ω + 1 B B Figure 4: Some posets permissible to have incompar able elem ents. W e can of ten visualize po sets, par ticularly ﬁnite o nes, by drawing the ir lin e dia- grams as in Figure 4. In th ese d iagrams, we pu t o ne cir cle for each element of X , and we draw an edge f rom x upward to y if x ⊑ y and there is no z with x ⊑ z ⊑ y . Such line diagr ams are also known a s Hasse diagrams . The idea b ehind using a p artial order to denote computational values is that x ⊑ y means th at x is less de ﬁned th an y . For in stance, if a certain ter m diverges, then its d enotation will be less deﬁned than , or below that of a ter m that has a deﬁn ite value. Similarly , a functio n is more de ﬁned than an other if it converges on mo re inputs. Another importan t idea in using posets for modeling computation al value is that of a ppr oximation . W e can thin k of some inﬁn ite comp utational object (such as, an inﬁnite strea m), to be a limit of su ccessi ve ﬁnite appr oximation s (such as, longer and longe r ﬁnite streams). Thu s we also read x ⊑ y as x appr oximates y . A complete partial order is a poset in which every cou ntable ch ain of increasing elements approx imates something. Deﬁnition. Let X b e a p oset and let A ⊆ X be a subset. W e say that x ∈ X is an upper bound for A if a ⊑ x for all a ∈ A . W e say that x is a l east upper b ound for A if x is an up per bound, and whene ver y is also an upp er bound, then x ⊑ y . Deﬁnition. An ω - chain in a poset X is a sequ ence of elem ents x 0 , x 1 , x 2 , . . . 110 such that x 0 ⊑ x 1 ⊑ x 2 ⊑ . . . Deﬁnition. A c omplete partial or der ( cpo) is a p oset such that e very ω -chain o f elements has a least upper boun d. If x 0 , x 1 , x 2 , . . . is an ω -chain of elements in a cpo, we write ❇ ❇ ✂ ✂ ✍ i ∈ N x i for the least upper boun d. W e also call the least upper bound the limit of the ω -ch ain. Not e very po set is a cpo . In Figure 4, t he poset labeled ω is not a cpo, because the evident ω -chain does n ot hav e a least u pper boun d (in fact, it has n o uppe r bound at all). The othe r posets s hown in Figure 4 are cpo’ s. 12.3 Pr operties of limits Proposition 12.1. 1 . Monoto nicity . Suppose { x i } i and { y i } i ar e ω -chains in a cpo C , such that x i ⊑ y i for all i . Then ❇ ❇ ✂ ✂ ✍ i x i ⊑ ❇ ❇ ✂ ✂ ✍ i y i . 2. Exchan ge. Supp ose { x ij } i,j ∈ N is a d oubly mo notone dou ble sequence of elements of a cpo C , i.e., when ever i 6 i ′ and j 6 j ′ , then x ij ⊑ x i ′ j ′ . Then ❇ ❇ ✂ ✂ ✍ i ∈ N ❇ ❇ ✂ ✂ ✍ j ∈ N x ij = ❇ ❇ ✂ ✂ ✍ j ∈ N ❇ ❇ ✂ ✂ ✍ i ∈ N x ij = ❇ ❇ ✂ ✂ ✍ k ∈ N x kk . In particular , all limits shown ar e well-d eﬁned. Exercise 39. Pr ove Pr oposition 12.1. 12.4 Continuous functions If we mod el d ata types as cpo ’ s, it is n atural to mo del algo rithms as fun ctions from cp o’ s to cpo’ s. These f unctions are subject to two co nstraints: they h av e to be monoto ne and continu ous. Deﬁnition. A fu nction f : C → D between posets C and D is said to be mon o- tone if for all x, y ∈ C , x ⊑ y ⇒ f ( x ) ⊑ f ( y ) . 111 A fun ction f : C → D between cpo ’ s C and D is said to be contin uous if it is monoto ne and it pr eserves least upper boun ds of ω -chains, i.e., for all ω -ch ains { x i } i ∈ N in C , f ( ❇ ❇ ✂ ✂ ✍ i ∈ N x i ) = ❇ ❇ ✂ ✂ ✍ i ∈ N f ( x i ) . The in tuitiv e explanation fo r the mono tonicity r equiremen t is that info rmation is “positive”: mo re in formation in th e inpu t cann ot lead to less inf ormation in the output of a n algorithm. The intuiti ve explanation for the contin uity r equiremen t is that any p articular output of a n algorithm can on ly dep end o n a ﬁn ite am ount of input. 12.5 P ointed cpo’ s and s trict functions Deﬁnition. A cpo is said to be pointed if it has a least element. The least element is usu ally d enoted ⊥ and pronou nced “bottom”. All cpo’ s shown in Figure 4 are pointed. A continuou s fu nction between pointed cpo’ s is said to be stri ct if it preserves the bottom element. 12.6 Pr oducts and function spaces If C and D are cp o’ s, then th eir cartesian p r od uct C × D is also a cpo, with the pointwise order g iv en by ( x, y ) ⊑ ( x ′ , y ′ ) iff x ⊑ x ′ and y ⊑ y ′ . Le ast u pper bound s are als o giv en pointwise, thus ❇ ❇ ✂ ✂ ✍ i ( x i , y i ) = ( ❇ ❇ ✂ ✂ ✍ i x i , ❇ ❇ ✂ ✂ ✍ i y i ) . Proposition 12 .2. Th e ﬁ rst an d secon d pr ojectio ns, π 1 : C × D → C a nd π 2 : C × D → D , ar e continuou s functions. Mo r eover , if f : E → C an d g : E → D ar e continuo us fun ctions, then so i s the function h : E → C × D g iven by h ( z ) = ( f ( z ) , g ( z )) . If C and D are cpo’ s, then the set of continuou s fun ctions f : C → D fo rms a cpo, denoted D C . Th e o rder is given p ointwise: giv en two f unctions f , g : C → D , we say that f ⊑ g iff for all x ∈ C , f ( x ) ⊑ g ( x ) . 112 Proposition 12.3 . The set D C of continuo us function s fr om C to D , to gether with the or der just deﬁned, is a complete partial or de r . Pr oo f. Clearly the set D C is partially ordered . What we must show is that least upper bou nds of ω -chains exist. Given an ω -ch ain f 0 , f 1 , . . . in D C , we deﬁne g ∈ D C to be the pointwise limit, i.e., g ( x ) = ❇ ❇ ✂ ✂ ✍ i ∈ N f i ( x ) , for all x ∈ C . No te that { f i ( x ) } i does indeed form an ω -chain in C , so that g is a well-deﬁned function . W e claim th at g is the least up per bound of { f i } i . First we need to show that g is indeed a n element of D C . T o see that g is mono tone, we use Proposition 12.1(1) and calculate, for any x ⊑ y ∈ C , g ( x ) = ❇ ❇✂ ✂ ✍ i ∈ N f i ( x ) ⊑ ❇ ❇✂ ✂ ✍ i ∈ N f i ( y ) = g ( y ) . T o see that g is continuous, we u se Proposition 12.1(2) and calculate, for any ω -cha in x 0 , x 1 , . . . in C , g ( ❇ ❇✂ ✂ ✍ j x j ) = ❇ ❇✂ ✂ ✍ i ❇ ❇✂ ✂ ✍ j f i ( x j ) = ❇ ❇✂ ✂ ✍ j ❇ ❇✂ ✂ ✍ i f i ( x j ) = ❇ ❇✂ ✂ ✍ j g ( x j ) . Finally , we m ust sho w that g is the least upper bound of the { f i } i . Clearly , f i ⊑ g for all i , so that g is an u pper boun d. Now sup pose h ∈ D C is any other upper bound of { f i } . T hen for all x , f i ( x ) ⊑ h ( x ) . Since g ( x ) was d eﬁned to be the least upper bound of { f i ( x ) } i , we then ha ve g ( x ) ⊑ h ( x ) . Since this holds for all x , we have g ⊑ h . Thu s g is indeed the least upper bound .  Exercise 40. Recall the cpo B from Figure 4. The cpo B B is also sho wn in Figure 4. Its 11 elements correspo nd to the 11 continuou s fun ctions from B to B . Label the elements of B B with the functions they correspond to. Proposition 12.4. Th e application function D C × C → D , which maps ( f , x ) to f ( x ) , is continu ous. Proposition 12.5 . Continu ous functions ca n b e co ntinuou sly curried and u n- curried. In other wor ds, if f : C × D → E is a continuou s function , then f ∗ : C → E D , deﬁ ned by f ∗ ( x )( y ) = f ( x, y ) , is well-deﬁned and contin uous. Con versely , if g : C → E D is a continuou s function, then g ∗ : C × D → E , d e- ﬁned by g ∗ ( x, y ) = g ( x ) ( y ) , is well-deﬁ ned and continuou s. Mor eover , ( f ∗ ) ∗ = f and ( g ∗ ) ∗ = g . 113 12.7 The i nterp r etation of the si mply-typed lambda calculus in complete partial orders The interpretation of the simp ly-typed lamb da calculus in cpo’ s resembles the set- theoretic inter pretation, except that typ es are interpreted by cp o’ s in stead of sets, and typing judgm ents are interpreted as continuo us function s. For each basic type ι , assume that we h av e chosen a pointed cpo S ι . W e can then associate a pointed cpo [ [ A ] ] to each type A recursively: [ [ ι ] ] = S ι [ [ A → B ] ] = [ [ B ] ] [ [ A ] ] [ [ A × B ] ] = [ [ A ] ] × [ [ B ] ] [ [1] ] = 1 T y ping judgmen ts are now in terpreted as continu ous function s [ [ A 1 ] ] × . . . × [ [ A n ] ] → [ [ B ] ] in precisely th e same way as the y were deﬁn ed for the set- theoretic interpretation. The on ly thing we need to check , at ev ery step, is that the fu nction deﬁned is indeed continuo us. For variables, this follows f rom the fact that projectio ns o f cartesian products are continuou s (Prop osition 12.2). For applications, we use the fact tha t th e app lication function o f cp o’ s i s co ntinuou s (Proposition 12.4), and f or lambda-ab stractions, we use the fact that cu rrying is a well-deﬁned, contin uous operation (Proposition 1 2.5). Finally , the contin uity o f the m aps associated with produ cts and projections follows from Proposition 12.2. Proposition 1 2.6 (Soundne ss an d Completeness) . The interpr e tation of th e simply- typed lambda ca lculus in pointed cp o’ s is sound and com plete with r espect to the lambda- β η ca lculus. 12.8 Cpo’ s and ﬁxed points One of the reason s, mentio ned in the intro duction to this section , for using cpo ’ s instead o f sets for th e interpretatio n of the simply-ty ped lambda ca lculus is that cpo’ s ad mit ﬁxed point, and thus th ey can be used to inter pret an extension of the lambda calculus with a ﬁxed point operator . Proposition 12.7. Let C be a p ointed cpo a nd let f : C → C be a con tinuous function. Then f ha s a least ﬁxed point. 114 Pr oo f. Deﬁne x 0 = ⊥ and x i +1 = f ( x i ) , for all i ∈ N . The resulting sequ ence { x i } i is an ω -chain , because clear ly x 0 ⊑ x 1 (since x 0 is th e least element), and if x i ⊑ x i +1 , then f ( x i ) ⊑ f ( x i +1 ) b y monoto nicity , hence x i +1 ⊑ x i +2 . It follows by in duction that x i ⊑ x i +1 . Let x = ❇ ❇ ✂ ✂ ✍ i x i be th e limit of th is ω -chain. Then using continuity of f , we have f ( x ) = f ( ❇ ❇ ✂ ✂ ✍ i x i ) = ❇ ❇ ✂ ✂ ✍ i f ( x i ) = ❇ ❇ ✂ ✂ ✍ i x i +1 = x. T o prove that it is th e least ﬁxed point, let y be any other ﬁx ed poin t, i.e., let f ( y ) = y . W e prove b y induction tha t for all i , x i ⊑ y . For i = 0 this is trivial because x 0 = ⊥ . Assume x i ⊑ y , then x i +1 = f ( x i ) ⊑ f ( y ) = y . It follows th at y is an upper bound for { x i } i . Since x is, by deﬁnition, the least upper bound, we have x ⊑ y . Since y was arbitrar y , x is below any ﬁxed poin t, hence x is the least ﬁxed point of f .  If f : C → C is any co ntinuou s function, let us w rite f † for its least ﬁxed point. W e claim that f † depend s contin uously on f , i.e., that † : C C → C deﬁnes a continuo us function. Proposition 1 2.8. The fun ction † : C C → C , wh ich a ssigns to each con tinuous function f ∈ C C its least ﬁxed point f † ∈ C , is contin uous. Exercise 41. Pr ove Pr oposition 12.8. Thus, if we add to the simp ly-typed lambda calculus a family of ﬁxed point op- erators Y A : ( A → A ) → A , the resultin g extended lambd a calculus can th en b e interpreted in cpo’ s by letting [ [ Y A ] ] = † : [ [ A ] ] [ [ A ] ] → [ [ A ] ] . 12.9 Example: Str eams Consider streams of characters from som e alpha bet A . Let A 6 ω be th e set of ﬁnite or inﬁnite sequences of ch aracters. W e o rder A b y the p r eﬁx ordering : if s a nd t are (ﬁnite or inﬁnite) sequences, we say s ⊑ t if s is a preﬁx of t , i.e., if the re exists a seq uence s ′ such th at t = ss ′ . No te that if s ⊑ t and s is an inﬁnite sequence, then necessarily s = t , i.e., th e inﬁnite seq uences are th e maxim al elements with respect to this order . Exercise 42. Pr ove th at the set A 6 ω forms a cpo under the preﬁx ordering . 115 Exercise 43. Consider an au tomaton that reads cha racters from an inpu t stream and writes chara cters to an o utput stream . For each inpu t ch aracter read , it c an write zero, on e, or more ou tput characters. Discuss how such an autom aton gi ves rise to a con tinuous function from A 6 ω → A 6 ω . In par ticular , e xplain the mean - ing of mono tonicity and continuity in this context. G i ve some e xamples. 13 Denotational semantics of PCF The denotational sema ntics of PCF is d eﬁned in ter ms of cpo’ s. It e xtends the cpo semantics of the simply-ty ped lambda calculus. Again, we assign a cp o [ [ A ] ] to each PCF type A , and a continuo us function [ [Γ ⊢ M : B ] ] : [ [Γ] ] → [ [ B ] ] to e very PCF typing judgment. T he interpretation is deﬁned in precisely the same way as for the s imply-typ ed lamb da calculu s. The in terpretation fo r the PCF- speciﬁc terms is shown in T a ble 11. Recall that B and N are the cpo’ s of lifted boolean s and lifted natural number s, respectiv ely , as sho wn in Figure 4. Deﬁnition. T wo PCF terms M and N of e qual types are deno tationally equ i v- alent, in symbols M = den N , if [ [ M ] ] = [ [ N ] ] . W e also wr ite M ⊑ den N if [ [ M ] ] ⊑ [ [ N ] ] . 13.1 Soundne ss and a dequacy W e have now d eﬁned the th ree notion s of equivalence on ter ms: = ax , = op , and = den . In gen eral, one does not expect the three equ i valences to co incide. For example, any two d iv ergent terms are ope rationally equ i valent, but there is n o reason why th ey should b e ax iomatically eq uiv alent. Also, th e POR-tester an d the term λx. Ω are operation ally equivalent in PCF , b ut the y are not d enotationally equiv alent (since a f unction r epresenting POR clearly exists in th e cpo semantics). For general terms M and N , one has the following proper ty: Theorem 13.1 (Sou ndness) . F or PCF terms M an d N , the following imp lications hold: M = ax N ⇒ M = den N ⇒ M = op N . 116 T y pes: [ [ bool ] ] = B [ [ nat ] ] = N T erms: [ [ T ] ] = T ∈ B [ [ F ] ] = F ∈ B [ [ zero ] ] = 0 ∈ N [ [ succ ( M )] ] =  ⊥ if [ [ M ] ] = ⊥ , n + 1 if [ [ M ] ] = n [ [ pred ( M )] ] =    ⊥ if [ [ M ] ] = ⊥ , 0 if [ [ M ] ] = 0 , n if [ [ M ] ] = n + 1 [ [ iszero ( M )] ] =    ⊥ if [ [ M ] ] = ⊥ , T if [ [ M ] ] = 0 , F if [ [ M ] ] = n + 1 [ [ if M t hen N else P ] ] =    ⊥ if [ [ M ] ] = ⊥ , [ [ N ] ] if [ [ M ] ] = F , [ [ P ] ] if [ [ M ] ] = T , [ [ Y ( M )] ] = [ [ M ] ] † T a ble 11: Cpo semantics of PCF 117 Soundn ess is a very u seful prop erty , be cause M = ax N is in g eneral easier to prove than M = den N , and M = den N is in turns easier to prove than M = op N . Thus, soun dness gi ves us a powerful proof meth od: to pr ove that two terms are operation ally eq uiv a lent, it suf ﬁces to sho w that they are equ iv a lent in th e cpo semantics (if they are), or e ven that they are axiomatically equiv alent. As the a bove examples show , the conv erse implica tions are not in gen eral tru e. Howe ver , the conv erse implicatio ns ho ld if the terms M and N are closed an d of observable typ e, a nd if N is a v alue. This prop erty is called c omputation al adequacy . Recall that a program is a closed term of observable type, an d a result is a closed value of observable type. Theorem 13.2 (Com putational Adeq uacy) . I f M is a pr ogram and V is a r esult, then M = ax V ⇐ ⇒ M = den V ⇐ ⇒ M = op V . Pr oo f. First n ote that the small-step sem antics is contain ed in the ax iomatic se- mantics, i.e., if M → N , then M = ax N . Th is is easily shown by induction on deriv ations of M → N . T o pr ove the theorem , by soun dness, it sufﬁces to show th at M = op V im plies M = ax V . So assum e M = op V . Since V ⇓ V and V is o f ob servable type, it follows that M ⇓ V . Th erefore M → ∗ V by Proposition 11.6. But this already implies M = ax V , and we are done.  13.2 Full abstraction W e have a lready seen th at th e o peration al and denotatio nal sema ntics d o no t c o- incide for PCF , i.e., there are som e terms such that M = op N but M 6 = den N . Examples of such terms are POR-test and λx. Ω . But of cou rse, the particular denotational seman tics th at we gave to PCF is not the only possible den otational seman tics. One can ask wh ether there is a b etter on e. For i nstance, instead of cpo’ s, we could ha ve used some other kind of mathemati- cal space, such as a cpo with additional stru cture or pr operties, or some other kind of o bject alto gether . Th e search for goo d deno tational semantics is a su bject of much research. The following ter minolog y help s in deﬁning pre cisely w hat is a “good ” denotational s emantics. Deﬁnition. A d enotationa l semantics is called fully ab stract if for a ll terms M and N , M = den N ⇐ ⇒ M = op N . 118 If the d enotational semantics in volves a partial ord er (such as a cpo semantics), it is also called or der fully abstract i f M ⊑ den N ⇐ ⇒ M ⊑ op N . The search fo r a fully abstrac t denotation al semantics for PCF was an op en prob- lem for a v ery long time. Milner proved that th ere could be at most o ne such fully abstract mod el in a certain sense. This mo del has a syntactic descriptio n (essentially the eleme nts o f the m odel ar e PCF term s), but for a long time, no satisfactory semantic d escription was k nown. The problem ha s to do with sequen- tiality: a f ully abstract mo del fo r PCF mu st b e able to ac count for the fact th at certain par allel constructs, such as p arallel or, are not deﬁnable in PCF . Th us, the model sho uld con sist only of “sequ ential” fun ctions. Berry an d o thers developed a the ory o f “stable domain theor y”, which is b ased on cp o’ s with a ad ditional proper ties in tended to capture sequentiality . This research led to many interestin g results, but the mod el s till failed to be fully abstract. Finally , in 19 92, two competing teams of resear chers, Abramsky , Jagadeesan and Malacaria, and Hy land and Ong, succeeded in g iving a fully abstract seman tics for PCF in term s of games and strategies. Games cap ture the interactio n between a player and an o ppon ent, or between a pro gram and its environment. By co nsid- ering cer tain kinds of “history -free” strategies, it is po ssible to cap ture the n otion of sequen tiality in just the righ t way to m atch PCF . In th e last decade, ga me se- mantics has been extend ed to give fully abstrac t sema ntics to a variety o f oth er progr amming l anguag es, including, for instance, Algol-like languages. Finally , it is interesting to no te that the problem with “parallel o r” is essentially the on ly obstacle to full abstrac tion for th e cpo semantics. As soo n as one add s “parallel or” to the languag e, the semantics become s fully abstract. Theorem 13.3. The cpo semantics is fully abstr act for parallel PCF . 14 Acknow ledgements Thanks to Field Cady , Brendan Gillon, and Francisco Rios for reportin g typos. 119 15 Bibliograph y Here ar e some textboo ks and other bo oks o n the lambda calculu s. [ 1] is a standar d referenc e handboo k on the lam bda calculu s. [2]–[4] ar e textboo ks on the la mbda calculus. [5]–[7] are textbo oks o n the semantics of program ming langua ges. Fi- nally , [8]–[9] are textbo oks o n writing compilers for functional p rogram ming lan- guages. They show h ow th e lambd a calculus ca n be useful in a more prac tical context. [1] H. P . Bare ndregt. The La mbda Calculus, its Sy ntax and Sema ntics . North - Holland, 2nd edition, 1984 . [2] J.-Y . Gira rd, Y . Lafont, and P . T aylor . Pr oofs and T ypes . Cambridge Uni versity Press, 1989 . [3] J.-L. Krivine. Lambda -Calculus, T ypes and Mod els . M asson, 1993. [4] G. E. R ´ ev ´ esz. Lambda-Calcu lus, Combinators an d Functio nal Pr ogramming . Cambridge University Pre ss, 1988. [5] G. W inskel. The F orma l S emantics of Pr ogr amming Languages. An Intr oduc- tion . MIT Press, Lon don, 1993. [6] J. C. Mitchell. F oundation s for Pr ogramming Languages . M IT Press, Lond on, 1996. [7] C. A. Gunter . Semantics of Pr ogramming Langu ages . MIT Press, 19 92. [8] S. L. Peyton Jon es. The Implementatio n of Function al Pr ogramming Lan- guages . Prentice- Hall, 1987. [9] A. W . Appel. Compiling with Continu ations . Camb ridge University Press, 1992. 120

Lecture notes on the lambda calculus

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment