CP-logic: A Language of Causal Probabilistic Events and Its Relation to Logic Programming

This papers develops a logical language for representing probabilistic causal laws. Our interest in such a language is twofold. First, it can be motivated as a fundamental study of the representation of causal knowledge. Causality has an inherent dyn…

Authors: ** Joost Vennekens, Marc Denecker, Stijn Bruynooghe (et al.) **

Under c onsider ation for public ation in The ory and Pr actic e of Lo gic Pr o gr amming 1 CP-lo gic: A L ang uage of Causal Pr ob abilistic Events and Its R elation t o L o gic Pr o gr amming JOOST VENNEKENS MAR C DENE CKER MA URICE B R UYNOOGHE { joost, marcd, maurice } @cs.kuleuven.b e Dept. of Computer Sc ienc e, Katholieke U niversiteit L euven Celestijnenlaan 200A B-3001 L euven, Belgi um submitte d July 2008; r evise d 6 January 2009; ac c epte d 2 April 2009 Abstract This pap ers develops a logical language for represen ting probabilistic causal laws. Our interes t in suc h a language is tw ofold. First, it can b e motiv ated as a fundamental study of the representa tion of causal knowledge. Causalit y has an inherent dynamic asp ect, whic h has b een stud ied at the seman tical level by Shafer in his framew ork of p rob ab ility trees. In such a dyn amic context, w here the evolution of a domain o ver time is considered, the idea of a causal law as something whic h guides th is evolution is quite natural. I n our fo rmalization, a set of probabilistic causal laws can b e used to represent a class of probabilit y trees in a concise, flexible and mo dular wa y . I n this w ay , our work extend s Shafer’s by offering a conv enient logica l representation for h is seman tical ob jects. Second, this language als o h as relev ance for the area of probabilistic log ic p rogramming. In particular, w e pro ve that the formal semantics of a theory in our language can b e equiv- alen tly defined as a probabilit y distribution over th e well -found ed mo dels of certain logic programs, ren dering it formally quite similar to ex isting languages suc h as ICL or PRI SM. Because we can motiv ate and explain our language in a completely self-con tained w ay as a representation of probabilistic causal la ws, t his provides a new wa y of exp laining the intuitions b ehind such prob ab ilistic logic programs: we can sa y precisely which k nowl edge such a p rogram ex presses, in terms that are equally understand ab le by a non- logician. Moreo ver, we also obtain an additional piece of kn o wledge representation metho d ology for probabilistic logic p rograms, by sho wing how they can express probabilistic causal la ws. KEYWORDS : Uncertaint y , Causalit y , Probabilistic Logic Programming 1 In tro duction Logic based languag es, such as logic prog ramming, play an imp ortant role in knowl- edge repres e n tatio n. One of the known w eaknes ses of such languages is that they are not well suited for repres e nting probabilistic or uncertain knowledge. This has prompted a significa nt amount of research into pr ob abilistic lo gic pr o gr amming lan- guages, bo th in the knowledge representation comm unity itself, as well as in machine 2 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe learning, wher e such la nguages are developed for the purp ose of sto chastic r elational le arning . Synt a c tically , such a language typically annotates a log ic progr amming rule, or so me par t thereo f, with a pr o bability; the formal semantics of the langua ge then somehow sp ecifies a probability distribution—typically over a set o f po ssible worlds—in terms o f these individual pro ba bilities. This is the wa y in which these probabilistic logic pr ogramming languag es tend to be forma lly de fined. How ever, such a for mal definition still leaves one imp or tant ques tion unanswered, namely that of ho w express io ns in the langa uge sho uld be understo o d on the informal level, i.e., how would one explain their intuitiv e meaning to a non-lo gician? F or the tw o sepe r ate compo nents of logic progra mming and probability , this question has o f co urse already b een a ddressed at leng th. F or instance, the info rmal meaning of lo gic progr ams—and in particular its negation-a s-failure connective— has b een e x plained amo ng others in epistemic terms, refer ring to the b eliefs of a rational a gent (Gelfond and L ifschitz 1991), and in terms of the well-known math- ematical concept of an inductive definition (Deneck er 1998). The meaning of sta te- men ts in proba bility calculus, on the o ther hand, has been explained among oth- ers in frequentist terms, e.g. (V enn 18 66), and in terms of degrees of b elief, e.g. (De Finetti 1937). So far , resear ch on probabilis tic logic programming langua ges has no t yet paid a great deal o f atten tion to this issue of the infor mal meaning of expr essions. It tends to b e ass umed that one a lr eady has s ufficient intuitions ab out the meaning of log ic progra ms and that the pr obabilities can simply b e tac ked o n top of that. T his paper presents a n effo r t to develop a probabilistic log ic progr amming langauge, whose informal s e mantics 1 is explained in full detail in a completely self-contained wa y . In genera l, the adv antage o f such an appro ach is that it gives more philoso phical insight in to the meaning of statemen ts in the lang uage, makes it easier to expla in it to domain ex per ts, and can help to provide a b etter mo deling metho dolo gy for it. One of the key tasks that s uch an e ffo r t needs to a c complish is to show con- vincingly that the formal semant ics of the languag e indeed co r rectly captures the informal meaning that is attributed to its ex pr essions, i.e., that these express io ns indeed mean—formally —what we claim they—in tuitively—mean. T o ensure that this is done pro per ly , we will adopt a constructiv e appr oach, where we fir s t descr ib e a particular kind of knowledge that we wan t to repres ent , then show ho w we can formalise the meaning of this k nowledge in a w ay which is straightf or ward enough for its correc tness to b e intuitiv ely obvious, and finally prove that the language we hav e thus defined is actually equiv alent to a certain pro babilistic logic pro gramming construction. The language that we dev elo p will attempt to formalise pr ob abilistic c ausal laws . The use of ca usal laws to compactly represent domains is commonplace in v arious 1 The informal semant ics of a l anguage is commonly also referred to as its “i ntuitiv e reading”. W e prefer the term “inf ormal semantics”, ho wev er, because it s tr esses the close relation that there (should) exist(s) to the formal seman tics. . CP-lo gic: A L anguage of Causal Pr ob abilistic Events 3 John and Mary ar e e ach holdi ng a r o ck. With pr ob ability 0 . 5 , Mary wil l thr ow her r o ck at a window. This wil l br e ak the window with pr ob ability 0 . 8 . John wil l then also thr ow hi s r o ck at the wi ndow. He wil l hit it with pr ob abil ity 0 . 6 . Fig. 1. The story o f John and Mary . • 0 . 5 Mary throws v v m m m m m m m m m m m m m 0 . 5 doesn’ t throw ( ( Q Q Q Q Q Q Q Q Q Q Q Q Q • 0 . 8 Window breaks v v m m m m m m m m m m m m m 0 . 2 doesn’ t break ( ( Q Q Q Q Q Q Q Q Q Q Q Q Q • 1 John throw s   • 1 John throw s   • 1 John throws   • 0 . 6 Window breaks   0 . 4 doen s’ t break ( ( Q Q Q Q Q Q Q Q Q Q Q Q Q • 0 . 6 Window breaks   0 . 4 doesn’ t break ( ( Q Q Q Q Q Q Q Q Q Q Q Q Q • 0 . 6 Window breaks   0 . 4 doesn’ t break ( ( Q Q Q Q Q Q Q Q Q Q Q Q Q • • • • • • Fig. 2. Probability tree for the window br eaking story . action languages , related to logic pr ogra mming , e.g. (Gelfond and L ifschitz 1993). Here, w e will inv estiga te a pro ba bilistic v aria nt o f such laws. W e will do this in the s emantic context developed by Shafer (19 96). In this work, Shafer pres e nts his view on a num ber of fundamen tal causal and pr obabilistic concepts. His central hy- po thesis is that suc h concepts are b est considered in an explicitly dynamic context: when sp eaking of probability o r causality , we should do so, he says, in the co nt ext of a particular story ab out how the doma in evolv es, which he formalises by mea ns of pr ob ability tr e es . As he himse lf puts it: A full understanding of probability and causalit y requires a language for talking about the structure of contingency—a language for talking ab out th e step-b y- step unfolding of even ts. This b o ok dev elops such a language based on an old and simple y et general and flexible idea: t h e probability tree. Figure 2 depicts a pro bability tree co rresp onding to the sto r y shown in Fig ure 1. In natural la nguage, w e could s ay that such a tree paint s the following picture. The domain star ts out in a n initial state . Then, some event happ e ns, which causes the domain to transition to a new state. How ever, w e do not know up front precisely which new state this is going to b e, exactly; instead, the new state is chosen prob- abilistically from a set of a lternatives. F or instance, in the initial state o f the tree in Figure 2, the even t happ ens that Mary makes up her mind whether to throw, which leads to either a s tate in which she do es o r a state in which she do es no t. This s tep is then re p ea ted—that is, in the new state, a different even t ha ppens , which leads to another new state, ag ain c hose n proba bilistically from some set o f alternatives—un til finally this pro cess arrives at a final st ate , in whic h no further even ts happ en. Throughout this pap er, we will con tinue to talk about probability trees using the 4 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe language intro duced ab ov e. In particula r, we will car ry on using the word event to refer to the o ccurrence that ca uses a transition from one state to the next. This use of the term differs fro m its standard use in pr obability theor y , where it denotes a set of p ossible outcome of an exp eriment. Shafer intro duces the terms Hu me an event (that whic h causes a transition betw een states) and Demoivrian event (a set of p ossible outcomes ) to distinguish betw een these tw o different meanings o f the word. Using the c hain rule, the pro bability of following any particula r branch in this tree can b e computed as the pro duct o f the pro babilities of the individual edges. F or instance, the probability of the left-most branch o f the tree in Figure 2 is 0 . 5 · 0 . 8 · 1 · 0 . 6 = 0 . 24 . A Demoivrian even t corr esp onds to a sets of branches of the tree. F or instance, the Demoivrian even t o f the window being broken cor resp onds to the set of all branches in which it breaks, i.e., the first, second, third and fifth branch. The probability of such a Demoivrian even t can b e computed as the sum of the probabilities of all branches that b elong to it, e.g ., the proba bilit y of the window breaking is 0 . 24 + 0 . 1 6 + 0 . 06 + 0 . 3 = 0 . 76. In the res t of this pap er, we reserve the term “even t” for Humean even ts (i.e., tr ansitions be tw een sta tes) and will therefore omit the mo difier. This pap er will develop a languag e for describing the causa l laws ac c ording to which a probability tree unfolds. In other w or ds, we will assume that each ev ent in such a tree happ ens for a r e ason , i.e., that it is actually ca used b y some particula r prop erty of the state in which it ha ppens . W e then construct a la nguage that allows to describ e these reasons. In the e xtreme case, it might b e that we can say nothing more than that each sta te of the tre e is in itself the reason for the even t that happ ens there; in this case, w e obtain nothing more than a n alternative repre s entation for the tree itself. How ever, if the same even t ca n ha ppe n in different parts of the tree, each time caused by the sa me prop erty of the state in which it ha ppens , we might end up with a significantly more compact representation. In the story in Figure 1, w e find four pro ba bilistic causal laws: • John throwing his ro ck ca uses the window to break with proba bilit y 0 . 6; • Mary throwing her ro ck ca us es the window to break with proba bilit y 0 . 8; • Mary decides to throw with pro bability 0 . 5 (this even t is v a cuously c aused); • John alw ays throws (this even t is also v acuously caused and it has o nly one po ssible outcome). In the language that we will develop in the next section, we will write down suc h probabilistic causal laws in the format: p ossible effe cts ← c ause where the cause can b e omitted if the even t is v acuously c a used. In this syntax, the ab ov e pr obabilistic causal laws can b e written down as: ( B r eak : 0 . 8 ) ← T hrow s ( M ary ) . (1) ( B r eak : 0 . 6 ) ← T hrow s ( J ohn ) . (2) ( T hrow s ( M ary ) : 0 . 5 ) . (3) T hrow s ( J ohn ) . (4) CP-lo gic: A L anguage of Causal Pr ob abilistic Events 5 • 1 John throw s   • 0 . 6 Window breaks v v m m m m m m m m m m m m m 0 . 4 doesn’ t break ( ( Q Q Q Q Q Q Q Q Q Q Q Q Q • 0 . 5 Mary throws v v m m m m m m m m m m m m m 0 . 5 doesn’ t throw   • 0 . 5 Mary throws v v m m m m m m m m m m m m m 0 . 5 doesn’ t thro w   • 0 . 8 Window breaks v v m m m m m m m m m m m m m 0 . 2 doesn’ t break   • • 0 . 8 Window breaks v v m m m m m m m m m m m m m 0 . 2 doesn’ t break   • • • • • Fig. 3. Alternate probability tree for the window breaking story . In this representation, Jo hn a nd Ma ry each get their own proba bilis tic causal law. This is neces sary because they indeed thro w differen tly , causing the windo w to break w ith different probability . How ever, w e can also imagine a n ex ample in which bo th hit the windo w with the same proba bilit y . In this case, our language also allows a more compact repres entation, using a v ariable to range over the different per sons that might throw: ∀ x ( B re ak : 0 . 8) ← T hrow s ( x ) . The mea ning of such a statement is as one would ex p ect: each pa rticular pe rson that throws (i.e, each instantation of x for which T hr ow s ( x ) holds) hits the window with 0 . 8. If we compare our four pro babilistic causal laws to the pro bability tree in Figure 2, w e see that this tre e indeed o be ys the causal laws, in the following sense: • As we go do wn an y branch of the tree, we find that all even ts whic h should happ en a ccording to our causal laws actually do so. The tw o unconditional causal laws (3) and (4) state that the even ts that Mary decides whether she will throw and that John decides that he will throw should alwa ys happ en; and indeed, we find that in each branch of the tree, they do. In the le ft-most branch of the tree, for instance, the result o f these t wo events is that b oth Mary and John decide to thro w, so accor ding to causal laws (1) and (2) the t wo event s by which their r e sp e ctive throws br eak the window s hould also happ en; and aga in, they indeed do. • The ev ents that happe ns accor ding to the caus al laws a r e also the only events that ha ppen. F or instance, in the right-most branch, Mary has decide d not to thro w, so the even t of her ro ck br e aking the window with probability 0 . 8 do es not happ en. Moreov er, each of the ev ents which sho uld happ en happ ens precisely once; it is not the ca se, for instance, that o nce Mary has decided to throw, the even t of her rock breaking the windo w with probability 0 . 8 k eeps on happe ning ad infin itum . T o define the se mantics of our la nguage, we will formalise this idea of a pr oba- 6 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe bilit y tree ob eying a se t of probabilis tic caus a l laws. W e will call suc h an obe y ing probability tree an exe cution mo del o f the s et of causal laws. In general, a single set of ca usal laws might hav e man y such execution models. Indeed, it is clear that a probability tre e contains more information than the causa l laws: ev ents in the tree are tota lly or dered (for instance, in Figure 2, Mar y decides to throw befor e Jo hn do es), wher eas the ca usal laws only pr ovide a partial or der on the even ts (for instance , the even t o f Mary’s ro ck hitting the window c an only happ en after the ev ent of Ma ry deciding to thro w, since one causes the other; how ever, no order is imp osed b e t ween e.g. the even ts of John thro wing and Mary throwing). How ever, the additional informatio n that is co n tained in a pr obability tree is actua lly ir relev ant fo r the fina l o utcome that will b e r eached. Let us co nsider, for instance, the alter native tree o f Figure 3 , in whic h John throws b e fore Mar y do es. This tree also obeys our four causal laws, yet has a different structure than the tree in Figure 2. Ho wev er, we see that the probability of the window ev entually breaking is nev ertheless precisely the sa me in this tree as it was in the other one, namely 0 . 76. La ter in this paper, w e will pr ov e that all execution mo dels of a set o f causal laws always generate precisely the same probability distribution. In this sense, ca us al laws manage to captur e the essence o f a pr o bability tr ee, while allowing irrelev ant details (e.g. do es J ohn throw first o r do es Mary throw first?) to be ignored. This renders our re presentation quite succint. Shafer’s bo ok also recog nizes that the naive gra phica l repr e s entation of probabil- it y tr ees tends to gr ow unwieldy rather quickly . In the final chapter of his bo ok, he therefore briefly examines a n umber o f alternative, mor e compact representations for such trees, including Bayesian nets (Pearl 1988) and a representation base d on Martin-L¨ of t yp e theory (Martin-L¨ of 1 9 82). In b oth these representations, new even ts are caused b y the outcomes of some fixed se t of pre vious even ts. The de- scription of an ev ent itself ther efore already carries within it certain restrictions on the order in which even ts ca n happ en. By constra st, in CP-logic even ts are not caused directly b y previo us ev ents, but ra ther b y pro per ties of the state in which they happ en. The fac t that we do not repr e sent any explicit a priori information ab out the order in which even ts happ en makes o ur repr esentation mo r e flexible and allows pro babilistic causal laws to e asily b e reused in differ ent co ntexts. Let us suppo se, for instance, that we know a proba bilistic causa l law that descr ibe s one particular wa y in whic h a certain disease can ca use certain symptom. This law can then be reused, without change, regardles s of what might cause the diseas e, whic h other c a uses there might b e for the same symptom, or even whether there ar e still other wa ys in whic h the same diseas e migh t also cause the sa me symptom. This pa pe r is str uctured a s follo ws. Section 2 briefly in tro duces s ome preliminary concepts from lattice theo ry and als o logic prog ramming. In Section 3, we formally define an initial, restricted v ers ion o f CP-logic . In Section 4, w e show how a certain kind of pro cess can be mo deled in this basic langua ge, whic h also suggests a w ay of defining a mo re genera l v ers ion of CP-lo gic. This will be done in Section 5. Section 6 then discusses the resulting definitions in mor e detail. In Section 7, we inv estigate the precise r elation betw een CP-lo gic and Bay esia n netw orks. Section 8 r elates CP- CP-lo gic: A L anguage of Causal Pr ob abilistic Events 7 logic to log ic pr ogra mming. Finally , Section 9 discuss es some r elated work. Pro of of the theorems presented in this pap er will be given in App endix Appendix A. Part of the material in this pap er was pre sented at c o nferences (V ennekens et al. 2004) and (V ennek ens et a l. 2 006). 2 Preliminaries This s ection recalls s o me well-known definitions and results fro m lattice theory and logic pro gramming. T o a large extent, this material is rele v ant only for the pro ofs of our theorems. It can safely b e skipp ed on a first r eading o f this pap er. 2.1 Some c onc epts fr om lattic e the or y A binary r elation ≤ on a set L is a p artial or der if it is r eflexive, transitive and anti-symmetric. A pa rtially order ed set h L, ≤i is a lattic e if every pa ir ( x, y ) o f elements of L has a unique least upp er b ound a nd greates t low er b o und. Such a lattice h L, ≤ i is c omplete if every non-empt y subset S ⊆ L has a leas t upper bo und and greatest low er b ound. A complete lattice has a least element ⊥ and a greatest element ⊤ . An oper ator O : L → L is monotone if for every x ≤ y , O ( x ) ≤ O ( y ). An element x ∈ L is a pr efixp oint of O if x ≥ O ( x ), a fixp oint if x = O ( x ) and a p ost fi xp oint if x ≤ O ( x ). If O is a mono tone op era tor o n a complete lattice, then for every po stfixp oint y , there exists a least element in the set of all prefixpoints x of O for whic h x ≥ y . This least prefixp o int greater than y of O is also the least fixp oint greater than y of O . Mor eov er, it can b e constructed by successively applying O to y , i.e., a s the limit of the s equence ( y , O ( y ) , O ( O ( y )) , . . . ). In particular, b ecause ⊥ is a triv ia l postfixp o in t, O has a lea s t pre fixpo int which is equal to its least fixp oint and whic h ca n be co nstructed by successive applicatio n of O to ⊥ . 2.2 Some c onc epts fr om lo gic pr o gr amming W e assume fa milia rity with classica l lo gic. A Herbr and interpretation for a vocab- ulary Σ is an in terpretatio n, which has as its domain the se t H U (Σ) o f all ground terms that can b e co nstructed fro m Σ and which in terprets each constants as it- self and ea ch function sy mbo l f /n as the function mapping a tuple ( t 1 , . . . , t n ) to f ( t 1 , . . . , t n ). W e can identify a Her brand interpretation with a s e t o f gro und atoms. A p artial Her brand int er pr etation is a function ν from the s e t H B (Σ) of all ground atoms, also called the Herbr and b ase , to the set of truth v alues { f , u , t } . A (tota l) Herbrand interpretation cor resp onds to a partial Herbr and int er pretation that do es not include u in its range. On the set of truth v a lues, one defines the pr e cision or der : u ≤ p f and u ≤ p t and the truth or der : f ≤ t u ≤ t t . 8 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe These order s can b e p o int wise extended to partia l Herbrand interpretations. Each totally order e d set S of par tial Herbrand in terpr etations has a ≤ p -least upp erb ound denoted l u b ≤ p ( S ). The three-v alued truth function ϕ ν for s entences ϕ a nd par tial Herbrand in terpr e tations ν is defined by induction: • p ν = ν ( p ), for p ∈ H B (Σ); • ( ψ ∧ φ ) ν = M in ≤ t ( ψ ν , φ ν ); • ( ∀ x φ ( x )) ν = M in ≤ t ( { φ ( t ) ν | t ∈ H U (Σ) } ). • ( ¬ ϕ ) ν = ( ϕ ν ) − 1 where f − 1 = t , t − 1 = f , u − 1 = u . A crucial monotonicity prop er t y of three-v alued logic is that ν ≤ p ν ′ implies ϕ ν ≤ p ϕ ν ′ . The well-founded semantics of logic progra ms was originally defined in (V an Gelder et a l. 1991). W e prese nt an equiv alent definition that was developed in (Deneck er and V ennekens 2007). F ormally , a lo g ic program P is a set of rules of the for m p ← φ , where p is a ground atom and φ is a first-order sentence. Definition 1 ( wel l-founde d induction ) W e define a wel l-founde d induction o f P as a sequence of pa rtial Herbr and inter- pretations ( ν α ) 0 ≤ α ≤ β satisfying the following conditions: • ν 0 = ⊥ ≤ p , the mapping of all atoms to u ; • ν λ = lub ≤ p ( { ν β | β < λ } ), for ea ch limit ordinal λ ; • ν α +1 relates to ν α in one of the following wa ys: — ν α +1 = ν α [ p : t ] such that for some rule p ← ϕ in P , ϕ ν α = t ; — ν α +1 = ν α [ U : f ] where U is an unfounde d set , i.e., a s et of gro und atoms such that for each p in U , ν α ( p ) = u and for ea ch rule p ← ϕ in P , ϕ ν α +1 = f . A w ell-founded induction is a sequence of increa sing precis ion. W e call a well- founded induction ( ν α ) α ≤ β terminal if it cannot be extended with a strictly more precise in terpr e ta tion. Each w ell- fo unded induction whose limit is a total in terpre- tation is ter minal. W e now define the wel l-founde d mo del of P as the limit of any such ter minal well-founded induction. As the following result shows, this definition coincides with the standar d one. The or em 1 (Denec ker and V ennekens 200 7) Each terminal well-found induction of P co nv erges to the w ell- fo unded mo del of P , as it was defined in (V an Gelder et a l. 19 91). In cer tain logic pro gramming v a riants, such as ab ductive logic pr ogra ms (K ak as et al. 1992) and ID-logic (Deneck er and T ernovsk a 2 007), a distinction is made b etw een pr edi- cates tha t ar e define d by the progra m and predicates that are left op en . The set of defined predicates must contain a t least those predica tes that app ear in the heads of rules of the pro g ram. This distinctio n is s imilar to that b etw een endogeno us and exogeno us r andom v ariables, which is common in probabilistic mo deling. It is straightforward to genera lize the well-founded s emantics to this case. Given an int er pr etation O of the op en pr edicates, we define a wel l-founde d induction of P CP-lo gic: A L anguage of Causal Pr ob abilistic Events 9 in O b y the sa me inductiv e definition as for ordinary well-founded inductions, only we no w hav e as a base case that ν 0 should be the leas t prec ise par tia l Herbra nd int er pr etation that extends O . It is easy to see tha t each ν i in such a w ell-founded induction in O in fact extends O and also that if there are no op en predicates, this definition simply coincides w ith the orig inal one. The wel l-founde d mo del of P in O is then the limit of a n y terminal w ell-founded induction of P in O . 3 A logic of probabili stic causal laws Our goal in this section is to define a lang uage for repr e senting pro ba bilistic causal laws. Befo r e going into the ma thema tical details, we first outline the general picture. T o repr esent knowledge in a logical langua ge, the first thing that is needed is a suitable vo cabulary . Usually in logical mo deling, this voca bulary is assumed to be s uch that each p os sible state of the doma in o f discours e corresp onds to an in- terpretation for it. In Shafer’s probability tr ees, pos s ible states of the domain are represented by no des of the tr ee. T o link these tw o formal settings, o ur seman- tics will therefore consider probability tr ees in whic h each no de corresp onds to an int er pr etation fo r a g iven vocabular y . As we introduced the co ncept in Section 1, a pr obabilistic ca usal law states the c ause and p ossible effe cts of a particula r even t or class of even ts. The cause specifies in whic h nodes o f the tr ee the even t might happ en, i.e, it is some proper ty of the domain of discour se suc h that the even t can happ en in pr ecisely those sta tes of the domain in which this prop er t y holds. The natural thing, therefore, is to repres ent such a cause by a firs t-order formula φ , who se mea ning is that the event might happ en in those states s o f a probability tree for whic h the asso cia ted interpretation I ( s ) is s uch that I ( s ) | = φ . Each even t that happ ens ma kes a transition from a no de s of a pr obability tr ee to one o f the children s ′ of s . The descriptio n o f the e ffects of such an even t should sp ecify how it will affect the state of the domain, i.e., what the in terpretatio ns I ( s ′ ) asso ciated to the c hildren s ′ of s should b e. There are many conceiv able w ays of r epresenting such knowledge, but in this pa pe r w e s tick to a very simply one: we assume tha t each p ossible effect of an event corresp onds to a single gr o und atom P ( t ) of our vocubalar y , suc h that the interpretation I ( s ′ ) co rresp onding to the new sta te s ′ is indentical to the interpretation I ( s ), apar t from the fac t that P ( t ) is no w true. W e c ho ose this simple repr esentation for t wo reaso ns. First, the aim of our exercise is to come up with a semantics that formalise s pr obabilistic causal laws in a way that clearly coincides with our intuitions ab o ut the meaning of such laws. K eeping the r epresentation of effects simple helps to a chieve the desired clarity . Second, we are not just interested in this language for its own sa ke, but also b ecause we w ant to use it to explain the meaning of certain probabilistic lo gic progra mming statements. O ur simple r epresentation of effects will also serve to elucidate this link to logic prog ramming. 10 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe 3.1 Syntax In this section, we formally define the la nguage of CP -logic. Let us fix a finite relational vo cabulary , c o nsisting of a set of predica te symbols a nd a set of constants. W e assume that the pr edicates of our vo cabulary are split into a set of endo genous predicates and a set of exo genous ones. The idea behind this distinction is of co ur se that the endog enous predica tes sho uld describ e thing s that are internal to the causal pro cess b eing modeled, while the exog enous predicates des crib e things external to it. A c ausal pr ob abilistic law , or CP-law for sho rt, is a statement of the form: ∀ x ( A 1 : α 1 ) ∨ · · · ∨ ( A n : α n ) ← φ, (5) where the α i are non-zer o proba bilities with P α i ≤ 1, φ is a first-order formula and the A i are atoms , suc h that the universally quan tified tuple of v ariables x contains all free v a riables in φ a nd the A i . Moreover, the pr edicate symbol o f ea ch of the atoms A i should be an endogenous predicate. Such a CP-law is rea d a s: “F or each x , φ causes an even t whose effect is t h at at most on e of the A i b ecomes t ru e; for eac h i , the probability of A i b eing the effect of this even t is α i .” If the causal law has a determinis tic effect, i.e., it causes some atom A with probability 1, we also write A ← φ instead of ( A : 1) ← φ . W e allow the pr econdition φ to be absent, meaning that the even t is v a cuously caused. In this case, the causal law is called unc onditional and we omit the ‘ ← ’-symbol as well. If the tuple x o f v a riables is empty , w e ca ll the c a usal law gr ound . W e remar k that the prec o ndition φ of such a g round ca usal law may still contain v ariables, a s long as they a re all bo und by some quantifier in φ . A CP-t he ory is a finite multiset 2 of CP-laws. F or now, we will r estrict a tten tion to CP-theories in which all formulas φ are p o sitive, i.e., they do not c o ntain nega tio n. Afterwards, Section 5 will exa mine how neg ation can be added. Example 1 In a b o ut 25 % o f the cases, syphilis causes a neuropsychiatric disorder called gener al p ar esis , and in fact, syphilis is the only cause for pare sis. This can b e mo deled as follows: ( P ar esi s : 0 . 25) ← S y phi l is. (6) where S y phil is is an exogeneous predicate. This example illustrates the difference betw een causa tio n a nd mater ial implication. Indeed, beca use syphilis is the only cause for par esis, obser ving that a patient has paresis implies that he must also hav e syphilis, i.e., the material implication P aresi s ⊃ S y phil i s holds. So, in this example, the directions of ca usation and mater ial implication are precisely opp osite. 2 Example 3 explains why we consider m ultisets instead of sets. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 11 Example 2 Our running exa mple for this section will also b e a medical example. Pneumonia might cause angina with pro bability 0 . 2. Vice v ersa , angina might ca use pneumo- nia with probability 0 . 3. A ba cterial infection can cause either pneumonia (with probability 0 . 4) or angina (with proba bility 0 . 1). W e consider bacterial infection as exogeneous . ( Ang ina : 0 . 2) ← P n eumonia. (7) ( P ne umonia : 0 . 3) ← Ang ina. (8) ( P ne umonia : 0 . 4) ∨ ( Ang ina : 0 . 1) ← I nf ection. (9) Example 3 A CP -theory is a m u ltiset of CP-laws, which means that it may contain several in- stances of the same even t. T o illustrate this, consider a v aria nt o f the ab ov e problem in which the patient comes in to contact with tw o differen t sources of infection, eac h of whic h might c ause him to beco me infected with a probability of 0 . 1. T o model this, we can add the following m ultiset o f tw o unconditio na l even ts to the theory of Example 2: ( I nf e ction : 0 . 1) . (10) ( I nf e ction : 0 . 1) . (11) W e now define so me notation to refer to different co mp one nts of a gr ound CP-law. The head he ad ( r ) of a rule r of form (5) is the s et of all pairs ( A i , α i ) a ppe a ring in the description o f the effects of the even t; the b o dy of r , body ( r ), is its precondition φ . By head At ( r ) w e denote the set of all atoms A i for whic h there exists an α i such that ( A i , α i ) ∈ head ( r ). Similarly , b y body At ( r ) w e will denote the set of all atoms A which “app ear” 3 in body ( r ). F o r the a b ov e E xample 2, if r is the CP-law (9), then head ( r ) = { ( P neumonia, 0 . 4) , ( Ang i na, 0 . 1) } , head At = { P neumonia, Angi na } , body ( r ) = I nf ecti on a nd body At ( r ) = { I nf ecti on } . W e will call a CP-law E ← φ a rule if we wan t to emphasiz e that we a re referring to a syn tactica l construct. 3.2 Semantics This section defines the fo rmal semantics of CP-logic. W e will restr ict attention to Herbr and interpretations, i.e., we consider only interpretations whos e domain is the set of constants of the theory a nd which interpret each constant as itself. This 3 More formally , we use body At ( r ) to denote At ( body ( r )), where At is the mapping from sente nces to s ets of ground atoms, tha t is inductiv ely defined b y: • F or Q ( t ) a ground at om, At ( Q ( t )) = { Q ( t ) } ; • F or φ ◦ ψ , with ◦ either ∨ or ∧ , At ( φ ◦ ψ ) is define d as At ( φ ) ∪ At ( ψ ); • F or ¬ φ , At ( ¬ φ ) = At ( φ ); • F or Θ x φ , with Θ either ∀ or ∃ , At (Θ x φ ) = ∪ t ∈ H U (Σ) At ( φ [ x/t ]), where H U (Σ) denote s the Herbrand universe for the v o cabulary Σ. 12 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe restriction is made for t wo r easons: it simplifies the presentation, and it is a lso wha t is usually done in (pr o babilistic) logic pro gramming. How ever, it is e a sy to extend all our definitions and results to arbitra ry do mains. W e vie w a non-ground CP-law ∀ x r as an abbre viation for the set of all gro und CP-laws r [ x / t ] that result from replacing the v ariables x by a tuple t of gro und terms in voca bulary Σ. F or instance, if we w anted to consider multiple peo ple in Example 2, w e might include constants { J ohn, M ar y } in our vocabula ry Σ and write the non-gro und rule ∀ x ( Ang ina ( x ) : 0 . 2) ← P neumonia ( x ) , to abbreviate the t wo CP -laws ( Ang ina ( J ohn ) : 0 . 2) ← P neum onia ( J ohn ) . ( Ang ina ( M ar y ) : 0 . 2 ) ← P neumoni a ( M ar y ) . Because CP-theorie s are finite, the use of such abbreviations only makes sense in the context of a finite domain, i.e., when the vo cabulary do es not genera te an infinite num b er of terms. By restricting a tten tion to finite relational v o ca bularies, we ensure that this is the case . In our formal treatment o f CP-lo g ic, we will never co nsider non-gr ound rules, but alwa ys assume tha t these hav e a lready b een expanded int o a finite set of ground CP-laws. When using such non-g r ound rules in examples, we will implicitly assume that predicates and constants hav e been appropr iately typed, in such a w ay as to av oid instantiations that are o bviously not in tended. W e will a lso allow ourselves to use arithmetic function sy m b ols, such as +/2 and -/2 , and assume that the grounding r eplaces terms made fro m these symbols by numerical c onstants in the appropria te wa y . As a lready explained, our basic sema ntical ob ject will b e that of a pro ba bilit y tree in which the no des cor resp ond to interpretations. Definition 2 ( pr ob abilistic Σ -pr o c ess ) Let Σ be a voc a bulary . A pr ob abilistic Σ -pr o c ess T is a pair h T ; I i , where: • T is a tree str ucture, in which each e dge is lab eled with a probability , such that fo r every non-lea f no de s , the probabilities of all edges leaving s sum up to precisely 1; • I is a mapping from no des of T to Her brand interpretations for Σ. In a proba bilit y tree, we can asso ciate to each no de s the probability P ( s ) of a random walk in the tree, sta rting fro m its ro ot, passing through s . Indeed, for the ro ot ⊥ of the tree, P ( ⊥ ) = 1 and for each o ther node s , P ( s ) = Q i α i where the α i are all the probabilities asso c iated to edges on the path from the ⊥ to s . Essentially , the mapping P con tains all the information that is pres ent in the lab eling o f the edges and vice versa. T o ease notation, we will sometimes take the liber ty of iden tifying a pro babilistic Σ- pro cess h T ; I i with the triple h T ; I ; P i and ignoring the lab els on the edges of T . Each pr obabilistic Σ-pr o cess now induces an o bvious proba bility dis tribution over the states in which the domain describ ed by Σ might end up. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 13 Notation: I ( s ) α 1 { { w w w α n # # G G G E ( s )   I ( s 1 ) ··· I ( s n ) { Inf } 0 . 4 x x p p p p p p p p p p p 0 . 5 ' ' N N N N N N N N N N N N 0 . 1 event (9)   { Inf,Pn } 0 . 2 x x q q q q q q q q q q q 0 . 8 event (7)   { Inf, Ang } 0 . 3 event (8)   0 . 7 & & N N N N N N N N N N N { Inf } 0.5 { Inf, Pn, A ng } 0.08 { Inf, Pn } 0.32 { Inf, Ang, Pn } 0.03 { Inf, Ang } 0.07 I { Inf,Pn,Ang } { In f,Pn } { Inf,Ang } { Inf } π T ( I ) 0.11 0.32 0.07 0.5 Fig. 4. A pr o cess T for Example 2 and its distribution π T . Definition 3 ( π T ) Let Σ b e a vo cabulary a nd T = h T ; I ; P i a probabilistic Σ-pro cess. By π T we denote the pr obability dis tribution that assigns to eac h Herbr and interpretation I of Σ the probability P s ∈ L T ( I ) P ( s ), where L T ( I ) is the set of all leaves s of T for which I ( s ) = I . Like an y probability distribution ov er interpretations, such a π T also defines a set of p ossible worlds , na mely that consisting of all I for which π T ( I ) > 0 . If all the probabilities P ( s ) ar e non-z ero, then this is simply the set of a ll I ( l ) for whic h l is a leaf of T . W e now wan t to r elate the tr ansitions in such a pro babilistic Σ-pro cess to the even ts descr ibe d by a CP-theor y . Definition 4 ( rules firing ) Let Σ b e a vocabulary , C a CP-theory in this voca bulary and T a proba bilistic Σ-pro cess. Let r ∈ C b e a CP -law of the form: ( A 1 : α 1 ) ∨ · · · ∨ ( A n : α n ) ← φ. W e say that r fir es in a no de s of T if s has n + 1 ch ildre n s 1 , . . . , s n +1 , s uch that: • F or all 1 ≤ i ≤ n , I ( s i ) = I ( s ) ∪ { A i } and the pro bability of edge ( s, s i ) is α i ; • F or s n +1 , I ( s n +1 ) = I ( s ) and the proba bilit y o f the edge ( s, s n +1 ) is 1 − P i α i . F or simplicit y , we will omit edges lab eled with a proba bilit y of zero; this do e s not affect an y of the fo llowing ma terial. This definition now allows us to link the tra nsitions in a probabilistic Σ-pro ce s s T to the even ts o f a CP-theor y C . F or mally , we will consider a mapping E from each no n-leaf node s of T to an asso ciated CP- law r ∈ C . Because the same ground CP-law s ho uld fire at most once in ea ch br a nch, the following definition will a lso consider, for a node s , the set of all CP-laws that ha ve not yet fired in s , i.e., the set of all r ∈ C for whic h there do e s not exist an ancestor s ′ of s s uch that E ( s ′ ) = r . W e will denote this set as R E ( s ). Definition 5 ( exe cution mo del–p ositive c ase ) 14 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe Let C b e a p ositive CP-theory a nd X an in terpr etation o f the exog enous predicates. A probabilis tic Σ-pro c e ss T = h T ; I i is an exe cut ion mo del of C in context X , written T | = X C , iff ther e exists a mapping E from the non-leaf node s of T to C , such that: • F or the ro ot ⊥ of T , I ( ⊥ ) = X ; • In each non-leaf no de s , a CP-law E ( s ) ∈ R E ( s ) fires, such that its prec o ndi- tion is satisfied in s , i.e., I ( s ) | = body ( E ( s )); • F or each lea f l of T , there a r e no CP-laws r ∈ R E ( l ) for which I ( l ) | = body ( r ). If there are no exogenous predicates, we simply write T | = C . Example 2 has o ne executio n mo del for every sp ecific context X ; the pro cess for X = { I nf ected } is depicted in Figur e 4. As w e show ed with the window breaking example in Section 1, there also ex is t tho eries which allow m ultiple e x ecution mo dels for a given cont ext. How ever, a ll of these e xecution mo dels must then generate the same probability distribution over their final states. The or em 2 ( Uniqueness—p ositive c ase ) Let C b e a p o sitive CP -theory . If T 1 and T 2 are b oth ex e c ution mo dels of C , then π T 1 = π T 2 . Pr o of Pro of of this theorem can b e found in Section A.2. This theorem s hows that knowing all the probabilis tic ca usal laws of a domain gives enoug h information to predict a sing le probability distributio n ov er the final states that this domain might reach. This r esult is impo rtant for tw o reaso ns. First, it sug g ests an app ealing expla nation for why causality is such a useful and impo rtant concept: causa l infor mation tells you just enough ab out the b ehaviour of a probabilistic pr o cess to be able to predict its final outco me in every possible context, while allowing irrelev ant details to be ignor ed. As such, it offers a compact and ro bust repres entation of the class of probability distributions tha t can result from such a pro cess . Second, in o ur co nstruction of CP-logic, we hav e started fro m Shafer’s dynamic analy s is of caus ality , using the probability tree as a basic sema nt ic ob ject. In this resp ect, our appro ach differs from that of c a usal Bay esian netw or ks (Pearl 2000), in which c ausal information is viewed more statically , with pr obabil- it y distributions a s ba s ic semantical ob jects. The ab ove theor em rela tes these t wo views, b ecaus e it allows us to not only vie w a CP-theo ry as describing a clas s of pro cesses, but also as defining a unique proba bilit y distribution. Definition 6 ( π X C ) Let C b e a CP-theory and X a n interpretation for the exogenous predicates of C . By π X C , we denote the unique pro bability distr ibution π T , for which T | = X C . If there are no exogenous predicates, we simply write π C . A CP- theory can b e viewed a s mapping each interpretation for the exo genous predicates to a pro bability dis tribution ov er in ter pr etations of the endog enous pr ed- icates or, to put it another wa y , as a conditional distribution over interpretations of the endog e nous predicates, given an in terpre tation for the exogenous predicates. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 15 Definition 7 ( mo dels of a CP-the ory ) Let C b e a CP- theory and π a probability distribution ov er int er pretations of all the predicates of C . π is a mo del o f C , denoted π | = C iff fo r each in terpretation X of the exog enous predicates with π ( X ) > 0 a nd each interpretation J of the endogenous predicates, π ( J | X ) = π X C ( J ). If a CP-theo ry C has no e x ogenous predicates, then there is a unique π for which π | = C and this is, o f c o urse, simply the distribution π C . Having defined this formal sema ntics for CP -logic, it is na tural to a sk how the causal in terpr etation that w e hav e informally attached to expr essions in o ur lan- guage is reflec ted in it. W e see that our semantics essentially consists of the following t wo causa l principles: • The principle of universal c ausation states that all changes to the state of the domain m ust b e tr iggered by a caus a l law who se precondition is satisfied. • The principle of sufficient c ausation states that if the precondition to a causa l law is satis fie d, then the event that it trigger s must even tually happ en. T o g ether with our decisio n to us e Shafer’s probability trees as our basic seman ti- cal ob jects and the particular r epresentation that w e have chosen for proba bilistic causal laws, thes e t wo princ iples essentially determine our lo gic c ompletely—at least, in the p os itive case. In the following sections, w e turn our attention to the question of how to extend our definitions to the case where negatio n can app ear in the precondition of a CP- law. Howev er, this re quires us to first discuss in more detail a particular mo deling metho dology for CP - logic. 4 Mode ling more complex pro cesses in CP-lo gic In the forma l semantics o f C P -logic, the interpretations I ( s ) asso cia ted to no des s of the pro ba bility trees pla y an impo r tant role. Indeed, if w e forget for a moment the restriction that ea ch causal law ca n fire at most o nc e in e ach branch, then the int er pr etation I ( s ) completely deter mines which o f the causal laws can fire in s . In our account of CP -logic so far, we have suggested that the log ical vo cabulary Σ of a CP-theory b e chosen in such a wa y that p ossible states of the domain of discourse c o rresp ond to (Herbrand) interpretations for Σ. How ever, this ass umption restricts the kind o f causal laws that c an b e repre s ented in at least tw o differen t, but related, wa ys. First, it mea ns that we can only descr ib e even ts that are caused by so me pro per ty o f the curr ent state s . In particular , it is not p ossible to say that an event is caused by so mething which happe ned pr eviously , but no longer has any visible effect on the current state. Second, since the interpretations I ( s ) grow monotonically througho ut a branch, i.e., no a toms are e ver remov ed from such an interpretation, it also mea ns that w e a ctually cannot even descr ib e even ts whose effects manifest themselves in so me state, but then disa ppe a r again in a future state. The follo wing example illustrates these limitations of the v ie w that an int er pr etation I ( s ) repr esents precisely the state of the domain at no de s . 16 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe Example 4 In 10% of the cases, pneumonia causes p ermanent lung damage, which p er sists after the pneumonia itse lf has disapp ea red. Let us a lso a ssume that the pro bability of getting pneumonia is 0 . 3. One attempt to mo del this is a s follows: ( Lung D amag e : 0 . 1) ← P neu monia. (12) ( P ne umonia : 0 . 3) . (13) The problem with this theory is tha t, under the natur al in terpr e tation of the pr ed- icates, it vio lates the as sumptions ma de by CP -logic: after pneumonia has b een caused a nd has in turn lea d to p er manent lung damag e, it might go aw ay again. As suc h, it is no surpr is e tha t, a ccording to the formal semantics of this theor y , the probability of a patient having p ermanent lung damage and no pneumonia is zero, while in reality , this situation is p erfectly p oss ible. There is how ever a simple solution to this pro blem—at lea s t, if we are prepar ed to refine our informa l interpretation of the ato m P neumonia . Instead of interpreting this atom a s repres entin g the real-world prop erty that “ the patient has pneumonia ” , we can also in terpret it as representing the prop erty that “at some p oint in time, the patient has had pneumonia”. It is obvious tha t this now is a prop erty that, once initiated, will for e ver per sist. The CP- law (1 2) now r eads as: “if the patient has, a t some p oint, had pneumonia, then this causes him to hav e lung damage with probabilit y 0.1.” According to this r e ading, it is now o nly the case that it is impo ssible for a patient to ha ve lung damage if he has not at some p oint in time had pneumonia, which is of course a conclusio n that should follow fr om our problem statement. T o fix this example, w e had to subtly c hang e the cor resp ondence b etw een the states of the formal e x ecution mo del and the s tates of the real world: wherea s previously , each of our for mal states precisely matched one state of the real world, it is now the ca se that a formal state actually r epresents bo th the state of the w o r ld at some pa rticular time and also ce r tain information about the history of the w orld up to that time. T a king this idea further actually allows us to des crib e pro cess es in considerable tempo ral detail, as the following ex ample illustrates. Example 5 A patient is admitted to hospital with pneumonia a nd stays there for a num ber of days. Each day , the pneumonia might ca use him to suffer chest pain on that par tic- ular day with probability 0 . 6. With probability 0 . 8, a patient who has pneumonia on one day still has pneumonia the next day . On the one hand, this example descr ib es a progres sion throug h a sequence of days. On the other hand, it also descr ibe s even ts that takes place en tirely dur ing one particular day . In general, a pro cess of this kind will loo k s omething lik e Figure 5: the global structur e of the pro cess is a succession b etw een differen t time p oints and, at each particular time p oint, a loca l pro cess migh t take place. The question now is how to model such a succession of states in CP-logic. A first impo rtant observ ation is that we no w need to dis ting uish be tw een the v a lues of CP-lo gic: A L anguage of Causal Pr ob abilistic Events 17 · · · Day 1 Day 2 Fig. 5. A g lobal pro cess as a sequence of lo cal pro c e s ses. prop erties at different time po ints, i.e., we ca n no longer re present every relev a n t prop erty by a single gr ound a tom, but instead we nee d a g round atom for every pa ir of such a pro p e r ty and a time point. Typically , one w ould construct a voca bula ry by a dding time as an argument to predicates , as is done in, e.g., the even t ca lculus or situa tio n calculus. F or instance, to describ e E xample 5, we could construct a vocabular y which has the following g round atoms: • Referring to day 1: { P neumonia (1) , C hestpai n (1) } ; • Referring to day 2: { P neumonia (2) , C hestpai n (2) } ; • . . . • Referring to day n : { P neumoni a ( n ) , C hestpain ( n ) } ; Of course, it mig ht be eq ually p o s sible to use some other r epresentation, such as P ne umonia ( S ec ondDa y ) or P neumonia 2 instead of P ne umonia (2). With the ab ov e voca bulary , we can now mo del Example 5. W e assume a fixed r ange 1 ..n of days, to ensure finiteness of the gr ounded theory . P n eumonia (1) . (14) ∀ d ( P ne umonia ( d + 1) : 0 . 8 ) ← P ne umonia ( d ) . (15) ∀ d ( C hestpa in ( d ) : 0 . 6) ← P neumonia ( d ) . (16) Here, the CP-laws describ ed by (15) are of the kind that pro pagate fro m one time po int to a later time point, whereas (16) describ es a class of “ instantaneous” even ts, taking place en tire ly inside o f a single time po int. Of co urse, whether a particular even t is instantaneous depends grea tly o n whic h unit of time is b eing used: one can imagine tha t it makes a difference whether we measure time in s econds o r in days. According to the informal desc r iption of Example 5, the intended model is the pro cess shown in Figure 6 . It can easily be seen that this is indeed an execution mo del o f the ab ov e CP -theory . W e remark that this theory also has other execution mo dels, which do not respect the prop er order ing of time po ints, such as, e.g ., the proces s in which all ev ents ca used b y (15) happ en b efore those caused b y (16). Howev er, since these “wrong” pro ces ses all generate the same probability distribution as the intend ed pr o cess anyw ay , this is harmless. W e als o observe that, again, the corr esp ondence betw een the states of the execu- tion mo del and the states of the real world is less direct than it was in the exa mples of Section 3.2. Indeed, now, a state o f an execution model contains a tr a ce of the ent ir e evolution of the real w o r ld unt il a certain p o in t in time. As s uch, a leaf of the execution mo del now repre s ent s a complete history of the world, wher eas in the examples of Section 3.2, it only repr esented the final state of the pro cess. Let us now make the ab ove discussion more formal. W e assume that, when con- structing the vocabular y Σ, we had in mind some function λ from the Herbra nd base of Σ to an interv al [0 ..n ] ⊆ N , suc h that, in o ur des ired interpretation of this 18 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe {} 1 E 1   Even ts: E 1:( P n (1):1) . E 2:( C p (1):0 . 6) ← P n (1) . E 3:( P n (2):0 . 8) ← P n (1) . E 4:( C p (2):0 . 6) ← P n (2) . E 5:( P n (3):0 . 8) ← P n (2) . · · · { P n (1) } 0 . 6 v v l l l l l l l l l l l l l 0 . 4 ( ( P P P P P P P P P P P P E 2 { P n (1) ,C p (1) } 0 . 8 E 3   0 . 2 ( ( R R R R R R R R R R R R R { P n (1) } 0 . 8 E 3   0 . 2 & & N N N N N N N N N N N { P n (1) ,C p (1) ,P n (2) } 0 . 6 E 4   0 . 4 ( ( R R R R R R R R R R R R R { P n (1) ,C p (1) } { P n (1) ,P n (2) } 0 . 6 E 4   0 . 4 & & N N N N N N N N N N N { P n (1) } { P n (1) ,C p (1) ,P n (2) ,C p (2) } { P n (1) ,C p (1) ,P n (2) } { P n (1) ,P n (2) ,C p (2) } { P n (1) ,P n (2) } ··· ··· ··· ··· Fig. 6. Initial segment of the intended mo del of E xample 5. vocabular y , ea ch atom p refers to the state o f so me pr op erty at time po in t λ ( p ). W e ca ll such a function a timing and λ ( p ) the time of atom p . In the t ypical case of predicates containing a n explicit temp oral a r gument, such a timing would simply map atoms o nt o this argument; for instance , in the cas e of the abov e example, w e had in mind the following timing λ : • F or each g round atom P neumonia ( i ), λ ( P neumoni a ( i )) = i ; • F or each g round atom C he stpain ( i ), λ ( C hestpain ( i )) = i . If we no w lo ok again at the CP-laws w e wrote for this ex ample, we observe that, whenever there is an ato m in the hea d of a CP -law r that refers to the tr uth of so me prop erty at time i and an atom in the b o dy of r that refer s to the truth of some prop erty at time j , it is the case that i ≥ j . This is of course no t a co incidence. Indeed, bec a use, in the r eal world, causes precede effects, it should b e imp ossible that the ca use-effect pro pagation describ ed by a CP- law g o es backw ar ds in time. Note that it is also p ossible that i = j ; in this case, the CP-law is instantaneous w.r.t. the gra nularity o f time that is b eing used, i.e., it descr ibe s one of those even ts (such as (15 ) in Ex a mple 5) that takes place entirely within a single time po int . Another, p erhaps mor e illustrative, example of an instantaneous CP -law is the statement that a n increase in the cur rent flowing thro ug h a resistor causes an increase in the voltage drop across it. Here, the increased cur rent c onc eptual ly precedes the increased voltage drop, but we would nev er exp ect to actually observe a tempor al delay . Definition 8 ( r esp e cting a timing ) Let Σ b e a vo cabulary . A CP-theor y C resp ects a timing λ iff, for every r ∈ C , if h ∈ head At ( r ) and b ∈ body At ( r ), then λ ( h ) ≥ λ ( b ). CP-lo gic: A L anguage of Causal Pr ob abilistic Events 19 Such a timing λ also contains infor mation ab out when even ts might happ en. T o be more co ncrete, if a CP-law r fir es at time p o int i , then we would exp ect i to lie somewhere betw een the maximum of all λ ( b ) for which b ∈ body At ( r ), and the minim um of all λ ( h ) for which h ∈ head At ( r ). F or a rule r , we write t λ ( r ) to denote this interv al, i.e., t λ ( r ) = [ max b ∈ body At ( r ) λ ( b ) , min h ∈ head At ( r ) λ ( h )] . Now, if w e ar e constructing a CP-theory with a particular timing λ in mind, then the pr o cess we ar e try ing to mo del should b e such that every CP-law r that actually fires do es so a t some time p oint κ ( r ) ∈ t λ ( r ). W e will ca ll such a mapping κ from rule r ∈ C to time p oints κ ( r ) ∈ t λ ( r ) an event-timing of λ . W e r emark that if a CP-law r is instan taneo us, then the interv a l t λ ( r ) will cons ist of a sing le time point and it is indeed clearly at this time p oint that the CP-law should fir e. A timing λ therefor e impose s the following constraint on which pro cesses can be considered reasona ble. Definition 9 ( fol lowing a timing ) Let Σ be a voc a bulary with timing λ a nd let C be a CP- theory that r esp ects λ . A probabilistic Σ-pro cess T fol lows λ if there exists a n event-timing κ o f λ s uch that the CP -laws of T fir e in the o rder impo sed b y κ , i.e., if for all successo rs s ′ of a no de s , κ ( E ( s ′ )) ≥ κ ( E ( s )). It can now be shown that for an y timing λ and any CP-theory C res pe c ting λ , C will hav e an e xecution mo del that follows λ . The or em 3 Let C b e a CP-theor y r esp ecting a timing λ . Ther e exists an exec utio n mo del T of C , such tha t T follo ws λ . Pr o of Pro of of this theorem can b e found in Section A.3. This result shows that if we construct a CP - theory C with a particular timing in mind, then C will ha ve an e xecution mode l in whic h the even ts happen in pr ecisely the order dictated by this timing. Therefore, the mo de ling metho dology that we hav e suggested in this section is indeed v alid. In the ca se of Example 5, the pro cess shown in Fig ur e 6 is an execution mo del that follows the timing λ sp ecified ab ove. In the sequel, we will refer to CP-theorie s , for whose vocabular y we hav e some int ended timing in mind, as temp or al CP-the ories ; other CP-theorie s will be called atemp or al . 5 CP-logic with negation So far , we have only allowed positive formulas as preconditions of CP-laws. In this section, w e ex amine whether it is p ossible to r elax this requirement. W e first lo o k at a small example. 20 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe { P n } 0 . 95 | | z z z z z z z z 0 . 05   ? ? ? ? ? ? ? (18) { P n, T r } { P n } 0 . 7          0 . 3 (17)   { P n, F v r } { P n } (a) Sequence (18)–(17). { P n } 0 . 7 (17)   0 . 3 " " D D D D D D D D { P n, F v r } 0 . 95 z z u u u u u u u u u 0 . 05 (18)   { P n } 0 . 95 (18)   0 . 05   = = = = = = = { P n, F v r, T r } { P n, F v r } { P n, T r } { P n } (b) Sequence (17)– (18). Fig. 7. Tw o pro ces s es for E x ample 6. Example 6 Having pneumonia ca us es a patien t to r eceive treatment with probabilit y 0 . 95. Un trea ted pneumo nia causes fever with pro bability 0 . 7. ( F ev er : 0 . 7) ← P neu monia ∧ ¬ T reatme nt. (17) ( T reatme nt : 0 . 95) ← P neum onia. (18) Figure 7 shows tw o pro cesse s for this example that satisfy all the requir e men ts that we previously imp osed for po s itive theories. It is obvious, how ever, that in this case the final outcome is affected by the or der in which even ts o ccur. So, simply including negation in this naive way would give rise to a mbiguities, c ausing our desirable uniqueness prop er t y (Theorem 2) to b e lost. Giving up the uniqueness proper ty w ould hav e gr av e consequences for the logic and its practica l use. One radical so lution to the problem might b e to force the user to not o nly sp ecify causal probabilistic even ts, but als o information ab out the order in which these even ts can happ en. Howev er, such informa tion is difficult to obtain and represent; mor eov er, in many cases, it would just b e useless ov erhe a d— indeed, a s we ha ve alrea dy seen, one of the most interesting features of CP-log ic without neg a tion is pr ecisely the fact that we can obtain a complete pro bability distribution without requiring s uch infor mation. The solutio n that we will ado pt instead is to restrict the class of pro cesses asso ciated to a CP-theory in such a wa y that the uniquenes s prop erty is preserved, i.e., all pro ce s ses from this res tricted class generate the same distribution ov er the final states. T o introduce the additional constr a int that will b e imp osed o n execution mo d- els, let us take a closer lo ok at the ab ove ex ample. W e observe that, in pro cess 7(b), even t (17) is caused at a moment when its pr econdition is not yet in its final state. In pa rticular, when (17) happ ens in the initial state, its precondtion ¬ T reatme nt ho lds , but later on, for instance in the leftmost br a nch, even t (18) causes T reatment , thereby falsifying this precondition. So, in the final state of this branch, we see that F ev er holds, while the precondition of the CP-law that ca used it no longer holds. In light o f this discussio n, we can now explain the additiona l a ssumption that CP-logic ma kes a b out the caus al pro cesses . This assumption, ca lled the temp or al pr e c e denc e assumption , is that a CP-law r will not fire until its preco ndition is in its fina l state. Mor e precisely , it ca nnot fire u ntil the p art of the pr o c ess that CP-lo gic: A L anguage of Causal Pr ob abilistic Events 21 determines whethe r its pr e c ondition holds is ful ly fi nishe d . F or Example 6, it is clear that o nly the pr o cess in Figure 7(a) sa tisfies this assumption, and so, in this case, the am biguity has been r esolved. W e s tress her e that temp ora l precedence is nothing mor e than an assumption : inherently , there is nothing wrong with the causal proces s in Figure 7(b), and w e could in fact easily imagine that, because fever is one of the ear lie s t symptoms o f pneumonia, this pro c ess is actually a b etter mo del o f the rea l w o r ld than that in Figure 7(a). So why do we c ho ose to eliminate pr ecisely these pro ces ses, in or der to regain our uniqueness result? T o explain our motiv ation for this, w e need to go back to the analysis of Section 4. There, we considered time d v o ca bularies, in which gr ound atoms a r e int ended to represent prop erties at some particula r p oint in time. W e then proved that each tempo ral theory without negation ha s an ex ecution mo del that follows its timing, i.e., in which even ts happ en in the right order. As we remarked, such a theory may also hav e other execution mo dels , in which even ts happen in the wrong or der , but this is no t a pro blem, b ecause all exec utio n mo dels of a p ositive theory genera te the same pr obability distribution anyw ay . F o r theories w ith negation, how ever, the situation is more complica ted. In that cas e, we ca n hav e thre e different kinds of execution mo dels : those which follow the timing; those which do not follow the timing, but nevertheless generate the same pr obability dis tribution a s the ones that do; a nd those which do not follow the timing a nd als o g enerate a different probability distribution. The right wa y to resolve the ambiguit y for these theories is obviously to reject this last kind of ex ecution mode l. As we will pr ov e in Section 6.3, temp or al prec edence will do precisely this—at least, if the CP-laws containing negation are not instantaneous. Intuit ively , this ca n be explained as fo llows. F or such a non-instantaneous CP-law, the timings of the atoms in its pr econdition are strictly ea rlier than those o f its effects. Ther efore, in a pro ces s which follows this timing, all even ts which cause one of thes e a toms m ust happen b efore the CP -law itself fires. This is now prec isely what temp or al precedence ass umes. The following example is a v ariant—or rather, a refinement — of Example 6, which illustrates this. Example 7 A patient enters the ho spital, p ossibly suffering fro m pneumonia. A t this time, he will b e ex amined by a physician, who will decide to treat the patient with probability 0 . 95 if he actua lly ha s pneumonia. If the patien t has pneumonia but the do ctor do es not trea t him, there is a probability of 0 . 7 that the patient will exhibit a fever by the next morning. W e intro duce the following pro po sitions: • P neu monia : “ the patient has pneumonia when e n tering the hospital” ; • T reatment : “ the patien t is treated up on admis s ion”; • F ev er : “the patien t has a fever the next mo rning”. Under this in terpre tation o f our v o cabular y , the CP-theor y of Example 6 respects the timing and cor rectly mo dels the example. Clearly , the intended mo del of the theory is now that of Figure 7(a): in this mo del, CP-laws fire in the r ight temp or al 22 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe order. The pro cess o f Fig ur e 7(b), on the other hand, go es agains t the flow of time (“treatment up on admission” is only caused after “fev er the next mor ning”), whic h should be impossible. So, as this ex a mple illustrates, if o ur CP-theory resp ects some intended timing such that the CP -laws co nt aining negatio n ar e non-ins ta nt aneo us, the temp oral precedence assumption will resolve the ambiguit y in the r ight wa y , i.e., by selecting precisely those pro c esses that follow the intended timing. W e will now formally define temp oral precedence and prove afterwards, in Section 6.3 that this pr op erty holds in general. W e s tart by introducing s ome mathematical machinery . The basic idea is tha t a C P -law should only fire after a ll even ts that might still affect the tr uth of its precondition hav e already happ ened, i.e., this precondition should not mer e ly b e curr ently true, but s ho uld in fact already b e guaranteed to also remain true in a ll po tent ial futu r e s ta tes. This na turally leads to a three-v alued logic, where we hav e truth v alues t (guaranteed to remain true), f (guaranteed to remain false), and u (still sub ject to change). Recall that a three- v a lued interpretation ν is a mapping from the ground atoms of our vo cabulary to the set o f truth v a lues { t , f , u } , which induces for each formula φ a truth v alue φ ν . Now, if our probabilistic pr o cess is in a s tate s , then the a toms of whic h we are already sure that they a re true ar e precisely those in I ( s ). T o figur e out which atoms are still unknown, we need to lo ok at which CP-laws mig ht s till fire, i.e., at those r ules r , for which b ody ( r ) ν 6 = f . Whenever we find suc h a rule, we know that the atoms in head ( r ) might all still b e caused and, as suc h, they m ust b e at least unknown. W e will now lo ok at a deriv ation se quence, in which we start by assuming that everything that is currently not t is f and then g radually build up the se t of unknown atoms by applying this pr inciple. Definition 10 ( hyp othetic al derivation se quen c e ) A hyp othetic al derivation se quenc e in a no de s is a se q uence ( ν i ) 0 ≤ i ≤ n of three- v a lued interpretations tha t satisfied the following pro per ties. Initially , ν 0 assigns f to all atoms not in I ( s ). F or each i > 0, ther e must b e a rule r with body ( r ) ν i 6 = f , such that, for all p ∈ head At ( r ) with ν i ( p ) = f , it is the case that ν i +1 ( p ) = u , while for all other atoms p , ν i +1 ( p ) = ν i ( p ). Such a sequenc e is terminal if it c a nnot be extended. A cruc ia l pro p erty is now that a ll such sequences r e a ch the same limit. The or em 4 Every terminal hypothetical deriv ation sequence reaches the same limit, i.e., if ( ν i ) 0 ≤ i ≤ n and ( ν ′ i ) 0 ≤ i ≤ m are suc h s e quences, then ν n = ν ′ m . Pr o of Pro of of this theorem is given in Sectio n A.1. F or a state s in a probabilistic pr o cess, we will denote this unique limit a s ν s and refer to it as the p otent ial in s . Such a ν s now provides us with an estimate CP-lo gic: A L anguage of Causal Pr ob abilistic Events 23 of whic h a to ms migh t still b e cause d, g iven that we ar e already in state s . W e c a n now tell whether the part of the pro cess that determines the truth of a formula φ has already finished by lo oking at ν s ; indeed, we can consider this pr o cess to be finished iff φ ν s 6 = u . W e now extend the concept of an execution model to arbitrary CP-theories as follows. Definition 11 ( exe cution m o del—gener al c ase ) Let C be a CP-theory in vocabular y Σ , T a pr obabilistic Σ-pro cess, and X an int er pr etation o f the exog enous predicates of C . T is an exe cution mo del of C in context X iff • T satisfies the conditions of Definition 5 (execution mo del– p o sitive case); • F or every no de s , b ody ( E ( s )) ν s 6 = u , with ν s the p o tent ia l in s . F ro m now on, we will r e fer to a probabilistic Σ-pr o cess that satisfies the original conditions o f Definition 5, but not necessarily the additional condition impo sed ab ov e, a s a we ak exe cution mo del . In the case o f Example 6, this definition indeed gives us the result describ ed ab ov e, i.e., the pro cess in Figure 7(a) is a n execution model of the example, while the one in Fig ur e 7(b) is not. Indeed, if we lo ok at the ro o t ⊥ o f this tree, w ith I ( ⊥ ) = { P neumonia } , we see that we ca n construct the fo llowing ter mina l hypothetical deriv ation sequence: • ν 0 assigns f to T r eatment and F ev e r ; • ν 1 assigns u to T reatm ent ; • ν 2 assigns u to F e v er , b ecause ( ¬ T r eatment ∧ P ne umonia ) ν 1 = u ; As such, the only CP-law that can initially fire is the one by which the patient might receive treatment. Afterwards, in every descendant s of ⊥ , ν s ( T reatme nt ) will b e either t or f . In the branch where it is f , the even t b y which the patient gets fever b ecause of untreated pneumonia will s ubs equently happ en. The tempora l precedence assumption imp oses a constra int on the order in which CP-laws can fire and hence, on the order in which atoms c a n be caused to b ecome true. In the case of Example 6, this order is fixed and can easily be derived fro m the syntactical structure o f the CP-theor y . This is not always the case. As the follo wing example illustrates, the o rder of event s may dep end on the con text in whic h they happ en. Example 8 A so ft ware system cons ists of t wo servers that pr ovide identical service s . One s erver acts as master a nd the o ther as s lave, and these roles are assig ne d r andomly . Clients can request servic e s. The master makes a selection among these request and the 24 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe slav e fulfills the r e quests that a re not accepted by the master . ( M aste r ( S 1 ) : 0 . 5) ∨ ( S l av e ( S 1 ) : 0 . 5) . (19) M as ter ( S 2) ← S lav e ( S 1) . (20) S l av e ( S 2) ← M aste r ( S 1 ) . (21) ∀ x ∀ s ( Accepts ( x, s ) : 0 . 6) ← App l ication ( s ) ∧ M aste r ( x ) . (22) ∀ x ∀ s Accepts ( x, s ) ← Appl ication ( s ) ∧ S l av e ( x ) ∧ M as ter ( y ) ∧ ¬ Acce pts ( y , s ) . (23) In a ll causa l pro ces ses that satisfy the temp or al precedence as sumption, the master accepts se rvices befo r e the slav e do es. How ever, because who is slave and who is master dep ends on the r esult of even t (1 9), this means that we cannot say up- front which of the a toms Accepts ( S 1 , s ) and Accepts ( S 2 , s ) will b e caused first. This shows that the temp or al pr ecedence ass umption induces a c ontext dep endent str atific ation on both even ts and atoms. The temp ora l precedence a s sumption is correct for many theories—including, as we will prove later, all those temp or al theories in which CP-laws co ntaining negation are not instantaneous—but not for all. Example 9 W e consider a v aria nt of the problem of E xample 8 in which the slav e do es not hav e to wait fo r the decision of the master, but is allow ed to accept an y request provided it has not yet b een accepted by the master . It is then possible that first the slave and later the master a ccept the same request, in whic h case the s e rvice is provided t wo times. Unlik e E xample 8, this spe c ification is an inc omplete description of a pr obability distribution. Indeed, the probability of a r equest b eing handled by the slave now als o depe nds on the probability of the s lav e reaching a de c is ion b efor e the master do e s, which is not spe cified. If we try to mo del this example with the sa me CP-theo ry as Exa mple 8, the temporal precedence as sumption would make one particular assumption ab out the relative sp eed of the tw o se rvers, namely , that the slave is always slow er than the ma ster. If we wan t to model so me o ther distr ibutio n, where the slave is so metimes faster tha n the master , we have to use a different representation style, which allows such informa tion to b e inco rp orated. This will be discussed la ter in Example 13. What this discus sion illustrates is that, ultimately , it is the resp onsibility of the user to design his CP -theory in such a way that the intended causal pro cesses s atisfy the tempo ral precedence assumption. 6 Discussion W e now chec k whether the w ay in which the previous sec tio n has extended the concept of an execution mo del to co p e with ne g ation indeed satisfies the go als that we origina lly stated. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 25 6.1 The c ase of p ositive the ories First o f a ll, we rema rk that, for pos itive CP-theories, the new definition (Def. 11) simply c o incides with the or iginal one (Def. 5), i.e., for p ositive theo ries, there is no difference b etw een execution models and w ea k execution mo dels. Indeed, because, according to our o riginal definitio n, it m ust be the c a se tha t I ( s ) | = E ( s ) for each non-leaf no de s , this is an immediate co nsequence of the following theorem, which follows trivia lly from the fact that througho ut a h yp othetical de r iv a tion sequence, the truth of an atom p can only increase. The or em 5 Let s b e a no de in a pr obabilistic Σ-pro cess. F o r any p ositive formula φ , if I ( s ) | = φ , then ν s ( φ ) = t . W e co nclude that, for po sitive CP- theories, the new definition is simply eq uiv ale n t to the old one. 6.2 Uniqueness the or em r e gaine d Second, the uniqueness theorem now indeed extends beyond p ositive theo ries. The or em 6 ( Uniqueness—gener al c ase ) Let C b e a CP-theory and X an interpretation o f the exogenous predicates of C . If T and T ′ are exec utio n mo dels of C in co ntext X , i.e ., T | = X C and T ′ | = X C , then π T = π T ′ . Pr o of Pro of of this theorem is given in Sectio n A.1. 6.3 Corr e ctness of temp or al pr e c e denc e in temp or al CP-the ories In the pr evious section, w e in tro duced the temp oral precedence a ssumption as a wa y of solving an a m biguity problem, namely the fact that different weak execution mo dels of a CP-theory with neg ation might pro duce different pro bability distribu- tions. W e show ed that this assumption was sa tisfied in the temporal CP-theory of Example 7. W e now prov e that the temp oral precedence assumption is sa tis fie d in a broad c lass of CP-theories, namely , in all tempo ral CP-theories in which even ts containing negatio n are non-instantaneous. T o b e more concr ete, we will show that if a weak execution mo del follows the timing of the voc a bulary , it also satisfies the tempo ral precedence assumption. Our first step is to refine our notion of a theory r esp ecting a timing (Definition 8), to make a distinction b etw een those atoms from so me body At ( r ) that a pp ea r only in a p ositive context and those which o ccur at least once in a ne gative context. The set of a ll the latter a toms will b e denoted as body − At ( r ), whereas that of all the former ones is body + At ( r ) 4 . 4 F ormally , we define, for all sentence s φ , the sets At + ( φ ) and At − ( φ ) by si multa neous induction as: 26 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe Definition 12 ( strictly r esp e cting a timing ) A CP-theory C (with negation) strictly resp ects a timing λ if, for a ll ground atoms h and b : • If there is CP-law r with h ∈ head At ( r ) and b ∈ body + At ( r ), then λ ( h ) ≥ λ ( b ); • If there is CP-law r with h ∈ head At ( r ) and b ∈ body − At ( r ), then λ ( h ) > λ ( b ); Notice that w e imp ose a stronger condition on neg ative conditions than on po sitive ones: the times of negative conditions should be strictly less than any caused atom. This condition entails that CP-laws with neg ation are not instantaneous. The or em 7 Let C b e a CP-theory whic h strictly resp ects a timing λ . Every weak execution mo del of C that follows λ also sa tisfies tempo ral precedenc e a nd is, therefor e, an execution mo del of C . Mo r eov er, such a pro cess a lwa ys exists. Pr o of Pro of of this theorem is given in Sectio n A.3. Int uitively , the theore m states that any ca usal pro cess o f C that is ph ysica lly po ssible (i.e., in which no event is caused by conditions that arise o nly in the future), automatically satisfies temp ora l precede nc e . Hence, in the c ontext o f CP-theor ies that strictly r esp ect some intended timing, the tempo ral precede nc e assumption applies naturally . Example 10 W e consider a time line divided into a num b er of different time slo ts, as illustrated in Figure 8. In the first time s lo t, a client sends a r equest to a server. If the s e rver receives a r e q uest, then with probability 0 . 5, he a ccepts it and sends a r eply , all within the same time slot as tha t in which he received the request. If the client has sent a reques t and has not re c eived a reply at the end o f the time slo t, he will rep eat his req ues t. A messag e that is s ent has a pr o bability of 0 . 8 of reaching the recipient in the same time slo t as it was s ent; with pr obability 0 . 1 , it reaches the recipient only in the nex t slot; with the remaining proba bility of 0 . 1, it will be lost. • F or p ( t ) a ground at om, At − ( p ( t )) = {} and At + ( p ( t )) = { p ( t ) } ; • F or φ ◦ ψ , with ◦ either ∨ or ∧ , A t + ( φ ◦ ψ ) = At + ( φ ) ∪ At + ( ψ ) and At − ( φ ◦ ψ ) = At − ( φ ) ∪ At − ( ψ ); • F or ¬ φ , At + ( ¬ φ ) = At − ( φ ) and At − ( ¬ φ ) = At + ( φ ); • F or Θ x φ , with Θ either ∀ or ∃ , At + (Θ x φ ) = ∪ t ∈ H U (Σ) At + ( φ [ x/t ]) and At − (Θ x φ ) = ∪ t ∈ H U (Σ) At − ( φ [ x/t ]), where H U (Σ) i s the Herbrand univ erse. W e can then define body − At ( r ) = At − ( body ( r )) and body + At ( r ) = body At ( r ) \ body − At ( r ). CP-lo gic: A L anguage of Causal Pr ob abilistic Events 27 Time slot 1 Time slot 2 Fig. 8. A div is ion in to time slo ts. ( S end ( C l i ent, R eq , S erv er, 1) : 0 . 7) . (24) ∀ t ( Accept ( t ) : 0 . 5) ∨ ( Rej ec t ( t ) : 0 . 5) ← Recv s ( S er v er, Re q , t ) . (25) ∀ t S end ( S er v e r , Answer, C l ie nt, t ) ← Accept ( t ) . (26) ∀ t, s, r , m ( Re cv s ( r, m, t ) : 0 . 8) ∨ ( Re cv s ( r, m, t + 1) : 0 . 1) ← S end ( s, m, r , t ) . (27) ∀ t S end ( C lie nt, R eq , S e rv er, t + 1) ← S e nd ( C l ient , R eq , S er v e r , t ) ∧ ¬ Recv s ( C li ent, Ans wer, t ) . (28) In this CP-theo ry , (24), (25) and (26) a re all instantaneous; the even ts describ ed by (27) mig h t either take place within one time slo t or constitute a propagatio n to a later time slot, dep ending on whic h of the poss ible effects actually o ccurs; fina lly , the ev ents descr ibe d by (28) a ll propagate to a later time slot. Because these la st even ts are the o nly one s in whic h negation o ccur s, this theory s tr ictly r esp ects its int ended timing and the theorem shows that the semantics gives the intended result. In summary , a sensible temp oral CP-theor y should resp ect its timing. If it strictly resp ect this timing—that is, the timing is fine-graine d enoug h to make CP -laws with negation non-instantaneous—then all of its w eak execution mo dels will automati- cally s atisfy tempor al precedence as well. O therwise, there may be w eak execution mo dels that do no t satisfy temp or al precedence and, hence, will b e r uled out as well. 6.4 V alidity of a CP-the ory Not all CP-theories ha ve an execution mo del. Let us illustrate this by the following example. Example 11 A game is be ing play ed b etw een t wo players, called W hi te and B l ack . If W hite do es not win, this ca uses B l ack to win and if B l ack do es not win, this c a uses W hi te to win. W i n ( W h ite ) ← ¬ W in ( B l ack ) . (29) W i n ( B l ac k ) ← ¬ W in ( W hite ) . (30) This theor y has t wo weak execution models : one in w hich (29) fires fir st and white wins with probability 1, and one in which (30) fires first a nd black w ins with probability 1. How ever, b o th of these weak e x ecution mo dels ar e rejected by the 28 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe tempo ral prece dence as sumption. Indeed, in ea ch of these weak execution models, it is the case that, for the ro ot ⊥ , ( ¬ W i n ( W hite )) ν ⊥ = u = ( ¬ W in ( B l ack )) ν ⊥ , so neither of the t wo even ts can ha ppen. So, this is an ex ample of an am biguity that cannot be resolved b y assuming tempor al pr ecedence. In order to ma ke a sensible CP-theory out of this example, w e w ould hav e to add a dditional information ab out the proba bility that one CP -law fires befo re the other. As w ill b e illustrated in Example 13, such information can b e mo deled in CP-lo gic, but requires a different representation style in which tempora l arguments are added to the predica tes. Theories which have no executio n mo dels are ob viously not o f interest. This motiv a tes the following definition. Definition 13 ( valid CP-the ories ) A CP-theory C is valid in an interpretation X for its ex o genous predicates if it has at least one execution mo del in context X . If C is v alid in all cont exts X , w e simply say that C is v a lid. Clearly , it is only if C is a v a lid CP-theo ry that w e can a sso ciate a probability distribution π C to it. The theories of Ex a mple 6 and Exa mple 8 a re v a lid. The a b ove discussion raise s the questio n how to recog nize whether a theory is v a lid. W e now prop ose a simple syntactic cr iterion tha t guarantees this. Definition 14 ( str atifie d CP-the ories ) A CP -theory C is str atifie d if there exists a function λ fro m the set o f its atoms to an in terv al [0 ..n ] such that C str ictly resp ects λ . Here, it is possible that the function λ is a timing such a s in Section 6.3, but this is not necessary ; e.g., it might b e the case that λ assigns different natural num b ers to atoms that conceptually , in their in tended interpretation, ar e supp osed to refer to the sa me time p oints. The following corolla ry of Theorem 7 is of relev ance both to tempor al and atemp oral CP-theor ies. Cor ol lary 1 Each stratified theory C has an execution mo del. W e rema rk that, in pa r ticular, all p ositive theories are str a tified, be c a use, for such a theory , w e can simply assign 0 to all g round atoms. An example o f a stra t- ified theor y containing negatio n is g iven in Exa mple 10. The theor y of E xample 8 is not s tratified b ecause the atoms Accepts ( S 1 , x ) and Accepts ( S 2 , x ) cannot b e ordered in time, since the times at which they are made true dep ends on who is the master. This is an example of a v alid but unstratified CP-theory . W e therefore con- clude that the exis tence of a stratificatio n is a s ufficient condition for the exis tence of an ex ecution mo del—and hence of the theory actually defining a proba bility distribution—but not a necessa ry one. 6.5 The r epr esentation of ti me in CP-lo gic In the preceding sections, w e ha ve encountered t wo quite different styles of know- ledge r epresentation: tempo ral theories explicitly include time, while atempor al theories make abstrac tio n of it. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 29 There may be several reaso ns for making time explicit. One obvious rea son is if we ar e ac tua lly interested in the intermediate states o f the pro cess. O ther reasons might b e that the ca usal pro cesses in a domain ar e simply to o complex to mo del without explicit time. Below, we illustrate t wo s uch cases. In CP-lo gic, each a tom starts out as fals e and might b ecome true during the pro cess; moreov er, if at some point an ato m beco mes true, it will remain true. In applications wher e the obvious relev a nt proper ties of the do main of in tere s t do not behave like this, we cannot simply represent them b y atoms in our CP-theor y . As already men tioned in Sectio n 4, this problem ca n t ypically be solved by explicitly including time in the repres e ntation. The following example illustrates how this metho dology can be used to handle domains in which ther e ar e causes for bo th a prop erty and its negation. Example 12 Consider the following v ariant of Ex a mple 2, in which a do ctor can now administer a medicine to suppress chest pain with pr o bability 0 . 9. P n eumonia (1) . (31) ∀ d ( P ne umonia ( d + 1) : 0 . 8) ← P ne umonia ( d ) . (32) ∀ d ( C hestpain ( d ) : 0 . 6) ← P neu monia ( d ) ∧ ¬ S up pre ssed ( d ) . (33) ∀ d M e dicine ( d ) ← C hestpai n ( d ) . (34) ∀ d ( S uppr essed ( d + 1) : 0 . 9 ) ← M edicine ( d ) . (35) In this representation, the use o f nega tion allows the predica te S uppr essed to act as a cause for not having chestpain. W e now discuss another type of applicatio n that requires time to b e made ex- plicit. As mentioned b efore, temp oral pr ecedence might giv e unintended results for theories which ar e no t temp ora l o r whose gr a nularit y of time is such that nega- tion o ccurs in insta ntaneous even ts. In suc h case s, the obvious solution is to make time explicit and ensure it is fine-gra ined enough to make a ll e vents with negation non-instantaneous 5 . T o illustrate, we consider the following refinement of Example 9. Example 13 W e ag ain consider the setting of E x ample 9, where the slave do es not necessa rily wait for the decisio n of the master, b efor e deciding whether to acc ept the reques t himself. This might b e the ca se, for instance, in a s ystem where the tw o servers have not bee n prop erly synchronized. As we explained, for such a mo del to be a complete 5 F or most real w orld even ts, there exists, at least in principle, some time s cale that w ould mak e them non-instan taneous. F or instance, ev en an even t such as the temp erature of a gas i ncreasing when the space in whic h it is con tained decreases, only manifests itself after the molecules of the gas hav e trav elled a certain microscopic distance, which do es take a—small, but in principle non-zero—amoun t of time. Examples of truly i nstan taneous even ts can b e found in quan tum mec hanics (if the state of one ob ject collapses, this instan taneously causes the collapse of the state of eac h ent angled ob ject) and abstract prop erties defined by so cial conv ent ion (e.g., signing a purc hase deed instantane ously mak es one the owner of a house ). 30 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe description of a pro bability distribution, we then also need to include information ab out the probability that the slav e decides b efor e the master. W e will a ssume that, at ea ch time po in t where the master has not decided yet, there is a pr obability of 0 . 2 tha t he will decide; for the slave, we assume that this pro bability is 0 . 8. ( M aste r ( S 1 ) : 0 . 5) ∨ ( S l av e ( S 1 ) : 0 . 5) ← . M as ter ( S 2) ← S l av e ( S 1) . S l av e ( S 2) ← M aste r ( S 1 ) . ∀ x ∀ s ∀ t ( Dec ides ( x, s, t ) : 0 . 2) ← M as ter ( x ) ∧ Appl i cation ( s ) ∧ ¬∃ t ′ ( t ′ < t ∧ D ecides ( x, s, t ′ )) . ∀ x ∀ s ∀ t ( Accepts ( x, s, t ) : 0 . 6) ← M aste r ( x ) ∧ D eci des ( x, s, t ) . ∀ x ∀ s ∀ t ( Dec ides ( x, s, t ) : 0 . 8) ← S l av e ( x ) ∧ Appl ication ( s ) ∧ ¬∃ t ′ ( t ′ < t ∧ D ecides ( x, s, t ′ )) . ∀ x ∀ s ∀ t Accepts ( x, s, t ) ← S l av e ( x ) ∧ D e cides ( x, s, t ) ∧ ¬∃ y ∃ t ′ ( M aste r ( y ) ∧ t ′ < t ∧ Accepts ( y , s, t ′ )) . In this CP- theory , we hav e intro duced the predica te D eci des ( x, s, t ) as a r eific a- tion of the event s by which the servers rea ch their dec ision (i.e., the even ts that w er e describ ed by (22) and (23) in our origina l theory from Example 8). The meaning of this predicate is that ser ver x makes his decision o n application s at time t . The ab ov e CP-theor y mo dels the situation o f an ea ger slave that decides on applications m uch fas ter than the mas ter, which causes many ser vices to b e provided t wice. 7 The relation to Ba y es ian ne t w orks In this s e ction, we inv estigate the relation b etw een CP-log ic and Bay esia n netw orks. Before we b egin, let us briefly reca ll the definition of a Bay esian netw ork. Such a net work consists of a dir ected a cyclic gr aph and a n umber of proba bilit y tables. Every no de n in the gr aph r e presents a random v ar iable, whic h has some do main dom ( n ) o f p os sible v a lues. A netw ork B defines a unique probability distribution π B ov er the set of all p o s sible a s signments n 1 = v 1 , . . . , n m = v m of v alue s to all of these random v a riables, with all v i ∈ dom ( n i ). First, this π B m ust ob ey a probabilistic independenc e as sumption expre ssed by the gr aph, namely , that every no de n is probabilistically indepe ndent of all of its no n-descendants, g iven its parents. This allows the probability π B ( n 1 = v 1 , . . . , n m = v m ) of such an assig nment o f v alues to a ll r andom v ar iables to b e rewritten as a pro duct of co nditional pro babilities Q i π B ( n i = v i | pa ( n i ) = v ), where eac h pa ( n i ) is the tuple of all parents of n i in the graph. The probability tables ass o ciated to the netw ork no w sp ecify precisely all of these conditiona l probabilities π B ( n i = v i | pa ( n i ) = v ). The s econd condition impo sed o n π B is then simply that all of these conditional pro babilities m ust matc h the corres po nding entries in these tables . It ca n be shown tha t this indeed suffices to uniquely characterize a single distribution. Most commo nly , Bay esian netw or ks are constructed without a n y explicit refer - ences to time, since this tends to pro duce the simples t mo dels. How ever, in s ome CP-lo gic: A L anguage of Causal Pr ob abilistic Events 31 cases such a repr e s entation do es not suffice; then, o ne typically uses a so-ca lle d dynamic Bayesian n etwork (Ghahr amani 1 998) which makes time explicit in muc h the same wa y a s can be do ne in CP-logic. Like CP- logic, Bay esia n netw orks ar e a for mal langua ge that can be used to represent causal rela tions in a domain. This is done by ch o s ing as the parents of a no de x all no des y for which it is the case that the v alue of y has a dir ect effect on the v a lue of x . The v alues in the conditional pro bability table for x then quantifiy the joint effect that all of these parents together hav e on x . Bay esian netw or ks constructed in this way ar e usually called c ausal netw or ks, to dinstinguish them from “ no n-causal” netw ork s which do no t ne c essarily fo llow the direction of causal relations. C a usal Ba yesian netw ork are more informa tive than no n-causal ones: not only do they define a proba bilit y distribution, but they also specify what will happ en when external action intervenes with the norma l o p e r ation o f the causa l mechanisms it describ es (Pearl 2000). In this section, we will compare ca usal B ay esian netw or ks to CP- lo gic. W e first show that, beca use Bayesian netw orks can easily b e unfolded into probability trees (Shafer 1996), they can b e mapped to C P -logic in a stra ightforw ard way . W e then discuss how CP - logic differs from Bay esian netw orks . There ar e es sentially thre e main differences. Our repres e ntation is more fine- grained and mo dular in the sense that a single probabilistic causal la w can express the effect that some of the “par- ent s” of a n atom hav e on it, reg ardless of the effect of others. It is also more qualitative, since we c an use first-o rder formulas to sp ecify in which circumsta nces the “ parents” will hav e a certain effect on the child, while Bay esia n netw orks enco de such information in pr obability tables. Finally , it is also more general, in the sense that it ca n dire c tly r epresent cyclic causal rela tions, whic h a Bay esia n netw ork can- not. W e remark that these co mparisons cons ider only the “v anilla” way of writing down Bay esian netw ork s, i.e., as a dra wing of a directed acyclic g raph accompanied by tables of num b ers. A larg e n umber of alterna tive nota tions exist in the litera- ture, e.g., (Comley and Dow e 200 3). These provide more eleg ant wa ys of handling all but one of the “s ho rtcomings” of Bay esian net works that we will mention—the exception being, to the best of o ur k nowledge, their inabilit y to directly repr e sent cyclic causal relations. After this discuss ion of r epresentation iss ue s , Section 7.5 will discuss interven tions in causa l Bayesian netw orks and show that the s emantics of CP-logic induces a natural count er part to this notion. 7.1 Bayesian networ ks in CP-lo gic As also men tioned in Sha fer’s b o ok, a Bay esia n netw ork can b e seen as a description of a class of probability tre e s . W e first make this more precise. T o ma ke it ea sier to compare to CP-logic later o n, w e will star t by in tro ducing a log ic al vo cabulary for describing a Bay esia n ne tw o r k. Definition 15 Let B be a Bay esia n netw or k. The vocabular y Σ B consists of a predic a te symbol P n for each no de n of B and a co nstant C v for each v alue v in the domain of n . 32 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe     spr ink ler + + W W W     we t g r ass     r ain 3 3 g g g sprinkler 0.2 rain 0.4 w et rain ¬ rain sprinkler 0.99 0.8 ¬ sprinkler 0.9 0 Fig. 9. Ba yesian netw or k for the spr inkler example. {} 0 . 4 } } { { { { { { { { 0 . 6 @ @ @ @ @ @ @ { Rain } 0 . 2 z z v v v v v v v v v 0 . 8   {} 0 . 2   0 . 8 & & L L L L L L L L L L L L { S p , R ain } 0 . 99 y y s s s s s s s s s 0 . 01   { Rain } 0 . 9   0 . 1 ! ! C C C C C C C C { S p } 0 . 8   0 . 2   = = = = = = = {} 0   1   3 3 3 3 3 3 { S p , R ain, W et } { S p, R ain } { Rain, W et } { W et } { S p, W et } { S p } { W et } {} Fig. 10. Pro cess co rresp onding to the spr ink ler Bayesian netw ork. Now, we wan t to relate a B ayesian netw ork B to a class of Σ B -pro cesses . Intu- itively , we are interested in those pro cesses, where the flow of events follows the structure of the gra ph and every event propaga tes the v alues of the parents of a no de to this no de itself. W e illus trate this by the following fa mous example. Example 14 ( Sprinkler ) The grass can b e wet b ecause it has rained or bec a use the sprink le r w as o n. The probability of the sprinkler causing the gras s to be wet is 0 . 8; the probability of rain causing the grass to b e wet is 0 . 9; and the probability of the gra ss b eing wet if bo th the sprinkler is on and it is ra ining is 0 . 99. The a priori pr obability of rain is 0 . 4 and that of the sprinkler having b een o n is 0 . 2. The Bayesian netw or k formalizatio n of this example ca n be seen in Figur e 9. Fig- ure 10 shows a proc e ss that corr esp onds to this netw ork . Here, w e hav e exploited the fact that a ll random v aria bles o f the Bayesian netw ork are bo o lean, b y repre- senting every r andom v a riable by a sing le atom, i.e., writing for instanc e W et a nd ¬ W et ins tead of W et ( T rue ) and W et ( F al se ). F or mally , we define the following class of pro cesses for a Bay esian netw ork. Definition 16 Let B b e a Ba yesian netw ork. A B -pr o c ess is a probabilistic Σ B -pro cess T for whic h there ex ists a mapping N fr o m no des of T to nodes of B , such that the following conditions are s atisfied. F or every branc h of T , N is a one-to-one mapping betw een the no des on this branch and the no des of B , whic h is order pr eserving, in the s ense CP-lo gic: A L anguage of Causal Pr ob abilistic Events 33 that, for all s, s ′ on this branch, if N ( s ) is an a ncestor of N ( s ′ ) in B , then s m ust be an ancestor of s ′ in T . If N ( s ) is a node n with domain { v 1 , . . . , v k } and par ents p 1 . . . , p m in B , then the children of s in T are no des s 1 , . . . , s k , for which: • I ( s i ) = I ( s ) ∪ { P n ( C v i ) } ; • The edge fr o m s to s i is la b eled with the entry in the table for n , that gives the co nditional pro ba bilit y o f n = v i given p 1 = w 1 , . . . , p m = w m , where eac h w i is the unique v a lue from the domain of p i for which P p i ( C w i ) ∈ I ( s ). It should be cle ar that ev ery leaf s of suc h a B -pro cess T describe s an a ssignment of v alues to all no des of B , i.e., every no de n is assigned the unique v alue v for which P n ( c v ) ∈ I ( s ). Moreover, the probability P ( s ) of such a leaf is precisely the pr o duct of all the a ppr opriate ent rie s in the v ario us conditiona l proba bilit y distributions. Therefore, the distribution π T coincides with the distribution defined by the netw ork B . W e now cons truct a CP-theory CP B , such that the execution mo dels of CP B will be precisely all B -pr o cesses. W e first illustrate this pro cess by showing how the Bay esian netw or k in Figur e 9 can b e transfor med into a CP-theory . Example 14 ( Sprinkler—c ont’d ) W e ca n derive the following CP-theo r y fro m the Ba yesian netw ork in Figure 9. ( W e t : 0 . 99) ← S pri nk l er ∧ Rain ( W e t : 0 . 8) ← S p rink l er ∧ ¬ Rain. ( W e t : 0 . 9) ← ¬ S pr ink l er ∧ Rain. ( W e t : 0 . 0) ← ¬ S pr ink l er ∧ ¬ Ra in. ( S pr ink l er : 0 . 2) . ( Rain : 0 . 4) . Again, this ex ample explo its the fact that the random v a riables ar e all b o olean, by using the mor e readable repr esentation of W et and ¬ W et instead of W et ( T rue ) and W et ( F al se ). It should b e obvious that the pro cess in Figure 1 0 is a n execution mo del of this theory a nd, therefore, that this theo r y defines the sa me proba bilit y distribution as the Bay esian netw ork. It is now easy to see that the enco ding used in the ab ov e exa mple genera lizes. Concretely , for every no de n w ith parents p 1 , . . . , p m and do ma in { v 1 , . . . , v k } , w e should construct the set of all rules of the form: ( P n ( C v 1 ) : α 1 ) ∨ · · · ∨ ( P n ( C v k ) : α k ) ← P p 1 ( C w 1 ) ∧ · · · P p m ( C w m ) , where each w i belo ngs to the do main of p i and each α j is the entry for n = v j , given p 1 = w 1 , . . . , p m = w m in the CPT for n . Let us denote the CP - theory thus constructed by CP B . The following result is then o b vio us. The or em 8 34 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe Let B be a B ayesian netw ork . E very B -pro ces s T is an execution mo del of the CP-theory CP B , i.e., T | = CP B . Therefore, the semantics of B co incides with the distribution π C . This result s hows that C P -logic offers a straightforw ard w ay of repres e nting Bay esian net works. W e now disc uss three wa ys in w hich it offers mo re expres sivity . 7.2 Multipl e c auses for the same effe ct In a pro cess corr esp onding to a Bayesian netw ork , the v alue o f each rando m v ariable is deter mined b y a single even t. CP-log ic, on the o ther hand, allows m ultiple ev ents to affect the s ame pr o p erty . This leads to b etter repres ent atio ns for effects that hav e a n umber of indep endent causes. Let us illustrate this by the follo wing example. Example 15 W e consider a game of Russian roulette that is b eing played with tw o guns, one in the play er ’s left hand and one in his r ig ht, each of which has a bullet in one of its six cham ber s. ( D eath : 1 / 6) ← P ul l trig g er ( L ef t g un ) . ( D eath : 1 / 6) ← P ul l trig g er ( R ig ht g un ) . In this example, there are tw o “causa l mechanisms” that might le ad to D eath : one is the fact that pulling the trig g er o f the left gun might cause a bullet to hit the per son’s left temple, and the other is the fact that pulling the trigger o f the right gun might cause a bullet to hit the pers o n’s right temple. They are indep en dent in the following sense: once w e know how many a nd which of thes e mechanisms are actually activ ated (i.e., which of the t wo triggers are pulled), then obs e r ving whether one of these poss ible ca uses actually results in the effect (i.e., whether o ne of the bullets is actually fired and k ills the p ersons ) pro vides no informatio n ab out whether o ne of the other causes will caus e the effect (i.e., whether one of the o ther bullets is a lso fired). Mathematically , this is of cours e sa ying nothing more than the proba bilit y of the effect occ ur ring should b e equal to the r esult o f applying a noisy-or 6 to the m ultiset of the probabilities with which ea ch of the causal mechanisms that are actually a ctiv a ted causes the effect, i.e., if b oth guns are fired, the pr obability of death should b e 1 − (1 − 1 / 6) 2 . This independence is precisely the co ndition that is required in order to b e able to r epresent each of these tw o causal mechanism by a separ ate CP-law, as in the ab ov e ex a mple. T o s uc c inctly descr ibe this situatio n, we s ay that P u l l trig g er ( L ef t g u n ) and P ul l trig g er ( R ig ht g un ) are indep endent c auses for D e ath . F or instance, in Example 14, R ain and S prink l er were not indep endent caus e s for W et , since the probability o f W et given b o th R ain and S prink l er is 0 . 99, whic h is not equal to 1 − (1 − 0 . 8)(1 − 0 . 9) = 0 . 98 . In this case, the causal mechanisms 6 The noisy-or maps a multiset of probabili ties α i to 1 − Q i (1 − α i ). CP-lo gic: A L anguage of Causal Pr ob abilistic Events 35     lef t + + V V V V V V     death     r ig ht 3 3 h h h h h death left ¬ left righ t 11/36 1/6 ¬ righ t 1/6 0 Fig. 11. A Bay esian netw ork for Example 15 . by which Ra in and S prink l er cause W et ther efore app ear to reenfor ce each other: it might b e that a light drizzle o nly causes the grass to get s lightly moist, a nd that sometimes the pressur e on the w ater main is so low that the spr inkler by itself cannot g et the g rass really wet, but that a light drizzle and a lightly s pr aying sprinkler to gether would b e enough to cause W e t , e ven though neither of them separately would do the trick. Because S pr ink l er a nd R ain a re no t indep endent causes in this example, we cannot use a repr esentation of the form: ( W e t : α ) ← S pr ink l er. ( W e t : β ) ← R ain. and instead hav e no choice but to use the repres e n tatio n shown on page 32. Figure 11 shows a Bay esia n net work for the Russian r oulette exa mple. The most obvious differ ence b etw een this representation and ours concerns the indep e ndence betw een the tw o differen t caus es fo r dea th. In the CP- theory , this indep endence is expr essed b y the st ructur e of the theory , whe r eas in the Bay esian netw o r k, it is a num eric al prop erty of the pr obabilities in the conditional probability table for D eath . Because of this, the CP-theo r y is more elabor ation toler ant, since adding or removing an additional cause for De ath simply corre s po nds to adding or removing a single CP-law. Moreover, its representation is also mor e compact, requiring , in general, only n proba bilities for n independent cause s , instead o f the 2 n ent r ie s that are needed in a Bayesian netw or k table. Of course, these en tries are nothing more than the result of applying a noisy-or 7 to the m ultiset of the probabilities with which each of the causes that ar e present actually causes the effect. In graphical modeling , it is common to consider v ariants of Bay esia n netw or k s, that us e more so phisticated r e presentations of the req uir ed conditio na l pro bability distributions than a simple table. Including the noisy-or as a structural element in s uch a representation achieves the same effect as CP-log ic when it co mes to representing indep endent ca us es. 7 The noisy-or m aps a multiset of probabilities α i to 1 − Q i (1 − α i ). 36 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe 7.3 First-or der lo gic r epr esentation of c auses In CP- logic, the ca use of an even t can be represented in a qualitative way , by means o f a firs t-order fo r mula. Bay esia n net works, on the other hand, enco de such information in the probability tables. Example 16 In the so-ca lle d Wumpus world , an agent moves thro ugh a grid, whic h contains, among other things , a num b er of b ottomles s pits. O ne asp ect of this w o r ld is that if a po s ition x is next to such a pit, then with a c e r tain probability α , a br e eze can b e felt there (often, α is simply taken to b e 1). In CP-lo gic, w e could write the following CP-law: ∀ x ( B r eez e ( x ) : α ) ← ∃ y N extT o ( x, y ) ∧ P it ( y ) . F or a g rid in which squar e A is sur rounded by sq uares B , C, D and E , a Bay esian net work co uld r e pr esent the effect o f P it ( B ) , P it ( C ) , P it ( D ) , P it ( E ) on B r eez e ( A ) by the following table: B r eez e ( A ) P i t ( B ) P i t ( C ) P it ( D ) P it ( E ) α P i t ( B ) P i t ( C ) P it ( D ) ¬ P it ( E ) α . . . . . . ¬ P i t ( B ) ¬ P it ( C ) ¬ P it ( D ) P it ( E ) α ¬ P i t ( B ) ¬ P it ( C ) ¬ P it ( D ) ¬ P it ( E ) 0 In this example, CP -logic offers a r epresentation which is co ns iderably more c on- cise than that of the Bay esian netw ork. This manifests itself in tw o wa ys : first, our first-order representation succeeds in defining the probability o f B r eez e ( x ) for al l squares x sim ultaneo usly b y a single CP-law, while ea ch square w ould need its own (iden tical) probability table in the Ba yesian net work; sec ond, it can also summarize the 2 4 ent r ie s that make up the probability table for ea ch B r eez e ( x ) by the sin- gle first-o r der precondition of this CP -law. Aga in, these sho r tcomings hav e already bee n reco gnized by the Bay esian netw ork co mm unity , leading to, for instance, the use of decision trees to repr esent probability ta bles (Comley and Dow e 2003), v ar - ious forms o f p ar ameter tying and first-or de r versions of B ayesian net works such a s Bay esian Lo gic Progr ams (Kersting and De Raedt 20 0 0) (see also Section 9 .4.3). W e remar k that this featur e o f CP - logic cannot re ally b e s e e n se parately from that discussed in the pre vious section: it is precisly b ecause we split up the effect that “par e n ts” hav e on their “child” into a num b er of independent causal laws, that we g et mor e opp ortunity to ex ploit the expres sivity of our first-order r e pr esentation. 7.4 Cyclic c au sal r elations In real life, proba bilistic pro cesses may consist o f even ts that might propa gate v a lues in o pp o s ite dir e ctions. W e already saw this in E xample 2, where angina c ould c a use pneumonia, but, vice versa, pneumonia could also cause a ng ina. In CP-logic, such CP-lo gic: A L anguage of Causal Pr ob abilistic Events 37     exter nal ( ang ina ) / / ) ) R R R R R R R R R R R R R R     ang ina     exter nal ( pneumonia ) / / 5 5 l l l l l l l l l l l l l l     pneumonia angina e(p) ¬ e(p ) e(a) 1 1 ¬ e(a) 0.2 0 pneumonia e(p) ¬ e(p) e(a) 1 0.3 ¬ e(a) 1 0 Fig. 12. Bay esian netw ork for the ang ina - pneu monia causa l lo op. causal loo ps do not r equire any sp ecial treatment. F or instance, the lo op formed b y the tw o CP- laws ( Ang ina : 0 . 2) ← P neumonia. ( P ne umonia : 0 . 3) ← Ang ina. correctly b ehav es a s follows: • If the patien t has neither angina nor pneumonia by an externa l cause (‘exter- nal’ here do e s not mean exogenous, but simply that this cause is not par t of the causal lo op), then he will have neither; • If the patien t has angina by an externa l caus e , then with probability 0 . 3 he will also hav e pneumonia; • If the patient has pneumonia by a n externa l cause, then with proba bility 0 . 2 he will also hav e angina; • If the patient ha s b oth pneumonia a nd angina by an ex ternal cause, then he will obviously hav e b oth. In orde r to get the same b ehaviour in a Bayesian net work, this w o uld hav e to b e explicitly enco ded. F or instance, o ne could in tro duce new, artificial random v ari- ables exter nal ( ang ina ) and exter nal ( pneumonia ) to represent the pos sibility tha t ang in a and pneumonia r esult from an external cause a nd construct the Bay esia n net work that is shown in Figure 12. In gener al, to enco de a causa l lo op formed by n pr op erties, one would intro duce n a dditional no de s , i.e., all of the n or ig inal prop erties w o uld hav e the sa me n artificia l no des as parents. 7.5 Inverventions in CP-lo gic Pearl’s work inv estigates the behaviour of causal mo dels in the presence of in- terventions , i.e., outside manipulations tha t preempt the norma l behaviour o f the system. His key observ ation here is that ca usal r elations are robust, in the s ense that, even when some causal rela tions ar e intervened with, the other causal rela- tions will still hold as b efore. F ormally , a n interven tion for P ear l is something of the form do ( X = x ) that sets the v alue of a random v aria ble X to x . In doing this, all edges leading into X are remov ed, b eca use even though they repres ent the ca usal 38 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe mechanism that would normal ly determine the v a lue of X , they b ecome ir relev ant when the v alue of X is determined b y an external interv ention ins tea d. It is a lso p ossible to consider such in terventions in the context of CP-log ic . Our representation of a caus a l s y stem is a mo dula r one, in which the ato mic unit is a s ingle CP-law. B ecause o f this, our languag e comes with a spe cific kind o f in- terven tions “built-in”: if we want to know what the result is o f intervening with a single causa l law r , we can simply co nsider the theor y from which this one law is removed (and p os sibly repla c e d by some o ther law). So, to judge the effect of doing an interv ention that preven ts r , we simply hav e to lo ok at π C \{ r } instead of π C , in (roug hly) the same w ay that Pearl would lo ok at so mething of the form P ( · | do ( X = x )) instead of P ( · ). The o pp o s ite is of course also p ossible: if we w ant to know the effect of doing an interv ention that instead establis hes an additiona l causal law r , w e could lo ok at π C ∪{ r } . T o illustrate, let us consider a no ther medical example. A tumor in a pa tien t’s kidney might cause kidney failure, which might cause the death o f the patien t; how ever, to make matters even w or se, the tumor can also metasta s ize to the br a in, which might also , indep endently , k ill the patien t. W e can represe nt this as: ( K i dney F ail ur e : 0 . 1 ) ← K i dney T umor. ( B r ainT um or : 0 . 1) ← K idney T umor . ( D eath : 0 . 5) ← B r ainT u mor . ( D eath : 0 . 9) ← K i dney F ail ur e. Now, let us supp ose that we w ant to know what the effect will be of putting the patient on a dialysis machine, whic h allows him to survive k idney failure. T o anw er this question, we simply remove the la s t of these causal laws (since the dialysis is precisely meant to preven t this par ticular causa lit y from tak ing effect) and lo ok at the semantics of the resulting theor y . If, say , the dialysis a lso ca rries some small risk, w e ca n a lso add new causal laws such as : ( D eath : 0 . 01) ← D ial y sis. In this wa y , the semantics of CP- logic a lready carries within it a notion of in- terven tion, which is also slig h tly more finegr ained than that of Pearl, s ince w e ca n consider interv entions that prohibit a single causal law (as in the a b ove e xample), whereas Pearl only co nsiders in tervent ions that pr ohibit al l causal laws that affect the v alue o f a certain random v aria bles. In the case of the ab ove example, ther efore, we would have to either in tervene to preven t al l p ossible causes for death, including the brain tumor, o r none at all. Admittedly , it is of cour s e easy eno ugh to s olve this pro blem by introducing some intermediate v ariable, s ay “high levels o f toxins in the bloo d”, betw een kidney failure and death. Among other things, Pearl uses interven tions to make s e nse of the statistical phenomenon known as Simpson ’s p ar adox . Bec ause this is somewhat of a be nchmark for causal formalisms, we will briefly discuss how CP -logic can deal with it. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 39 Simpson’s par adox refers to the phenomenon that a p o pula tion can so metimes be partitioned in such a way that a cer ta in outcome has a lo w pro bability in each of the par titio ning se ts , yet has a high pro bability in the popula tio n over all (or vice versa). F or instance, a certain drug might b e har mful to both men and women, but app ear b eneficial for per sons of unkno wn sex. More pr ecisely , taking the drug and recov ering might b e pos itively cor related in the p o pulation at larg e , but b ecome negatively correla ted when conditioning o n sex. This can happ en, for instance, if men are b oth more lik ely to ta ke the drug and to sp ontaneously re c ov er from the disea se. (Because then observing that a patient takes the drug increa ses the probability that he is male, which in turn increa ses the probability that he will recov er on his own, thus erroneously suggesting tha t the drug had so me benefical effects on this recovery .) The cr ux of Simpson’s paradox is that the same probability distribution can b e generated b y differen t sets o f c ausal laws, and tha t in or der to figure out whether, e.g., some drug has a p ositive effect on a pa tien t’s co ndition, it are really these causal laws that matter. P ea rl’s b o ok shows tha t the paradox can therefore be reso lved by considering the caus al mo de ls behind the probability distributio n. In CP -logic, we can do the same. Let us illustrate this by a famous real- world exa mple: it w as found that women had a significantly low er a cceptance r ate than men for the g raduate school of the Universit y o f California at Ber keley , w hich led to a discrimination law-suit a gainst the university . How ever, it turned out that none of the individual departments of the university had a lower acceptance rate for women than for men; instead, it was simply the case that women were significa nt ly mo re likely to apply to departments with a low accepta nc e ra te. A highly simplified mo del of the real situation mig h t therefore ha ve lo oked some- thing lik e this: ∀ x ( Appl y ( x, E ng inee ri ng ) : 0 . 7) ∨ ( Appl y ( x, Lite ratu re ) : 0 . 3 ) ← M an ( x ) . ∀ x ( Appl y ( x, E ng inee ri ng ) : 0 . 2) ∨ ( Appl y ( x, Lite ratu re ) : 0 . 8 ) ← W oman ( x ) . ∀ x ( Accepted ( x ) : 0 . 6) ← Appl y ( x, E ng ineer ing ) . ∀ x ( Accepted ( x ) : 0 . 3) ← Appl y ( x, Liter atu re ) . Here, there clearly is no gender discr imination: the ge nder of the applicant pla ys no role in the CP-laws that descr ib e how the university decides whether to accept an applicatio n. The reason for the law suit is that, if w e o nly lo ok a t the acceptance rates of men and women for the univ er s it y as a whole, we cannot disting uish this CP-theory from, e.g., the following o ne : 40 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe ∀ x ( Appl y ( x, E ng inee ri ng ) : 0 . 6) ∨ ( Appl y ( x, Lite ratu re ) : 0 . 5 ) ← M an ( x ) . ∀ x ( Appl y ( x, E ng inee ri ng ) : 0 . 4) ∨ ( Appl y ( x, Lite ratu re ) : 0 . 6 ) ← W oman ( x ) . ∀ x ( Accepted ( x ) : 0 . 3) ← Appl y ( x, E ng ineer ing ) ∧ W oman ( x ) . ∀ x ( Accepted ( x ) : 0 . 6) ← Appl y ( x, E ng ineer ing ) ∧ M an ( x ) . ∀ x ( Accepted ( x ) : 0 . 4) ← Appl y ( x, L iter atur e ) ∧ W oman ( x ) . ∀ x ( Accepted ( x ) : 0 . 5) ← Appl y ( x, L iter atur e ) ∧ M an ( x ) . Indeed, b o th theories yield an acce pta nce ra te o f 0 . 36 for women and an ac- ceptance ra te of 0 . 51 for men. In the law suit, the a c ceptance ra tes of individual departments were us ed to arg ue that the first mo del was, in fact, the correct o ne. Purely theor etically , another option could ha ve b een to conduct a randomized ex- per iment to elimina te selection bias : instead of a llowing students to choo se their department, we would as sign it at random. In CP- logic terms, this corresp onds to the interven tion of r emoving the first tw o CP- laws from both theories and replacing them by: ∀ x ( Appl y ( x, E ng inee ri ng ) : 0 . 5) ∨ ( Appl y ( x, Lite ratu re ) : 0 . 5 ) . If we were to p erfor m this interven tion, the first theory w ould predict a new acceptance r ate of 0 . 45 for men and women alike, where a s the second would predict an a c ceptance r ate o f 0 . 35 for women and 0 . 55 for men. So, we se e here that the k ind of interv entions induced by the se mantics of CP -logic is able to ex plain Simpso n’s paradox, and do es so in essentially the sa me w ay that Pearl doe s . 8 CP-logic and logi c programs There is an obvious similarity b etw een the syntax of CP -logic and that o f logic progra ms. Mor eov er, the c o nstructive pro ce s ses that w e used to define the seman- tics o f a CP-theor y ar e also similar to the kind of fixp oint constructions use d to define cer tain semantics for logic progr ams. In this section, we will in vestigate these similarities. T o be mo r e concrete, we will first define a str a ightforw ard pr o babilistic extension of lo g ic pr ogra ms, c a lled L o gic Pr o gr ams with Annotate d Disjunctions , and then prov e tha t this is ess e n tially equiv alent to CP -logic. The connec tion b etw een causal reaso ning and lo g ic pr ogramming has long b een implicitly prese n t; we c an r e fer in this res pe c t to, for insta nce, for malizations of s itu- ation calculus in lo gic progra mming (Pinto and Reiter 199 3; V a n Belleghem et al. 1997). Here, we now make this rela tion explicit, by showing that the languag e of CP- logic, that we hav e constructed directly fro m caus al principles, corresp onds to ex- isting log ic prog ramming concepts. In this r esp ect, our work is similar to tha t of (McCain and T ur ner 1996), who defined the language of causal theo ries, whic h w as then shown to be closely related to log ic progr a mming. Howev er, as we w ill discuss later, McCain and T urner formalis e somewhat different causa l intuitions, which leads to a cor resp ondence to a different logic progra mming semantics. O ur r esults CP-lo gic: A L anguage of Causal Pr ob abilistic Events 41 from this section will help to clarify the po sition of CP-log ic among related work in the are a of pr obabilistic log ic pro gramming, such a s Poole’s Indep endent Choice Logic (Poo le 1997). Mor eov er, they provide a dditio nal insight into the role that causality plays in such pr obabilistic lo gic progr amming languag es, as well a s in normal and disjunctive logic prog rams. 8.1 L o gic Pr o gr ams with Annotate d Di sjunctions In this section, we define the langua g e of L o gic Pr o gr ams with Annotate d Disjunc- tions , or LP ADs for short. This is a pr obabilistic extension of log ic pr o gramming, which is based on disjunctive logic pro grams. This is a natural choice, b ecause dis- junctions themselv es—and therefore als o disjunctive logic progr a ms—already rep- resent a kind o f uncertaint y . Indeed, to giv e just one ex a mple, we c ould use these to mo del indeterminate effects of actions. Consider, for insta nc e , the following dis- junctiv e r ule: H eads ∨ T ail s ← T oss. This offers a quite intuitiv e repre s entation of the fact that tos sing a coin will res ult in either heads o r ta ils . Of course, this is not all w e kno w. Indeed, a coin a lso has equal pr obability o f la nding o n heads or tails. T he idea b ehind LP ADs is now simply to express this by annotating ea ch of the disjuncts in the head with a pro bability , i.e., w e wr ite: ( H eads : 0 . 5) ∨ ( T ai l : 0 . 5) ← T oss. F or mally , an L P AD is a set o f r ules: ( h 1 : α 1 ) ∨ · · · ∨ ( h n : α n ) ← φ, (36) where the h i are ground a toms and φ is a sentence. As such, LP ADs are sy n tactica lly ident ica l to CP-lo gic. Howev er, we will define their semantics quite different ly . F o r instance, the a b ove example will express that precisely one of the following logic progra mming rules ho lds: either H eads ← T oss holds, i.e., if the coin is tossed this will yield heads, or the rule T ail s ← T oss holds, i.e., to ssing the coin g ives tails. Each of these t wo rules has a proba bilit y of 0 . 5 of b eing the actual instantiation of the disjunctive rule. More gener ally , ev er y rule o f form (36 ) repr e sents a probability distribution ov er the following set of logic prog ramming rules : { ( h i ← φ ) | 1 ≤ i ≤ n } . F ro m these distributions, a pr obability distribution o ver logic pro grams is then derived. T o formally define this distr ibutio n, w e intro duce the following co ncept of a sele ction . W e use the notation head ∗ ( r ) to denote the set of pairs head ( r ) ∪ { ( ∅ , 1 − P ( h : α ) ∈ head ( r ) α ) } , wher e ∅ r epresents the p ossibility that none of the h i ’s are caused b y the rule r . Definition 17 ( C -s ele ction ) Let C b e an LP AD. A C -sele ct ion is a function σ from C to S r ∈ C head ∗ ( r ), such 42 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe that for all r ∈ C , σ ( r ) ∈ hea d ∗ ( r ). By σ h ( r ) a nd σ α ( r ) we denote, r esp ectively , the first and second element of the pair σ ( r ). T he set of all C -sele ctions is denoted as S C . The pr obability P ( σ ) of a selection σ is now defined as Q r ∈ C σ α ( r ). F or a set S ⊆ S C of selections, we define the pr obability P ( S ) as P σ ∈ S P ( σ ). By C σ we denote the logic program that consists of all rules σ h ( r ) ← b ody ( r ) for which r ∈ C and σ h ( r ) 6 = ∅ . Suc h a C σ is ca lled an instanc e o f C . W e will interpret these instances by the well-founded mo del semantics. Reca ll that, in general, the well- founded mo del o f a prog ram P , w f m ( P ), is a pa ir ( I , J ) of in terpre ta tions, where I contains all atoms that are c e rtainly tr ue a nd J contains all atoms that might po ssibly b e true. If I = J , then the well-founded mo del is called exact. Intuitiv ely , if w f m ( P ) is exac t, then the tr uth of all atoms can be decided, i.e., everything that is not false can be der ived. In the semantics of LP ADs, we w ant to ensure that all uncertaint y is expressed by mea ns of the annotated disjunctions. In o ther words, given a sp ecific selection, there should no longe r b e any uncertaint y . W e therefore impo se the following criter ion. Definition 18 ( soundness ) An LP AD C is sound iff all ins tances of C hav e an exact well-founded mo del. F or such LP ADs, the following s emantics can now b e defined. Definition 19 ( instanc e b ase d semantics µ C ) Let C be a so und LP AD. F or an interpretation I , we denote by W ( I ) the set o f all C -selections σ for which w f m ( C σ ) = ( I , I ). The instanc e b ase d semantics µ C of C is the probability distribution on interpretations, that ass igns to each I the probability P ( W ( I )) of this set of selections W ( I ). It is straig ht for ward to extend this definition to allow for exogenous predicates as w ell. Indeed, in Section 2.2, we have already seen how to define the well-founded semantics for rule sets with open predica tes, a nd this is basically all that is needed. Concretely , given an interpretation X for a set of ex ogenous predicates, we can define the instance based semantics µ X C given X a s the distribution that ass ig ns, to each in terpr etation I of the endogenous predic a tes, the pr obability of the set of all selections σ for whic h ( I , I ) is the well-founded mo dels of C σ given X . Of course, this s e mantics is only defined for LP ADs that a re so und in X , mea ning that the well-founded mo del of each C σ given X is tw o-v alued. 8.2 Equivalenc e to CP-lo gic Every C P -theory is s y nt ac tica lly also an LP AD and v ice versa. The key result o f this section is tha t the instance based semantics µ C for LP ADs coincides with the CP-logic semant ics π C defined in Sections 3 and 5. The or em 9 Let C b e a CP- theory that is v alid in X . Then C is als o an LP AD tha t is so und in X and, moreover, µ X C = π X C . CP-lo gic: A L anguage of Causal Pr ob abilistic Events 43 Pr o of Pro of of this theorem is given in Sectio n A.2. W e r emark that it is not the ca se that every sound LP AD is also a v alid CP- theory . In o ther words, there are so me s ound LP ADs tha t ca nnot b e seen as a sensible description of a set of proba bilistic causal laws. Example 17 It is easy to see that the following CP - theory has no execution models . ( P : 0 . 5) ∨ ( Q : 0 . 5 ) ← R. R ← ¬ P . R ← ¬ Q. How ever, each of its instances has an exa ct w ell-founded mo de l: for { P ← R ; R ← ¬ P ; R ← ¬ Q } this is { R, P } a nd for { Q ← R ; R ← ¬ P ; R ← ¬ Q } this is { R, Q } . Clearly , this CP-theory do es not ha ve execution mo dels that satisfy the tempo ral precedence assumption. 8.3 Discussion The res ults of this se ction rela te CP-log ic to LP ADs and, more gener ally sp e aking, to the area of logic progra mming and its proba bilis tic ex tensions. As such, these results he lp to p osition CP- lo gic a mong r elated work, suc h as Poo le’s Indep endent Choice Log ic and McCain and T urner’s causal theories , which we will dis cuss in Section 9.2. More ov er, they also provide a v alua ble piece of knowledge repre s ent a - tion metho dology for these languag es, b y clarifying how they can r epresent c a usal information. T o illustrate, we now discuss the relev ance o f our theorem for so me logic programming v a riants. Disjunctive lo gic pr o gr ams. In probabilistic mo deling, it is often useful to conside r the qualitative structure o f a theory separa tely from its pro babilistic parameter s. Indeed, fo r instance, in machine learning, the pr oblems o f str ucture lear ning and parameter learning are tw o very differen t task s. If we consider only the structure of a CP - theory , then, syntactically sp e aking, we end up with a disjunctive lo gic pr o gr am , i.e., a set o f r ules: h 1 ∨ · · · ∨ h n ← φ. (37) W e can also single o ut the qua litative information cont ained in the seman tics π C of such a CP-theory . Indeed, as w e ha ve alrea dy seen, lik e any pro bability distr ibution ov er interpretations, π C induces a p o ssible world semantics, consisting o f those int er pr etations I for which π C ( I ) > 0. Thus w e can define: I | = C if π C ( I ) > 0 Now, let us restrict our attention to only those CP- theories in which, for every CP-law r , the sum of the probabilities α i app earing in head ( r ) is precisely 1. This 44 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe is without loss of gene r ality , since we can simply add a n additional disjunct ( P : 1 − P i α i ), with P s ome new a tom, to all rules whic h do not satisfy this prop erty . It is ea s y to see that the set of p ossible worlds is then indep endent of the prec ise v alues of the α i , i.e., the q ua litative as p ects of the semant ics of such a theory depe nd o nly on the qualitative asp ects of its sy n tactica l form. Stated differen tly , for an y pair of CP-theories C, C ′ which differ only on the α i ’s, it holds that, for any interpretation I , I | = C iff I | = C ′ . F ro m the p oint o f v ie w of disjunctiv e logic prog ramming, this set of p os s ible worlds therefore offers an alter native semantics for such a program. Under this s e- mantics, the intu itive reading of a rule of fo r m (37) is: “ φ causes a non-deter ministic even t, whose effect is precis e ly one of the h 1 ,. . . , h n .” This is a different infor - mal reading than in the sta ndard stable mo del seman tics for disjunctive progr ams (Przymusinski 199 1). Indeed, under our reading, a rule co rresp onds to a causal even t, whereas, under the s table mo del rea ding , it is s upp os ed to describ e an as- pec t of the rea s oning b ehaviour of a ra tio nal agent. Consider, for instance , the disjunctiv e prog ram { p ∨ q . p. } . T o us, this progr am describ es a set o f tw o non- deterministic even ts: O ne even t causes either p or q and another even t always caus es p . F ormally , this leads to tw o p ossible worlds, namely { p } a nd { p, q } . Under the stable mo del semant ics , how ever, this progra m states that an agent b elieves either p or q and the agents believes p . In this case, he has no reas on to b elieve q and the only stable mo del is { p } . So, clearly , the causal view o n disjunctiv e logic prog ram- ming induced by CP-lo gic is fundamentally different from the s tandard v iew and leads to a different formal semantics. Interestingly , the p ossible mo del semantics (Sak a ma and Inoue 1 994) for disjunctive programs is quite similar to the LP AD treatment, b ecause it consists o f the stable mo dels of instances of a pr ogra m. Be- cause, as sho wn in Section 8.2, the semant ics of CP-logic co nsiders the w ell-founded mo dels of instances, these t wo semantics are very clo s ely rela ted. Indeed, for a large class of progra ms, including all stratified ones, they co incide completely . Normal lo gic pr o gr ams. Let us consider a logic prog ram P , co nsisting o f a set of rules h ← φ , with h a ground atom a nd φ a formula. Syn tactica lly , s uch a progra m is also a deterministic CP-theory . I ts s emantics π P assigns a probability of 1 to a single interpretation and 0 to all other in terpretations . Moreover, the results from Section 8.2 tell us that the interpretation with proba bilit y 1 will b e precisely the well-founded mo del o f P . As suc h, these results show tha t a logic prog ram under the w ell-fo unded seman tics can b e viewed a s a description of deterministic causal information. Co ncretely , we find that we ca n r ead a rule h ← φ as: “ φ caus es a deterministic even t, whose effect is h .” This observ ation makes explicit the co nnection be tw een causal r easoning and logic pro gramming that ha s long b een implicitly prese nt in this field, a s is witnessed, e.g., by the work o n situation ca lculus in lo gic progra mming. As such, it enhances the theor etical foundations behind the prag matic use of logic programs to r epresent causal even ts. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 45 F O( I D ). F O(ID) (also called ID-logic) (Denec ker and T er novsk a 20 07) ex tends clas - sical logic with inductive definitions. Similar to the way they appe a r in mathemat- ical texts, a n inductive definition is represented as a set of definitional rules, which are of the form ∀ x p ( t ) ← φ , where x is a tuple of v aria bles, φ is a first-o rder formula and p ( t ) an ato m. Such a definition defines all predicates in the head of the rules b y s im ultaneo us induction in terms of the other predicates, which a re called the op en predicates of the definition. This sy n tax offers a uniform wa y o f expr essing the mo st imp ortant forms of inductiv e definitions found in mathema tics , including monotone, transfinite and iterated inductiv e definitions, and inductive definitions ov er a well-founded o rder. F ormally , the semantics of such a definition is given by the w ell-founded semantics, which ha s b een shown to co rrectly for malise these forms of inductiv e definitions . T o b e mo re concrete, an interpretation I is a mo del of a definition D if it interprets the defined predica tes as the well-founded model of D extending the r e s triction of I to the op en symbols of D . Our r esults s how that finite pr op ositional definitions in F O(ID) are, both syntac- tically and semantically , identical to deterministic CP -theories. W e can therefor e view suc h a set of rules as b ot h an inductive definition and a description of a causa l pro cess. This relation betw een induction and causality may be remar k a ble, but it is not all that surpris ing. In essence, a n inductive definition defines a concept by describing how to c onst ru ct it. As such, an inductive definition also sp ecifies a con- struction pr o cess, and such pro cesses are basically causal in nature. Or to put it another wa y , an inductive definition is nothing more than a descr iption of a causal pro cess, that takes place not in the real w o r ld, but in the domain of ma themati- cal ob jects. This suggests that the ability of mathematicians and formal scientists in general to under stand inductiv e definitions is ro oted deeply in human c ommon sense , in particular our ability to under stand and reaso n a bo ut ca usation in the ph ys ic al world. 9 Related w ork In this section, we discuss some research that is related to our work on CP-logic. Roughly sp eaking, we can divide this into t wo different categ ories, namely , the related work that fo c us es mainly o n formalizing causa lity a nd that which focuses mainly on representing probabilistic knowledge. 9.1 Pe arl ’s c au sal mo dels Our w ork on CP-logic studies ca usality from a kno wledge representation p ersp ec- tive. As such, it is closely related to the work of Pearl (200 0). His work uses Bay esian net works and st ructur al mo dels as for mal to ols. In Section 7, we have already com- pared CP- logic to Ba yesian net works and showed that it offers c e rtain r epresenta- tional adv antages. A str uc tur al mo del is a s e t of equations, each of whic h defines the v alue of one ra ndom v a riable in terms o f the v a lues of a set of other r andom v a riables. F or each endogeno us random v ar iable, there is precisely one s uch defin- ing equation. As for Bayesian netw orks, we can say here that a CP -theory is more 46 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe detailed, since it represents individual causal laws, while a single s tr uctural mo del equation has to take into account al l of the random v aria bles that hav e a direct in- fluence on the v alue of the defined r andom v ar iable. Another similarit y to Bay esian net works is that structura l mo dels have to be acy clic as w ell, whic h means that, in this s ense, they are a ls o less g eneral than CP -logic. Apart from this, a lot dep ends of cours e o n the pa rticular form that these equations take, so ther e is not muc h more that can be s aid in general ab out this. Pearl’s w ork mainly fo cus e s on the b ehaviour of causal mo dels in the presence of in tervent ions . As we hav e shown in Section 7.5, it is poss ible to co nsider similar int er ventions in the context of CP-lo gic. Pearl uses in tervent ions for a num b er of int er e sting purp os e s, such as ha ndling counterfactuals. They hav e also b een used to define concepts suc h as “a ctual causes” (Halpern and Pearl 2 001a) and “expla- nations” (Halp ern and Pearl 2 001b). The explicitly dyna mic pro cesse s of CP-logic seem to o ffer an interesting setting in which to investigate these concepts as w ell. Indeed, in any par ticular branch of a n e xecution mo del of a CP -theory , every tr ue atom p is caused by at leas t o ne CP-law whose precondition φ was sa tisfied at the time when this ev ent happene d. It now seems sensible to call φ a n actua l cause o f p . An int er esting q uestion is to what exten t suc h a definition would coincide with the notion o f actual causation defined by Halp ern. How ever, we leav e these is sues for future work. 9.2 Causality in lo gic pr o gr ams In the ar ea of log ic pr ogra mming , many lang uages can b e found which expre ss some kind of (non-pro babilistic) causal laws. A typical example is McCain a nd T urner ’s c ausal the ories (McCain a nd T urner 1 996). A causal theory is a set of rules φ ⇐ ψ , where φ and ψ a re prop ositional formulas. The semantics of such a theory T is defined by a fixp oint criterio n. Concretely , an interpretation I is a mo del of T if I is the unique class ic al model o f the theory T I that consis ts of all φ , for which there is a rule φ ⇐ ψ in T such that I | = ψ . In CP-logic, w e assume that the domain is initia lly in a certain state, whic h then changes through a series o f events. This naturally leads to the kind of c o nstructive pro cesses that we have used to define the fo rmal semantics o f CP- logic. By contrast, according to McCain and T urner ’s fixpoint condition, a prop osition can hav e any truth v alue, as lo ng as there exists so me ca usal explanation for this tr uth v alue. This difference mainly manifests itself in tw o wa ys. First, in CP-lo gic, every endogeno us prop erty has a n initial truth v alue, which can only change as the res ult of an even t. As such, there is a fundamen tal asymmetry betw een falsity and truth, since only one of them r epresents the “natura l” state of the prop erty . F or McCa in & T urner, howev er, truth and fals it y a r e completely symmetric and b o th need to b e causally explained. As such, if the theory is to ha ve any mode ls , then, for every prop osition Q , there m ust always be a cause for either Q or ¬ Q . A sec o nd differ ence is that the constructive pro ces ses o f CP- logic r ule out a ny unfounded causality , i.e., it cannot b e the ca se that prop erties sp ontaneously cause CP-lo gic: A L anguage of Causal Pr ob abilistic Events 47 themselves. In McCain & T urner ’s theories, this “spo n taneo us g eneration” of prop- erties can o ccur. F o r instance, the CP - theory { Q ← Q } has {} as its (unique) mo del, whereas the causal theory { Q ⇐ Q } ha s { Q } as its (unique) mo del. As such, the direct re presentation of cyclic causal relations that is p ossible in CP-log ic (e.g., Example 2) ca nnot b e done in causal theories; instead, one has to use an en- co ding similar to the one needed in Bay esian netw orks (e.g ., Figure 12). In pr actice, the main adv ant a g e of McCain & T urner’s treatment of c ausal cycle s seems to b e that it offers a w ay of in tro ducing exoge no us atoms in to the language. Indeed, by including b oth Q ⇐ Q a nd ¬ Q ⇐ ¬ Q , one can express that Q can have any truth v a lue, witho ut this req uiring any further c ausal e x planation. O f cour se, CP-log ic has no need for suc h a mechanism, since w e make an explicit distinction b etw een exogenous a nd endo genous predica tes. It is interesting to observe that, given the relation b etw een logic prog ramming and causal theorie s pr oven in (McCain 1 997), this difference actually corresp onds to the difference b et ween the well-founded and completion semantics for logic pro grams. 9.3 A ction languages In Knowledge Repre s entation, a s ignificant amount of work has been on the topic of action langu ages , such as A , B and C (Gelfond and Lifschitz 1998). These langua g es hav e in commo n with CP-log ic that caus a lity plays a sig nificant r ole in their setting, and also that they ar e clos ely rela ted to log ic pr ogra mming (Lifschitz and T urner 1 999). Moreov er , there also exis t pro babilistic extensions o f these langauges (Ba r al et al. 2002). Action langua ges conceive the world as consisting of a system of causal laws, together with an external ag ent who can decide to a ct up on this system. Crucia lly , the agen t’s decisions are themselves not gov erned b y the system’s causal la ws. Therefore, if we cons ide r , e.g., Pearl’s framework, the c lo sest thing to this co nc e pt of an action would b e his notio n of an in terven tion. Indeed, it has b een shown in (T ra n and Ba ral 2004) that interv entions can be enco ded in the pro babilistic action language P AL (Bar al et a l. 2002) precisely as actions. In CP-logic, the most natural wa y of encoding an action w ould be, for the s ame reason, by means of an exoge nous predicate. While the behaviour of an age n t in an action language is no t determined by the state of the system, it is how ever t ypically restrained by it: certain ac tions are only av ailable in certain states. Neither of these tw o wa ys of enco ding actions (i.e., in P ea r l’s framework o r in ours) would b e able to directly repr e s ent such constraints. Action la nguages a llow the effects o f an action to b e spec ifie d by means of s o- called dynamic c ausal laws . In Pearl’s framework, the effect of an ac tion would b e given as part of the specifica tion of the in tervent ion that is perfo rmed. In CP-logic, such knowledge would take the form o f a CP-law whose bo dy contains the exoge- nous pr e dicate representing the action, and whose head contains some endogenous predicate that is affected by the actio n. Besides these dynamic causal la ws, there are also s t atic c ausal laws . These repre- sent the caus al laws that ar e ob eyed by the s y stem itself; once we k now the dir ect effects of the a gent’s actio ns, the static causal laws tell us how these propaga te 48 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe through the rest of the system. In Pearl’s fra mework, they would corr esp ond to the functional equation that were not interv ened with. In CP-log ic, they would b e CP-laws with endogenous predica tes in b o dy a nd head. As this brief discussion demonstrates, we could conceive of a proba bilistic action language in which the (dynamic and static) caus al laws are expre ssed in CP-log ic , while the a ctions and co nstraints thereon are defined in some o ther language. T o the b est of our knowledge, ho wever, all o f the appro a ches in curr ent literatur e use essentially a McC a in and T urner- style repres ent a tio n for the causal laws (see e.g. (Giunc higlia a nd Lifschitz 1998)). T o illustrate, let us co nsider the transmission in a car . This is a system consisting of a num ber of gea r wheels, which can b e connected in v a rious wa ys, such that turning one of the gea r wheels causes connected gea r wheels to turn also. This system can b e repr esented by a set of causa l laws—and can b e done so b etter in CP-logic than in McCain a nd T urner’s logic. The reason is that there exist cyclic causal rela tions b etw een co nnected gear wheels: if the engine is turning , this can cause the ca r to mov e, but vic e versa, if the car is moving, this can also cause the engine to turn (“engine braking”). As ex plained in Section 9.2, suc h c y clic c a usality is handled b etter in CP -logic. Note also that in the context of probabilistic planning , the McCa in a nd T urner s t yle re presentation poses the problem that lo ops in the causal laws (e.g., p ⇐ q and q ⇒ p ) lead to uncerta int y (e.g., p and q can either b oth be true o r b oth be false) that is not proba bilistically quantified. P AL, for instance, solves this problem by assuming that, in such a cas e, all p ossible states are equally likely . A CP -logic repr e sentation of the transmis s ion would probably use predicates such as Clutch and ShiftGe ar to refer to the actions av ailable to the driver. These w ould be exogeno us pr edicates, a nd we would not say anything more ab out them. An action language, ho wev er, could do more. It w ould also allow to express constraints on these actions, such as that shifting gea r s is o nly allow ed while the clutch is in op eration. In such a setting, w e would hav e enough informatio n to help the dr iver come up with a plan to , e.g ., put the car in to fourth gear without stalling, which cannot be done using just CP-logic by itself. 9.4 Pr ob abilistic languages Part of the motiv a tion b ehind CP - logic was to provide a proba bilistic logic pro- gramming language in whic h statements hav e a n in tuitive meaning that can easily be explained to domain exp er ts, without having to rely on a ny prio r knowledge of logic progra mming. T o this end, w e hav e developed, from scr atch, a forma liz a tion of the concept of a probabilis tic causal law; the resulting langua ge w a s then shown to be equiv a lent to the probabilistic logic progr amming construction o f LP ADs. This result allows us to in terpre t probabilis tic logic programs in a new wa y , namely as sets of pr obabilistic ca usal laws. T his do es not only hold for LP ADs themselves, but also for related languages , which w e will now discus s in more deta il. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 49 9.4.1 Indep endent Choi c e L o gic Indep endent Choic e L o gic (ICL) (Po ole 1997) b y Poo le is a proba bilistic extension of ab ductive logic prog ramming, that extends the e a rlier formalism o f Pr ob abili st ic Horn Ab duct ion (Poole 1993). An ICL theory consists of both a logica l and a pr ob- abilistic part. The logical par t is an acyclic lo gic prog ram. The probabilis tic part consists of a set of rules of the form (in CP-lo gic sy n tax ): ( h 1 : α 1 ) ∨ · · · ∨ ( h n : α n ) such that P n i =1 = 1. The a toms h i in such clauses are called ab ducibles . Eac h ab ducible ma y only a pp e ar o nc e in the pr obabilistic part of an ICL program; in the logical part of the prog r am, ab ducibles may only appea r in the bo dies of clauses. Synt a c tically spea king, each ICL theory is also CP-theor y . More over, the ICL semantics of such a theory (as formulated in, e.g., (Po ole 1997)) ca n easily be seen to co incide with o ur instance based semantics for LP ADs. As such, a n ICL the- ory can be seen as a CP-theor y in which every CP -law is either deterministic or unconditional. W e can als o translate certain LP ADs to ICL in a s tr aightforw ar d way . Concr etely , this can b e done for a cyclic LP ADs without exogenous predica tes, for which the bo dies of all CP-laws are conjunctions of liter als. Such a CP-law r of the form: ( h 1 : α 1 ) ∨ · · · ∨ ( h n : α n ) ← φ is then transformed into the set o f r ules:            h 1 ← φ ∧ C hoice r (1) . · · · h n ← φ ∧ C hoice r ( n ) . ( C hoice r (1) : α 1 ) ∨ · · · ∨ ( C hoi ce r ( n ) : α n ) .            The idea b ehind this tra nsformation is that every selection of the orig inal theory C corresp onds to precise ly one selection of the translation C ′ . More pr e cisely , if we denote b y C hoiceR ul e ( r ) the las t CP-law in the ab ov e translation o f a rule r , then a C -selection σ c o rresp onds to the C ′ -selection σ ′ , for which for a ll r ∈ C , σ ( r ) = ( h i : α i ) iff σ ′ ( C hoiceR ul e ( r )) = ( C h oice r ( i ) : α i ). It is quite obvious that this one-to-one corr esp ondence pre s erves both the probabilities of selections and the (r estrictions to the original vocabular y of the) w ell-founded models of the instances of selections. This suffices to show that the pro bability distribution defined by C co incides with the (restriction to the original v o cabular y of ) the probability distribution defined by C ′ . So, our result on the equiv a lence b etw een LP ADs a nd CP -logic shows that the t wo parts of an ICL theo ry c an b e understo o d as, resp ectively , a set of unco nditional probabilistic even ts and a set of deterministic causal even ts. In this sense, o ur w o r k offers a causa l interpretation for ICL. It is, in this resp ect, somewhat related to the work of Finzi et a l. on causality in ICL. In (Finzi a nd Luk asiewicz 200 3), these authors present a mapping o f ICL into Pearl’s structural mo dels a nd use this to derive a co ncept of actual causatio n for this logic, based o n the work by Halp e rn 50 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe (Halper n and Pearl 200 1a). This approach is, howev er, somewhat opposite to our s. Indeed, we view a CP-theor y , with its structur e ba sed on individua l probabilis tic causal laws, as a mor e fine-grained model of causality . T r ansforming a CP-theor y int o a structura l mo del actually loses information, in the sense that it is no t pos sible to recov er the o riginal structure o f the theory . F rom the p oint-of-view of CP -logic, the approa ch of Finzi et al. would therefore not make m uch sense, since it would attempt to define the concept o f a ctual ca usation in a more fine- grained model of causal information by means of a transition to a co arser one. 9.4.2 P-lo g P-log (Bara l et al. 20 0 4; Baral et al. 2008) is an extension of the langua ge of An- swer Set Prolo g with new constructs for representing probabilistic information. It is a so rted logic, which allows for the definition of attributes , which map tuples (of particular sorts) into a v alue (of a particula r sort). Tw o kinds of probabilis tic statements ar e considered. The first ar e ca lled r andom sele ction rules and a re of the form: [ r ] random ( A ( t ) : { x : P ( x ) } ) ← φ. Here, r is a name fo r the r ule , P is an unary bo olea n attribute, A is an a ttribute with t a vector of arguments of appropr iate sorts , and φ is a co lle ction o f so- called extended liter als 8 . The meaning of a statement of the ab ov e form is that, if the b o dy φ o f the rule is satisfied, the attribute A ( t ) is selected at ra ndom from the intersection of its domain with the se t o f all terms x for which P ( x ) holds (unless some deliberate action intervenes). The lab el r is a name for the experiment that p er forms this random selectio n. The choice of which v a lue will be assigned to this attribute is random and, by default, all p ossible v alues a r e considered equally likely . It is, ho wev er, po ssible to ov erride such a default, using the second kind of statements, called pr ob abilistic atoms . These are of the for m: pr r ( A ( t ) = y | c φ ) = α. Such a sta temen t should b e re ad as: if the v alue of A ( t ) is determined b y the exp eriment r , and if also φ holds, then the probability o f A ( t ) = y is α . The information e x pressed by a r andom selection rule and its asso ciated proba- bilistic atoms is somewha t similar to a CP-law, but stays closer to a Bay esia n net- work style representation. Indeed, it express e s that, under certain conditions , the v a lue of a c e r tain attribute will b e determined by some implicit random pr o cess, which pro duces each of a num b er of possible outcomes with a certa in pro bability . W e see that, as in B ay esian net works, ther e is no wa y o f directly represe n ting in- formation ab out the actual events tha t might take place; instead, only infor mation ab out the wa y in which they even tually affect the v alue of s o me attribute (or ra n- dom v aria ble, in Bay esia n netw or k terminolo gy) can b e incor p o rated. Ther efore, 8 An extended literal is either a classi cal l iteral or a classical literal preceded by the default negation not , where a classical literal is either an atom A ( t ) = t 0 or the classical negation ¬ A ( t ) = t 0 thereof. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 51 representing the kind of phenomena discussed in Section 7 —namely , cyclic causal relations and effects with a num be r of indep endent p os s ible causes—r equires the same kind of enco ding in P-log as in Bayesian net works. A second interesting difference is that a random -statement o f P -log represents a n exp eriment in which a v alue is selected from a dynamic set of alternatives, whereas , in CP-logic the set of p os sible outcomes is sp e c ified statically . Consider , for instance, a rob ot that leaves a ro om b y selecting at random one of the do ors that happ ens to be o p en. In P-log , this can ea sily b e wr itten down as: [ r ] random ( Leav e throu g h : { x : O pen door ( x ) } ) . In C P -logic, such a concis e representation is currently not p ossible. Apart from pr obabilistic statements, a P - log progra m can also contain a set of regular Answer Set Pr olog rules and a set of obse r v a tions and interven tions. The difference betw een observ a tions and in terven tions is the same as highlighted by Pearl, and (Ba ral and Hunsaker 200 7) shows that interven tions in P-log can b e used to p erform the sa me kind of counterfactual reasoning a s Pearl do es. One interesting difference, how ever, is that in P-log int er ven tions ar e actually repres e nted within the theor y , wher eas Pearl’s a pproach (as well as the one we presented in Section 7.5) views int er ven tions a s meta- manipulations of theories. In summary , the scop e of P- log is significa nt ly broa der than that of CP-logic and it is a more full-blown knowledge representation languag e than CP-logic, which is only aimed at expressing a s pec ific kind of pr obabilistic causa l laws. How ever, when it comes to representing just this kind of knowledge, CP- logic offers the same adv an tag es ov er P-log that it do es ov er Bay esian net works. 9.4.3 First-or der V ersions of Bayesian networks In this section, we discuss t wo appro aches that aim at lifting the pro p o sitional for- malism of Bay esian netw orks to a first-o rder repr esentation, namely Bay esian L o gic Pr o gr ams (BLPs) (Ker sting and De Raedt 2000) and R elational Bayesian Networks (RBNs) (Jaeger 1997). A Bay esia n Logic Pr ogra m or BLP c onsists of a set o f definite clauses, using the symbol “ | ” instead of “ ← ”, i.e., clauses of the form P ( t 0 ) | B 1 ( t 1 ) , . . . , B n ( t n ) . in which P and the B i ’s a re predica te sy mbols and the t j ’s are tuples of ter ms. F or every predicate symbol P , there is a doma in dom ( P ) o f p oss ible v alues. The meaning of such a pro gram is given b y a Bayesian net work, whose no des consist of all the atoms in the least Herbra nd mo del o f the progra m. The domain o f a no de for a gr ound atom P ( t ) is dom ( P ). F or every ground instantiation P ( t 0 ) | B 1 ( t 1 ) , . . . , B n ( t n ) of a clause in the pro gram, the netw or k contains an edge from each B i ( t i ) to P ( t 0 ), and these are the only edges that ex is t. T o complete the definition of this Bayesian netw ork, all the rele v a nt co nditional probabilities also need to be defined. T o this end, the user needs to sp ecify , for eac h clause in the pr ogram, a co nditional pro bability table, which defines the co nditional 52 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe probability of every v alue in dom ( P ), g iven an assignment of v alues to the atoms in the b o dy o f the cla use. Now, let us fir st assume tha t every gr ound atom in the Bayesian netw ork is an instantiation of the head of pr ecisely one clause in the progra m. In this case, the tables for the clauses suffice to deter mine the conditional probability tables of the netw or k, b ecause every no de can then s imply take its probability table from this unique clause. How ever, in g eneral, there migh t b e man y such clauses. T o also handle this case, the user needs to sp ecify , for each pr edicate symbol P , a so-calle d c ombination rule , which is a function that pro duce s a single probability from a multiset of proba bilities. The conditional probability table for a g round atom P ( t ) can then b e constr uc ted fro m the set o f all cla uses r , such that P ( t ) is an instan tiation o f head ( r ), b y finding the appr opriate en tries in the tables for a ll such clauses r and then applying the combination rule for P to the m ultiset of these v alues. Accor ding to the semantics of Bayesian Logic Pro g rams, this com bination rule will alwa ys b e applied, even when there exists only a single such r . This c ompletes the definition of BLPs as given in, e.g., (Kers ting and De Raedt 2 000). More r ecently , a n umber of issues with this formalism hav e led to the develop- men t of Log ical Bayesian Net works (Fierens et a l. 2005). These issues hav e also prompted the addition of so-ca lled “log ical a to ms” to the origina l B L P language (Kersting and De Raedt 2007). Since this do es not significantly affect any of the compariso ns made in this section, howev er, we will ig nore this extension. A R elational Bayesia n N etwork (Jaeger 1 997) is a Ba yesian netw ork in which the no des cor resp ond to predicate symbols and the domain of a no de for a predicate P /n consists of all possible interpretations of this predicate symbo l in some fixed domain D , i.e., all subsets of D n . The conditional pr obability distribution as so ciated to such a no de P is sp ecified b y a pr ob ability formula F p . F o r every tuple d ∈ D n , F p ( d ) defines the pr obability of d b elo ng ing to the interpretation of P in ter ms of probabilities of tuples d ′ belo nging to the in terpr etation of a predicate P ′ , wher e P ′ is e ither a pare n t of P in the graph o r ev en, under certain conditions, P itself. Such a probabilit y formula can contain a nu mber of different op era tions on proba bilities, including the applicatio n of arbitrar y combination rules . Such a Re la tional Bay esian Net work can also b e co mpiled in to a netw ork that is similar to that gener ated by a BLP , i.e., o ne in which the no de s corr esp ond to domain atoms instead of predica te symbols. The main adv antage of s uch a co mpiled netw ork is tha t it a llows more efficient inference. Again, the main difference b etw een these tw o fo rmalisms and CP -logic is that they b oth stick to the Bay esian netw ork s t yle o f modeling , in the se ns e tha t the actual even ts that determine the v a lues of the ra ndom v ariables ar e entirely ab- stracted awa y and only the resulting conditiona l proba bilities a re r etained. How- ever, throug h the use of, respectively , com bination r ules and probability for mulas, these can b e r epresented in a more str uctured manner than in a simple table. In this wa y , knowledge about, the under lying ca usal even ts can b e exploited to r e pre- sent the co nditional pr obability distributions in a concise w ay . T he most common example is pr obably the use of the n oisy-or to handle an effect which has a n umber of indep endent p ossible c auses. F or instance, let us consider the Russia n roulette CP-lo gic: A L anguage of Causal Pr ob abilistic Events 53 problem of E x ample 15. In a BLP , the rela tio n b etw een the guns firing a nd the play er’s death c ould be r epresented by the following clause: D eath | F ire ( X ) . F i re ( x ) = t F i re ( x ) = f D eath = t 1 / 6 0 D eath = f 5 / 6 1 Combination rule for D eath : noisy-or In Rela tional Bay esian Net works, this would b e repres e nted as follows: F Death = noisy-o r ( { 1 / 6 · F ir e ( x ) | x } ) Fire Death As such, combination r ules do allo w some knowledge a bo ut the e ven ts underlying the co nditional pr obabilities to b e incorp ora ted in to the model. How ever, this is o f course not the sa me as actually having a s tructured representation of the even ts themselves, as is offered b y CP-log ic. As a consequenc e of this, cyclic causal rela- tions, such as that of our P neu monia - Ang ina example, still need the same kind of enco ding as in a Bay esian netw or k. 9.4.4 Other app r o aches In this section, w e give a quick ov erview o f some other related languages . An im- po rtant class of pr obabilistic logic pro gramming formalisms a re tho se following the Know le dge Base d Mo del Const ru ction approach. Such formalisms a llow the representation of an entire “ class” of pro po sitional mo dels, from which, for a s pe - cific query , an appropriate mo del can then b e constr ucted “at run-time” . This approach w as initiated by Breese (Breese 1992) and Bacch us (Bacch us 199 3) and is follow ed b y b oth B ay esian Logic P rogr ams and Relational Bayesian Net works. Other formalism in this clas s are Pr ob abilistic Know le dge Bases of Ngo and Had- dawy (Ngo and Haddawy 19 97) and Pr ob abilistic R elational Mo dels of Geto or et al. (Geto o r et al. 2001). F rom the p oint of view of compariso n to CP-lo gic, bo th are very similar to Bayesian L o gic Programs (see, e.g., (Kersting and De Raedt 2001) for a comparison). The language used in the Pr o gr amming in St atistic al Mo deling sys tem (P RISM) (Sato and Kameya 1997) is very similar to Independent Choice Logic. Our com- men ts concerning the r e lation b etw een CP-log ic and Indepe nden t Choice Logic therefore carry ov er to PRISM. Like CP-logic, Many-V alue d Disjunctive L o gic Pr o gr ams (Luk asiewicz 2001) are also r elated to disjunctive logic progr a mming. How ever, in this language, proba- bilities are a sso ciated with dis junctive clauses a s a whole . In this w ay , uncerta in ty of the implicatio n itself—and not, as is the case with LP ADs or C P -logic, o f the disjuncts in the head—is expres s ed. 54 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe All the works mentioned s o far use p oint probabilities. T he r e are howev er als o a nu mber of forma lis ms us ing pro bability interv als: Pr ob abilistic L o gic Pr o gr ams of Ng and Subr ahmanian (Ng and Subrahma nian 1992), their extension to Hybrid Pr ob a- bilistic Pr o gr ams of Dekht yar and Subrahmanian (Dekh ty ar and Subrahmania n 2000) and Pr ob abilistic De ductive Datab ases of Lakshmanan and Sa dri (Lakshmanan and Sadri 1994). Contrary to our appr oach, prog rams in these formalisms do not define a sin g l e prob- ability distribution, but rather a set of p oss ible proba bilit y distributions, whic h al- lows one to express a k ind of “meta-uncer ta int y”, i.e., uncertaint y a bo ut which prob- ability distributio n is the “right” one. Mor eov er, the techniques used by these for- malisms tend to have more in common with constraint logic progr amming than stan- dard lo g ic progra mming . The more rec ent formalism of CLP (BN) (Santos Costa et al. 2003) belo ngs to this class. W e also wan t to ment ion S to chastic L o gic Pr o gr ams of Muggleton and Cussens (Cussens 2000; Muggleton 2000), which is a proba bilis tic extension o f Pr olog. In this formalism, proba bilities are attached to the selection of clauses in Prolog ’s SLD- resolution algorithm, which basic a lly res ults in a fir st-order version of sto chastic context free gr ammars. Because of this formalism’s stro ng ties to the pro cedural asp ects of Pr olog, it app ea rs to b e q uite differen t from CP-logic and indeed all of the other formalisms ment io ne d here. Pr obL o g (De Raedt et a l. 2007) is a mor e recent pro babilistic extension of pure Prolog . Here, to o, every claus e is labeled with a pro ba bilit y . The semantics o f ProbLo g is v er y similar to that of LP ADs and, in fact, the sema n tics of a gr ound ProbLo g progr am co inc ides c ompletely with that of the cor resp onding LP AD. More precisely put, a Pro bLo g rule of the for m: α : h ← b 1 , . . . , b n , where h and the b i are ground atoms is ent ir e ly equiv ale nt to the LP AD rule: ( h : α ) ← b 1 , . . . , b n . F or non-g round programs , how ever, ther e is a difference. The sema n tics of an LP AD first g r ounds the entire prog r am and then pro babilistically selects instantiations o f the r ules of this ground progr am. In P robLog , on the other hand, selections directly pick out r ules of the o riginal pro gram. This means that, for instance , the following ProbLo g -rule: 0 . 8 : l ik es ( X , Y ) ← l ik es ( X , Z ) , l i k es ( Z , Y ) , sp ecifies that, with pro bability 0 . 8, the li k es -r elation is entirely tra nsitive, whereas the corr e sp o nding LP AD-rule would mean that for all individual s a, b and c , the fact that a likes b and b likes c caus es a to lik e c with pro ba bilit y 0 . 8. 10 Conclusions and future work Causality has an inherent dynamic aspec t, which can b e captured at the semantical level by the pr obability tree framework that Shafer ha s dev elo ped in (Shafer 199 6). He c o ncludes this b o ok with the following o bserv ation: CP-lo gic: A L anguage of Causal Pr ob abilistic Events 55 When we think of a Bay es net as a represen tation of a probabilit y tree, we sometimes ma y also wan t to leav e indeterminate orderings that are n ot imp osed by arrow s in the graph, so that th e net can b e thought of as a rep resen tation not of a single tree but of a class of trees, corresponding to differen t c hoices for these orderings. The possibility of introducing indeterminacy in the ordering of judgements is ob viously equally presen t in the typ e-theoretical represen tation. [...] [W]e can think of [a larg e collection of partially ordered judgemen ts] as a set of rules from which martingale trees can b e constructed. [...] More abstractly , they can be thought of as causal la ws, and we can imagine many problems of delib eration b eing p osed and solved directly in terms of these causal laws, without the sp ecification of a martingale tree. Thus type theory can tak e us b eyond probability trees to a more general framew ork for causal d eliberation. This pap er can b e seen as an ex tension of Shafer’s work in the directio n po inted at by the above remar ks. W e hav e developed a logical languag e which uses pr ob abilistic c ausal laws to co ncisely represent cla sses o f probability trees. Our re presentation do es not explicitly imp ose any order on the p ossible event s, since we move aw ay from a representation in whic h it is the o utco me of previous even ts that caus es a new ev ent, to one in which new even ts are caused b y prop erties of the current state of the domain. E ven though this means tha t the probability trees corresp onding to a given set o f causal la ws might be consider ably different fro m one another, we prove that they will a ll still gener ate the same probability distribution ov er their fina l states. Therefore, such ca usal laws capture precis ely those prop e r ties of probability trees that we need to answer questions ab out the probabilities in these final states, which is what we are typically interested in. A first contribution of this pap er is the languag e CP-logic itself. This language allows to repr esent a pr obability distribution over p o ssible states o f a domain by an en umera tion of the pro ba bilistic causal la ws accor ding to which it is gener ated. As w e tried to show by , a mong o thers, the compar ison to Bay esian net works, this representation is often natural and concise . A second contribution is that w e ha ve shown that CP-logic can b e e quiv a lent ly defined as a probabilistic logic progra m- ming language . Becaus e bo th the mea ning of statements in CP-log ic, a s well as their formal sema nt ics, ca n b e completely explained in terms of intuitions ab out probabilistic causal laws, this formal equiv a lence offers a new wa y of (informally ) explaining the meaning o f probabilistic lo gic progr ams. This is a useful contribution to the existing mo deling metho dolog y for such languages . By relating ca usality and logic pr o gramming in this wa y , our pap er also serves as a unifying semantic study o f existing probabilistic and non-probabilistics logics and formalisms. W e show ed ho w CP-lo gic refines causa l Bayesian net works and several logics bas ed o n them. W e also elab o rated on the links b etw een CP-logic and ex ist- ing lo gic pr ogra mming extensions s uch a s ICL, PRISM and LP ADs, th us showing that these logic s can als o b e viewed as causal probabilistic logics . F or example, a theory in ICL can b e under s to o d as a combination of deterministic causa l even ts and unconditiona l probabilistic ev ents. As for logic programming itself, w e show ed that CP-lo gic induces a c a usal view on this for ma lism, in which rules repr esent de- terministic causa l even ts. W e a lso a r gued that this view bas ic ally coincides with the view of logic pr ograms as inductive definitions. T o b e more c oncrete, we hav e shown that a normal logic progr am under the well-founded semantics can be understo o d 56 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe as a set of deterministic causal statemen ts and we ha ve pres ent ed an a lternative se- mantics for disjunctive lo gic pr ogra ms (simila r to that of (Sak ama and I no ue 1994)) under whic h these ca n b e interpreted as sets of non-deterministic ca usal events. This pap er is primarily in tended to sho w ho w the co ncept of a probabilistic causal law can b e formalized in a logical language, and to demonstrate the close relation o f such a language to pro babilistic logic progr ams. Because of this, we hav e int entionally kept our langua ge quite simple. As b ecame a ppa rent in the c o mparison with other lo gics, such as P-lo g, CP-log ic therefor e lacks the expressivity to b e truly useful for a broad class of applications. T o make it more suitable for practical purp oses, it sho uld therefore be improv ed in a num ber of wa ys. W e see the following opp ortunities for future resea rch. R efinement of CP-lo gic. The curr ent langua ge of CP-logic is restricted in a num b er of wa ys. First, it only allows a finite num b er of CP-laws. Let us cons ider, for in- stance, a die that is rolled as long as it ta kes to obtain a six. Here, there is no upper bo und on the num b er of thr ows that might be needed and, therefore, this example can currently not b e r epresented in CP-lo g ic. Second, CP-logic is also limited in its representation of the effects of an even t. F o r instance, it is not po ssible to dire c tly represent even ts whose range of po ssible outcomes is not completely fixed b efor e - hand. Also , we curr ently do no t allow different events to ca ncel o ut or re inforce each other’s e ffects. Third, a nd somewhat related to the prev ious point, CP-log ic currently can only handle prop erties that a re either fully present or fully a bsent. As such, it c a nnot correctly repres e nt causes whic h hav e only a contributory effect, e.g., turning on a tap w ould not instantaneously cause a ba s in to b e full, but only contribute a certain amount p er time unit. Inte gr ation into a lar ger formalism. T o correctly for malise a domain in CP- lo gic, a user must exactly know the causes and effects of a ll relev ant event s that might happ en. F o r rea l domains of any significant size, this is an unr ealistic ass umption. Indeed, typically , one will only have suc h deta iled k nowledge about certa in parts of a do main. So, in or der to still b e a ble to use CP-log ic in such a setting, it would hav e to be integrated with other for ms of knowledge. There are so me o bvious candi- dates for this: statements ab out the probabilities o f certain prop erties, s ta temen ts ab out probabilis tic indep endencies (suc h as those in Bay esian ne tw o r ks), a nd con- straints on the p ossible states of the do main. Integrating these different forms of knowledge without losing c onceptual clarit y is o ne o f the main c halle ng es for future work re g arding CP-logic, and p er haps even for the area of uncertaint y in ar tificial int ellige nce as a whole. Infer enc e. The most obvious inference task in the context of CP -logic is c a lculating the probability π C ( φ ) of a form ula φ . A straightforw ar d way of doing this w o uld b e to exploit the rela tion b etw een CP-log ic and (pro babilistic) logic pr ogra mming, such that we perfo r m these co mputations by reusing existing algorithms (e.g., the infer- ence alg orithm of Po ole’s indep endent choice log ic (Poole 199 7)) in an appropria te wa y . A more adv anced technique, us ing binary decision diagra ms, has rec e n tly been CP-lo gic: A L anguage of Causal Pr ob abilistic Events 57 developed in (Riguzzi 20 07). Another interesting inference task concer ns the con- struction of a theory in CP-log ic. F o r proba bilistic mo deling languages in general, it is typically not desir able that a user is forced to e stimate or c ompute concr e te probability v a lues herself; instead, it should b e p ossible to automa tically der ive these from a given data set. F or CP- lo gic, there a lr eady exist algorithms that ar e able to do this in certain restricted cases (Riguzzi 2004; Blo ckeel and Meert 2007). It would b e in teresting to generalize these, in o rder to ma ke them g e nerally appli- cable. Besides such lea rning o f probabilistic par a meters, it is a lso p ossible to lear n the s tructure of the theory itse lf. This too is an imp ortant topic, b ecause if we are able to construct the theory that b est describ es a given da ta set, we a re in effect finding out which causal mechanisms are most lik ely present in this data. Such information can b e relev ant for many domains . F or insta nce, when bio-infor matics attempts to distinguish active from non-active comp o unds, this is exactly the kind of infor mation that is needed. In (Meert et al. 2 0 07), it is discus sed how certain Bay esian netw ork lea rning techniques c an b e adapted to p erform structure lear ning for ground CP-logic. Ac kno wledge men ts This res e a rch was suppo rted by GOA 2003/8 Inductive Knowledge Bases and by FW O Vlaander en. Jo ost V ennekens is a postdo cto ral researcher of the FW O. App endix A Pro ofs o f the theorem s In this section, we pr esent pro ofs o f the theorems that were sta ted in the previous section. T o ea se notation, w e will assume that there are no exog enous predicates. This can b e done without loss of generality , since all our r esults ca n simply b e relativized with resp ect to some fixed interpretation for these predicates. A.1 The semantics is wel l -define d W e start by proving tha t the sema ntics of CP -logic—and in particular, the partia l int er pr etation ν s , the p otential in s , used in the additional condition imp osed by Definition 11 for handling neg a tion—is indeed well-defined. Since we defined ν s as the unique limit of a ll terminal h yp othetical deriv ation sequences of s , this requires us to show that all such seq uences indeed end up in the same limit (Theorem 4). Let us co nsider a CP-theory C and state s in an execution mo del of C . W e will denote by R ( s ) the set of all CP-laws r ∈ C that ha ve no t yet happ ened in s , i.e., for which there is no ancestor s ′ of s with E ( s ′ ) = r . Consider the colle c tion O s of all partial in terpr etions ν such that for ea ch atom p , p ν = t iff p I ( s ) = t , and for each rule r ∈ R ( s ), if body ( r ) ν 6 = f , then for each atom p ∈ head At ( r ), p ν 6 = f . Stated differen tly , ν can be obtained fro m I ( s ) b y turning false atoms of I ( s ) into unknown atoms in such a w ay that if the bo dy of s ome r ule r ∈ R ( s ) is unknown or true in ν , then each of its he a d atoms is unknown or true in ν as well. 58 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe Pr op osition 1 Let ( ν i ) 0 ≤ i ≤ n be a hypothetica l deriv a tion sequence in state s . • F or each 0 ≤ i ≤ n a nd each ν ∈ O s it holds that ν ≤ p ν i . • The limit ν n = ν s is an element of O s . Pr o of The first pr op erty can be prov en b y a straigthforward induction. Clearly , it ho lds that ν ≤ p ν 0 = I ( s ). Assume ν ≤ p ν i for some i < n . The tr ue atoms of ν and ν i +1 are tho se of I ( s ), so they a re the same. There fo re, it s uffices to s how that every atom p that is false in ν is a lso fals e in ν i +1 , or, since ν and ν i +1 hav e the same true atoms, that every such p is no t unk nown in ν i +1 . Assume tow ar ds c o ntradiction tha t p is false in ν and unknown in ν i +1 . By the induction hypothesis, p is still fals e in ν i . Therefor e, p b elo ngs to the head o f some r ule r ∈ R ( s ) such that body ( r ) ν i 6 = f . Since ν ≤ p ν i , this w ould imply that b ody ( r ) ν 6 = f , which, given that ν ∈ O s , le ads to the co ntradiction that p ν 6 = f . Hence, p is false in ν i +1 . It follo ws that ν ≤ p ν i +1 . As for the second prop er ty , it is clear that ν s can b e o btained fro m I ( s ) by turning some false atoms into unknown atoms, and that there ar e no more rules r ∈ R ( s ) with a non-false b o dy and false atoms in the head w.r.t. ν s . Hence, ν s ∈ O s . W e ca n now use this s et O s to characterize the limit ν n of any hypothetical deriv ation sequence ( ν i ) 0 ≤ i ≤ n in s . The or em 10 Let ( ν i ) 0 ≤ i ≤ n be a hypothetical deriv a tion sequence in s and let ν be the least upper b o und of O s w.r.t. the precision order ≤ p . Then ν n = ν . Pr o of It is obvious tha t ν itself als o b elongs to O s . Therefore, by the first bullet of P r op o- sition 1, ν ≤ p ν n . Because, b y the second bullet of Prop ositio n 1, ν n also be longs to O s , we hav e that ν ≥ p ν n as well. Since this theorem shows that all h yp othetical deriv a tion sequences con verge to the most precise element of O s , it implies Theorem 4 and, therefor e, our sema n tics is indeed well defined. A.2 CP-lo gic and LP ADs ar e e quivalent Let C be an LP AD. Let us define a p artial C -sele ction as a pa rtial function σ from C mapping r ules r of a subs e t dom ( σ ) ⊆ C to pairs ( p : α ) ∈ head ∗ ( r ). The probability function o f selections ca n b e extended to pa rtial se le ctions by setting P ( σ ) = Q r ∈ dom ( σ ) σ α ( r ). Define als o S ( σ ) as the set of C -se lections that extend σ . The following equation is obvious: P ( σ ) = X σ ′ ∈ S ( σ ) P ( σ ′ ) CP-lo gic: A L anguage of Causal Pr ob abilistic Events 59 W e define an inst anc e of σ as any instance C σ ′ in which σ ′ is a C -selection that extends σ . Let T b e a n ex ecution mo del of C . Cle a rly , each no de s in T determines a unique partial C -selection, denoted σ ( s ). F o rmally , if ( s i ) 0 ≤ i ≤ n is the path from the ro ot to s , then the domain of σ ( s ) is {E ( s i ) | 0 ≤ i < n } and each rule r = E ( s i ) in its domain is mapp ed to the atom p ∈ hea d ∗ ( r ) that was selected for s i +1 . Moreover, we hav e P ( s ) = P ( σ ( s )) = X σ ′ ∈ S ( σ ( s )) P ( σ ′ ) . (A1) With the path ( s i ) 0 ≤ i ≤ n from the ro ot to some no de s , w e now also asso ciate a sequence of partial interpretations ( K j ) 2 n +1 j =0 defined as follows: • K 0 = ⊥ , the par tial in terpr etation mapping all a toms to u . • K 2 i +1 = ν s i , for all 0 ≤ i ≤ n . • K 2 i +2 = ν s i [ p : t ], for all 0 ≤ i < n , whe r e p is the hea d a tom of E ( s i ) selec ted to obtain s i +1 . Pr op osition 2 F or ea ch σ ∈ σ ( s ), ( K j ) 2 n +1 j =0 is a well-founded induction of C σ . Pr o of The pro of is by induction on the length n o f the path fr om the ro ot of T to s . W e star t by pro ving that ( K j ) 2 n j =0 is a well-founded induction of all instanc e s C σ with σ ∈ S ( σ ( s )). If n = 0, then s is the ro ot of the tree and σ ( s ) is the e mpt y partial selection. The seq uence ( K 0 ) is obviously a well-founded induction of a ny instance C σ . F o r n > 0 , the induction hypothesis sta tes that ( K j ) 2 n − 1 j =0 is a well- founded induction o f a ll instance s C σ , where σ belo ngs to S ( σ ( s n − 1 )). Let r b e E ( s n − 1 ), the rule selec ted in s n − 1 , and let K 2 n = K 2 n − 1 [ p : t ] where p was selected in the head o f r to obtain s . Hence, body ( r ) is true in K 2 n − 1 = ν s n − 1 . Clearly , for each σ ∈ S ( σ ( s )), C σ contains the r ule p ← body ( r ). Co nsequently , ( K j ) 2 n j =0 is a well-founded induction of C σ . Next, we prov e that ( K j ) 2 n +1 j =0 is a well-founded induction o f all C σ with σ ∈ S ( σ ( s )). Let us inv estiga te the set U o f all atoms q such that K 2 n ( q ) 6 = K 2 n +1 ( q ). W e will pr ove that all atoms of U are unknown in K 2 n and false in K 2 n +1 and that U is an unfounded set of C σ . It then will follow that ( K j ) 2 n +1 j =0 is a well-founded induction of C σ . Let us first v erify that all atoms in U are unknown in K 2 n and false in K 2 n +1 . If n = 0, then K 0 = ν 0 = K 1 , so U = {} and the statement trivia lly holds. Let n > 0. Recall that K 2 n is ν s n − 1 [ p : t ], where p is the atom selec ted in the head of E ( s n − 1 ) to o btain s , a nd K 2 n +1 = ν s . It is easy to se e that the tr ue atoms o f K 2 n and K 2 n +1 are identical to those tr ue in I ( s ). Hence, K 2 n and K 2 n +1 only differ on false or unknown ato ms. T o show that U co ntains only atoms that are unknown in K 2 n and false in K 2 n +1 , it therefore suffices to show that all atoms fals e in K 2 n are also fals e in K 2 n +1 . T o prove this, it suffices to show that K 2 n ∈ O s . Indeed, if K 2 n ∈ O s , Prop os ition 1 entails that ν s ≥ p K 2 n and hence, all atoms fals e in K 2 n are false in ν s = K 2 n +1 . 60 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe W e observe that, since ν s n − 1 belo ngs to O s n − 1 (Prop ositio n 1), all head a toms of rules r ∈ R ( s n − 1 ) with a non-false bo dy in ν s n − 1 , are true o r unknown in ν s n − 1 . In par ticular, E ( s n − 1 ) ∈ R ( s n − 1 ) and has a true b o dy in ν s n − 1 , hence p is true or unknown in ν s n − 1 . It follows that: ν s n − 1 ≤ p ν s n − 1 [ p : t ] = K 2 n . It follows that any rule r ∈ R ( s ) ⊆ R ( s n − 1 ) with a non-false b o dy in K 2 n , has a non-fals e b o dy in ν s n − 1 ; hence, all a toms in the head of such an r a re true or unknown in ν s n − 1 and, a fortiori , in K 2 n = ν s n − 1 [ p : t ]. T hus, we obtain that K 2 n ∈ O s , a s desired. So far, we hav e pr oven that K 2 n +1 = K 2 n [ U : f ] and that all elements in U are unknown in K 2 n . It follows that K 2 n ≤ p K 2 n +1 and, mor e g enerally , that K j ≤ p K 2 n +1 , for all j ≤ 2 n . All that rema ins to b e shown is that U is an unfounded set of eac h instance of σ ( s ). Le t C ′ be such an instanc e and for a ny atom q ∈ U , let q ← ϕ b e a rule of C ′ . W e need to show that ϕ is false in K 2 n +1 . The r ule is o btained as an instance of some rule r ∈ C with q in its head. The r ule r is not one of the rules E ( s i ) with i < n , since otherwise q would be true in I ( s j ) for all j > i and, in pa r ticular, also in ν s n = K 2 n +1 , which would contradict the fact that we hav e alrea dy shown q to b e fa ls e in K 2 n +1 . It follows that r ∈ R ( s ). Since K 2 n +1 = ν s ∈ O s and q is fals e in ν s , body ( r ) = ϕ is false in K 2 n +1 . Pr op osition 3 F or each leaf l of an execution mo del T of C , I ( l ) is the well-founded mo del of eac h instance C σ with σ ∈ S ( σ ( l )). Pr o of Let l b e a leaf and σ ∈ S ( σ ( l )). By P rop osition 2, ( K j ) 2 n +1 j =0 is a well-founded induction of C σ . Because l is a leaf, we have that for every rule r ∈ R ( l ), body ( r ) is false in I ( l ). Therefor e, I ( l ) ∈ O l and P r op osition 1 states that ν l ≥ p I ( l ). Ho wever, bec ause I ( l ) is tw o- v a lue d, this implies tha t ν l = I l . Ther efore, K 2 n +1 = ν l is a total interpretation. Becaus e a well-founded induction with a tota l limit is terminal, I ( l ) is the well-founded mo del of C σ . This no w allows us to prov e the desired equiv alenc e , whic h was previously stated as Theorem 9. The or em 11 Let T be a n execution mo del of a CP-theory C . F or each interpretation J , µ C ( J ) = π T ( J ) . Pr o of Given an execution mo del T of a CP-theory C (Def. 11), w e asso ciate to each node s o f T the set S ( σ ( s )) of a ll thos e C -selections σ (Def. 17) tha t extend σ ( s ). It is easy to see that, with L T the set of a ll leav es of T , the clas s { S ( σ ( l )) | l ∈ L T } CP-lo gic: A L anguage of Causal Pr ob abilistic Events 61 is a partition of the set S C of all selections . Let L T ( J ) b e the set of all leav es l of T for which I ( l ) = J , and let S el s ( J ) b e the set of selections σ such tha t W F M ( C σ ) = J . B ecause for each leaf l , the w ell-fo unded mo del of a selection σ ∈ S ( σ ( l )) for is I ( l ) (Pr op osition 3), the class { S ( σ ( l )) | l ∈ L T ( J ) } is a partition of the collection S el s ( J ) . This now allows us to derive the following equation: µ C ( J ) = P σ ∈ S els ( J ) P ( σ ) = P l ∈ L T ( J ) P σ ∈ S ( σ ( l )) P ( σ ) (Def. 19) = P l ∈ L T ( J ) P ( l ) (see equation (A1)) = π T ( J ) . F or any execution model T of C , this theore m now c harac terizes the pr obability distribution π T in a way that dep ends only o n C and no t on T itself. It follows that, indee d, fo r all execution mo dels T and T ′ of C , π T = π T ′ , which means that we hav e now also prov en Theorem 6 (and, therefore, Theorem 2 as well). A.3 Exe cution mo dels that fol low the timing In this section, we will prov e Theorem 7, which states that every stratified CP - theory ha s an executio n mo del which follows its stratifica tion. Recall that a CP- theory is stratified if it str ic tly respects some timing λ (i.e., for all h ∈ h ead At ( r ) and b ∈ body + At ( r ), λ ( h ) ≥ λ ( b ) and for all h ∈ head At ( r ) and b ∈ body − At ( r ), λ ( h ) > λ ( b )). As we did in Definition 9 of Section 4, we will again introduce a e ven t- timing κ o f λ (i.e., κ maps r ules to time p oints in such a way that λ ( h ) ≥ κ ( r ) ≥ λ ( b ) for all h ∈ head At ( r ) and b ∈ body At ( r )). Moreov er, we a s sume that κ is suc h that for all b ∈ body − At ( r ), κ ( r ) > λ ( b ). It can ea sily b e seen that for a n y stratified theor y C , it is always po ssible to find such a κ . Our goal is now to sho w that, fir st, all w eak execution models that follow κ (Def. 9) also satisfy tempo ral precedence and, second, that such a pro cess indeed exists. Let us star t b y making some general obse rv ations ab out any weak execution mo del T tha t follows κ . F o r any descendant s ′ of a no de s of T , it is, by definition, the case that κ ( E ( s ′ )) ≥ κ ( E ( s )). Beca use ev ery even t r can only a ffect the truth v a lue of atoms with timing ≥ κ ( r ), it must be the case that, for each rule r with timing < κ ( E ( s )), body ( r ) I ( s ) = body ( r ) I ( s ′ ) . Supp ose now that for suc h an r it would b e the cas e that r ∈ R ( s ) and I ( s ) | = body ( r ), i.e., r is an even t that could als o hav e happ ened in s . In this case, body ( r ) would remain sa tisfied in all descendants of s , up to a nd including each leaf l that migh t b e reached. How ever, it is imp ossible that r actua lly happ ens in so me descendant s ′ of s , since that w ould violate the constraint tha t κ ( E ( s ′ )) ≥ κ ( E ( s )). So, it w ould b e the case that r ∈ R ( l ) and I ( l ) | = body ( r ), which would co nt ra dict the fact that l is a leaf. W e conclude that such a n r cannot exist, i.e., for each s , it m ust b e the case that E ( s ) is a rule with minimal timing among all rules r ∈ R ( s ) for whic h I ( s ) | = b ody ( r ). Let us now assume that we non-deterministically construct a probabilis tic Σ- pro cess T as follows: • W e star t with o nly a ro ot s , with I ( s ) = {} ; 62 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe • As long as one exists, we select a leaf s of our curr ent tree, for which the set of rule s r ∈ R ( s ) suc h that I ( s ) | = body ( r ) is non-empty . W e then extend T by executing one of the rules whose timing is minimal in this set. As shown in the previous parag raph, all weak exec ution mo dels tha t follow κ can b e co nstructed in this way . Conv er sely , each pro cess T that w e can construct in this wa y can easily b e seen to also b e a weak execution mo del. Mor e ov er, it is aga in ea sy to see that for a ll descendants s ′ of s of such a T and each rule r with timing < κ ( E ( s )), body ( r ) I ( s ) = body ( r ) I ( s ′ ) . Ther efore, as we go along any particular branch o f T , the minimum timing of all rules with true b o dy c a n only increase, which means that eac h pro cess constructed in the ab ov e wa y must follow κ . So, this pr ovides an a lternative, c o nstructive character iz ation of the set of a ll weak execution models that follow κ . An immediate consequence is that there exist such pr o cesses. The r efore, it now suffices to show that all these pro c esses als o satisfy tempo ral precedence. Pr op osition 4 Each weak execution mo del T that follows the timing κ also satisfies temp oral precedence and is, therefore , an execution mo del. Pr o of W e need to show that, for eac h no de s o f T , ν s ( body ( E ( s )) = t . In genera l, applying an ev ent with timing ≥ i during a hypo thetical deriv atio n sequence o nly mo difies atoms with timing ≥ i , and hence , can only mo dify the truth v alue o f b o dies of even ts with timing ≥ i . Because, in the first step ν 0 of a sequence cons tructing ν s , the only even ts that can b e used a re tho se r ∈ R ( s ) for whic h I ( s ) | = body ( r ) and we know that the time of E ( s ) is minima l among these e ven ts, we conclude that I ( s ) and ν s coincide on all atoms p with timing λ ( p ) < i . Because C strictly resp ects λ , all atoms p ∈ b ody − At ( r ) therefor e have the s ame tr uth v alue in ν s as in I ( s ). Mor eov er, I ( s ) ≤ t ν s , so, in particular , for all a toms p ∈ body + At ( r ), I ( s )( p ) ≤ t ν s ( p ). By a well-kno wn monotonicity prop erty o f three-v alued logic, t = body ( r ) I ( s ) ≤ t body ( r ) ν s . Hence , body ( r ) ν s is indeed t . This co ncludes our pro o f of Theorem 7. Since this theorem clearly generalizes Theorem 3, we hav e no w proven all theorems stated in this pap er. References Bacchus, F. 1993. Using first-order probability logic for the construction of Ba yesian net- w orks. In Pr o c e e dings of the Sixth Confer enc e on Unc ertainty i n Artificial Intel li genc e, UAI’93 . 219–226. Baral, C. , Gelfond, M. , and Rushton, N. 2004. Probabilistic reasoning with an swer sets. In Pr o c. of the 7th I nternational Confer enc e on L o gic Pr o gr amming and Nonmono- tonic R e asoning (LPNMR-7) . Lectu re notes in artifical intel ligence ( LNAI), vol. 2923. Springer-Verlag, 21–33. Baral, C. , Gelfond, M. , and Rushton, N. 2008. Probabilistic reasoning with an swer sets. The ory and Pr actic e of L o gic Pr o gr amming . to app ear. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 63 Baral, C. and Hunsaker, M. 2007. Using the probabilistic logic programming language P-log for causal and counterfactual reasoning and non-n aive conditioning. In Pr o c e e dings of IJCAI . Baral, C. , T ran, N. , and Tua n, L. 2002. Reasoning ab out actions in a p rob ab ilistic setting. In AAAI . Blockeel, H. and Meer t, W. 2007. T ow ards lea rning non-recursive LP ADs by trans- forming them into Ba yesian netw orks. In Inductive L o gic Pr o gr amming, ILP’06, R evise d Sele cte d Pap ers . Lecture Notes in Computer Science, vol. 4455. 94–108. Breese, J. 1992. Co nstru ct ion of b elief and decision netw orks. Computational intel l i- genc e 8, 4, 624–647 . Comley, J . W. and Do we, D. L. 2003. General Ba yesian n etw orks and asymmetric languages. In Pr o c e e dings of the Se c ond Hawaii International C onfer enc e on Statistics and R elate d fields . Cussens, J. 2000. Stochastic logic programs: Samp ling, inference and applications. In Pr o c e e dings of the Sixte enth A nnual Confer enc e on Unc ertainty i n Ar tificial Intel ligenc e . Morgan Kaufmann, 115–12 2. De Finetti, B. 1937. La p rvision: ses lois logiques, ses sources sub jectiv es. Annales de l’Institut Henri Poinc ar´ e 7 , 1–68. De Ra edt, L. , Kim mig, A. , a nd Toivo ne n, H. 200 7. ProbLog: A probabilistic Prolog and its app lication in link d isco very . In Pr o c e e dings of the 20th International Joi nt Confer enc e on Artificial Intel ligenc e, IJCAI’07 . 2462–246 7. Dekhty ar, A. and Subrahmanian , V. S. 2000. Hybrid probabilistic p rograms. Journal of L o gic Pr o gr ammi ng 43, 3, 187–250. Denecker, M. 1998. The well-founded semantics is the principle of inductive defi nition. In L o gics in Artificial Intel ligenc e (JELIA’98) , J. Dix, L. F ari˜ nas del Cerro, and U. F u r- bach, Eds. Lecture Notes in Artificial Intelligence, vol . 1489. Springer-V erlag, 1–16. Denecker, M. and Terno vska, E. 2007. A logic of n on-monotone inductive defi n itions. T r ansactions On Computational L o gic (TOCL) . Denecker, M. and Venne kens, J. 20 07. Well-founded semantics and the algebraic theory of non-monotone inductive definitions. I n Lo gi c Pr o gr amming and Nonmonotonic Re asoning, 9th International Conf er enc e, LPNM R 2007, Pr o c e e dings . Lecture Notes in Artificial I ntell igence, vol. 4483. S pringer, 84–96. Fierens, D. , Blockeel, H. , Br uynooghe, M. , and Ramon, J. 2005 . Logical Ba yesian netw orks and th eir relation to other p robabilistic logical mo dels. In Pr o c e e dings of the 15th International Confer enc e on I nductive Lo gic Pr o gr amm ing, ILP’05 . Lectu re Notes in Computer Science, vol. 3625 . Sp ringer, 121–1 35. Finzi, A. and Lukasiewicz, T. 2003. Structu re- based causes and explanations in th e in- dep endent choice logic. In Pr o c e e dings of the 19h Confer enc e on Unc ertainty in Ar tificial Intel l igenc e, UAI’03 . Gelfond, M. and Lifschitz, V. 1991. Classical negation in logic p rograms and disjunc- tive databases. New Gener ation Computing 9 , 365–387. Gelfond, M. and Lifschi tz, V . 1993. Representing A ct ion and Change by Logic Pro- grams. Journal of L o gic Pr o gr ammi ng 17, 2,3,4, 301–322. Gelfond, M. and Lifschitz, V. 1998. A ction languages. Li nk¨ oping Ele ctr oni c Articles in Computer and Information Scienc e 3, 16. Getoor, L. , Fri edman, N. , Koller, D. , and Pfeffer, A. 2001. Learning p robab ilistic relational mod els. In R elational Data Mining , S. Dzeroski and N . Lavrac, Eds. Springer- V erlag, 7–34. Ghahramani , Z. 1998. Learning d ynamic ba yesian netw orks. In A daptive Pr o c essing of Se quenc es and Data Structur es . Lecture N otes in A rt ificial Intelligence. S pringer-V erlag, 168–197 . 64 Jo ost V en n ekens and Mar c Dene cker and Mauric e Bruyno o ghe Giunchiglia , E. and Li fschitz, V. 1998. A n action language based on causal exp lana- tion: Preliminary report. In Pr o c e e dings of AAAI 98 . Halpern, J. and Pearl, J. 2001a. Causes and explanations: A structural mo del app roac h – part I: Causes. In Pr o c e e dings of the 17th Confer enc e on Unc ertainty in Artificial Intel l igenc e, UAI’01 . Halpern, J. and Pea rl, J. 2001b. Causes and ex planations: A structural model ap- proac h – part I I: Explanations. In Pr o c e e di ngs of the 17th Confer enc e on Unc ertainty in A rtificial Intel ligenc e, UAI’01 . Jaeger, M. 1997. Relational Bay esian netw orks. In Pr o c e e dings of the Thirte enth Con- fer enc e on Unc ertainty in Artificial I ntel ligenc e (UAI-97) . Kakas, A. C. , Ko w alski, R. , and Toni, F. 1992. Ab ductive logic programming. Journal of L o gic and Computation 2, 6, 719–7 70. Kersting, K. and De Raedt, L. 2000. Ba yesian logic programs. In Pr o c e e dings of the Work-in-Pr o gr ess T r ack at the 10th International Confer enc e on Inductive L o gi c Pr o gr amming , J. Cussens and A. F risch, Eds. 138–155. Kersting, K. and De Raedt, L. 2001. Ba yesian logic programs. T ech. Rep. 151, Institute for Computer Science, U n iversi ty of F reiburg, Germany . Kersting, K. and De R aedt, L. 2007. Bay esian logic programming: Theory and tool. In An Intr o duction to Statistic al R elational L e arning , L. Geto or and B. T ask ar, Eds. MIT Press. T o app ear. Lakshmanan, L., V. S . and Sadri , F. 19 94. Probabilistic deductive databases. In Pr o c e e dings of the International Symp osium on L o gic Pr o gr amming, ILPS’94 , M. Bruyn ooghe, Ed. MIT Press, 254–268. Lifschitz, V. a nd Turner, H. 1999. Representi ng transition systems b y logic programs. In LPNMR . Lukasiewicz, T. 2001 . Fixp oint chara cterizations for man y-val ued disjunctive log ic pro- grams. In Pr o c e e dings of the 6th International Confer enc e on L o gic Pr o gr ammi ng and Nonmonotonic R e asoning (LPNMR’01) . Lecture Notes in Artificial Intellige nce, vol . 2173. Sp ringer-V erlag, 336–350. Mar ti n-L ¨ of, P. 1982. Constructiv e mathematics and computer programming. In Pr o- c e e dings of the Sixth International Congr ess of L o gic, Metho dolo gy, and Philosophy of Scienc e . 153–175. McCain, N. 1 997. Causalit y in commonsense reasoning about actions. Ph.D. thesis, Universit y of T exas at Austin. McCain, N. and Turner, H. 199 6. Causal theories of action and change. In Pr o c e e dings of the Thirte enth National Confer enc e on A rtificial Intel li genc e and the Eighth Innova- tive Applic ations of Artificial I ntel ligenc e Confer enc e ( 13th AAAI/8th IAAI ) . AAAI Press, 460–465. Meer t, W. , Struyf, J. , and Blockeel, H . 2007. Learning ground CP-logic theories by means by Bay esian netw ork techniques. In Pr o c e e dings of the 6th International Workshop on Multi-Relational Data Mining . 93–104. Muggleton, S. 2000. Learning sto chastic logic programs. Ele ctr onic T r ansactions in Ar tificial Intel ligenc e 5, 041, 141–1 53. Ng, R. T. and S ubrahmanian, V. S. 1992. Probabil istic logic p rogramming. Information and Computation 101, 2, 150–20 1. Ngo, L. and Hadda wy, P. 1997. A nsw ering queries from con text-sensitive probabilistic knowl edge bases. The or etic al Computer Scienc e 171, 1–2, 147–177. Pearl, J. 1988. Pr ob abili stic R e asoning in Intel ligent Systems : Networks of Plausible Infer enc e . Morgan Kaufmann. CP-lo gic: A L anguage of Causal Pr ob abilistic Events 65 Pearl, J. 2000. C ausality: M o dels, R e asoning, and Infer enc e . Cambridge Universit y Press. Pinto, J. a nd Reiter, R . 1993. T emp oral reasoning in logic programming: A case for the situation calculus. In Pr o c. of the International Conf er enc e on L o gic Pr o gr amming . 203–221 . Poole, D. 1993. Probabilistic Horn ab duct ion and Ba yesian netw orks. Ar tificial Intel li- genc e 64 64, 1, 81–129. Poole, D. 1997. The Ind ep endent Choice Logic for mo delling multiple agents un d er uncertaint y . Artificial I ntel li genc e 94, 1-2, 7–56. Przymusinski, T. C. 1991. S table seman tics for disjunctive programs. New Gener ation Computing 3/4 , 401–424 . Riguzzi , F. 2004. Learning logic programs with annotated disjunctions. In 14th Inter- nation Confer enc e on Inductive L o gi c Pr o gr amming (ILP2004), Porto, 6-8 Septemb er 2004 , A. Sriniv asan and R. King, Eds. Springer V erlag, Heidelberg, Germany , 270 –287. Riguzzi , F. 2007. A top do wn in terpreter for LP AD and CP-logic. In The 14th RCRA workshop Exp erimental Evaluation of Algorithms for Solving Pr oblems with Combina- torial Explosion . Sakama, C. and Inoue, K. 1994. An alternativ e approach to the seman tics of disjunctive logic programs and d eductive d atabases. Journal of automate d r e asoning 13, 1, 145–17 2. Santos Cost a, V. , P age, D. , Qazi, M. , and Cu ssens, J. 2003. CLP(BN): Constraint logic programming for prob ab ilistic kn o wledge. In Pr o c e e di ngs of the Ninete enth An nual Confer enc e on Unc ertainty in Art ificial Intel ligenc e (UAI-2003) . Morgan K aufmann, 517–524 . Sa to, T. and Kamey a, Y. 1997. PRISM: A language for symbolic-statistical mo deling. In Pr o c e e dings of the International Joint Confer enc es on Artificial Intel l igenc e, I JCAI’97 . 1330–13 35. Shafer, G. 1996. The art of c ausal c onje ctur e . MIT Press. Tran, N. and Baral, C. 2004. Encod ing probabilistic causal mo del in p robab ilistic action language. In AAAI . V an Belleghem, K. , Denecker, M. , and De Schreye, D. 1 997. On the Relation b etw een Situation Calculus and Even t Ca lculus. Journal of L o gic Pr o gr amming, sp e cial issue on R e asoning ab out Ac tions and Change 31, 1-3, 3–37. V an G elder, A. , Ro ss, K. , and Schlipf, J. 1991. The w ell-founded semanti cs for general logic p rograms. Journal of the ACM 38, 3, 620–6 50. Venn, J. 1866. The L o gic of C hanc e: A n Essay on the F oundations and Pr ovinc e of the The ory of Pr ob abili ty . Vennekens, J. , Denecker, M. , and Br uynooghe, M. 2006. Represen ting causal in- formation ab out a probabilistic process. In L o gics in Art i fici al Intel ligenc e, 10th Eur o- p e an Confer enc e, JELIA’ 06, Pr o c e e dings . Lecture Notes in Computer Science, vol. 4160. Springer, 452–464. Vennekens, J. , Verbaeten, S. , and Bruynooghe, M. 2004. Logic programs with an- notated disjunctions. In Lo gic Pr o gr amming, 20th I nternational Confer enc e, IC LP’04, Pr o c e e dings . Lecture Notes in Computer S cience, vol. 3132 . Springer, 431–445.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment