XML Static Analyzer User Manual

This document describes how to use the XML static analyzer in practice. It provides informal documentation for using the XML reasoning solver implementation. The solver allows automated verification of properties that are expressed as logical formula…

Authors: ** *작성자 정보가 명시되지 않음* (매뉴얼은 Genevès 연구팀 및 INRIA‑WAM 프로젝트에 기반) **

XML Static Analyzer User Manual
apport   de recherche ISSN 0249-6399 ISRN INRIA/RR--6726--FR+ENG Thème SYM INSTITUT N A TION AL DE RECHERCHE EN INFORMA TIQUE ET EN A UTOMA TIQ UE XML Static Analyzer User Manual Pierre Gene vès — Nabil Layaïda N° 6726 October 27, 2021 Centre de recherche INRIA Grenoble – Rhône-Alpes 655, av enue de l’Europe, 38334 Montbonnot Saint Ismier Téléphone : +33 4 76 61 52 00 — Télécopie +33 4 76 61 52 52 XML Static Analyzer User Man ual Pierre Genev` es ∗ , Nabil La y a ¨ ıda Th ` eme SYM — Syst ` emes symboliques ´ Equip es-Pro jets W am Rapp ort de rec herche n ° 6726 — October 27, 2021 — 17 pages Abstract: This man ual provides do cumentation for using the logical solv er in tro duced in [ Genev ` es, 2006; Genev ` es et al. , 2007 ] . Key-w ords: Static Analysis, Logic, Satisfiability Solv er, XML, Sc hema, XP ath, Queries ∗ CNRS Analyseur Statique p our XML et XP ath: Man uel Utilisateur R´ esum ´ e : Ce manuel do cumen te l’utilisation du solv eur logique d ´ ecrit dans [ Ge- nev ` es, 2006; Genev ` es et al. , 2007 ] . Mots-cl´ es : Analyse Statique, Logique, Solv eur, Satisfaisabilit´ e, XML, Schema, Requ ˆ etes, XP ath XML R e asoning Solver User Manual 3 1 Introduction This do cumen t describ es the logical solver in tro duced in [ Genev ` es, 2006; Genev` es et al. , 2007 ] and pro vides informal do cumentation for using its implementation. The solver allows automated verification of prop erties that are expressed as logical form ulas o ver trees. A logical formula may for instance express stru ctural constrain ts or na vigation prop erties (lik e e.g. path existence and node selection) in finite trees. A decision pro cedure for a logic traditionally defines a partition of the set of logical formulas: formulas which are satisfiable (there is a tree which satisfies the form ula) and remaining formulas which are unsatisfiable (no tree satisfies the giv en form ula). Alternatively (and equiv alently), formulas can b e divided into valid formulas (formulas whic h are satisfied b y all trees) and invalid form ulas (form ulas that are not satisfied b y at least one tree). The solver is a satisfiability- testing solver: it allows chec king satisfiabilit y (or unsatisfiability) of a given logical formula. Note that v alidity of a formula ϕ can b e chec ked by testing ¬ ϕ for unsatisfiabilit y . The solver can be used for reasoning ov er finite ordered trees whatever these trees do actually represent. In particular, the logic and the solv er are sp ecifically adapted for formulating and solving problems ov er XML tree structures [ Bra y et al. , 2004 ] . The logic can express na vigational properties like those expressed with the XPath standard language [ Clark and DeRose, 1999 ] for navigating and selecting sets of no des from XML trees. Additionally , the logic is expressive enough to enco de any regular tree language prop erty (it subsumes finite tree automata). It can enco de constraints definable with common XML tree t yp e definition languages (such as DTD [ Bra y et al. , 2004 ] , XML Schema [ F allside and W almsley , 2004 ] , and Relax NG [ Clark and Murata, 2001 ] ). The logic provides high-lev el constructs sp ecifically designed for reasoning directly with such XML concepts: the user can directly write an expression using XPath notation in the logic, or even refer to an XML type in the logic. These characteristics make the system especially useful for solving problems lik e those encountered in the static analysis of XML code, static v erification of XML access control p olicies, XML data securit y chec king, XML query optimization, and the construction of static t yp e-chec kers, and optimizing compilers for a wide v ariety of tree-manipulating programs and XML pro cessors. Outline This user man ual is organized as follo ws: Section 2 describes the basics for using the solver without requiring an y logical kno wledge; Section 3 giv es some insights on the logic, esp ecially on the simple y et general data tree mo del used b y the logic (Section 3.1) and on the syn tax of logical form ulas (Sec- tion 3.2) including high-level constructs for embedding XP ath expressions and XML tree types directly in the logic. Finally , Section 4 pro vides an ov erview of the background theory underlying the logic and its solver, with related refer- ences. 2 Getting Started with XML Applications The logical solver is shipped as a compressed file whic h, once extracted, provides binaries along with all required libraries. The “ solver.jar ” executable file RR n ° 6726 4 Genev ` es, L aya ¨ ıda, & Quint tak es a filename as a parameter 1 . The filename refers to a text file con taining the logical formula to solve. F or example, provided a recen t 2 Ja v a runtime engine is installed, the following command line: java -jar solver.jar formula.txt runs the solv er on the logical form ula contained in “ formula.txt ”. The full syn tax of logical formulas is given in Section 3.2. The follo wing examples in- tro duce the logical formulation of some simple yet fundamen tal XML problems, and ho w the solver output should b e interpreted. Example 1: emptiness test for an XP ath expression. The most basic decision problem for a query language is the emptiness test of an expression: whether or not a query is self contradictory and alwa ys yields an empt y result. This test is imp ortant for error-detection and optimization of host languages implemen tations, i.e. implementations that process languages in whic h XP ath expressions are used. F or instance, if one can decide at compile time that a query result is empt y then subsequen t bound computations can be ignored. F or c heck- ing emptiness of the XP ath expression a/b[following-sibling::c/parent::d] , the con ten ts of the “ example1.txt ” file simply consists of the following line: example1.txt select("a/b[following-sibling::c/parent::d]") Running the solver with “ example1.txt ” as parameter yields the following trace: Output for example1.txt Reading example1.txt Satisfiability Tested Formula: (mu X5.(((b & (mu X2.(<-1>(a & (mu X1.(<-1>T | <-2>X1))) | <-2>X2))) & (mu X4.(<2>((mu X3.(<-1>d | <-2>X3)) & c) | <2>X4)))|(<1>X5|<2>X5))) Computing Relevant Closure Computed Relevant Closure [1 ms]. Computed Lean [1 ms]. Lean size is 20. It contains 14 eventualities and 6 symbols. Computing Fixpoint.....[4 ms]. Formula is unsatisfiable [14 ms]. The input XPath expression is first parsed and compiled into the logic. The corresp onding logical translation whose satisfiability is going to b e tested is prin ted. The solver then computes the Fisher-Ladner closure and the Lean of the form ula: the set of all basic subformulas that notably defines the search space that is going to b e explored by the solver (see [ Genev ` es et al. , 2007 ] for details). The solver attempts to build a satisfying tree in a b ottom-up w a y , in the manner of a fixpoint computation that iteratively updates a set of tree 1 Running the command “ java -jar solver.jar ” prints the list of required and optional arguments. 2 A Jav a virtual mac hine version 1.5.0 (or further) and a Jav a compiler compliance level version 5.0 (or further). INRIA XML R e asoning Solver User Manual 5 no des. This computation is p erformed in at most 2 O ( n ) steps with resp ect to size n of the Lean. In this example, no satisfying tree is found: the form ula is unsatisfiable (in other terms, no matter on whic h XML do cument this XPath expression is ev aluated, it will alwa ys yield an empty re sult). Intuitiv ely , that is b ecause this XP ath expression con tains a contradiction: according to the query , the same no de is required to be named both “ a ” and “ d ”, whic h is not allow ed for an XML tree. Empt y queries often come from the use of an XP ath expression in a con- strained setting. The combination of navigational information of the query and structural constraints imp osed by a DTD (or XML Schema) may rapidly yield con tradictions. Such contradictions can also b e detected b y chec king a logical form ula for satisfiabilit y . Example 2: c hecking XPath emptiness in the presence of tree con- strain ts. Supp ose w e w ant to c heck emptiness of the XPath expression descendant::switch[ancestor::head]/descendant::seq/ descendant::audio[preceding-sibling::video] o ver the set of do cumen ts defined b y the DTD of the SMIL language [ Hosc hk a, 1998 ] . The following formula is used: example2.txt select("descendant::switch[ancestor::head]/descendant::seq/ descendant::audio[preceding-sibling::video]", type("sampleDTDs/smil.dtd", "smil")) The first argumen t for the predicate type() is a path to the DTD file (here the DTD is assumed to b e lo cated in a sub directory called “sampleDTDs”), and the second argument is the name of the elemen t to be considered as top-lev el start sym b ol. Running the solver with this “ example2.txt ” file as parameter yields the following trace: Output for example2.txt Reading example2.txt Converted tree grammar into BTT [169 ms]. Translated BTT into Tree Logic [60 ms]. Satisfiability Tested Formula: (mu X22.(((audio & (mu X20.(<-1>((seq & (mu X19.(<-1>(((switch & (mu X17.(<-1>( (let_mu X1=(((meta & ~(<1>T)) & ~(<2>T)) | ((meta & ~(<1>T)) & <2>X1)), ... X16=((smil & (~(<1>T) | <1>X15)) & ~(<2>T)) in X16) | X17) | <-2>X17))) & (mu X18.(<-1>(head | X18) | <-2>X18))) | X19) | <-2>X19))) | X20) | <-2>X20))) & (mu X21.(<-2>video | <-2>X21))) | (<1>X22 | <2>X22))) Computing Relevant Closure Computed Relevant Closure [39 ms]. RR n ° 6726 6 Genev ` es, L aya ¨ ıda, & Quint Computed Lean [1 ms]. Lean size is 50. It contains 31 eventualities and 19 symbols. Computing Fixpoint......[37 ms]. Formula is satisfiable [99 ms]. A satisfying finite binary tree model was found [52 ms]: smil(head(switch(seq(video(#, audio), layout), meta), #), #) In XML syntax: The referred external DTD (tree grammar) is first parsed, con v erted in to an in ternal representation on binary trees (called “BTT” and that corresp onds to the mapping describ ed in 3.1), and then compiled in to the logic. The XPath expression is also parsed and compiled in to the logic so that the global formula can b e comp osed. In that case, the formula is satisfiable (the XPath expression is non-empty in the presence of this DTD). The solver outputs a sam ple tree for whic h the formulas is satisfied. This sample tree is enriched with specific attributes: the “solver:target” attribute marks a sample no de selected by the XP ath expression when ev aluated from a no de marked with “solver:con text”. Example 3: c hecking containmen t and equiv alence b etw een XPath expressions. One of the most essential problem for a query language is the con tainment problem: whether or not the result of one query is alw ays included in to the result of another one. Containmen t for XPath expressions is for instance needed for the static type-chec king of XPath queries, for the control-flo w analysis of XSL T [ Clark, 1999 ] , for chec king integrit y constraints in XML databases, for XML data security ... Supp ose for instance that w e w an t to chec k containmen t betw een the following XP ath expressions: descendant::d[parent::b]/following-sibling::a and: ancestor-or-self::*/descendant-or-self::b/a[preceding-sibling::d] Since containmen t corresp onds to logical implication, we actually wan t to chec k whether the implication of the tw o corresp onding formulas is v alid. Since we use a satisfiabilit y-testing algorithm, we verify this v alidity by chec king for the unsatisfiabilit y of the negated implication, as follows: INRIA XML R e asoning Solver User Manual 7 example3.txt ~( select("descendant::d[parent::b]/following-sibling::a",#) => select("ancestor-or-self::*/descendant-or-self::b /a[preceding-sibling::d]",#)) Note that XPath expressions must b e compared from the same ev aluation con- text, whic h can be any set of no des, but should be the same set of no des for both expressions. This is denoted by “ # ”. Running the solver with this “ example3.txt ” file results in the follo wing trace: Output for example3.txt Reading example3.txt Satisfiability Tested Formula: (mu X26.(((a & (mu X15.((<-2>T & (~(<-2>T) | <-2>((d & (mu X13.((<-1>T & (~(<-1>T) | <-1>(_context | X13))) | (<-2>T & (~(<-2>T) | <-2>X13))))) & (mu X14.((<-1>T & (~(<-1>T) | <-1>b)) | (<-2>T & (~(<-2>T) | <-2>X14))))))) | (<-2>T & (~(<-2>T) | <-2>X15))))) & ((~(a) | (mu X22.((~(<-1>T) | <-1>(~(b) | ((~(_context) & (~(<1>T) | <1>(mu X18.((~(_context) & (~(<1>T) | <1>X18)) & (~(<2>T) | <2>X18))))) & (mu X20.((~(<-1>T) | <-1>((~(_context) & (~(<1>T) | <1>(mu X19.((~(_context) & (~(<1>T) | <1>X19)) & (~(<2>T) | <2>X19))))) & X20)) & (~(<-2>T) | <-2>X20)))))) &(~(<-2>T) | <-2>X22)))) | (mu X25.((~(<-2>T) | <-2>~(d)) & (~(<-2>T) | <-2>X25))))) | (<1>X26 | <2>X26))) Computing Relevant Closure Computed Relevant Closure [4 ms]. Computed Lean [1 ms]. Lean size is 29. It contains 23 eventualities and 6 symbols. Computing Fixpoint.....[8 ms]. Formula is unsatisfiable [22 ms]. The tested formula is unsatisfiable (in other terms: the implication is v alid), so one can conclude that the first XPath expression is contained in the second XP ath expression. A related decision problem is the equiv alence problem: whether or not tw o queries alwa ys return the same result. It is important for reform ulation and optimization of an expression, which aims at enforcing op erational prop erties while preserving semantic equiv alence. Equiv alence is reducible to con tainment (bi-implication) and is noted <=> in the logic. Note that the previous XPath expressions are not equiv alent. The reader may chec k this by using the solver, that will generate the following counter-example tree: RR n ° 6726 8 Genev ` es, L aya ¨ ıda, & Quint 3 Logical Insigh ts 3.1 Data Mo del for the Logic An XML do cument is considered as a finite tree of unbounded depth and arit y , with t w o kinds of nodes resp ectively named elements and attributes. In such a tree, an elemen t ma y ha ve any num b er of c hildren elemen ts, and ma y carry zero, one or more attributes. A ttributes are leav es. Elements are ordered whereas attributes are not, as illustrated on Figure 1. The logic allo ws reasoning on suc h trees. Notice that from an XML p ersp ectiv e, data v alues are ignored. XML Notation a b c d e r s t u v w x Figure 1: Sample XML T ree with Attributes. Unrank ed and Binary T rees There are bijective enco dings b etw een un- rank ed trees (trees of un b ounded arity) and binary trees. Owing to these en- co dings binary trees ma y b e used instead of unrank ed trees without loss of generalit y . The logic op erates on binary trees. The logic relies on the “first- c hild & next-sibling” enco ding of unranked trees. In this enco ding, the first c hild of a no de is preserved in the binary tree represen tation, whereas siblings of this no de are app ended as right successors in the binary represen tation. The in tuition of this enco ding is illustrated on Figure 2 for a sample tree. T rees can b e seen as terms or function calls. More formally , a binary tree t can 1 2 3 0 0 1 2 3 Figure 2: Binary Enco ding Principle. INRIA XML R e asoning Solver User Manual 9 a b c d e r s t u v w x Figure 3: Binary Enco ding of T ree of Figure 1. b e defined b y the recursive syn tax t ::= σ ( t, t 0 ) |  where σ is a node la- b el and  denotes the empty tree. Similarly unranked trees can b e defined as t ::= σ ( h ) where h is a hedge (a sequence of unranked trees) defined as h ::= σ ( h ) , h 0 |  . The function f that translates unrank ed trees in to binary trees is then defined by f ( σ ( h ) , h 0 ) = σ ( f ( h ) , f ( h 0 )) and f (  ) =  . The re- v erse mapping used for reconstructing unranked trees from binary trees can be expressed as: f − 1 ( σ ( t, t 0 )) = σ ( f − 1 ( t )) , f − 1 ( t 0 ) and f − 1 (  ) =  . In the remaining part of this man ual, the binary represen tation of a tree is implicitly considered, unless stated otherwise. F rom an XML p oint of view, notice that only the nested structure of XML elements (whic h are ordered) is enco ded into binary form like this. XML attributes (whic h are unordered) are left unc hanged by this enco ding. F or instance, Figure 3 presents how the sample tree of Figure 1 is mapp ed. 3.2 Syn tax of Logical F orm ulas Mo dal F ormulas for Navigating in T rees The logic uses t wo pr o gr ams for na vigating in binary trees: the program 1 allows to na vigate from a no de down to its first successor and the program 2 for na vigating from a node down to its second successor. The logic also features c onverse pr o gr ams -1 and -2 for navi- gating upw ard in binary trees, resp ectively from the first and second successors to the parent no de. Some basic logical form ulas together with corresp onding satisfying binary trees are shown on T able 1. When using XPath expressions, lik e e.g. select("a[b]") , the XPath expression is automatically compiled in to a logical formula ov er the binary tree representation (see Section 3.2). The set of logical formulas is defined by the syntax given on Figure 4, where the meta-syntax h X i  means one or more o ccurences of X separated by commas. Mo dels of a formula are finite binary trees for whic h the formula is satisfied at some no de. The semantics of logical form ulas is formally defined in [ Genev ` es, 2006; Genev` es et al. , 2007 ] . T able 1 gives basic form ulas that use mo dalities for na vigating in binary trees and no de names. Recursiv e F ormulas The logic allows expressing recursion in trees through the use of a fixp oint op erator. F or example the recursive formula: RR n ° 6726 10 Genev ` es, L aya ¨ ıda, & Quint Sample F ormula Satisfying Binary T ree XML syntax a & <1>b a b a & <2>b a b a & <1>(b & <2>c) a b c e & <-1>(d & <2>g) d e g f & <-2>(g & ~<2>T) none none T able 1: Sample F orm ulas using Mo dalities. ϕ ::= form ula T true | F false | l elemen t name | p atomic prop osition | # start con text | ϕ | ϕ disjunction | ϕ & ϕ conjunction | ϕ => ϕ implication | ϕ <=> ϕ equiv alence | ( ϕ ) paren thesized form ula | ˜ ϕ negation | < p > ϕ existen tial mo dalit y | < l >T attribute named l | $ X v ariable | let h $ X = ϕ i  in ϕ binder for recursion | pr e dic ate predicate (See Figure 5) p ::= program inside mo dalities 1 first c hild | 2 next sibling | -1 paren t | -2 previous sibling Figure 4: Syn tax of Logical F orm ulas. INRIA XML R e asoning Solver User Manual 11 let $ X = b | <2> $ X in $ X means that either the current no de is named b or there is a sibling of the curren t no de whic h is named b . F or this purpose, the v ariable $ X is b ound to the subform ula b | <2> $ X which contains an occurence of $ X (therefore defining the recursion). The scop e of this binding is the subformula that follows the “ in ” sym b ol of the form ula, that is $ X . The en tire form ula can th us b e seen as a compact recursive notation for a infinitely nested formula of the form: b | <2>(b | <2>(b | <2>(...))) Recursion allows expressing global properties. F or instance, the recursive for- m ula: ~ let $ X = a | <1> $ X | <2> $ X in $ X expresses the absence of nodes named a in the whole subtree of the curren t no de (including the current node). F urthermore, the fixpoint op erator mak es p ossible to bind sev eral v ariables at a time, which is specifically useful for expressing m utual recursion. F or example, the m utually recursiv e form ula: let $ X = (a & <2> $ Y) | <1> $ X | <2> $ X, $ Y = b | <2> $ Y in $ X asserts that there is a node somewhere in the subtree such that this no de is named a and it has at least one sibling whic h is named b . Binding sev eral v ariables at a time pro vides a very expressiv e yet succinct notation for expressing m utually recursiv e structural patterns (that may o ccur in DTDs for instance). The com bination of mo dalities and recursion makes the logic one of the most expressiv e (yet decidable) logic known. F or instance, regular tree grammars can b e expressed with the logic using recursion and (forward) mo dalities. The com bination of con verse programs and recursion allo ws expressing prop erties ab out ancestors of a no de for instance. The p ossibility of nesting recursive form ulas allo w XP ath expressions to b e translated into the logic. Cycle-F reeness Restriction There is a restriction on the use of recursiv e form ulas. Only formulas that are cycle-fr e e are allo w ed. Intuitiv ely a form ula is cycle-free if it do es not con tain b oth a program and its conv erse inside the same recursion. F or instance, the formula let $ X = a | <-1> $ X | <1> $ X in $ X is not cycle-free since 1 and -1 occur in front of the same v ariable b ound b y the same binder. A form ula is cycle-free if one cannot find b oth a program and its conv erse by starting from a v ariable and going up in the form ula tree to the binder of this v ariable. F or instance, the following formula is cycle-free: let $ X = a & (let $ X = b | <1> $ X in $ X) | <-1> $ X in $ X since v ariable binders are prop erly nested and a program and its conv erse never app ear in front of the same v ariable b ound b y the same binder. T ranslations of XPath expressions and XML tree types into the logic always generate cycle-free formulas, whatev er the translated XPath or XML type is. The cycle-freeness restriction only matters when one directly writes recursive logical formulas. F rom a theoretical p ersp ective the cycle-freeness restriction RR n ° 6726 12 Genev ` es, L aya ¨ ıda, & Quint pr e dic ate ::= select ( " query " ) | select ( " query " , ϕ ) | exists ( " query " ) | exists ( " query " , ϕ ) | type ( " f " , l ) | type ( " f " , l, ϕ, ϕ 0 ) | forward incompatible ( ϕ, ϕ 0 ) | backward incompatible ( ϕ, ϕ 0 ) | element ( ϕ ) | attribute ( ϕ ) | descendant ( ϕ ) | exclude ( ϕ ) | added element ( ϕ, ϕ 0 ) | added attribute ( ϕ, ϕ 0 ) | non empty ( " query " , ϕ ) | new element name ( " query " , " f " , " f 0 " , l ) | new region ( " query " , " f " , " f 0 " , l ) | new content ( " query " , " f " , " f 0 " , l ) | pr e dic ate-name ( h ϕ i  ) Figure 5: Syn tax of Predicates for XML Reasoning. sp e c ::= ϕ form ula (see Fig. 4) | def ; ϕ def ::= pr e dic ate-name ( h l i  ) = ϕ 0 custom definition | def ; def list of definitions Figure 6: Global Syntax for Sp ecifying Problems. comes from the fact that conv erse programs may interact with recursion in a subtle manner such that the finite model prop ert y is lost, so the cycle-freeness restriction ensures that the negation of ev ery formula can also b e expressed in the logic, or in other terms, that the logic is closed under negation and all other b o olean op erations (a detailed discussion on this topic can b e found in [ Genev ` es et al. , 2007 ] ). Supp orted XPath Expressions The logic pro vides high-level constructions for facilitating the formulation of problems in v olving XPath expressions. The construct select (” e ” , ϕ ) where e is an XP ath expression pro vides a w ay of em b edding XP ath expression directly into the logic ( e is automatically compiled INRIA XML R e asoning Solver User Manual 13 query ::= / p ath absolute path | p ath relativ e path | query | query union | query ∩ query in tersection p ath ::= p ath / p ath path composition | p ath [ qualifier ] qualified path | a :: nt step qualifier ::= qualifier and qualifier conjunction | qualifier or qualifier disjunction | not( qualifier ) negation | p ath path | p ath / @ nt attribute path | @ nt attribute step nt ::= no de test σ no de label | ∗ an y no de label a ::= tree navigation axis self | c hild | parent | descendan t | ancestor | descendan t-or-self | ancestor-or-self | follo wing-sibling | preceding-sibling | follo wing | preceding Figure 7: XP ath Expressions. in to a logical formula, see [ Genev ` es et al. , 2007 ] for details on the compilation tec hnique). The second parameter ϕ denotes the context from whic h the XPath is applied; it can b e an y formula. The other construct select (”e”) is simply a shorthand for select (” e ” , # ), where # is the initial con text node mark. The syn tax of supp orted XP ath expressions is given on Figure 7. W e observed that, in practice, man y XPath expressions contain syn tactic sugars that can also fit in to this fragment. Figure 8 presen ts how our XPath parser rewrites some commonly found XPath patterns into the fragment of Figure 7, where the notation ( a :: nt ) k stands for the comp osition of k successiv e path steps of the same form: a :: nt /.../ a :: nt | {z } k steps . Supp orted XML T yp es The logic is expressive enough to allo w for the en- co ding of an y regular tree grammar. The logical construction type ( " filename " , start ) pro vides a conv enient w a y of referring to tree grammars written in usual nota- tions like DTD, XML Schema, or Relax NG. The referred tree type is automat- ically parsed and compiled into the logic, starting from the given start sym bol (whic h can be the ro ot symbol or any other symbol defined by the tree t yp e). RR n ° 6726 14 Genev ` es, L aya ¨ ıda, & Quint nt [p osition() = 1] nt [not(preceding-sibling:: nt )] nt [p osition() = last()] nt [not(following-sibling:: nt )] nt [p osition() = k |{z} k> 1 ] nt [(preceding-sibling:: nt ) k − 1 ] coun t( p ath ) = 0 not( p ath ) coun t( p ath ) > 0 p ath coun t( nt ) > k |{z} k> 0 nt / (following-sibling:: nt ) k preceding-sibling:: ∗ [p osition() = last() and qualifier ] preceding-sibling:: ∗ [not(preceding-sibling:: ∗ ) and qualifier ] Figure 8: Syn tactic Sugars and their Rewritings. 3.3 Predicates W e build on the aforementioned query and schema compilers, and define ad- ditional predicates that facilitate the form ulation of decision problems at a higher level of abstraction. Sp ecifically , these predicates are introduced as logi- cal macros with the goal of allo wing system usage while fo cusing (only) on the XML-side prop erties, and k eeping underlying logical issues transparent for the user. Ultimately , we regard the set of basic logical formulas (suc h as mo dal- ities and recursive binders) as an assembly language, to which predicates are translated. Some built-in predicates include:  ....  the predicate exclude ( ϕ ) whic h is satisfiable iff there is no no de that satisfies ϕ in the whole tree. This predicate can b e used for excluding sp ecific element names or ev en nodes selected b y a giv en XPath expression.  the predicate element ( T ) builds the disjunction of all element names o c- curing in T .  the predicate descendant ( ϕ ) forces the existence of a no de satisfying ϕ in the subtree, and pr e dic ate-name ( h ϕ i  ) is a call to a custom predicate, as explained in the next section. 3.4 Custom Predicates F ollo wing the spirit of predicates presen ted in the previous section, users may also define their own custom predicates. The full syntax of XML logical sp ecifi- cations to b e used with the system is defined on Figure 6, where the meta-syn tax h X i  means one or more o ccurrence of X separated b y commas. A global prob- lem specification can b e an y form ula (as defined on Figure 4), or a list of custom predicate definitions separated by semicolons and follow ed b y a formula. A cus- tom predicate may hav e parameters that are instanciated with actual formulas INRIA XML R e asoning Solver User Manual 15 when the custom predicate is called (as sho wn on Figure 5). A formula bound to a custom predicate may include calls to other predicates, but not to the curren tly defined predicate (recursiv e definitions must b e made through the let binder sho wn on Figure 4). 4 Overview of the Bac kground Theory The logic and its solv er are formally described in [ Genev ` es, 2006; Genev` es et al. , 2007 ] . The logic is a mo dal logic of trees, more sp ecifically an alternation-free µ -calculus with conv erse for finite trees. The logic is equipp ed with forward and backw ard mo dalities, whic h are notably useful for capturing all XP ath (including rev erse) axes. The logic is also equipped with a fixed-point op er- ator for expressing recursion in finite trees. A n-ary fixed-point op erator is also provided so that mutual recursion o ccurring in XML types can b e suc- cin tly expressed in the logic. The logic is also able to express an y proposi- tional prop erty , for instance ab out no des labels (XML element and attribute names). Last but not least, the logic is closed under negation [ Genev ` es, 2006; Genev ` es et al. , 2007 ] , that is, the negation of any logical formula can b e ex- pressed in the logic to o (this prop erty is essen tial for chec king XPath con tain- men t which corresp onds to logical implication). All these features together: prop ositions, forward and backw ard mo dalities, recursion (fixed-p oints op er- ators), and b o olean connectiv es yield a logic of very high expressive p ow er. Actually , this logic is one of the most expressive yet decidable known logic. It can express prop erties of regular tree languages. Sp ecifically , it is as expres- siv e as tree automata (which notably provide the foundation for the Relax NG language in the XML world) and monadic second-order logic of finite trees (of- ten referred as WS2S or “MSO” in the literature) [ Thatc her and W righ t, 1968; Doner, 1970 ] . How ever, the logical solv er is considerably (orders of magni- tude) faster than solvers for monadic second-order logic, like e.g., the MONA solv er [ Klarlund et al. , 2001 ] (the MONA solver nevertheless remains useful when one wan ts to write logical formulas using MSO syn tax). T ec hnically , the truth status of a logical formula (satisfiable or unsatisfiable) is automatically determined in exp onential time, and more sp ecifically in time 2 O ( n ) where n is prop ortional to (and s maller than) the size of the logical formula [ Genev ` es, 2006; Genev ` es et al. , 2007 ] . In comparison, the complexit y of monadic second-order logic is muc h higher: it w as prov ed in the late 1960s that the b est decision pro- cedure for monadic second order logic is at least hyper-exp onential in the size of the form ula [ Thatc her and W righ t, 1968; Doner, 1970 ] that is, not b ounded b y an y stack of exp onentials. The tree logic described in this document currently offers the best balance known b etw een expressivit y and complexit y for decid- abilit y . The acute reader may notice that the complexity of the logic is optimal since it subsumes tree automata and less expressive logics such as CTL [ Clark e and Emerson, 1981 ] , for instance. XP ath expressions and regular tree types can b e linearly translated into the logic. This observ ation allows to generalize the complexity of the algorithm for solving the logic to a wide range of problems in the XML w orld. The decision pro cedure for the logic is based on an inv erse tableau metho d that searches for a satisfying tree. The algorithm has b een pro v ed sound and complete in [ Genev ` es, 2006; Genev` es et al. , 2007 ] . The solv er is implemen ted RR n ° 6726 16 Genev ` es, L aya ¨ ıda, & Quint using symbolic techniques lik e binary decision diagrams (BDDs) [ Bry ant, 1986 ] . It also uses numerous optimization techniques suc h as on-the-fly formula nor- malization and simplification, conjunctive partitioning, early quan tification. Finally , another b enefit of this metho d (illustrated in Section 2) is that the solv er can b e used to generate an example (or counter-example) XML tree for a giv en prop erty , which allows for instance to repro duce a program’s bug in the dev elop er en vironment, indep endently from the logical solver. References [ Bra y et al. , 2004 ] Tim Bray , Jean Paoli, C. M. Sp erb erg-McQueen, Eve Maler, and F ran¸ cois Y ergeau. Extensible markup language (XML) 1.0 (third edition), W3C recommendation, F ebruary 2004. http://www.w3.org/TR/2004/REC- xml-20040204/. [ Bry ant, 1986 ] Randal E. Bryan t. Graph-based algorithms for b o olean function manipulation. IEEE T r ansactions on Computers , 35(8):677–691, 1986. [ Clark and DeRose, 1999 ] James Clark and Stev e DeRose. XML path language (XPath) version 1.0, W3C recommendation, Nov ember 1999. h ttp://www.w3.org/TR/ 1999/REC-xpath-19991116. [ Clark and Murata, 2001 ] James Clark and Makoto Murata. RELAX NG sp ecification, OASIS committee specification, Decem ber 2001. h ttp://relaxng.org/sp ec-20011203.html. [ Clark, 1999 ] James Clark. XSL transformations (XSL T) v ersion 1.0, W3C recommendation, Nov ember 1999. http://www.w3.org/TR/1999/REC-xslt- 19991116. [ Clark e and Emerson, 1981 ] Edm und M. Clarke and E. Allen Emerson. De- sign and syn thesis of sync hronization skeletons using branching-time temp o- ral logic. In L o gic of Pr o gr ams, Workshop , volume 131 of LNCS , pages 52–71, London, UK, 1981. Springer-V erlag. [ Doner, 1970 ] John Doner. T ree acceptors and some of their applications. Jour- nal of Computer and System Scienc es , 4:406–451, 1970. [ F allside and W almsley , 2004 ] Da vid C. F allside and Priscilla W almsley . XML Sc hema part 0: Primer second edition, W3C recommendation, Octob er 2004. h ttp://www.w3.org/TR/xmlschema-0/. [ Genev ` es et al. , 2007 ] Pierre Genev` es, Nabil Lay a ¨ ıda, and Alan Sc hmitt. Effi- cien t static analysis of XML paths and types. In PLDI ’07: Pr o c e e dings of the 2007 ACM SIGPLAN Confer enc e on Pr o gr amming L anguage Design and Implementation , pages 342–351, New Y ork, NY, USA, 2007. A CM Press. [ Genev ` es, 2006 ] Pierre Genev` es. L o gics for XML . PhD thesis, Institut National Polytec hnique de Grenoble, Decem b er 2006. h ttp://www.pierresoft.com/pierre.geneves/phd.h tm. INRIA XML R e asoning Solver User Manual 17 [ Hosc hk a, 1998 ] Philipp Hosc hk a. Sync hronized multimedia integration language (SMIL) 1.0 sp ecification, W3C recommendation, June 1998. h ttp://www.w3.org/TR/REC-smil/. [ Klarlund et al. , 2001 ] Nils Klarlund, Anders Møller, and Mic hael I. Sc hw artzbach. MONA 1.4, January 2001. http://www.brics.dk/mona/. [ Thatc her and W right, 1968 ] James W. Thatcher and Jesse B. W right. Gener- alized finite automata theory with an application to a decision problem of second-order logic. Mathematic al Systems The ory , 2(1):57–81, 1968. RR n ° 6726 Centre de recherche INRIA Grenoble – Rhône-Alpes 655, av enue de l’Europe - 38334 Montbonnot Saint-Ismier (France) Centre de recherche INRIA Bordeaux – Sud Ouest : Domaine Univ ersitaire - 351, cours de la Libération - 33405 T alence Cedex Centre de recherche INRIA Lille – Nord Europe : Parc Scientifique de la Haute Borne - 40, a venue Halley - 59650 V illeneuve d’Ascq Centre de recherche INRIA Nancy – Grand Est : LORIA, T echnopôle de Nancy-Brabois - Campus scientifique 615, rue du Jardin Botanique - BP 101 - 54602 V illers-lès-Nancy Cedex Centre de recherche INRIA Paris – Rocquencourt : Domaine de V oluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex Centre de recherche INRIA Rennes – Bretagne Atlantique : IRISA, Campus univ ersitaire de Beaulieu - 35042 Rennes Cedex Centre de recherche INRIA Saclay – Île-de-France : Parc Orsay Uni versité - ZA C des Vignes : 4, rue Jacques Monod - 91893 Orsay Cedex Centre de recherche INRIA Sophia Antipolis – Méditerranée : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex Éditeur INRIA - Domaine de V oluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France) http://www.inria.fr ISSN 0249-6399

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment