We introduce a controlled form of recursion in XQuery, inflationary fixed points, familiar in the context of relational databases. This imposes restrictions on the expressible types of recursion, but we show that inflationary fixed points nevertheless are sufficiently versatile to capture a wide range of interesting use cases, including the semantics of Regular XPath and its core transitive closure construct. While the optimization of general user-defined recursive functions in XQuery appears elusive, we will describe how inflationary fixed points can be efficiently evaluated, provided that the recursive XQuery expressions exhibit a distributivity property. We show how distributivity can be assessed both, syntactically and algebraically, and provide experimental evidence that XQuery processors can substantially benefit during inflationary fixed point evaluation.
The backbone of the XML data model, namely ordered, unranked trees of nodes, is inherently recursive and it is natural to equip the associated languages with constructs that can query such recursive structures. To get from the recursive axes in XPath, e.g., ancestor and descendant, to XQuery's [7] recursive user-defined functions, language designers took a giant leap, however. User-defined functions in XQuery admit arbitrary types of recursion-a construct that largely evades optimization approaches beyond "procedural" improvements like tail-recursion elimination or unfolding.
This paper embarks on a journey that explores a controlled form of recursion in XQuery, the inflationary fixed point (IFP), familiar in the context of relational databases [1]. While this imposes restrictions on the expressible types of recursion, IFP embraces a family of widespread use cases of recursion in XQuery, including many forms of horizontal or vertical structural recursion and the pervasive transitive closure problem (IFP captures Regular XPath [25], in particular). Example 1.1 The DTD of Figure 1 (taken from [22]) describes recursive curriculum data, including courses, their lists of prerequisite courses, the prerequisites of the latter, and so on. The XQuery program of Figure 2 uses the course element node with code “c1” to seed a computation that recursively finds all prerequisite courses, direct or indirect, of course “c1”. For a given sequence $x of course nodes, function fix(•) calls out to rec(•) to find their prerequisites. While new nodes are encountered, fix(•) calls itself with the accumulated course node sequence. (This is not expressible in XPath 2.0.) ⊳ Note that fix(•) implements a generic fixed point computation: only the initialization (let $seed := • • • ) and the payload function rec(•) are specific to the curriculum problem. This motivates the introduction of a syntactic form that can succinctly accommodate this pattern of computation (Section 2). Most importantly, however, such computation in IFP form is susceptible to systematic optimization, provided that the payload (or body) of the recursion exhibits a specific distributivity property.
Unlike the general user-defined XQuery functions, this account of recursion puts the query processor into control in that it can decide whether the optimization may be safely applied. Distributivity may be assessed on a syntactical level-a non-invasive approach that can easily be realized on top of existing XQuery processors (Section 3). Further, though, if we adopt a relational view of the XQuery semantics (as in [15]), the seemingly XQuery-specific distributivity notion turns out to be elegantly and uniformly tractable on the familiar algebraic level (Section 4).
Compliance with the restriction that IFP imposes on query formulation is rewarded by significant query runtime savings that the IFP-inherent optimization hook can offer. We document the effect for the XQuery processors Mon-etDB/XQuery [8] and Saxon [20] in Section 5. This is primarily due to a substantial reduction of the number of items that are fed into the recursion’s payload function (the naïve implementation of Example 1.1 feeds already discovered course element nodes back into rec(•)).
In Section 6, we stop by related work on recursion on the XQuery as well as the relational side of the fence, and finally wrap-up in Section 7.
The subsequent discussion will revolve around the recursion pattern embodied by function fix(•) of Figure 2, known as the inflationary fixed point (IFP) [1]. We will introduce a new syntactic form to introduce IFP on the XQuery language level and then explore its semantics in the XQuery context, application, and optimization.
In the following, we regard an XQuery expression e 1 containing a free variable $x as a function of $x. We write e 1 (e 2 ) to denote e 1 [ e 2/$x], i.e., the consistent replacement of all free occurrences of $x in e 1 by e 2 . Function fv (e) returns the set of free variables of expression e. We further introduce set-equality ( s =), a relaxed notion of equality for XQuery item sequences that disregards duplicate items and order, e.g., (1,“a”) s = (“a”,1,1). To streamline the discussion, in the following we assume computations over sequences of type node()* as trees are the recursive data structure in the XQuery Data Model. In this case, with X 1 , X 2 of type node()*, we have 1 X 1 s = X 2 ⇔ fs:ddo(X 1 ) = fs:ddo(X 2 ) . 1 Here and in the following, fs:ddo(•) abbreviates the function fs:distinct-doc-order(•) of the XQuery Formal Semantics [9].
An extension to general sequences of type item()* is possible and entails the replacement of XQuery’s node set operations (union, except) with appropriate variants. The payload expression e rec is called the body, e seed is called the seed, and $x is called the recursion variable of the inflationary fixed point operator.
The semantics of the IFP of e rec ($x) seeded by e seed is the sequence of nodes res k , i
This content is AI-processed based on open access ArXiv data.