Toward Security Verification against Inference Attacks on Data Trees

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper describes our ongoing work on security verification against inference attacks on data trees. We focus on infinite secrecy against inference attacks, which means that attackers cannot narrow down the candidates for the value of the sensitive information to finite by available information to the attackers. Our purpose is to propose a model under which infinite secrecy is decidable. To be specific, we first propose tree transducers which are expressive enough to represent practical queries. Then, in order to represent attackers’ knowledge, we propose data tree types such that type inference and inverse type inference on those tree transducers are possible with respect to data tree types, and infiniteness of data tree types is decidable.

💡 Research Summary

The paper tackles the problem of inference attacks on XML‑style data trees by introducing a novel security notion called infinite secrecy. Infinite secrecy holds when, even after an adversary has access to all authorized queries, their results, and the schema of the data tree, the set of possible values for a sensitive attribute remains infinite, preventing the adversary from pinpointing the exact value. This contrasts with earlier access‑control approaches that only reason about finite label sets and cannot handle unbounded data domains such as integers or rational numbers.

To make infinite secrecy decidable, the authors construct a three‑layer framework:

Query Representation via Deterministic Tree Transducers
They define seven families of deterministic tree transducers that together can express a wide range of practical XML queries. These include top‑down and bottom‑up relabeling, node deletion, data‑rewriting (changing a data value for a given label), data‑relabeling (changing a label conditioned on a data value), min‑data‑relabeling, and max‑data‑relabeling. The latter two simulate natural‑join operations by marking the tuple(s) that carry the minimum or maximum data value for a given label. A query is a composition of zero or more of these transducers, with the restriction that a deleting transducer must appear only at the end and that no unauthorized query may produce the special “♯” label. This restriction guarantees that the transformation of a set of input trees can be captured by a finite tree automaton (NFT‑A), which is essential for the subsequent type analysis.
Attacker Knowledge as Data‑Tree Types
The knowledge an attacker possesses is modeled by data‑tree types, each being a finite union of atomic data‑tree types. An atomic type consists of (i) a non‑deterministic finite tree automaton (NFT‑A) describing the shape of admissible trees, (ii) a mapping θ from automaton states and node labels to variables, and (iii) a finite set of conditional expressions over those variables. Two kinds of variables are introduced:
- S‑variables (standard) enforce that all nodes sharing the same variable must have identical data values.
- M‑variables (multiple) allow nodes to have different data values, but those values must satisfy relational constraints (e.g., ordering, set inclusion) expressed in the condition set. This dual‑variable scheme enables the type system to capture both exact data equality (useful for projection and selection) and relational constraints (crucial for inverse type inference of data‑rewriting transducers where original values are overwritten).
Type Inference, Inverse Type Inference, and Infiniteness Decision
The authors define type inference as computing, given a transducer τ and an input type T_in, the output type T_out that over‑approximates all possible results τ can produce from trees in T_in. Conversely, inverse type inference computes, given τ and a desired output type T_out, the set of input trees that could lead to an output in T_out. Both procedures are algorithmically realized by manipulating the underlying NFT‑A, the θ mapping, and the condition set, while respecting the semantics of S‑ and M‑variables. Crucially, the inverse inference algorithm can handle data‑rewriting transducers because M‑variables can represent the unknown original values that were overwritten.

Finally, the paper presents a decidability algorithm for infiniteness of a data‑tree type. By analyzing the structure of the NFT‑A together with the variable constraints, the algorithm determines whether the set of trees described by the type is infinite (i.e., contains infinitely many distinct data‑value assignments) or finite. If the candidate set for the sensitive attribute is infinite, the data tree is deemed infinitely secret with respect to the unauthorized query; otherwise, a breach is possible.

The paper illustrates the framework with a concrete XML database of students, their origins, and scholarship amounts. Authorized queries extract names, origins, and extremal scholarship values; unauthorized queries attempt to retrieve the scholarship amount for a specific name. By modeling the authorized queries as a composition of the defined transducers and the attacker’s knowledge as a data‑tree type, the authors show that for some unauthorized queries the candidate set of scholarship amounts remains infinite (thus secure), while for others it collapses to a finite set (thus insecure).

Key contributions:

Formal definition of infinite secrecy for data trees with unbounded data domains.
A rich yet decidable class of deterministic tree transducers capable of expressing projection, selection, and natural join on XML data.
Introduction of data‑tree types with S‑ and M‑variables, enabling precise representation of attacker knowledge even after data‑rewriting operations.
Algorithms for type inference, inverse type inference, and a decision procedure for infiniteness of data‑tree types, establishing the decidability of infinite secrecy.

Overall, the work provides a rigorous, automata‑theoretic foundation for pre‑emptively verifying that a given XML database configuration resists inference attacks, extending security analysis beyond traditional access‑control to encompass quantitative data leakage in infinite domains.

Toward Security Verification against Inference Attacks on Data Trees

💡 Research Summary

Comments & Academic Discussion

Leave a Comment