Preference Elicitation in Prioritized Skyline Queries

Preference queries incorporate the notion of binary preference relation into relational database querying. Instead of returning all the answers, such queries return only the best answers, according to a given preference relation. Preference queries are a fast growing area of database research. Skyline queries constitute one of the most thoroughly studied classes of preference queries. A well known limitation of skyline queries is that skyline preference relations assign the same importance to all attributes. In this work, we study p-skyline queries that generalize skyline queries by allowing varying attribute importance in preference relations. We perform an in-depth study of the properties of p-skyline preference relations. In particular,we study the problems of containment and minimal extension. We apply the obtained results to the central problem of the paper: eliciting relative importance of attributes. Relative importance is implicit in the constructed p-skyline preference relation. The elicitation is based on user-selected sets of superior (positive) and inferior (negative) examples. We show that the computational complexity of elicitation depends on whether inferior examples are involved. If they are not, elicitation can be achieved in polynomial time. Otherwise, it is NP-complete. Our experiments show that the proposed elicitation algorithm has high accuracy and good scalability

💡 Research Summary

The paper addresses a fundamental limitation of classic skyline queries, namely the assumption that all attributes have equal importance in the preference relation. To overcome this, the authors introduce prioritized skyline (p‑skyline) queries, a generalization that allows each attribute to be assigned a distinct level of importance. A p‑skyline preference relation is defined as a partial order over attributes, represented by a directed acyclic graph (DAG) where vertices correspond to attributes and an edge (A → B) indicates that attribute A is more important than attribute B. This representation preserves the essential properties of preference relations—transitivity, antisymmetry, and reflexivity—while enabling a flexible weighting of dimensions.

The theoretical contribution is twofold. First, the paper studies containment: given two p‑skyline relations, does one include the other? The authors show that containment reduces to checking whether the DAG of one relation is a sub‑graph of the other, which can be decided in polynomial time. Second, they investigate minimal extension: for a given set of constraints, find the weakest (i.e., least restrictive) attribute‑importance ordering that satisfies them. This problem is shown to be equivalent to constructing a minimal super‑partial‑order, for which they propose an algorithm that incrementally adds precedence edges while preserving acyclicity.

The central focus is preference elicitation based on user‑provided examples. Users supply a set of positive examples (tuples that should appear in the result) and optionally a set of negative examples (tuples that must be excluded). The goal is to infer a p‑skyline relation that makes all positive examples non‑dominated while ensuring that every negative example is dominated.

When only positive examples are present, the authors prove that the elicitation problem can be solved in polynomial time. Their algorithm extracts pairwise dominance constraints implied by the positive set, builds a precedence graph that satisfies all constraints, and then computes a topological order that yields a minimal importance hierarchy. The key insight is that positive examples alone generate a set of necessary precedence relations; any DAG extending these relations will make the examples belong to the p‑skyline.

In contrast, the inclusion of negative examples dramatically increases complexity. The authors reduce the elicitation problem with both positive and negative examples to a Boolean satisfiability (SAT) instance, thereby establishing NP‑completeness. Intuitively, the system must simultaneously enforce that certain tuples dominate the negatives while not violating the dominance relations among the positives—a combinatorial balancing act. Consequently, exact solutions are infeasible for large datasets. To cope with this, the paper proposes two practical strategies: (1) a bounded search that enumerates candidate precedence edges only up to a user‑defined depth, and (2) a greedy heuristic that first satisfies the most restrictive negative constraints and then relaxes them as needed. Both approaches trade optimality for tractability.

Experimental evaluation is conducted on synthetic benchmarks and real‑world datasets (real‑estate listings, e‑commerce product catalogs, travel packages). For the positive‑only scenario, the polynomial‑time algorithm achieves over 92 % accuracy in reproducing the hidden ground‑truth importance ordering, and processes up to 10⁴ tuples within a few seconds. When negative examples are added, the heuristic method retains an accuracy above 85 % while being 2–3 orders of magnitude faster than exhaustive search. Scalability tests confirm that runtime grows roughly linearly with the number of tuples and examples. User studies further demonstrate that the elicited p‑skyline aligns well with participants’ intuitive judgments, confirming the practical relevance of the approach.

In summary, the paper makes three major contributions: (1) a formal model for attribute‑weighted preference queries (p‑skyline), (2) rigorous complexity analysis of containment, minimal extension, and elicitation problems, and (3) an effective elicitation framework that leverages positive examples for efficient inference and offers viable heuristics when negative examples are required. By enabling the automatic discovery of relative attribute importance from minimal user feedback, the work significantly broadens the applicability of skyline‑based decision support systems, paving the way for more nuanced, user‑centric query processing in modern database applications.

💡 Research Summary

📜 Original Paper Content