Efficient Skyline Querying with Variable User Preferences on Nominal Attributes

Reading time: 5 minute
...

📝 Original Info

  • Title: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes
  • ArXiv ID: 0710.2604
  • Date: 2007-10-16
  • Authors: ** 논문에 저자 정보가 명시되어 있지 않음. (원문에 포함된 경우 별도 기재 필요) **

📝 Abstract

Current skyline evaluation techniques assume a fixed ordering on the attributes. However, dynamic preferences on nominal attributes are more realistic in known applications. In order to generate online response for any such preference issued by a user, we propose two methods of different characteristics. The first one is a semi-materialization method and the second is an adaptive SFS method. Finally, we conduct experiments to show the efficiency of our proposed algorithms.

💡 Deep Analysis

📄 Full Content

The skyline operator has emerged as an important summarization technique for multi-dimensional datasets. Given a set of m-dimensional data points, the skyline S is the set of all points p such that there is no other point q which dominates p. q is said to dominate p if q is better than p in at least one dimension and not worse than p in all other dimensions. Consider a customer looking for a vacation package to Cancun using three criteria: price, hotel-class and number of stops. We know that lower price, higher hotel class and less stops are more preferable. Thus, if p is in the skyline, then there is no other package q which has lower price, higher hotel class and less stops compared with p.

Skyline queries have been studied since 1960s in the theory field where skyline points are known as Pareto sets and admissible points [10] or maximal vectors [9]. However, earlier algorithms such as [9,8] are inefficient when there are many data points in a high dimensional space. The problem of skyline queries was introduced in the database context in [1].

Most of the existing studies handle only numeric attributes. Consider an example as shown in Table 1 showing a set of vacation packages with three attributes or dimensions 1 , Price, Hotel-class and Hotel-group. Most existing works consider the first two attributes which are nu-meric, where lower price and higher hotel-class are more preferable. Many efficient methods have been proposed for so-called full-space skyline queries which return a set of skyline points in a specific space (a set of dimensions such as price and hotel-class). Some representative methods include a block nested loop (BNL) algorithm [1], a sort first skyline (SFS) algorithm [7], a bitmap method [19], a nearest neighbor (NN) algorithm [13] and a branch and bound skylines (BBS) method [14,15]. Recently, skyline computation has been extended to consider subspace skyline queries which return the skylines in subspaces [23,17,22,18,16].

Hotel-group as shown in Table 1 is a categorical attribute. There can be partial ordering on categorical attributes. Some recent studies [3,2,4,6,5,12,11,20] consider partially-ordered categorical attributes. In [3,2], each partially-ordered attribute is transformed into two-integer attributes such that the conventional skyline algorithms can be applied. [4] studies the cost estimation of the skyline operator involving the partially ordered attributes.

Nevertheless, known existing work on categorical attributes assumes that each attribute has only one order: either a total or a partial order. In real life, it is not often that categorical attributes have a fixed predefined order. For example, different customers may prefer different realty locations, different car models, or different airlines. We call such a categorical attribute which does not come with a predefined order a nominal attribute. It is easy to name important applications with nominal attributes, such as realties (where type of realty, regions and style are examples of nominal attributes) and flight booking (where airline and transition airport are examples of nominal attributes). In this paper, we consider the scenarios where different users may have different preferences on nominal attributes. That is, more than one order need to be considered in nominal attributes.

Furthermore, typically, for a nominal attribute, there may be many different values, and a user would not specify an order on all the values, but would only list a few of the most favorite choices. Table 2 shows different customer preferences on Hotel-group. The preference of Alice is “T ≺ M ≺ * " which means that she prefers Tulips to

Mozilla and prefers these two to other hotel groups (i.e., Horizon). We call such preferences implicit preferences.

Note that different preferences yield different skylines. As shown in Table 2, the skyline is {a, c} for Alice’s preference but {a, c, e, f } for Fred’s preference. The numerous skylines make the problem highly challenging. Some latest works [6,5] study the problem of preference changes, whereupon the query results can be incrementally refined. In [12], a user or a customer can specify some values in nominal attributes as an equivalence class to denote the same “importance” for those values. [11] is an extension of [12]. In [11], whenever a user finds that there are a lot of irrelevant results for a query, s/he can modify the query by adding more conditions so that the result set is smaller to suit her/his need. However, these works only focus either on the effects of the query changes on the result size, or the reuse of skyline results when a query is refined in a progressive manner, but not on finding efficient algorithms. Here, we consider that different users may have different preferences and so the preferences are not undergoing refinement but they can be different or conflicting from one query to another. Also, we focus on the issue of efficient query answering. Nominal attributes

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut