Cooperative Update Exchange in the Youtopia System

Reading time: 6 minute
...

📝 Original Info

  • Title: Cooperative Update Exchange in the Youtopia System
  • ArXiv ID: 0903.5346
  • Date: 2009-04-01
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Youtopia is a platform for collaborative management and integration of relational data. At the heart of Youtopia is an update exchange abstraction: changes to the data propagate through the system to satisfy user-specified mappings. We present a novel change propagation model that combines a deterministic chase with human intervention. The process is fundamentally cooperative and gives users significant control over how mappings are repaired. An additional advantage of our model is that mapping cycles can be permitted without compromising correctness. We investigate potential harmful interference between updates in our model; we introduce two appropriate notions of serializability that avoid such interference if enforced. The first is very general and related to classical final-state serializability; the second is more restrictive but highly practical and related to conflict-serializability. We present an algorithm to enforce the latter notion. Our algorithm is an optimistic one, and as such may sometimes require updates to be aborted. We develop techniques for reducing the number of aborts and we test these experimentally.

💡 Deep Analysis

Deep Dive into Cooperative Update Exchange in the Youtopia System.

Youtopia is a platform for collaborative management and integration of relational data. At the heart of Youtopia is an update exchange abstraction: changes to the data propagate through the system to satisfy user-specified mappings. We present a novel change propagation model that combines a deterministic chase with human intervention. The process is fundamentally cooperative and gives users significant control over how mappings are repaired. An additional advantage of our model is that mapping cycles can be permitted without compromising correctness. We investigate potential harmful interference between updates in our model; we introduce two appropriate notions of serializability that avoid such interference if enforced. The first is very general and related to classical final-state serializability; the second is more restrictive but highly practical and related to conflict-serializability. We present an algorithm to enforce the latter notion. Our algorithm is an optimistic one, and

📄 Full Content

Communities everywhere on the Web want to share, store and query data. Their motivations for data sharing are very diverse -from entertainment or commercial activity to the desire to collaborate on scientific or artistic projects. The data involved is also varied, running the gamut from unstructured through semistructured to relational. The solutions used for data sharing are frequently custom-built for a concrete scenario; as such, they exhibit significant diversity themselves. To name only a few prominent solutions, Wiki software has proved very successful for community management of unstructured data; scientific portals such as BIRN [1] and GEON [2] allow scientists to pool their datasets; and an increasingly large number of vertical social network-ing sites include a topic-specific database that is maintained by the site's members.

While the scenarios mentioned above vary widely in their parameters, they have in common many high-level properties that translate into concrete design desiderata for Collaborative Data Integration (CDI) systems. In the Youtopia project, we are building a system to address these desiderata and enable community data sharing in arbitrary settings. Our initial focus is on relational data; however, the ultimate goal is to include arbitrary data formats and manage the data in its full heterogeneity, as in Dataspaces [16].

CDI has three fundamental aspects that distinguish it from other paradigms such as classical data integration. First, a CDI system must enable best-effort cooperation among community members with respect to maintenance of the data and metadata. That is, no worthwhile contribution to the repository should be rejected because it is incomplete, as another community member may be able to supply the knowledge required to complete it. This means a CDI system must be equipped to deal with incomplete data and metadata, as well as providing a way for users to complete them at a later time. Next, a CDI solution must manage disagreement regarding the data and schema or other metadata. Finally, it must maximize data utility.

These three aspects have clear tradeoffs in the extent to which they can be addressed; as such, they define a design space within which we can situate existing solutions and Youtopia. The structure of this design space also clarifies the relationship of CDI to classical data integration; the latter is fundamentally an effort to maintain utility while permitting as much disagreement as possible. CDI builds on this by introducing the added element of best-effort cooperation, familiar from the Web 2.0 model of enabling all users to create their own content on the internet.

Youtopia is a system that allows users to add, register, update and maintain relational data in a collaborative fashion. The architecture of Youtopia is presented in Figure 1. The storage manager provides a logical abstraction of the repository. In this abstraction, the repository consists of a set of logical tables or views containing the data; these are tied together by a set of mappings. The mappings are supplied by the users as the repository grows and serve to propagate changes to the data. Thus, at the logical level Youtopia is an update exchange system. In this paper, we introduce our update exchange model, which is designed to enable besteffort cooperation as far as possible; in this it differs from previous update exchange work such as Orchestra [15]. A small Youtopia repository is shown in Figure 2. It contains relations with travel and tourist information; the relations are conected by a set of mappings or tuple-generating dependencies (tgds). For instance, the tgd σ3 ensures that table R contains review information about all available tours of attractions, as explained in the following example.

Example 1.1. Suppose company ABC Tours starts running tours to Niagara Falls and the tuple T(Niagara Falls, ABC Tours) is added. The mapping will cause the new tuple R(Niagara Falls, ABC Tours, x3) to be inserted by the update exchange module. The x3 is a labelled null or variable which indicates that some review for the tour should exist, but is unknown to the system. The review may subsequently be filled in manually by a user.

This propagation of changes occurs through a process known as the (tgd) chase [4,23,7] -a simple mechanism for constraint maintenance in which the corrective operations required are relatively easy to determine and perform.

Tuple-generating dependencies and equivalent constraints such as GLAV mappings [22] and conjunctive inclusion dependencies [21] are frequently encountered in data integration [15,11,17,19,29]. Their ubiquity points to the fact that they are a very powerful formalism, applicable in a variety of subject domains. On the other hand, it is not always trivial for a user to specify a mapping correctly. However, this problem has been addressed in some existing work [29,26] and we are building on these solutions to set up an infrastructure to f

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut