Learning Dependency Models for Subset Repair
Inconsistent values are commonly encountered in real-world applications, which can negatively impact data analysis and decision-making. While existing research primarily focuses on identifying the smallest removal set to resolve inconsistencies, recent studies have shown that multiple minimum removal sets may exist, making it difficult to make further decisions. While some approaches use the most frequent values as the guidance for the subset repair, this strategy has been criticized for its potential to inaccurately identify errors. To address these issues, we consider the dependencies between attribute values to determine a more appropriate subset repair. Our main contributions include (1) formalizing the optimal subset repair problem with attribute dependencies and analyzing its computational hardness; (2) computing the exact solution using integer linear programming; (3) developing an approximate algorithm with performance guarantees based on cliques and LP relaxation; and (4) designing a probabilistic approach with an approximation bound for efficiency. Experimental results on real-world datasets validate the effectiveness of our methods in both subset repair performance and downstream applications.
💡 Research Summary
The paper addresses the problem of subset repair (S‑repair) in relational databases, where the goal is to delete a set of tuples so that the remaining instance satisfies a given set of denial constraints. Traditional approaches focus on finding a minimum‑size removal set, but multiple minimum solutions often exist, and selecting among them using frequency‑based heuristics can misclassify rare but correct data as errors. To overcome these limitations, the authors propose a novel framework that evaluates candidate repairs based on how well the surviving tuples conform to learned attribute‑dependency models.
A dependency model for a tuple t l and attribute A j predicts the distance between t l
Comments & Academic Discussion
Loading comments...
Leave a Comment