Consistent Query Answering via ASP from Different Perspectives: Theory and Practice

Under conside ratio n for public ation in Theory and Practice of Logic Pro grammi ng 1 Consistent Query Answering via ASP fr om Dif fer ent P erspectives: Theory and Practice MARCO MANN A, FRANCESC O RICCA GIORGIO TERRA CIN A Department of Mathemat ics, University of Calabria, Italy ( e-mail: { manna,ricca,t erracina } @mat.uni cal.it ) submitte d 23 No vember 2010; re vised 1 J anuary 2003; acce pted 24 January 2011 Abstract A data integration system p rovides transparen t acc ess to differe nt d ata sources by suitably combining their data, and providing the user with a u niﬁed v ie w of them, called global schema . Ho wever , source data are g enerally not under the control of the data integration process, thus inte grated data may v io- late global integrity co nstraints even in presenc e o f locally-consistent d ata sources. In this scena rio, it may be an yway interesting to retrie ve as much co nsistent information as p ossible. The process of an- swering user queries under global constraint violations is called consistent query answering (CQA). Sev eral notions of CQA have been proposed, e.g., dependin g on whether integrated information i s assumed to be sound , complete , exact or a v ariant of them. This paper provides a contribution in this sett ing: it uniforms solutions coming from differe nt perspecti ves under a common ASP-based core, and provides query-dri v en optimizations designed for isolating and eliminating inefﬁciencies of the general approa ch for computing consistent answers. Moreove r , the paper introduces so me ne w theoretical results enriching existing knowledge on decidability and complexity of the considered problems. The ef fectiv eness of the approach is e videnced by exp erimental results. T o appea r in Theory and Pr actice of Logic Programming (TPLP). KEYWORDS : Answer Set Programm ing, Data Integration, Consistent Query Answering 1 Introduction The enor mous am ount of inf ormation dispersed over m any da ta sources, of ten stored in different heter ogeneo us data bases, has recen tly boosted the interest fo r data integration systems (Len zerini 2002). Roughly spea king, a d ata integration system pr ovides transpar- ent access to different data sources by suitably comb ining th eir data, and providin g th e user with a uniﬁed view of them, called global schema . In many cases, the applicatio n domain imposes some co nsistency req uiremen ts on integrated data. For instance, it may be at least de sirable to impo se some integrity constra ints (ICs), like pr imary/fo reign ke ys, on the global relations. I t may be the case that data stored at the sources may violate global ICs whe n integrated , since in gen eral d ata sources ar e not u nder the con trol o f the data integration p rocess. The standard app roach to this problem b asically consists of ex- plicitly mod ifying the d ata in orde r to eliminate IC violatio ns (d ata cleaning ). Howev er , 2 M. Manna , F . Ricca and G. T erracina the explicit rep air of data is n ot always convenient o r po ssible. Theref ore, when answer- ing a user quer y , the system should be able to “v irtually repair” relevant data (in the lin e of Arenas et al. 2003; Bertossi et al. 2005; Chomicki and Marcinkowski 200 5), in order to provide consistent answers; this task is also called Consistent Query Answering (CQA). The datab ase commun ity has spent considerab le efforts in this area, relev ant research results have been obtain ed to clarify semantics, decida bility , and co mplexity of data- integration u nder co nstraints and , spec iﬁcally , for CQA. In pa rticular, se veral no tions of CQA have been proposed (see Bertossi et al. 20 05 for a surve y), e .g. depe nding o n whether the information i n the database is assumed to be sound , complete or exact . Ho we ver , w hile efﬁcient systems are already av ailable fo r simple data integration scenario s, solution s be- ing both scalable an d compr ehensive have not been implemented yet for CQA, mainly due to the f act that hand ling inconsistencies arising from constraints violatio n is inheren tly hard. Moreover, m ixing different kind s of con straints (e.g. denial co nstraints, and inclu - sion dependencies) on the same global da tabase ma kes, of ten, th e query answer ing pro cess undecid able (Abiteboul et al. 1995; Cal ` ı et al. 2003a). This pap er p rovides some co ntributions in th is setting . Speciﬁcally , it ﬁrst starts fr om d if- ferent state-o f-the-art semantic perspe ctiv es (Arenas et al. 2003; Cal` ı et al. 200 3a; Chomicki and Marcinkowski 2005) and r evisits them in ord er to pr ovide a un iform, c ommon core based on Answer Set Pro- grammin g (ASP) (Gelfo nd and Lifschitz 1988; Gelfon d and Lifschitz 1991). Th us, it pro- vides q uery driven op timizations, in the ligh t of the experience we gain ed in the IN- FOMIX (Le one et al. 2005) p roject in order to overcome the limitations ob served in r eal- world scenarios. The main contributions of this paper can be summarized in: • A theoretical analysis of considered semantics which extends previous results. • The deﬁnition of a u niﬁed fram ew ork for CQA based o n a pur ely declara ti ve, logic based ap proach which suppo rts the most relev ant semantics assumptions on sou rce data. Speciﬁcally , the prob lem of co nsistent quer y answering is reduced to cau tious reasoning o n (disjun ctiv e) ASP pro grams with aggregates (Faber et al. 2010) auto- matically built from both the query and in volved constraints. • The deﬁnition of an optimiza tion a pproac h d esigned to (1) “localize” and limit the inefﬁcient part o f the comp utation of consistent answers to small f ragments of the input, (2) cast down the computational complexity of the repair process if possible. • The implementa tion of the entire framew ork in a full ﬂedged prototype s ystem. • The capab ility of handling large amounts of data, typica l of real-world data integra- tion scen arios, u sing as intern al qu ery ev aluator the DL V DB (T erracina et al. 2008) sys- tem; indeed, DL V DB allows fo r mass-m emory d atabase ev aluations an d distributed data managem ent features. In order to assess the ef fecti veness of the proposed approach, we carried out e xperimen - tal activities both o n a re al world scenar io and on synth etic data, com paring its behavior on different semantics and constraints. The plan of the paper is as follows. Section 2 form ally intro duces the notion of CQA under different seman tics and some n ew theoretical results on decidability and complexity for this pr oblem. Section 3 ﬁrst introd uces a un iﬁed (gen eral) solution to handle CQA via ASP , and th en presents some optimizations. Sec tion 4 de scribes the benchmark fr amew ork CQA via ASP fr om differ ent per spectives 3 we ad opted in the tests and discu sses on obtain ed results. Finally , Section 5 comp ares related work and draws so me conclusive co nsideration s. 2 Data Integrat ion Framework In this pap er we explo it th e data integration setting to p oint out moti v ations an d challenges underly ing CQA. However , as it will b e clariﬁed in the following, techn iques a nd results provided in the paper h old also fo r a sing le database setting . W e next for mally describe the adopted data integration framew ork. The following n otation will be used througho ut the paper . W e always denote by Γ a countab ly inﬁnite doma in of totally or dered values; by t a tup le of values fr om Γ ; by X a variable; by ¯ x a sequence X 1 , . . . , X n of ( not necessarily d istinct) variables, and by | ¯ x | = n its len gth. Let ¯ x , ¯ x ′ be two sequences of variables, we d enote b y ¯ x − ¯ x ′ the sequ ence ob tained from ¯ x b y discardin g a variable if it ap pears in ¯ x ′ . When ev er all th e variables of seq uence ¯ x ap pear in ano ther sequ ence ¯ x ′ , we simply write ¯ x ≤ ¯ x ′ . Giv en a sequence ¯ x and a set π ⊆ { 1 , . . . , | ¯ x |} , we denote by ¯ x π the seq uence obtaine d from ¯ x b y d iscarding a variable if its position is n ot in π . (Sim ilarly , g iv en a tup le t and a set π ⊆ { 1 , . . . , | t |} , we den ote by t π the tuple obtained from t b y discarding a v alue if its position is n ot in π .) Moreover , we d enote, b y σ ( ¯ x ) a conjun ction of co mparison ato ms of the fo rm X ⊙ X ′ , where ⊙ ∈ {≤ , ≥ , <, > , 6 = } , and b y ⊖ , the symmetric difference operator between tw o sets. A relational database schema is a pair R = h names ( R ) , c onstr ( R ) i where names ( R ) and c onstr ( R ) a re the relation n ames and the integrity constrain ts (ICs) of R , respecti vely . The arity of a gi ven relatio n r ∈ names ( R ) is denoted by a rity ( r ) . A database (instance) for R is any set of facts (Abiteboul et al. 1995) of the form: F = { r ( t ) : r ∈ names ( R ) ∧ t is a tuple from Γ ∧ | t | = arity ( r ) } In th e following, we adop t the unique name assumption , an d dom ( F ) denote s the subset of Γ containin g all the v alues appearing in the facts of F . Let r 1 , . . . , r m ∈ names ( R ) , the set c onstr ( R ) contains ICs of the form: 1. ∀ ¯ x 1 , . . . , ¯ x m ¬ [ r 1 ( ¯ x 1 ) ∧ . . . ∧ r m ( ¯ x m ) ∧ σ ( ¯ x 1 , . . . , ¯ x m ) ] ( denial constraints – DCs) 2. ∀ ¯ x ∀ [ r 1 ( ¯ x 1 ) → ∃ ¯ x 2 ∃ r 2 ( ¯ x 2 ) ] ( inclusion depende ncies – INDs); where arity ( r i ) = | ¯ x i | , for each i in [1 .. m ]. I n particu lar , for INDs we requir e that all the variables within an ¯ x i ( 1 ≤ i ≤ 2 ) are distinct, ¯ x ∀ ≤ ¯ x 1 , ¯ x ∀ ≤ ¯ x 2 , and ¯ x 2 ∃ = ¯ x 2 − ¯ x ∀ . Note that, if | ¯ x 2 ∃ | = 0 , then ¯ x ∀ = ¯ x 2 ≤ ¯ x 1 . In the case we are only interested in emp hasizing the relation names in volved in an IND, we simp ly write r 1 ( ¯ x 1 ) → r 2 ( ¯ x 2 ) or r 1 → r 2 . A database F is said to be consistent w .r . t. R if all ICs are satisﬁed. A conjunc ti ve query c q ( ¯ x ) over R is a formu la of the form ∃ ¯ x 1 ∃ , . . . , ¯ x m ∃ r 1 ( ¯ x 1 ) ∧ . . . ∧ r m ( ¯ x m ) ∧ σ ( ¯ x 1 . . . , ¯ x m ) where ¯ x i ∃ ≤ ¯ x i for each i in [1.. m ], ¯ w = ¯ x 1 − ¯ x 1 ∃ , . . . , ¯ x m − ¯ x m ∃ are the free variables of q , and ¯ x contains o nly and all the variables of ¯ w (with n o duplicates, an d possibly in different order) . A union of conjunctiv e qu eries q ( ¯ x ) is a formula o f the form c q 1 ( ¯ x ) ∨ . . . ∨ c q n ( ¯ x ) . In the following, f or sim plicity , th e term q uery r efers to a union of c onjunctive queries, if not d ifferently speciﬁed . Given a datab ase F fo r R , and a quer y q ( ¯ x ) , the answer to q is the set of n - tuples of v alues ans ( q , F ) = { t : F | = q ( t ) } . 4 M. Manna , F . Ricca a nd G. T erracina 2.1 The Data Integration Model A data integration system is formalized (Lenzerini 2002) a s a triple I = hG , S , Mi where  G is the global schema . A global database for I is any database for G ;  S is the source schema . A source database for I is any database consistent w .r .t. S ;  M is the global-as-v iew (GA V) mapping , that associates each element g in names ( G ) with a union of conjun ctiv e q ueries over S . Let F b e a source database for I . The retrieved glob al database is r et ( I , F ) = { g ( t ) : g ∈ n ames ( G ) ∧ t ∈ ans ( q , F ) ∧ q ∈ M ( g ) } for G satisfy ing the map ping. No te that, when source data are c ombined in a u niﬁed sch ema with its o wn ICs, the retrieved glob al database might be inconsistent. In the following, when it is clear fr om the context, we use simp ly the symbo l D to denote th e retrieved global database r et ( I , F ) . In fact, all r esults provided in the paper hold f or any data base D comply ing with some schema G but possibly inco nsistent w .r .t. the constraints of G . Example 1 Consider a b ank a ssociation th at d esires to unif y the data bases of two branch es. Th e ﬁrst (source) d atabase models managers by u sing a re lation man ( c o de , name ) an d employ- ees by a relation emp ( c o de , n ame ) , where c o de is a primary key fo r bo th tables. The second datab ase stores the same data in a relation employe e ( c o de , name , r ole ) . Supp ose that th e data h av e to b e integrated u nder a g lobal schem a with two relation s m ( c o de ) and e ( c o de , name ) , wh ere the global ICs are: • ∀ X 1 , X 2 , X 3 ¬ [ e ( X 1 , X 2 ) ∧ e ( X 1 , X 3 ) ∧ X 2 6 = X 3 ] namely , c o de is the key of e ; • ∀ X 1 [ m ( X 1 ) → ∃ X 2 e ( X 1 , X 2 )] i.e., an IND imposing that e ach manager code mu st be an employee code as well. The mapp ing is deﬁned by the f ollowing Datalog rules (a s usual, see Abitebo ul et al. 1995): e ( X c , X n ) : − emp ( X c , X n ) · m ( X c ) : − man ( X c , ) · e ( X c , X n ) : − employe e ( X c , X n , ) · m ( X c ) : − employe e ( X c , , ‘ manager ′ ) · Assume that, emp stores tu ples (‘e1’,‘ john’) , (‘e2’,‘ mary’) , (‘e3’,‘ willy’) , man stor es (‘e1’,‘ john’) , and employe e stores (‘e1’,‘ ann’,‘ manager’) , (‘e2’,‘ mary’, ‘manag er’) , (‘e3’, ‘rose’,‘e mp’) . It is easy to verif y that, a lthough the sour ce databa ses are consistent w .r .t. local constraints, the g lobal database, o btained by ev aluating the m apping , violates the k ey constraint on e as both john and ann have the sam e code e1 , and both willy and rose have the same code e3 in table e . ⊓ ⊔ 2.2 Consistent Query Answer ing under d ifferent sema ntics In case a d atabase D v iolates ICs, on e can still be interested in queryin g th e “consistent” informa tion originatin g from F . One p ossibility is to “r epair” D (by insertin g o r de leting tuples) in such a w ay that all the ICs are satisﬁed. But there are several ways to “repair” D . As an example, in order to satisfy an IND of the for m r 1 → r 2 one might either rem ove CQA via ASP fr om differ ent per spectives 5 violating tu ples fr om r 1 or insert new tuples in r 2 . Moreover , th e rep airing strategy d e- pends on the particu lar seman tic assump tion m ade o n the data in tegration system. Semantic assumptions m ay range from (strict) soundne ss to (strict) comp leteness. Roug hly speak - ing, co mpleteness co mplies with the closed world assum ption where missing facts are assumed to be false; on the co ntrary , soundness complies with the open world assumption where D may be incomp lete. W e next deﬁne consistent q uery an swering under som e rel- ev ant seman tics, namely loosely-exact, lo osely-soun d, CM-com plete (Arenas et al. 2003; Cal ` ı et al. 20 03a; Chomicki and Marcinkowski 2005). More formally , let Σ denote a se- mantics , and D a possibly in consistent database fo r G , a database B is said t o be a Σ - repair for D if it is consistent w .r .t. G an d one of the following conditions holds: 1. Σ = CM-complete , B ⊆ D , and ∄ B ′ ⊆ D such that B ′ is consistent and B ′ ⊃ B ; 2. Σ = loosely-sou nd and ∄ B ′ such that B ′ is consistent and B ′ ∩ D ⊃ B ∩ D ; 3. Σ = loosely-exact , and ∄ B ′ such that B ′ is consistent and B ′ ⊖ D ⊂ B ⊖ D . The CM-complete semantics allo ws a min imal nu mber o f deletions in each repair to av oid empty rep airs, if possible, but do es not allow inser tions. The loosely-sou nd seman- tics allo ws insertions and a minimal amoun t of de letions. Finally , the loosely-exact se- mantics allows both in sertions and deletio ns by m inimization of th e symmetric difference between D and the repairs. Deﬁnition 1 Let D be a database for a schema G , and Σ be a semantics. The consistent answer to a query q w .r .t. D , is the set ans Σ ( q , G , D ) = { t : t ∈ ans ( q , B ) for each Σ -repair B for D} Consistent Query Answering (CQA) is the problem of computin g ans Σ ( q , G , D ) . ⊓ ⊔ Observe th at othe r semantics have b een consid ered in the literature, like sound , com- plete , exact , loosely-co mplete , etc. (Cal` ı et al. 2003a); howe ver , some o f th em a re trivial for CQA; as an examp le, in the exact semantics CQA makes sense only if the r etrieved database is already consistent with the global constraints, whereas in the complete and loosely-co mplete semantics CQA will always re turn a void answer . Note that, th e seman - tics co nsidered in this paper add ress a wide signiﬁcan t r ange of ways to rep air the retriev ed database which are also rele vant f or CQA. Example 2 By f ollowing Example 1 , the retrieved glob al datab ase adm its exactly the fo llowing r epairs under the CM-complete semantics: B 1 = { e (‘e2’,‘ mary’) , e (‘e1’,‘ john’) , e (‘e3’,‘ willy’) , m (‘e1’) , m (‘e2’) } B 2 = { e (‘e2’,‘ mary’) , e (‘e1’,‘ john’) , e (‘e3’,‘ rose’) , m (‘e1’) , m (‘e2’) } B 3 = { e (‘e2’,‘ mary’) , e (‘e1’,‘ ann’) , e (‘e3’,‘ willy’) , m (‘e1’) , m (‘e2’) } B 4 = { e (‘e2’,‘ mary’) , e (‘e1’,‘ ann’) , e (‘e3’,‘ rose’) , m (‘e1’) , m (‘e2’) } Query m ( X ) ask ing f or th e list of m anager co des h as then both e1 and e2 as co nsistent an - swers, whereas the query e ( X , Y ) asking for the lis t of employees h as o nly e (‘e2’,‘ mary’) as consistent answer ( e is th e only tuple in each CM-complete repair). ⊓ ⊔ 6 M. Manna , F . Ricca a nd G. T erracina 2.3 Restricted Classes of Integrity Constraints The pro blem of comp uting CQA, unde r general combina tions of ICs, is und ecidable (Abiteboul et al. 1995). Howe ver , restriction s o n ICs to retain decidability an d id entify tractable cases can be im - posed. Deﬁnition 2 Let r b e a relation name of arity n , and π be a set of m ≤ n indices from I = { 1 , . . . , n } . A key dependency (KD) for r co nsists of a set o f n − m DCs, exactly one fo r each in dex i ∈ I − π , o f the form ∀ ¯ x 1 , ¯ x 2 ¬ ( r ( ¯ x 1 ) ∧ r ( ¯ x 2 ) ∧ ¯ x i 1 6 = ¯ x i 2 ) where no variable o ccurs twice in each ¯ x i ( 1 ≤ i ≤ 2 ), | ¯ x 1 | = | ¯ x 2 | = n , the sequence ¯ x π 1 exactly coincid es with ¯ x π 2 , and ¯ x j 1 is distinct fr om ¯ x j 2 for e ach j ∈ I − π . The set π is called the primary -key of r and is denoted by key ( r ) . W e assum e tha t at most on e KD is speciﬁed for each r elation (Cal` ı et al. 2003a). Finally , fo r each relation name r ′ such th at no DC is explicitly speciﬁed for, we say , witho ut loss of generality , that key ( r ′ ) = { 1 , . . . , arity ( r ′ ) } . ⊓ ⊔ Deﬁnition 3 Giv en an inclusion dependency d of the form ∀ ¯ x ∀ [ r 1 ( ¯ x 1 ) → ∃ ¯ x 2 ∃ r 2 ( ¯ x 2 ) ] , we deno te by π d L ⊆ { 1 , . . . , arity ( r 1 ) } an d π d R ⊆ { 1 , . . . , arity ( r 2 ) } the two sets of in dices indu ced by the positions of the variables ¯ x ∀ in ¯ x 1 and ¯ x 2 , respectively . More formally , π d L = { i : ¯ x i 1 is universally qu antiﬁed in d } and π d R = { i : ¯ x i 2 is universally qu antiﬁed in d } . ⊓ ⊔ For example, let d de note the IND ∀ X 1 , X 2 [ r 1 ( X 1 , X 3 , X 2 ) → ∃ X 4 r 2 ( X 4 , X 2 , X 1 ) ] . W e h av e that π d L = { 1 , 3 } an d π d R = { 2 , 3 } . Deﬁnition 4 An IND d is said to be • a foreign key (FK) if π d R = key ( r 2 ) (Abiteb oul et al. 1995); • a foreign superkey (FSK) if π d R ⊇ key ( r 2 ) (Levene and V incent 2000); • non-key-con ﬂicting (NKC) if π d R 6⊃ key ( r 2 ) (Cal` ı et al. 2003 a ). ⊓ ⊔ Deﬁnition 5 An FSK d of the form r 1 → r 2 is said to be safe (SFSK) if π d L ⊆ key ( r 1 ) . In particular, if d is a safe FK we call it an SFK. ⊓ ⊔ For example, let d den ote the FSK ∀ X 1 , X 2 [ r 1 ( X 1 , X 3 , X 2 ) → ∃ X 4 r 2 ( X 4 , X 2 , X 1 ) ] where key ( r 2 ) = { 3 } . Thus, if key ( r 1 ) = { 1 , 3 } , d is SFSK, whereas if key ( r 1 ) = { 1 , 2 } , d is no t SFSK. T a ble 1 sum marizes known and new r esults abou t compu tability and co mplexity of CQA under relev ant classes of I Cs and the thr ee semantic assum ptions consider ed in this p aper . In particular, gi ven a que ry q (witho ut comparison atoms if Σ ∈ { lo osely - sound , lo osely - exact } ) , we refer to th e decision pr oblem of establishing whether a tuple fr om dom ( D ) be- longs to ans Σ ( q , G , D ) or not. Note th at, Chomicki and Marcinkows ki (2005) have pr oved computab ility and complexity of CQA for the CM-complete semantics in case of conjunc- ti ve q ueries with compar ison p redicates. Howe ver , since in such a settin g there is a ﬁnite number of re pairs each o f ﬁn ite size, th en their r esults straigh tforwardly hold for unio n of conjunc ti ve queries as well. New decidab ility and complexity results for CQA und er KDs and SFSKs only , with Σ ∈ { loosely-sou nd , loosely-exact } are proved in Section 2.4. CQA via ASP fr om differ ent per spectives 7 T a ble 1. Data Complexity of CQA (distinguishing between c yclic/acyclic INDs) DCs INDs loosely-sound loosely-exact CM-complete no any in PTIME (1) in PTIME (1) in PTIME (2) KD no coNP -c (1) coNP -c (1) coNP -c (2) KD NKC coNP -c (1) Π p 2 -c (1) in Π p 2 (2) / in coNP (2) KD SFSK in Π p 2 (3) in Π p 2 (3) in Π p 2 (2) / in coNP (2) KD any undec. (1) undec. (1) in Π p 2 (2) / in coNP (2) any a ny undec. (4) undec. (4) Π p 2 -c (2) / coNP -c (2) (1) Cal ` ı et al. 2003a; (2) Chomicki and Marcink o wski 2005 ; (3) Section 2.4; (4) Abitebou l et al. 1995; 2.4 Loosely- exact and Loosely-sound semantics under KD and SFSK In this section we p rovide new d ecidability and complexity resu lts for CQA under both the loosely-exact and the loosely-so und semantics with KDs and SFSKs. In the r est of the section we alw ays denote by: • G , a schema containin g KDs and SFS Ks only ; • D , a po ssibly inconsistent database for G ; • q , a unio n of conjunctive q ueries without comparison atoms. • Σ ∈ { loosely-exact , loosely-sou nd } . W e ﬁrst show that, in the aforemention ed h ypoth esis, the size of each repair is ﬁnite. Deﬁnition 6 Let B be a Σ -repair for D an d i ≥ 0 be a natural nu mber . W e inductively deﬁne the s ets B i as follows: 1. If i = 0 , then B 0 = B ∩ D . 2. If i > 0 , then B i ⊆ B − ( B 0 ∪ . . . ∪ B i − 1 ) is arb itrarily cho sen in such a way that its facts a re necessary a nd s ufﬁcient for satisfyin g all th e INDs i n c onstr ( G ) that are violated in B 0 ∪ . . . ∪ B i − 1 . Observe that B = S i ≥ 0 B i and that B i ∩ B j = ∅ f or each j 6 = i . ⊓ ⊔ Lemma 1 Let B be a Σ -re pair for D , then 1. The key of each fact in B o nly contains values f rom dom ( D ) . 2. |B | is ﬁnite. 8 M. Manna , F . Ricca a nd G. T erracina Pr o of (1) Let i > 0 be a natural number . Let r i ( t i ) be a fact in B i such that th ere is an ind ex j ∈ key ( r i ) f or wh ich t j i 6∈ dom ( B 0 ) . L et r i − 1 ( t i − 1 ) b e one of the facts in B i − 1 that forces the p resence o f r i ( t i ) in B i for satisfyin g so me IND, say d . (Note th at, b y De ﬁnition 6, there m ust be at least one of such a fact because B i would oth erwise vio late con dition 2, since r i ( t i ) would be unnecessary .) Moreover, since d is a safe FSK, then there must e xist an in dex k ∈ key ( r i − 1 ) such that t j i = t k i − 1 . Thu s, r i − 1 ( t i − 1 ) contain s a value be ing no t in dom ( B 0 ) inside its key as well as r i ( t i ) . Since i has been chosen arbitrarily , then value t j i has to be part of a fact of B 0 , which is clearly a contrad iction. (2) Since, the key of each fact in B can only con tain values from dom ( B 0 ) , a nd | dom ( B 0 ) | ≤ |B 0 |· α wh ere α = max { arity ( g ) : g ∈ names ( G ) } , then |B | ≤ | n ames ( G ) |·| dom ( B 0 ) | α ≤ | names ( G ) | · ( α · |B 0 | ) α ≤ | names ( G ) | · ( α · |D | ) α . W e n ext characterize representative databases for Σ -rep airs. Deﬁnition 7 Let B be a Σ -r epair for D . W e den ote by homo ( B ) the (po ssibly inﬁnite) set o f datab ases deﬁned in such a way that B ′ ∈ homo ( B ) if and only if: • B ′ can be obtain ed fr om B by replacing each value (if any) that is no t in dom ( D ) with a value from Γ − dom ( D ) ; and • none of the values in Γ − dom ( D ) occu rs twice in B ′ . Finally , we denote by h B , B ′ : dom ( B ′ ) → dom ( B ) the fun ction (homomor phism) as- sociating values in dom ( B ′ ) with values in dom ( B ) , where h B , B ′ ( α ) = α , for ea ch α ∈ dom ( D ) ∩ dom ( B ′ ) . ⊓ ⊔ Note that, since ( by Le mma 1) the key of each fact in B on ly con tains values from dom ( D ) , then |B ′ | = |B | holds. For exam ple, if B = { p (1 , ε 1 , ε 2 ) , q (2 , ε 2 , ε 1 ) } with dom ( D ) = { 1 , 2 } and key ( p ) = key ( q ) = { 1 } , then all o f the following datab ases are in homo ( B ) : { p (1 , ε 1 , ε 3 ) , q (2 , ε 2 , ε 4 ) } , { p (1 , ε 4 , ε 2 ) , q (2 , ε 3 , ε 1 ) } and { p (1 , ε 5 , ε 6 ) , q (2 , ε 7 , ε 8 ) } . Lemma 2 If B is a Σ -repa ir for D , then each B ′ ∈ homo ( B ) also is. Pr o of Let B ′ ∈ homo ( B ) . First of all, we prove that B ′ is consistent w .r .t. G . In par ticular, since the key of each fact in B only con tains values fr om dom ( D ) (by Lemma 1), then B ′ cannot v iolate a ny KD (by Deﬁnitio n 7 ); Mo reover , since each IND has to be satisﬁed throug h values of a key (by deﬁn ition of safe FSKs), and since the key of each fact in B only con tains values f rom dom ( D ) (by Lem ma 1), then B ′ cannot v iolate any I ND (by Deﬁnition 7); W e now prove that B ′ is a repa ir , ﬁrst for the loosely-sou nd seman tics and then fo r the loosely-exact semantics. [loosely-sound] I f Σ = loosely-sou nd , then ob serve that B ′ ∩ D = B ∩ D , b y deﬁn ition CQA via ASP fr om differ ent per spectives 9 of homo ( B ) . Thu s, if B ′ was consistent but not a loosely-sou nd repair there would exist a loosely-sou nd r epair B ′′ such that B ′′ ∩ D ⊃ B ′ ∩ D = B ∩ D . Con tradiction. [loosely-ex act] I f Σ = loosely-exact , then assume that B is a loosely -exact re pair but B ′ (althoug h consistent w .r . t. G ) is not. By de ﬁnition, there must be a lo osely-exact repair B ′′ such that B ′′ ⊖ D ⊂ B ′ ⊖ D . In particular, we distinguish three cases : (1) B ′′ − D = B ′ − D an d D − B ′′ ⊂ D − B ′ (2) B ′′ − D ⊂ B ′ − D an d D − B ′′ = D − B ′ (3) B ′′ − D ⊂ B ′ − D an d D − B ′′ ⊂ D − B ′ C A S E 1 : Sin ce, b y Deﬁnition 7, for each fact in B there is a fact in B ′ with th e same key , if we could ad d th e facts in B ′′ − B ′ to B ′ without viola ting a ny KD, then such facts could also be added to B without violatin g any KD. Moreover, if w e could add to B ′ the facts in B ′′ − B ′ without violating any IND, then su ch facts cou ld be also add ed to B preserving consistency . Th is follows by the deﬁnition of safe FSKs (because each IND has to be satisﬁed throu gh values o f a key), b y L emma 1 (beca use the key of each fact in a loosely-exact repair only contain s values from dom ( D ) ) and , by Deﬁnition 7 (b ecause for each fact in B ′ there is a fact in B with the sam e key and with the same values from dom ( D ) ). Consequently , we co uld a dd all the facts in B ′′ − B ′ to B p reserving consistency . But this is not possible since B is a loosely -exact repair . C A S E 2: Since in B ′ we h ave unnec essary facts (th ose in B ′ − B ′′ ) or eq uiv alently the facts in B ′′ do not v iolate any IND, the n the co rrespon ding facts in B do n ot vio late any IND by Lemm a 1 and by Deﬁnition 7. Conseq uently , if each fact f ∈ B , su ch that there is a fact f ′ ∈ B ′ − B ′′ that is homomorp hic to f , was removed from B , then we w ould obtain a database pr eserving co nsistency and with a smaller sy mmetric dif ference than B . But this is not possible since B is a loosely-exact repair . C A S E 3 : Analo gous considerations can be done by combining case 1 and case 2. W e n ext d eﬁne the ﬁnite database D ∗ having am ong its subsets a numb er of Σ -r epairs sufﬁcient f or solving CQA. Deﬁnition 8 Let c be a v alue in Γ − dom ( D ) . Consider the largest (possibly inconsistent) database, say C , con structible on the do main dom ( D ) ∪ { c } su ch that f ∈ C iff th e value c does not appear in the k ey of f . Let N be a ﬁxed set of values arbitrarily c hosen from Γ − dom ( D ) whose cardin ality is equal to the nu mber o f oc currenc es of c in C . W e deno te by D ∗ one possible database for G o btained from C by replacing ea ch occurre nce o f c with a value from N in such a way that each v alue in N oc curs exactly once in D ∗ . ( | C | = |D ∗ | .) ⊓ ⊔ For example, if dom ( D ) = { 1 , 2 } and G = { p } with arity ( p ) = 2 an d key ( p ) = { 1 } , then C = { p (1 , 1 ) , p (1 , 2) , p (1 , c ) , p (2 , 1) , p (2 , 2) , p (2 , c ) } . Let us ﬁx N = { ε 1 , ε 2 } . Thus, D ∗ has the following form: { p (1 , 1 ) , p (1 , 2) , p (1 , ε 1 ) , p (2 , 1) , p (2 , 2) , p (2 , ε 2 ) } . Pr o position 1 The following hold: • |N | = P g ∈G ( arity ( g ) −| key ( g ) | ) ·| dom ( D ) | | key ( g ) | · ( | dom ( D ) | + 1) arity ( g ) −| ke y ( g ) |− 1 • |D ∗ | ≤ P g ∈G ( | dom ( D ) | + 1) arity ( g ) ≤ P g ∈G ( arity ( g ) · |D| + 1) arity ( g ) 10 M. Manna , F . Ricca a nd G. T erracina Lemma 3 If B is a Σ -repa ir for D , then there exists B ′ ∈ homo ( B ) such that B ′ ⊆ D ∗ . Pr o of B ′ can b e ob tained fr om B b y r eplacing each fact r ( t 1 ) ∈ B with the u nique fact r ( t 2 ) ∈ D ∗ such that for each i ∈ arity ( r ) either t i 2 = t i 1 , if t i 1 ∈ dom ( D ) , o r t i 2 ∈ N , if t i 1 6∈ dom ( D ) . Moreover , note that, since B cannot contain two facts with th e same key and sin ce keys on ly have values from dom ( D ) , then each fact in D ∗ can rep lace at m ost one fact in B . Finally , B ′ ∈ homo ( B ) by Deﬁnition 7. Lemma 4 Let B be a Σ - repair fo r D , B ′ ∈ homo ( B ) , q b e a quer y , and t b e a tuple of values fr om dom ( D ) . If t ∈ ans ( q , B ′ ) , then t ∈ ans ( q , B ) . Pr o of Let q i be one of the con junctions in q , if t ∈ ans ( q i , B ′ ) , then there is a substitution µ ′ from the variables of q i to values in Γ such that B ′ | = q i ( t ) . But since, b y Deﬁnition 7, each fact in B ′ is u niv ocally associated with a u nique fact in B by p reserving th e values in dom ( D ) , an d since all th e extra values in B ′ are distinct, the n there must also be a substitution µ such that B | = q i ( t ) . I n particu lar , let x be a variable in q i , we ca n deﬁne µ in such a way that µ ( x ) = h B , B ′ ( µ ′ ( x )) , wh ere h is the ho momor phism f rom B ′ to B ( see Deﬁnition 7). Clearly , if t ∈ ans ( q i , B ′ ) for at least one q i in q then t ∈ ans ( q , B ′ ) too and, consequen tly , t ∈ ans ( q , B ) The next th eorem states th e d ecidability of CQA u nder bo th the lo osely-exact and th e loosely-sou nd sem antics with KDs and SFSKs only . Theor em 1 Let B be a Σ -repair for D , q a qu ery , and t a tuple from dom ( D ) . Let B ⊆ 2 D ∗ denote the set of all Σ - repairs contained in D ∗ . Then, t ∈ ans Σ ( q , G , D ) iff t ∈ ans ( q , B ) ∀B ∈ B · Pr o of ( ⇒ ) W e have to pr ove that, if t ∈ ans Σ ( q , G , D ) , then t ∈ ans ( q , B ) for each B ∈ B , or equiv alently if t 6∈ ans ( q , B ) for some B ∈ B , then t 6∈ ans Σ ( q , G , D ) . This follows, by the deﬁnition of ans Σ ( q , G , D ) and from the fact that B only contain s Σ -repairs. ( ⇐ ) W e hav e to prove that, if t ∈ ans ( q , B ) for each B ∈ B , th en t ∈ ans Σ ( q , G , D ) . Assume tha t t ∈ ans ( q , B ) for each B ∈ B but t 6∈ ans Σ ( q , G , D ) . This w ould entail that there is a repair B 0 such that t 6∈ ans ( q , B 0 ) . But, since t 6∈ ans ( q , B ′ ) for each B ′ ∈ homo ( B 0 ) (by Lemma 4), and since B ∩ homo ( B 0 ) always con tains a r epair, say B ′′ (by Lemma 3), then we ha ve a contradiction since t 6∈ ans ( q , B ′′ ) has to ho ld whereas we have assumed that t ∈ ans ( q , B ) for each B ∈ B . Decidability an d co mplexity results, un der KDs and SFSKs only , follow fr om Theo rem 1. CQA via ASP fr om differ ent per spectives 11 Cor olla ry 1 Let G be a g lobal schema co ntaining KDs and SFSKs o nly , D be a possibly inco nsistent database for G , q be a query , Σ ∈ { loosely-exact , loosely-sou nd } , and t be a tuple of values from dom ( D ) . Th e prob lem of establishing whether t ∈ ans Σ ( q , G , D ) is in Π p 2 in data complexity . Pr o of It sufﬁces to prove tha t th e prob lem o f establishing whethe r t 6∈ ans Σ ( q , G , D ) is in Σ p 2 . This can be d one by (i) building D ∗ , and (ii) g uessing B ∈ 2 D ∗ such th at B is a Σ -repair and t 6∈ ans ( q , B ) . Since, b y Propo sition 1, |D ∗ | ∈ O ( |D | α ) where α = max { arity ( g ) : g ∈ names ( G ) } , then step (i) (enum erate th e f acts of D ∗ ) can be done in polynom ial time. Since checking th at t 6∈ ans ( q , B ) c an be done in P TIME . It remains to show that check ing whether B is a Σ -repair can be done in coNP . [loosely-ex act] If Σ = loosely-exact , this task corresponds to checking that there is n o consistent B ′ ⊆ D ∪ B su ch that B ′ ⊖ D ⊂ B ⊖ D , where this last task is doable in PTIME . [loosely-sound] If Σ = loosely-sou nd , this task co rrespon ds to checking that the re is no consistent B ′ ⊆ D ∗ such that B ′ ∩ D ⊃ B ∩ D , where this last task is doab le in PTIME . Then the thesis follows. 2.5 Equivalence of CQA under loosely-ex act and C M-complete semantics In this section we de ﬁne some relev ant cases in which CQA un der loosely-exact and CM- complete semantics coincide. Lemma 5 Giv en a database D f or a schema G , i f B is a CM-complete repair f or D , then it is a loosely - exact repair for D . Pr o of Suppose th at B is a CM-complete repair fo r D ( so, it is co nsistent w .r .t. G ) , but it is not a loosely-exact one. This means that its symme tric difference with D can be still redu ced. But, by d eﬁnition of CM-complete semantics, B doe s not contain anything else but tuples in D , n amely B − D = ∅ . So, the on ly way for “improving ” it is to extend it w ith tuples from D . But, th is is not po ssible because B is alread y maximal d ue to the CM-complete semantics, namely the addition of any other tuple would violate at least one IC. Cor olla ry 2 ans lo osely − e xact ( q , G , D ) ⊆ ans CM − complete ( q , G , D ) Pr o of This directly follows by Lemma 5 in light of Deﬁnition 1. Theor em 2 There are cases where ans lo osely - ex act ( q , G , D ) ⊂ ans CM - c omplete ( q , G , D ) 12 M. Manna , F . Ricca a nd G. T erracina Pr o of By Cho micki and Marcinkowski (20 05), stating that the two semantics are different, and by Corollary 2. Pr o position 2 Let B be a datab ase consistent w .r .t. a set of ICs C . 1. If C are DCs only , then each B ′ ⊂ B is co nsistent w .r .t. C , as well. 2. If C are I NDs only , then B ∪ B ′ is consistent w .r .t. C for each B ′ consistent w .r .t. C . Pr o of (1) Deletion of tuples can not introd uce ne w DCs violations. (2) Let r ( t ) be a fact in B ′ . Let d 1 be an IND o f the f orm r 1 → r ( r 6 = r 1 ). Clearly , r ( t ) cannot v iolate d 1 in any d atabase because r is in th e righth and side o f d 1 . In par ticular , r ( t ) ca nnot violate d 1 in B ∪ B ′ . Let d 2 be an IND of the fo rm r → r 2 (possibly , r = r 2 ). Since r ( t ) does not violate d 2 in B ′ , then it cannot violate d 2 in B ∪ B ′ . Theor em 3 Giv en a database D for a schema G , let B be a loosely-exact repair for D , and B = B ∩ D . There is a CM-co mplete r epair B ′ ⊆ B fo r D if at least on e of the following restrictions holds: I G conta ins DCs only (no INDs); II G contains INDs only (no DCs); III G contain s KDs and FKs only , and D is consistent w .r .t. KDs; IV G con tains KDs and SFKs only; Pr o of Case I : B y Proposition 2, since B is consistent w .r .t. DCs, th en B ⊆ B is consistent as well. Now , if B − D 6 = ∅ , th en we would h av e a co ntradiction becau se B ⊖ D ⊂ B ⊖ D would hold. Thus, B − D = ∅ and so, B = B is alread y a CM-complete repair itself. Case II : Since there is no DC, there e xists only one CM-complete repair, say B ′ , obtained from D after removing all th e facts v iolating INDs. Now , if B ′ was not co ntained in B , then, by Propo sition 2, B ′ ∪ B would still b e co nsistent, that is a larger CM-complete repair . Contradic tion. Finally B = B ′ . Case III : Sin ce D is co nsistent w .r .t. DCs, we ha ve o nly one CM-complete repair, say B ′ , obtain ed f rom D after removin g a ll the facts violating INDs. But, as in case II, if the set B ′ − B w as n onemp ty , the n we c ould add all these facts into B without viola ting any IND. Anyway , one of these facts, say f , c ould vio late a DC d ue to a fact f ′ in B − D . Now , no te that f ′ is in B o nly for ﬁxing an IND vio lation. But in this case, as we are only considering FKs, there would be no reason to h av e f ′ in B in stead of f . So, we could (safely) r eplace f with f ′ in B and no KD would be violated as well as no FK. But this leads to a contradiction . So, there i s no fact in B ′ which is not in B . Case IV : First of all, we obser ve tha t if B − D = ∅ , then e ither B is a CM-complete repair or B is not a loosely-exact repair . So the statement holds. Now assume that B − D 6 = ∅ . W e d istinguish three different cases: CQA via ASP fr om differ ent per spectives 13 (1) B is bo th consistent and maximal (it is a CM-complete repair); (2) B is con sistent b ut not maximal (it is not a CM-complete repair); (3) B is inco nsistent (it is not a CM-complete repair). In case (1 ) , we have a con tradiction becau se B is assum ed to be a loosely-exact repair, but it does not minimize the symmetric dif ference with D since B ⊖ D ⊂ B ⊖ D . In case (2) , we have again a c ontradictio n b ecause B is a ssumed to be a loosely-exact repair but it do es n ot minim ize the symmetric difference with D since there is a CM- complete repair e B ⊃ B such that e B ⊖ D ⊂ B ⊖ D . In case (3) , we o bserve t hat since, by hypothesis, B is consistent, th en the inconsistency of B arises, by Proposition 2, only due to INDs. Now , assum e that ( i) B contain s a fact r 1 ( t 1 ) ; (ii) ther e is an IND d of the fo rm ∀ ¯ x ∀ [ r 1 ( ¯ x 1 ) → ∃ ¯ x 2 ∃ r 2 ( ¯ x 2 ) ] ; ( iii) there is no f act for r 2 in B satisfying d . This means that a fact of the form r 2 ( t 2 ) must be in B − D , whe re t π d L 1 = t π d R 2 . Now , we claim that there i s no fact of the form r 2 ( t 3 ) in D − B , where t π d L 1 = t π d R 3 . Sup- pose that D − B containe d such a f act r 2 ( t 3 ) . Consider the ne w database ( B ∪ { r 2 ( t 3 ) } ) − { r 2 ( t 2 ) } . This would necessarily b e consistent bec ause the addition o f r 2 ( t 3 ) (after remov- ing r 2 ( t 2 ) as well) canno t violate any KD since d is an FK (rem ember th at key ( r 2 ) = π d R ), and canno t violate any IND since eac h IND d ′ of the f orm r 2 → r 3 is an SFK (rem ember that key ( r 2 ) ⊇ π d ′ L ). But this is not possible because B is a ssumed to be a loosely-exact repair, and ( B ∪ { r 2 ( t 3 ) } ) − { r 2 ( t 2 ) } would imp rove the symm etric difference. This me ans, that each CM-complete repair cannot contain the tuple r 1 ( t 1 ) (this goes in the direction of the statement). Let us call B ′ the consistent ( w .r .t. both KDs and SFKs) da tabase obtained fro m B after removing all the facts v iolating some I ND. I t rem ains to show th at there is n o othe r fact in D − B such that B ′ ∪ { r 1 ( t 1 ) } does no t violate any constrain t. Assume that such a fact r 1 ( t 1 ) exists, then: - B ′ ∪ { r 1 ( t 1 ) } would not violate any IND; - B ∪ ( B ′ ∪ { r 1 ( t 1 ) } ) = B ∪ { r 1 ( t 1 ) } would not violate any IND, by Proposition 2; - B ∪ { r 1 ( t 1 ) } would violate some KD, since B is a loosely-exact repair . Thus, there w ould necessarily b e a fact in B , say r 1 ( t 2 ) , being not in B ′ , with th e same k ey of r 1 ( t 1 ) . Since s uch a fact cannot s tay in B − B ′ because it does no t v iolate any IND, then it must be in B − D . But th is is no t po ssible because we could rep lace r 1 ( t 2 ) by r 1 ( t 1 ) in B witho ut vio lating any KD and also withou t violating any IND, sin ce we are on ly considerin g SFKs. But since B is already a repair, this is clear ly a con tradiction . Finally , B ′ is a CM-complete repair . Cor olla ry 3 ans lo osely - exact ( q , G , D ) = ans CM - c omplete ( q , G , D ) in the following cases: - G conta ins DCs only (no INDs); - G conta ins INDs only (no DCs); - G conta ins KDs and FKs only , and D is consistent w .r .t. KDs; - G conta ins KDs and SFKs only; 14 M. Manna , F . Ricca a nd G. T erracina Pr o of This directly follows by both Theorem 3 and Lemma 5, in light of Deﬁnition 1. Pr o position 3 In general, Theorem 3 does not hold in case G contains SFSKs and KDs only . Pr o of Consider a database co ntaining two r elations of arity 2 , namely: r and s . Mor eover , the schema contain s the fo llowing ICs: key ( r ) = { 1 , 2 } , and key ( s ) = { 1 } and r ( X , Y ) → s ( X , Y ) . Note that, the last is a safe FSK. Supp ose also that a DB D for this schema con- tains the following facts: r ( a , b ) , s ( a , c ) . The loosely-exact repairs are B 1 = { s ( a , c ) } and B 2 = { r ( a , b ) , s ( a , b ) } , but only the ﬁrst one is also a CM-Complete repair . Ho we ver , B = B 2 ∩ D = { r ( a , b ) } is not a CM-co mplete repair ( it is in consistent). The o nly consis- tent datab ase co ntained in B is the emp ty set that is no t a CM-Complete repair (d eletions are not minimized) . 3 Computation of CQA via ASP In th is section, we show how to exploit Answer Set Programming (ASP) (Gelfon d and Lifschitz 1988; Gelfond and Lifschitz 1991) for efﬁciently computin g con sistent answer s to u ser queries under different semantic assumptions. ASP is a po werful logic programmin g parad igm al- lowing (in its general form) for d isjunction in rule heads ( Minker 1982) and n onmo no- tonic negation in ru le bodies. In the following, we assume that the re ader is familiar with ASP with aggregates, and in particular we adopt the DL V syn tax (Faber et al. 2010; Leone et al. 2006). The s uitability of ASP for implementing CQA has been already recog nized in the litera- ture (Len zerini 2002; Aren as et al. 2003; Bertossi et al. 200 5; Chomick i and Marcinko wski 2005). The g eneral appro aches are based on the f ollowing idea: pr oduce an ASP pr ogram P who se answer s ets represent p ossible repairs, so that the pr oblem o f computing CQA correspond s to cau tious r easoning on P . On e of th e h ardest c hallenges in this co ntext is the automatic identiﬁcation of a pro gram P considering a minimal number o f repairs actually rele v ant to answering user queries. In order to face these c hallenges, we ﬁrst intro duce a gen eral encoding which u niﬁes in a common core the solutions for CQA under the semantics considered in this paper . Then, based on this u niﬁed framework, we deﬁne op timization strategies precisely aiming at reducing the computational cost o f CQA. This is don e in several ways: (i) by casting d own the origin al progr am to complexity-wise easier programs; (ii) by identif ying portion s of the database not requ iring repairs at all, acco rding to the query requirem ents; (iii) exploiting equiv alence clas ses between some semantics in such a way to adopt optimized solutions. W e n ext present the general encoding ﬁrst and, then, the optimizations. 3.1 General Encoding The general approa ch generates a program Π c qa and a new query q c qa obtained by rewrit- ing both the c onstraints and the q uery q in such a way tha t CQA reduces to cautio us rea - CQA via ASP fr om differ ent per spectives 15 soning on Π c qa and q c qa . Recall that a union of conjunctive qu eries in ASP is expressed as a set of rules having the same head predicate with the same arity . In what fo llows, we ﬁrst present how to g enerate Π c qa and q c qa and then fo rmally prove under which hypothesis cautious reasoning on such Π c qa and q c qa correspo nds t o CQA. Giv en a datab ase D fo r a schema G and a query q on G , the ASP program Π c qa is cre ated by re writing each IC belongin g to c onstr ( G ) an d q as follows: Denial C onstraints. Let Σ ∈ { CM-complete , loosely-sou nd , loosely-exact } . For each DC of the form ∀ ¯ x 1 , . . . , ¯ x m ¬ [ g 1 ( ¯ x 1 ) ∧ . . . ∧ g m ( ¯ x m ) ∧ σ ( ¯ x 1 , . . . , ¯ x m )] in c onstr ( G ) , insert the following r ule into Π c qa : • g c 1 ( ¯ x 1 ) ∨ · · · ∨ g c m ( ¯ x m ) : − g 1 ( ¯ x 1 ) , · · · , g m ( ¯ x m ) , σ ( ¯ x 1 , . . . , ¯ x m ) · This rule states that in p resence of a violated den ial constraint it must be g uessed the tuple(s) to be removed in ord er to repair the database. Inclusion dep endencies. Let Σ = { CM-complete , loosely-exact } . For each IND d in c onstr ( G ) of the for m ∀ ¯ x ∀ [ g 1 ( ¯ x 1 ) → ∃ ¯ x 2 ∃ g 2 ( ¯ x 2 ) ] , add the following rules into Π c qa : • g c 1 ( ¯ x 1 ) : − g 1 ( ¯ x 1 ) , #c ount { ¯ x 2 ∃ : g c 2 ( ¯ x 2 ) } = #count { ¯ x 2 ∃ : g 2 ( ¯ x 2 ) }· if | ¯ x 2 ∃ | > 0 • g c 1 ( ¯ x 1 ) : − g 1 ( ¯ x 1 ) , g c 2 ( ¯ x 2 ) · g c 1 ( ¯ x 1 ) : − g 1 ( ¯ x 1 ) , n ot g 2 ( ¯ x 2 ) · if | ¯ x 2 ∃ | = 0 The ﬁrst rule states that a tuple of g 1 must be dele ted iff either all the tuples in g 2 pre- viously ref erred to by g 1 via d have been deleted due to th e repairing process, or there is n o tup le in g 2 referred to by g 1 via d . (This is do ne by comparing the total count of tuples in g 2 and g c 2 ). Ob serve that if there is a cyclic set of INDs, the set of rules ge ner- ated b y th is rewriting would con tain recu rsiv e aggregates. Their sem antics is d escribed in (Faber et al. 2010). T he latter two ru les replace the ﬁrst o ne in the special case of | ¯ x 2 ∃ | = 0 . Repaired Relations. Let Σ ∈ { CM-complete , loosely-sou nd , loosely-exact } . F or each re- lation name g ∈ names ( G ) , insert th e following ru le into Π c qa : • g r ( ¯ x ) : − g ( ¯ x ) , not g c ( ¯ x ) · Query r e writing. Build q c qa ( ¯ x ) fr om q ( ¯ x ) as follows: 1. If Σ = loosely-sou nd , then apply onto q the per fect rewriting alg orithm that deals with INDs described in (Cal ` ı et a l. 2003b) 1 . 2. For each atom g ( ¯ y ) in q , r eplace g ( ¯ y ) by g r ( ¯ y ) The perf ect rewriting intro duced in (Cal` ı et al. 2003b) is intuitively described n ext. Gi ven a que ry q ( ¯ x ) and a set of INDs, the algorithm iteratively computes a n ew query Q as fol- lows. Q is ﬁrst initialized with q ; then, at each iteration it carries out th e following two steps: (1) For each co njunctio n c q ′ in Q , and for each pair of ato ms g 1 , g 2 in c q ′ that unify (i.e., for wh ich there exists a substitution transfor ming g 1 into g 2 ), g 1 and g 2 are substituted 1 Observe that , when Σ = loosely-sou nd , INDs are not encoded into logic rules. 16 M. Manna , F . Ricca a nd G. T erracina by on e single unify ing ato m. (2) For each conjunc tion c q ′ in Q , and for each applicable IND d of the form g 1 → g su ch that g is in c q ′ , it adds to Q a new co njunctio n c q ′′ ob- tained from c q ′ by interpre ting d as a rewriting ru le o n g , app lied from right to left. Th e algorithm stops wh en no f urther modiﬁcations ar e possible on Q with the tw o steps above. The following theo rems show how and whe n cautiou s reason ing on Π c qa and q c qa cor- respond to CQA. First we consider the CM-complete semantics. Theor em 4 Let Σ = CM-complete , let D be a databa se for a schema G with arbitr ary DCs and (possi- bly cyclic) INDs, and let q b e a unio n of conjunctive queries. t ∈ ans Σ ( q , G , D ) if f q c qa ( t ) is a cautious consequen ce of the ASP program D ∪ Π c qa . Pr o of W e claim that Π c qa allows to consider only and all the repair s, exactly one per model. Let B r be a rep air . I n th e f ollowing, we describe how to o btain a mo del con taining for each relation, say g , e xactly only an d all the tuples o f g th at d o not appear in B r . W e co llect such tuples in the new relation g c , while we co llect in g r only and a ll the tuples o f g ap pearing in B r . For each relation, say g : (a) By the disjunctive r ules (if any) in volving g , o f the form · · · ∨ g c ( ¯ x ) ∨ · · · : − · · · , g ( ¯ x ) , · · · , σ ( · · · , ¯ x , · · · ) · we guess a set of tuples of g , collected in g c , that must not appear in B r . (b) Next, for each IND o f th e fo rm g ( ¯ x 1 ) → g 1 ( ¯ x 2 ) (inv olving g in the left- hand side), we use the rule g c ( ¯ x 1 ) : − g ( ¯ x 1 ) , #c ount { ¯ x 2 ∃ : g c 1 ( ¯ x 2 ) } = #count { ¯ x 2 ∃ : g 1 ( ¯ x 2 ) }· for d eciding which tuples of g cann ot appe ar in B r due to an I ND v iolation. Note that in case | ¯ x 2 ∃ | = 0 , the rule is rewritten with out the #count aggr egate. (c) Finally , by the ru le g r ( ¯ x ) : − g ( ¯ x ) , not g c ( ¯ x ) we obtain the rep aired relations. Impor tantly , f or computing the extension o f each g c we only exploit the minimality of answer sets semantics; later , the exten sion of each g r is c omputed . Observe th at, by the splitting theorem ( Lifschitz and T ur ner 1994) Π c qa can be divided (split) into tw o parts . It is clear that, by construction, Π c qa has e xactly one answer set per repair . Finally , the query is reorganized to exploit the repaired relations, and cautious reasoning does the rest. Example 3 Consider again Exam ple 2, the prog ram (an d th e qu ery built f rom q ( X ) : − m ( X ) ) u nder the CM-complete semantics obtained for it, is:  e c ( X c , X n ) ∨ e c ( X c , X ′ n ) : − e ( X c , X n ) , e ( X c , X ′ n ) , X n 6 = X ′ n ·  m c ( X c ) : − m ( X c ) , #c ount { X ′ n : e c ( X c , X ′ n ) } = #count { X n : e ( X c , X n ) }·  e r ( X c , X n ) : − e ( X c , X n ) , not e c ( X c , X n ) ·  m r ( X c ) : − m ( X c ) , not m c ( X c ) ·  q c qa ( X c ) : − m r ( X c ) · When this prog ram is evaluated on the database we ob tain fo ur answer sets. It can be veriﬁed th at, all the answer sets co ntain m r ( ‘e1’ ) and m r ( ‘e2’ ) , ( i.e., they are cautious consequen ces of Π c qa ) and, th us, ‘e1’ and ‘e2’ are the consistent an swers to the query . ⊓ ⊔ CQA via ASP fr om differ ent per spectives 17 Theor em 5 Let Σ = loosely-sou nd , let D be a datab ase fo r a schem a G with KDs (and exactly one key for each relation) and (p ossibly c yclic) NKC INDs, a nd let q be a union of con junctive queries without comparison atom s 2 . t ∈ ans Σ ( q , G , D ) iff q c qa ( t ) is a cautious conse- quence of the ASP progr am D ∪ Π c qa . Pr o of Consideration s analogou s to the CM-complete case can be drawn. Disjunctive rules guess a minimal set of tu ples to b e removed, whereas the perfect re writing algor ithm allows to deal with NKC INDs. Observe that, the separation theorem introduced in (Cal` ı et al. 2003b) shows th at INDs can be taken into account as if the KDs wh ere not expressed on G ; in particular, it states that it is sufﬁcient to compute the perfect re writing q ′ of q and ev aluate q ′ on th e maximal su bsets of D c onsistent with KDs. I n ou r case, these are com puted by the part of Π c qa dealing with KDs, whereas the separation is carried out by renaming e ach g in q ′ by g r . The g eneral enco ding f or th e loosely-exact semantics is inh erently m ore c omplex than the ones for loosely-sou nd and CM-complete , since b oth tuple deletion s an d tup le in ser- tions are subject to min imization. As a consequence, we tackled the loosely-exact encodin g by co nsidering that there are comm on cases in which CQA un der the loosely-exact seman- tics an d th e CM-complete semantics actu ally co incide (see Corollary 3) . Th ese cases can be easily ch ecked and, thus, it is p ossible to h andle the loosely-exact semantics with th e encodin g deﬁned for t he CM-complete case. Theor em 6 Let Σ = loosely-exact , D be a database for a schem a G such th at on e of the following holds: - G conta ins DCs only (no INDs); - G conta ins INDs only (no DCs); - G conta ins KDs and FKs only , and D is consistent w .r .t. KDs; - G conta ins KDs and SFKs only; Let q be a u nion of conjunctive qu eries. t ∈ ans Σ ( q , G , D ) iff q c qa ( t ) is a cautiou s conse- quence of the ASP progr am D ∪ Π c qa . Pr o of Follo ws from Corollary 3 and Theorem 5. 3.2 Optimized Solution The strategy reported in the pr evious section is a general solu tion for solv ing the CQA problem but, in several cases, more efﬁcient ASP p rogram s can be prod uced. First of all, note that th e gener al algorith m blindly consider s all the ICs o n the glo bal schema , includ- ing those that ha ve no ef fect on the speciﬁc query . Consequently , useless logic rules might 2 Recal l that equalities are expressed in terms of vari ables having the same name. 18 M. Manna , F . Ricca a nd G. T erracina be pro duced which may slow down program e valuation. Then, a very simple op timization may consist of con sidering relev ant ICs o nly . Howe ver , there are several cases in which the co mplexity of CQA stay s in PTIME ; but disjun ctiv e programs, for wh ich cautio us rea- soning b ecomes a hard task (Eiter et al. 19 97), are generated even in presen ce of denial constraints only . This mean s tha t the ev aluation of the pro duced logic program s might be much more expensi ve than re quired in those “easy” cases. In the following, we p rovide semantic-speciﬁc op timizations aimin g to overcom e such problems for the settings po inted out in Theorem 4, Theorem 5, and Theorem 6. Giv en a q uery q and an atom g in q , we deﬁne th e set of relev ant indices of g in q , say r elevant ( q , g ) in suc h a way th at an index i in [1 .. arity ( g ) ] belo ngs to r elevant ( q , g ) if at least one of the following holds f or an occurr ence g ( X 1 , . . . , X n ) of g in q : • X i is not existentially quantiﬁed (it is a free variable, it is an outpu t v ariable of q ); • X i is in volv ed in some comparison atom (e ven if it is e xistentially quantiﬁed); • X i appears more than once in the same conjun ction; • X i is a constant value; If g does not app ear in q , we say that r elevant ( q , g ) = ∅ ; In th e fo llowing, we d enote by π a set of indices. Moreover, g iv en a sequenc e of variables ¯ x and a set π ⊆ { 1 , . . . , | ¯ x |} , we denote by ¯ x π the sequence obtained from ¯ x by discarding a v ariable if its position is no t in π . Finally , gi ven a relation name g , a set o f indices π and a label ℓ we deno te by g ℓ - π ( ¯ x π ) an au xiliary atom derived f rom g , marked by ℓ , and using only variables in ¯ x π . Σ = loosely-sound . The obje ctiv e o f this optimizatio n is to single out, for eac h relation in volved b y the q uery , the set of attributes actually relev ant to answer it and ap ply the necessary repairs only on th em. As we show next, this may allo w both to r educe (even to zero ) the numb er o f disjun ctiv e ru les n eeded to rep air key v iolations an d to re duce the cardinality of relations in v olved in such disjunctions. Giv en a schem a G and a query q , perf orm the following steps for building the p rogram Π c qa and the query Q c qa . 1. Apply th e the p erfect rewriting algor ithm that deals with I NDs d escribed in (Cal` ı et al. 200 3b). 2. Let Q be the u nion of con junctive q ueries obtained fro m q after Step 1. F or each g ∈ names ( G ) , build the sets π g R = r elevant ( Q , g ) π g S = π g R ∪ key ( g ) These two sets capture th e fact that a ke y attrib ute is rele vant for the repair ing process, b ut it may not be strictly relev ant for answering the query . Observe that the p erfect rewriting dea ling with INDs must be ap plied before singling out rele vant attr ibutes. In f act, q may depen d, through INDs, also on attrib utes of relations not explicitly men tioned in it. H owe ver , in the last step of this algorith m the rewriting o f the query is com pleted by su bstituting each relation in th e query with its r epaired (and possibly reduced) version. 3. For each g ∈ names ( G ) such th at π g R 6 = ∅ an d key ( g ) + π g R , add the f ollowing rules into Π c qa : CQA via ASP fr om differ ent per spectives 19 • g sr - π g S ( ¯ x π g S ) : − g ( ¯ x ) . • g c - π g S ( ¯ x π g S 1 ) ∨ g c - π g S ( ¯ x π g S 2 ) : − g sr - π g S ( ¯ x π g S 1 ) , g sr - π g S ( ¯ x π g S 2 ) , ¯ x i 1 6 = ¯ x i 2 · · ∀ i ∈ π g S − key ( g ) • g r - π g R ( ¯ x π g R ) : − g sr - π g S ( ¯ x π g S ) , not g c - π g S ( ¯ x π g S ) . Observe that if there exists a t least on e relev ant non -key attrib ute fo r g , the rep airing pro- cess can no t b e av oided; howe ver , violation s cau sed by irr elev ant attributes on ly (i.e, not in π g S ) can be ignor ed, since th e p rojection of g on π g S is still safe and sufﬁcient for q uery answering purpo ses. 4. For each g ∈ names ( G ) such that π g R 6 = ∅ and key ( g ) ⊇ π g R , add the follo wing rule into Π c qa : • g r - π g R ( ¯ x π g R ) : − g ( ¯ x ) . Observe that, if the rele v ant attributes of g are a subset o f its key , the repair pr ocess of g for key violations through disjunction can be avoided at all. I n fact, the p rojection of g on π g R is still safe and sufﬁcient for q uery answerin g pur poses. Moreover , for the same r eason, it is not needed to take all the ke y of g into accoun t. 5. For each atom of the form g ( ¯ x ) in Q , replace g ( ¯ x ) by g r - π g R ( ¯ x π g R ) . Σ = CM-complete . For the optimizatio n of the CM-comp lete seman tics, we exploit a graph wh ich is used to navigate the query and the database in order to single o ut tho se relations and pro jections actually relev ant for answering the q uery . Moreover , it allows to identify p ossible cycles gen erated by ICs which m ust b e suitably handled; in fact, acyclic ICs ind uce a partial order amo ng them and this information can be effectiv ely exploited for the optimization. On the contrary cyclic ICs must be handled in a more standard w ay . Giv en a schema G a nd a q uery q , build th e dir ected lab elled g raph G q = h N , A i as follows: • N = { q } ∪ names ( G ) ; • ( g 1 , g 2 , c ) ∈ A iff c is a DC in c onstr ( G ) inv olving both g 1 and g 2 ; • ( g 1 , g 2 , d ) ∈ A iff d is an IND in c onstr ( G ) of the for m g 1 → g 2 ; • ( q , g , ε ) ∈ A iff g ap pears in a conjunction of q . Perform the following step s for building program Π c qa : 1. V isit G q starting from node q ; 2. Discard unreach able nodes and update the sets N and A ; 3. Partition the set N in ( N cf , N ncf ) in such a way that a node n belon gs to N cf if it is not in v olved in any c ycle ( q always belongs to N cf ). Contrariwise, a node n belongs to N ncf if it is in volved in some cycle. 4. For each node g ∈ N − { q } compute the sets π g R = ( S ( g L , g , d ) ∈ A π d R ) ∪ r eleva nt ( q , g ) ; π g S = π g R ∪ key ( g ) , only if g has exactly one p rimary key as DCs; π g S = ∅ otherwise. here π g R is the set of relev ant v ariable indices of g , an d π g S adds to π g R the key o f g . 20 M. Manna , F . Ricca a nd G. T erracina Observe that Step s 1–4 implement a pre-p rocessing phase in which relevant relations and their re lev ant ind ices are singled out, and each relev ant re lation is classiﬁed as cycle free or non cycle free. 5. For each node g ∈ N cf , if g has o nly one key as DCs, then ad d th e fo llowing rules into Π c qa : • g ξ - π g χ ( ¯ x π g χ ) : − g ( ¯ x ) , g r - π d 1 R 1 ( ¯ x π d 1 R 1 ) , . . . , g r - π d k R k ( ¯ x π d k R k ) . • g r - π d i R i ( ¯ x π d i R i ) : − g r - π g i R i ( ¯ x π g i R i ) . ∀ i ∈ [ 1.. k ] s.t. π g i R ⊃ π d i R where: - k ≥ 0 is the numb er of arcs in G q labelled by INDs, and outgoing from g ; - the p air ( ξ , χ ) is either ( r , R ) or ( sr , S ) , ac cording to whether key ( g ) ⊇ π g R or not, respec ti vely . Intu iti vely , if key ( g ) ⊇ π g R holds, then the repair g r - π g R of g can b e dir ectly co mputed ; otherwise the compu tation mu st ﬁr st go throug h a sem i-reparatio n step for compu ting g sr - π g S . Intuitively , this semi- reparatio n step collects those tuples th at vio late no I ND of the f orm g → g i , but that must be anyway processed in order to ﬁx some ke y violation (see Steps 6 - 10). - atom g r - π d i R i is in th e body of th e ﬁrst rule ( 1 ≤ i ≤ k ) on ly if both ( g , g i , d i ) ∈ A , and d i is an IND of the fo rm g ( ¯ x ) → g i ( ¯ x i ) . This atom is just a p rojection of g r - π g i R i ( ¯ x π g i R i ) . 6. For each n ode g ∈ N cf if g has on ly one pr imary key as DCs, an d key ( g ) ⊂ π g R , and g has incoming ar cs only f rom q , an d all the r elev ant variables of g w .r .t. q ar e in the head of q , and each occu rrence of g in q contain s all of its re lev ant v ariables, then add the following ru les into Π c qa by considerin g t hat the key o f g is deﬁned by rules of the form ∀ ¯ x 1 , ¯ x 2 ¬ [ g ( ¯ x 1 ) ∧ g ( ¯ x 2 ) ∧ ¯ x i 1 6 = ¯ x i 2 ] : • g c - π g S ( ¯ x π g S 1 ) : − g sr - π g S ( ¯ x π g S 1 ) , g sr - π g S ( ¯ x π g S 2 ) , ¯ x i 1 6 = ¯ x i 2 · ∀ i ∈ π g S − key ( g ) • g r - π g R ( ¯ x π g R 1 ) : − g sr - π g S ( ¯ x π g S 1 ) , not g c - π g S ( ¯ x π g S 1 ) . 7. For each n ode g ∈ N cf if g has on ly one pr imary key as DCs, an d key ( g ) + π g R , and case 6 does n ot apply , then add th e f ollowing rules into Π c qa by considering that the key is deﬁned by rules of the form, ∀ ¯ x 1 , ¯ x 2 ¬ [ g ( ¯ x 1 ) ∧ g ( ¯ x 2 ) ∧ ¯ x i 1 6 = ¯ x i 2 ] : • g c - π g S ( ¯ x π g S 1 ) ∨ g c - π g S ( ¯ x π g S 2 ) : − g sr - π g S ( ¯ x π g S 1 ) , g sr - π g S ( ¯ x π g S 2 ) , ¯ x i 1 6 = ¯ x i 2 · · ∀ i ∈ π g S − key ( g ) • g r - π g R ( ¯ x π g R 1 ) : − g sr - π g S ( ¯ x π g S 1 ) , not g c - π g S ( ¯ x π g S 1 ) . Observe th at, in this c ase, disjunctive rules are de ﬁned only on the set o f relev ant indices that are n ot in the key and that eac h g c - π g S contains only th e projectio n of deleted tuples on the set π g S . Here, Steps 5–7 handle r elations for which a k ey is deﬁn ed and are classiﬁed as cycle free. In particular, if key ( g ) ⊇ π g R holds, key repar ation can b e a voided at all (and th us disjunc- ti ve rules to o); other wise a semi-r eparation step is req uired, but Step 6 identiﬁes fu rther cases in w hich even if key reparation is n eeded, disjunction can be still a voided. Finally , CQA via ASP fr om differ ent per spectives 21 Step 7 ha ndles all the oth er ca ses. Importantly , thro ugh Step s 5-7 we take into account only the minimal projection s of in volved relations in order to red uce as much as possible computatio nal co sts (and e ven disjunctiv e r ules) not considerin g i rrelev ant attrib utes. 8. For each node g ∈ N ncf add the following rules into Π c qa : • g c ( ¯ x ) : − g ( ¯ x ) , not g r - π d R 1 ( ¯ x π d R 1 ) . g r - π d R 1 ( ¯ x π d R 1 ) : − g r - π g R 1 ( ¯ x π g R 1 ) . for each IND d of the form g ( ¯ x ) → g 1 ( ¯ x 1 ) such th at there is n o cycle in G q in volving b oth g 1 and g ; • g c ( ¯ x ) : − g ( ¯ x ) , #count { ¯ x 1 ∃ : g c 1 ( ¯ x 1 ) } = #count { ¯ x 1 ∃ : g 1 ( ¯ x 1 ) }· for each IND d o f the form ∀ ¯ x ∀ [ g ( ¯ x ) → ∃ ¯ x 2 ∃ g 1 ( ¯ x 1 ) ] such that g 1 ∈ N ncf ; • g c ( ¯ x 1 ) ∨ g c ( ¯ x 2 ) : − g ( ¯ x 1 ) , g ( ¯ x 2 ) , ¯ x i 1 6 = ¯ x i 2 · ∀ i ∈ π where π = { 1 , . . . , arity ( g ) } − key ( g ) and the key of g is deﬁn ed by DC s of the form ∀ ¯ x 1 , ¯ x 2 ¬ [ g ( ¯ x 1 ) ∧ g ( ¯ x 2 ) ∧ ¯ x i 1 6 = ¯ x i 2 ] ; • g r - π g R ( ¯ x π g R ) : − g ( ¯ x ) , not g c ( ¯ x ) . if there is at least one node in N cf with an arc to g , or g ap pears in q ; 9. For each DC of the fo rm ∀ ¯ x 1 , . . . , ¯ x m ¬ [ g 1 ( ¯ x 1 ) ∧ . . . ∧ g m ( ¯ x m ) ∧ σ ( ¯ x 1 , . . . , ¯ x m )] in volving at least two different relatio n nam es (entailin g that each g i ∈ N ncf ), add the following rules into Π c qa : • g c 1 ( ¯ x 1 ) ∨ · · · ∨ g c m ( ¯ x m ) : − g 1 ( ¯ x 1 ) , · · · , g m ( ¯ x m ) , σ ( ¯ x 1 , . . . , ¯ x m ) · Steps 8 and 9 handle n on cycle free relations; the repairing process in th is case mimics t he standard rewriting, but pro jects relations on the rele v ant attrib utes whenever possible. 10. For each nod e g ∈ N cf if g is inv olved in DCs that do not form a p rimary key , then add the following rules into Π c qa : • g sr ( ¯ x ) : − g ( ¯ x ) , g r - π d 1 R 1 ( ¯ x π d 1 R 1 ) , . . . , g r - π d k R k ( ¯ x π d k R k ) . • g r - π d i R i ( ¯ x π d i R i ) : − g r - π g i R i ( ¯ x π g i R i ) . ∀ i ∈ [ 1.. k ] s.t. π g i R ⊃ π d i R • g c ( ¯ x 1 ) ∨ · · · ∨ g c ( ¯ x m ) : − g sr ( ¯ x 1 ) , · · · , g sr ( ¯ x m ) , σ d ( ¯ x 1 , . . . , ¯ x m ) · ∀ d • g r - π g R ( ¯ x π g R ) : − g sr ( ¯ x ) , not g c ( ¯ x ) . where: - k ≥ 0 is the numb er of arcs, labelled by INDs, outgoing from g ; - atom g r - π d i R i is in the bo dy of the ﬁrst rule ( 1 ≤ i ≤ k ) iff both ( g , g i , d i ) ∈ A and d i is an IND of the form g ( ¯ x ) → g i ( ¯ x i ) ; - d is a DC of the form ∀ ¯ x 1 , . . . , ¯ x m ¬ [ g ( ¯ x 1 ) ∧ . . . ∧ g ( ¯ x m ) ∧ σ d ( ¯ x 1 , . . . , ¯ x m )] Step 10 han dles the special case in which the re is no key fo r a relation but denial co nstraints are deﬁned (only) on it. 11. For each atom of the form g ( ¯ x ) in q , rep lace g ( ¯ x ) by g r - π g R ( ¯ x π g R ) . 22 M. Manna , F . Ricca a nd G. T erracina Example 4 Consider again Example 1; suppose to extend the global sch ema b y ad ding the relation c ( c o de , name ) which repr esents th e list of customers, where c o de is the primary ke y of c . Moreover , supp ose that we ask for the query q ( X c , X n ) : − c ( X c , X n ) , e ( X c , X n ) retriev- ing the cu stomers th at are also employees of the ban k. In this case, after building the gr aph G q it is easy to see that m is u nreachab le (so it is discard ed) and that both c an d e com ply with the requirements d escribed at Steps 5 and 6 of the o ptimized algorith m. Con sequently , the optimized progr am under the CM-complete semantics is: e sr - 1 , 2 ( X c , X n ) : − e ( X c , X n ) . c sr - 1 , 2 ( X c , X n ) : − c ( X c , X n ) . e c - 1 , 2 ( X c , X n ) : − e sr - 1 , 2 ( X c , X n ) , e sr - 1 , 2 ( X c , X ′ n ) , X n 6 = X ′ n . c c - 1 , 2 ( X c , X n ) : − c sr - 1 , 2 ( X c , X n ) , c sr - 1 , 2 ( X c , X ′ n ) , X n 6 = X ′ n . e r - 1 , 2 ( X c , X n ) : − e sr - 1 , 2 ( X c , X n ) , n ot e c - 1 , 2 ( X c , X n ) . c r - 1 , 2 ( X c , X n ) : − c sr - 1 , 2 ( X c , X n ) , n ot c c - 1 , 2 ( X c , X n ) . q c qa ( X c , X n ) : − c r - 1 , 2 ( X c , X n ) , e r - 1 , 2 ( X c , X n ) . Note that, since b oth e an d c are n ot affected by IND violations, an d they have no irr elev ant variables, the sem i-reparatio n step can not actu ally d iscard tup les. However , the obtained progr am is non- disjunctive a nd stratiﬁed. T hus, it can be ev aluated in polyn omial time (Leone et al. 2006). In this case, the only answer set of the progra m con tains the consistent answers to the original query . ⊓ ⊔ Σ = loosely-ex act . In Section 3.1 we p roved that there are commo n cases in which CQA under the loosely-exact semantics and th e CM-complete semantics actually co incide. As a consequen ce, in these cases, all th e optimiza tions deﬁned for th e CM-complete semantics apply also to the loosely-exact semantics. 4 Experiments In th is section we p resent som e of the experimen ts we carried out to assess the e ffecti veness of our approach to consistent query answering. T esting has been per formed by exploitin g our com plete system fo r d ata integration, which is inten ded to simplify both th e in tegration system design an d th e q uerying activ- ities by exploiting a user-friendly GUI. I ndeed, this system bo th supp orts the u ser in d e- signing the glo bal schema and the mapping s between global relations and sou rce schemas, and it allows to specify user quer ies over the global schema via a QBE-like interface. The query ev aluation engine ado pted fo r the tests is DL V DB (T erracina et al. 2008) cou- pled, via ODBC, with a Postgr eSQL DBMS where inp ut data were sto red. DL V DB is a DLP ev aluator born as a database or iented extension of th e well kn own DL V system (Leone et al. 2006). It has been recently extended for d ealing with unstratiﬁed negation, disjunction and external function calls . W e ﬁrst addr ess tests on a r eal world scen ario and then repo rt on tests for scalab ility issues on synthetic data. CQA via ASP fr om differ ent per spectives 23 Fig. 1. INFOMIX database. 4.1 T ests on a real world scenario Data Set. W e have exploited th e r eal-world data integratio n fr amew ork developed in the INFOMIX project (IST -200 1-335 70) (Leon e et al. 2005) which integrates data from a real university con text. In particu lar , con sidered data sources were av ailable at th e University of Rome “La Sapienza”. Th ese compr ise in formatio n on students, pro fessors, cu rricula and exams in v arious faculties of the uni versity . There are about 35 d ata so urces in the application scenario, which are mapped into 12 global schem a relation s with 20 GA V m apping s and 21 integrity co nstraints. W e call this data set Inf omix in the fo llowing. Figure 1 reproduces the main chara cteristics o f the g lobal database: each node corre sponds to a glo bal relation showing its arity and key . An edge between r 1 and r 2 labelled by r 1 [ I ] ⊆ r 2 [ J ] indicates an IND of th e form ∀ ¯ x ∀ [ r 1 ( ¯ x 1 ) → ∃ ¯ x 2 ∃ r 2 ( ¯ x 2 ) ] where I and J are the position s o f ¯ x ∀ in ¯ x 1 and ¯ x 2 , respectively; the arc is labelled with the attributes of a and b inv olved in the IND. Observe that th ere are cyclic INDs in v olving teaching , exam record and professor . Besides the or iginal source d atabase instance (which takes ab out 16 Mb on DBMS), we obtained b igger in stances artiﬁcially . Speciﬁcally , we generate d a nu mber of copie s of the original d atabase; each copy is disjoint from the o ther ones but maintains the same 24 M. Manna , F . Ricca a nd G. T erracina data co rrelations b etween instances as the origin al datab ase. Th is h as been carried out b y mapping each original attribute v alue t o a new v alue ha ving a copy-speciﬁc preﬁx. Then, we c onsidered tw o further datasets, namely Inf omix-x-10 and Infomix-x-50 stor- ing 10 copies (for a total amount of 160Mb of data) and 50 copies (800Mb) of the original database, respectively . It hold s that Inf omix ⊂ Infomix-x-10 ⊂ Inf omix-x-50 . Compar ed Metho ds and T ested Queries. I n order to assess the ch aracteristics of the pro- posed optimizations, we measur ed the execution ti me of dif ferent queries with (i) the stan- dard encoding (identiﬁed as STD in th e following), (ii) a n a¨ ıve op timization obtained b y only removing relations no t strictly needed for answering the queries ( OPT1 in the fol- lowing), an d (iii) the f ully optimized enc oding presented in Sectio n 3 ( O PT2 in the fol- lowing). Each of these cases has been evaluated for the th ree sem antics consider ed in this paper . In order to isolate the imp act of ou r optimization s, we disabled other op timizations (like magic sets) e mbedde d in the d atalog ev aluation en gine. Clearly , such optimizations are complemen tary to our o wn and might further improve the overall performances. T ested queries are as follows: Q1(X1) :- course(X2,X1), plan data(PL,X2, ), student co urse plan (PL,"09089903" , , , ). Q2(X1) :- university(X1, ). Q3(X1,X2,X3) :- university deg ree(X1,X2), faculty(X2, ,X3). Q4(X1,X2,X3) :- student(S, ,X1 , , , , ), enr ollment(S, , ), exam record(S, , ,X2,X3, , ), S == "09089903". Q5(X1,X2) :- student r( S1, ,X1, , , , ), exam record r(S 1,C, , , , , ), student r( S2, ,X2, , , , ), exam record r( S2,C, , , , , ), S1 == "09089470", S1<>S2. Q6(X1,X2,X3) :- student(X1, , , , , , ), exam record(X1, , ,X2,X3, , ), X1 == "09089903". Observe that Q2 in volves key con straints on ly , Q1 , and Q3 in volve both ke ys and acyclic INDs; speciﬁcally , Q3 inv olves a SFK while Q1 inv olves NKC INDs. Finally , Q4 , Q5 an d Q6 inv olve keys and cyclic NKC INDs. Results and discussion. All tests have been carried out on an Intel Xeon X3430 , 2.4 GHz, with 4 Gb Ram, runn ing Linux Operating System. W e set a time limit of 120 minutes after which query execution has been killed . Figures 2 and 3 show obtained results for the loosely-sou nd and the CM-complete semantics. It is w orth recallin g tha t, as we pointed ou t in Section 3.2, o ptimizations fo r the loosely-exact semantics are inheren t to the eq uiv alence classes to the CM-complete semantics discovered in this pap er . As a conseque nce, we tested this semantics o nly on queries Q2 and Q3 f or which suc h eq uiv alence holds. Then , since the execution times of the optimized encod ing coincide with the CM-complete graphs for queries Q2 and Q3, we do not report speciﬁc ﬁgures for them. Analyzing the ﬁgures, we observe that: th e pro posed optimization s do not introduce computatio nal overhead an d, in most cases, transform practically untractable queries in tractable ones; in fact, for all th e tested quer ies th e execution time of the standa rd r ewriting exceeded the time limit. OPT1 h elps m ostly o n the smallest data set; in fact for Infomix- x-10 it sho ws some gain in 33% of cases and only in tw o cases for Infomix-x-50 . As for the comp arison among the optimized encoding s, we can observe that if INDs are not in volved by the query ( Q2 ) the loosely-sou nd and the CM-complete optimization s CQA via ASP fr om differ ent per spectives 25 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale LS Semantics - Query 1 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale LS Semantics - Query 2 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale LS Semantics - Query 3 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale LS Semantics - Query 4 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale LS Semantics - Query 5 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale LS Semantics - Query 6 Timeout (2h) STD OPT1 OPT2 Fig. 2. Query e valuation execution times for the loosely-sou nd semantics. have the same performan ces; this co nﬁrms theoretical expecta tions. When acyclic I NDs are in volved ( Q1 , Q3 ), the loosely-sou nd optimization perf orms sligh tly better b ecause the CM-complete must choose the tuples to be deleted d ue to IND violations, whereas the loosely-sou nd semantics just work s on the or iginal data. Finally , when inv olved INDs are cyclic ( Q4 , Q5 , Q6 ) the p erform ance of the CM-complete optimization further degrad es w .r . t. the loosely-sou nd one becau se recursive aggregates must be e xploited to choose dele- tions and, this, increases the complexity of query e v aluation. 4.2 Scalability analysis w .r .t. the number and kind of constraint violations Since, in the real world scenario eme rged tha t the CM-complete semantics is mo re af- fected than the loosely sound one f rom the kin d o f in volv ed con straints, we carr ied o ut a scalability analysis on this semantics, whose results are reported next. W e considered a synthetic data set compo sed of th ree r elations n amed r 1 , r 2 , and r 3 over which we im posed different sets of ICs in or der to analyze the scalability o f ou r m ethods depend ing on the presence of ke ys and/or in pre sence/absence of acyclic and cyclic INDs. 26 M. Manna , F . Ricca a nd G. T erracina 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale CM Semantics - Query 1 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale CM Semantics - Query 2 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale CM Semantics - Query 3 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale CM Semantics - Query 4 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale CM Semantics - Query 5 Timeout (2h) STD OPT1 OPT2 0.1 1 10 100 1000 10000 100000 1e+06 Infomix Infomix_x10 Infomix_x50 Average Execution Time (s) - logscale CM Semantics - Query 6 Timeout (2h) STD OPT1 OPT2 Fig. 3. Query e valuation execution times for the CM-Complete semantics. In particular , we imposed the follo wing key co nstraints: k ey ( r 2 ) = { 1 , 2 } , key ( r 3 ) = { 1 } , and we experimen ted with three different sets of INDs: N OINCL = ∅ , ACYCLIC = { r 1 ( X 1 , X 2 , X 3 , X 4 ) → r 2 ( X 2 , X 5 , X 3 , X 6 ) , r 1 ( X 1 , X 2 , X 3 , X 4 ) → r 3 ( X 1 , X 5 , X 6 , X 7 ) } and CYCLIC = ACYCLIC ∪ { r 2 ( X 1 , X 2 , X 3 , X 4 ) → r 1 ( X 5 , X 6 , X 7 , X 2 ) } . The em- ployed que ry is: query ( X 1 , X 3) : − r 1 ( X 1 , X 2 , X 3 , X 4) , r 2 ( X 2 , X 3 , X 5 , X 6)? W e have random ly generated synthetic databases having a growing number of key violations on ta- ble r 2 . Th e ge neration pro cess prog ressi vely adds key violation s to r 2 by gen erating pairs of conﬂicting tup les; after an instance of r 2 is ob tained, tab les r 1 and r 3 are g enerated by taking values fr om r 2 in such a way th at INDs are satisﬁed. In addition, for each tuple of r 3 a k ey-conﬂicting tu ple is gen erated. In order to asses s the imp act of the number of INDs violations, for each database instance DB x , con taining x key v iolations on table r 2 , we generated a D B x - 10 instance wher e the 10 % of tuples is (rando mly) removed from tables r 1 and r 3 (causing I NDs v iolations). W e have ge nerated six database instances p er size (numb er of ke y v iolations on table r 2 ), and p lotted th e time (averaged over the instances of the same size) in Figure 4. In detail, Figure 4(a) shows the results for incremen tally hig her KD v iolations with CQA via ASP fr om differ ent per spectives 27 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 Average Execution Time (s) - logscale Number of key constraints violations Scalability: Impact of global constraints NOINCL-STD NOINCL-OPT ACYCLIC-STD ACYCLIC-OPT CYCLIC-STD CYCLIC-OPT 0.0001 0.001 0.01 0.1 1 10 100 1000 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 Average Execution Time (s) - logscale Number of key constraints violations Scalability: Impact of inclusion constraints ACYCLIC-10 CYCLIC-10 ACYCLIC CYCLIC (a) (b) Fig. 4. Scalability Analysis no I ND violation s. Both standar d and o ptimized encodin gs have b een tested. Figure 4(b ) compare s the optimized e ncoding o nly , when the percentag e of IND violations is 0% or 10%. Observe that, in general, even when th ere is no in itial I ND violation, the KD repairing process may induce some of them. The an alysis o f the se ﬁgu res shows that even if cyclic INDs are gen erally hard er , their scaling is almost the same as the acyclic o nes. On the con trary , in the ab sence of INDs the optimiza tion may bo ost the per forman ces (see the ﬂat lin e in Figure 4( a)). Figure 4 (b) points out that when the numb er of IND violation s increases, the performan ce m ay im- prove. This behavior is justiﬁed by the fact that tu ple deletions due to IND re pairs may , in their turn, remove KD violatio ns. This reduces t he number of disjunctions to be e valuated. 5 Related work and concluding remarks From the 9 0ies – when th e foun ding notions of CQA (Bry 1997), GA V mapping (Garcia-Mo lina et al. 1997; T o masic et al. 1998; Goh et al. 1999), and database-rep air (Arenas et al. 1999) were intro- duced – data integration (Lenzerin i 2 002) and inconsistent databases (Bertossi et al. 2005) have been stud ied quite in depth. Detailed characterization s of the m ain problems arising in a d ata in tegration system have been provid ed, takin g into acc ount different semantics, con straints, and quer y typ es (Cal` ı et al. 2003a; Cal` ı et al. 200 3b; Arenas et al. 2003; Ch omicki and Marcinkowski 20 05; Grieco et al. 2005; Fuxman and Miller 2007; Eiter et al. 2008). This paper provides a contribution in this scenar io by extending the decidability bound- aries for the loosely-exact semantics (as called in Cal` ı et al. 2003 a but ﬁrstly introduced by Arenas et al. 1999) and the loosely-sou nd semantics, in case of both KDs and SFS K INDs. A ﬁrst p ropo sal of an u nifying f ramework for CQA in a Data Integration setting is pre- sented in ( Cal` ı et al. 2005) u sing ﬁrst-ord er logic; it con siders different sem antics deﬁn ed by in terpreting the mapp ing assertions between the global and the local sch emas of the data integration system. A comm on framework for computin g repairs in a sing le da tabase setting is proposed in ( Eiter et al. 2008); it c overs a wide range of semantics relying on the general notion of preo rder fo r candidate rep airs, b ut only un iv ersally quantiﬁed constraints are allo wed. Moreover , the authors in troduce an abstract logic programming framew ork t o compute co nsistent an swers. Finally , the authors pro pose an optim ization strategy called factorization that, as will be clariﬁed belo w , is orthogon al to our o wn. 28 M. Manna , F . Ricca a nd G. T erracina This paper provid es a co ntribution in this setting since it uniﬁes different semantics, as in (Cal` ı et al. 2005) and (Eiter et al. 2008), but also provides an alg orithm that, gi ven a retrieved d atabase, a user query q , and a semantics, automatically c omposes an ASP progr am capable of co mputing the consistent answers to q . I n particula r , our ASP-rewriting offers a natural, compact, and d irect way for enc oding even hard cases where the CQA problem belongs to the Π p 2 complexity class. Theoretical studies gave rise to concrete im plementatio ns m ost of which were co n- ceiv ed to opera te on some speciﬁc semantics and/or co nstraint type s. (Aren as et al. 1999; Cal ` ı et al. 20 02; Greco and Zum pano 2000; Greco et al. 200 1; Cal` ı et al. 2003b; Aren as et al. 2003; Chomicki et al. 2004a; Cal` ı et al. 200 4; Chomick i et al. 2004b; Lembo 2004; Grieco et al. 2005; Leone et al. 2005; Fux man et al. 2005; Fuxm an and Miller 2007). As an example, in (Leon e et al. 2005) only the loosely-sou nd semantics was supported . In this paper, we pr ovide b oth a un i- ﬁed framework b ased on ASP , and a com plete system supp orting (i) all the three afor e- mentioned signiﬁcant semantics in case of c onjunctive q ueries a nd the most com monly used d atabase con straints (K Ds a nd INDs), ( ii) specialized optimizations, and (iii) a user- friendly GUI. Another general contr ibution of our work com es fro m a novel optim ization tech nique that, after analyzin g the query and loc alizing a min imal n umber of relevant ICs, tries to “simplify” th eir structure to red uce the number of datab ase repa irs – as they could be exponentially many (Arenas et al. 2001). Such techniq ue co uld b e classiﬁed as “vertical” due to the fact that it red uces (when ever po ssible) th e arity of each active relation (with the effect, e.g., of decreasing the number of key conﬂicts) without look ing at the data. It is or thogo nal to other “horizon tal” appro aches, such as magic-sets (Faber et al. 2007) and factor ization (Eiter et al. 20 08) which are based on data ﬁltering strategies. I n partic- ular , a system exploiting ASP incorporatin g mag ic-set techn iques for CQA is described in (Marileo and Bertossi 2010). Other ap proach es comp lementary to o ur o wn are based on ﬁrst-order rewritings of the qu ery (Arenas et al. 199 9; Chomicki and Marcinkowski 20 02; Cal ` ı et al. 20 03b; Grieco et al. 2005; Fuxman and Miller 2007). The co mbination of our optimization s with suc h approac hes, an d further extensions of decidability boundar ies fo r CQA are some of our future line of research. Acknowledgments . This work has been par tially sup ported by the Calabrian Region un - der PIA (Pacchetti In tegrati di Agev olazione industria, a rtigianato e servizi) pro ject DL VSYS- TEM approved in BURC n. 20 parte III del 15 /05/200 9 - DR n. 73 73 del 06/05/200 9. References A B I T E B O U L , S . , H U L L , R . , A N D V I A N U , V . 1995. Founda tions of Databases: The Logical Lev el . Addison-W esley L ongman Publishing Co., Inc., Boston, MA, USA. A R E N A S , M . , B E RT O S S I , L . , A N D C H O M I C K I , J . 1999. Consistent query answers in inconsistent databases. In Proceedings of PODS’99 . A CM, Ne w Y ork, NY , USA, 68–79. A R E N A S , M . , B E RT O S S I , L . , A N D C H O M I C K I , J . 2001. S calar Aggrega tion in F D-Inconsistent Databases. In Proceedings of ICDT’01 . LNCS, vo l. 1973. Springer Berlin / Heidelberg, 39 –53. A R E N A S , M . , B E RT O S S I , L . , A N D C H O M I C K I , J . 2003. Answer sets for consistent query answering in inconsistent databases. TPLP 3, 4, 393–424 . CQA via ASP fr om differ ent per spectives 29 B E R T O S S I , L . E . , H U N T E R , A . , A N D S C H A U B , T., Eds. 2 005. Inconsistenc y T olerance . LNCS, v ol. 3300. Springer , Berlin / Heidelberg . B R Y , F. 1997. Query Answering in Information S ystems with Integrity Constraints. In Proceedings of IICIS’97 . Chapman & Hall, Ltd., London, UK, UK, 113–1 30. C A L ` I , A . , C A LV A N E S E , D . , D E G I A C O M O , G . , A N D L E N Z E R I N I , M . 200 2. On the Role of Inte grity Constraints in Data Integration. IEEE Data Eng. Bull. 25, 3, 39–45. C A L ` I , A . , C A LV A N E S E , D . , D E G I A C O M O , G . , A N D L E N Z E R I N I , M . 2004. Data integ ration under integrity constraints. Inf. Syst. 29, 2, 147– 163. C A L ` I , A . , L E M B O , D . , A N D R O S A T I , R . 2003a. On the decidability and complex ity of query an- swering ov er inconsistent and incomplete databases. In Proceedings of PODS’03 . ACM, New Y ork, NY , USA, 260–2 71. C A L ` I , A . , L E M B O , D . , A N D R O S A T I , R . 2003b . Query rewriting and answering under constraints in data integration systems. In Proceedings of IJCAI’03 . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 16–21. C A L ` I , A . , L E M B O , D . , A N D R O S AT I , R . 200 5. A comprehensi ve semantic fr ame work for data integration systems. Journal of Algorithms 3, 2, 308–328 . C H O M I C K I , J . A N D M A R C I N K O W S K I , J . 2002. On the Computational Complexity of Consistent Query Answers. CoRR cs.DB/0204010 , 1–9. C H O M I C K I , J . A N D M A R C I N K O W S K I , J . 2005. Minimal-change integrity maintenance using tuple deletions. Inf. Comput. 197, 1-2, 90–121 . C H O M I C K I , J . , M A R C I N K O W S K I , J . , A N D S TAWO R KO , S . 2004a. Computing consistent query answers using conﬂict hypergraphs. In Proceedings of CIKM’04 . ACM, New Y ork, NY , USA, 417–42 6. C H O M I C K I , J . , M A R C I N K O W S K I , J . , A N D S TAWO R K O , S . 2004b . Hippo: A System for Computing Consistent Answers to a Class of S QL Queries. In Adv ances in Database T echnology - E DBT 2004 . LNC S, v ol. 2992. Springer Berlin / Heidelberg, 661–6 62. E I T E R , T., F I N K , M . , G R E C O , G . , A N D L E M B O , D . 2008. Repair localization for query answering from inconsistent databases. A CM TODS 33, 2, 10:1–10:51 . E I T E R , T . , G O T T L O B , G . , A N D M A N N I L A , H . 1997. Di sjuncti ve datalog. A CM TODS 22, 3, 364–41 8. F A B E R , W . , G R E C O , G . , A N D L E O N E , N . 20 07. Magic Sets and their ap plication to data integration. JCSS 73, 4, 584– 609. F A B E R , W ., P F E I F E R , G . , A N D L E O N E , N . 2010. Semantics and complexity of recursi ve agg regates in answer set programming. Artiﬁcial Intelligence In Press, Corrected Proof , 1–21. F U X M A N , A . , F A Z L I , E . , A N D M I L L E R , R . J . 2005. ConQuer: ef ﬁcient management of inconsistent databases. In Proceedings of SIGMOD’05 . A CM, Ne w Y ork, NY , USA, 155– 166. F U X M A N , A . A N D M I L L E R , R . J . 2007. First-order query rewriting for inconsistent databases. JCSS 73, 4, 610– 635. Special Issu e: Database Theory 2005. G A R C I A - M O L I N A , H . , P A PA KO N S TA N T I N O U , Y . , Q U A S S , D . , R A JA R A M A N , A . , S A G I V , Y . , U L L - M A N , J . , V A S S A L O S , V . , A N D W I D O M , J . 1997. The TSIMMIS Approach to Mediation: Data Models and Languages. JIIS 8, 2, 117–132 . G E L F O N D , M . A N D L I F S C H I T Z , V . 1988. The Stable Model Semantics for Logic Programming. In Proceedings of ICLP/SL P’88 . MIT Press, 1070–108 0. G E L F O N D , M . A N D L I F S C H I T Z , V . 1991. Classical Negation in Logic Programs and Disjuncti ve Databases. Ne w Gen. Comput. 9, 3-4, 365–38 5. G O H , C . H . , B R E S S A N , S . , M A D N I C K , S . , A N D S I E G E L , M . 1999. Conte xt interchan ge: ne w features and formalisms for t he intelligent inte gration of information. A CM TOIS 17, 3, 27 0–293. G R E C O , G . , G R E C O , S . , A N D Z U M PA N O , E . 2001. A Logic Programming App roach to the Integ ra- tion, Repairing and Querying of Inconsistent Databases. In Proceedings of ICL P’01 . Number 17 in LNCS. Springer Berlin / Heidelberg, 34 8–364. 30 M. Manna , F . Ricca a nd G. T erracina G R E C O , S . A N D Z U M PA N O , E . 2000. Querying incon sistent d atabases. In Proceedings o f LP AR’00 . Springer-V erlag, Berlin, Heidelberg , 308–325. G R I E C O , L . , L E M B O , D . , R O S AT I , R . , A N D R U Z Z I , M . 2005. Consistent query answering under ke y and exclusion dep endencies: algorithms and e xperiments. In Proceedings o f CIKM’05 . A CM, Ne w Y ork, NY , USA, 792–79 9. L E M B O , D . 2004. Dealing with Incon sistency and In completeness in Da ta Integ ration. Ph.D. thesis, Dipartimento di Informatica e Sistemistica, Univ ersita ` a di Roma “La Sapienza”. L E N Z E R I N I , M . 2002. Data integration: a theoretical perspe ctiv e. In Proceedings of PODS’02 . A CM, Ne w Y ork, NY , USA, 233–246. L E O N E , N . , G R E C O , G . , I A N N I , G . , L I O , V . , T E R R AC I N A , G . , E I T E R , T. , F A B E R , W., F I N K , M . , G O T T L O B , G . , R O S AT I , R . , L E M B O , D . , L E N Z E R I N I , M . , R U Z Z I , M . , K A L K A , E . , N OW I C K I , B . , A N D S TA N I S Z K I S , W . 2005. The I NFOMIX system for advan ced integration of incomplete and inconsistent data. In Proceedings of SIGMOD’05 . A CM, New Y ork, NY , USA, 915–917. L E O N E , N . , P F E I F E R , G . , F A B E R , W., E I T E R , T., G O T T L O B , G . , P E R R I , S . , A N D S C A R C E L L O , F. 2006. The DL V System for Kno wledge Representation and Reasoning. A CM TOCL 7, 3, 499–56 2. L E V E N E , M . A N D V I N C E N T , M . W . 2000. Justiﬁcation for Inclusion Dependency Normal Form. IEEE TKDE 12, 2, 281–291 . L I F S C H I T Z , V . A N D T U R N E R , H . 1994. Spli tting a logic program. I n Proceedings of ICLP’94 . MIT Press, Cambridge, MA, USA, 23–37. M A R I L E O , M . C . A N D B E RT O S S I , L . E . 2010. The consistenc y extractor system: Answer set pro- grams for consistent query answering in databases. Data Kno wl. Eng. 69, 6, 545–572 . M I N K E R , J . 1982. On I ndeﬁnite Data Bases and the Closed W orld Assumption. In Proceedings of CADE’82 . LNCS, vol. 13 8. Springer , Berlin / Heidelberg , 292–308. T E R R A C I N A , G . , D E F R A N C E S C O , E . , P A N E T TA , C . , A N D L E O N E , N . 2008. Enhancing a DL P Sys- tem for Advanced Database Applications. In Proceedings of R R’08 . LNCS , vol. 5341. Springer , Berlin / Heidelberg, 119 –134. T E R R A C I N A , G . , L E O N E , N . , L I O , V . , A N D P A N E T TA , C . 2008. E xperimenting with recursive queries in database and logic programming systems. TPLP 8, 2, 129–165 . T O M A S I C , A . , R A S C H I D , L . , A N D V A L D U R I E Z , P . 1998. Scaling Access to Heterogeneous Data Sources with DISCO. IEEE TKDE 10, 5, 808–823 .

Consistent Query Answering via ASP from Different Perspectives: Theory and Practice

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment