Interactive ontology debugging: two query strategies for efficient fault localization

Interacti ve ontology deb ugging: two query strate gies for e ﬃ cient fault localization I K ostyantyn Shchekotykhin a, ∗ , Gerhard Friedrich a , Philipp Fleiss a,1 , Patrick Rodler a,1 a Alpen-Adria Universit ¨ at, Universit ¨ atsstrasse 65-67, 9020 Klag enfurt, Austria Abstract E ﬀ ectiv e deb ugging of ontologies is an important prerequisite for their broad application, especially in areas that rely on ev eryday users to create and maintain kno wledge bases, such as the Semantic W eb . In such systems ontologies capture formalized vocab ularies of terms shared by its users. Howe v er in many cases users ha ve di ﬀ erent local views of the domain, i.e. of the context in which a gi ven term is used. Inappropriate usage of terms together with natural complications when formulating and understanding logical descriptions may result in faulty ontologies. Recent ontology debugging approaches use diagnosis methods to identify causes of the faults. In most debugging scenarios these methods return man y alternativ e diagnoses, thus placing the burden of f ault localization on the user . This paper demonstrates ho w the target diagnosis can be identiﬁed by performing a sequence of observations, that is, by querying an oracle about entailments of the target ontology . T o identify the best query we propose two query selection strategies: a simple “split-in-half” strategy and an entropy-based strategy . The latter allows knowledge about typical user errors to be exploited to minimize the number of queries. Our ev aluation showed that the entropy-based method signiﬁcantly reduces the number of required queries compared to the “split-in-half” approach. W e experimented with di ﬀ erent probability distrib utions of user errors and di ﬀ erent qualities of the a-priori probabilities. Our measurements demonstrated the superiority of entropy-based query selection ev en in cases where all fault probabilities are equal, i.e. where no information about typical user errors is av ailable. K e ywor ds: Ontology Debugging, Query Selection, Model-based Diagnosis, Description Logic 1. Introduction Ontology acquisition and maintenance are important prerequisites for the successful application of semantic systems in areas such as the Semantic W eb . Howe ver , as state of the art ontology extraction methods cannot au- tomatically acquire ontologies in a complete and error - free fashion, users of such systems must formulate and correct logical descriptions on their o wn. In most of the cases these users are domain experts who hav e little or I This article is a substantial e xtension of the preliminary results published in Pr oceedings of the 9th International Semantic W eb Con- fer ence (ISWC 2010) [1]. ∗ Corresponding author at: Alpen-Adria Univ ersit ¨ at, Univ er- sit ¨ atsstrasse 65-67, 9020 Klagenfurt, Austria. T el: + 43 463 2700 3768, Fax: ++ 43 463 2700 993768 Email addr esses: kostya@ifit.uni-klu.ac.at (Kostyantyn Shchekotykhin), gerhard@ifit.uni-klu.ac.at (Gerhard Friedrich), pfleiss@ifit.uni-klu.ac.at (Philipp Fleiss), prodler@ifit.uni-klu.ac.at (Patrick Rodler) 1 The research project is funded by grants of the Austrian Science Fund (Project V -Know , contract 19996) no e xperience in expressing kno wledge in representa- tion languages like O WL 2 DL [2]. Studies in cogniti ve psychology , e.g. [3, 4], indicate that humans make sys- tematic errors while formulating or interpreting logical descriptions, with the results presented in [5, 6] con- ﬁrming that these observations also apply to ontology dev elopment. Moreo ver , the problem gets even more if an ontology is dev eloped by a group of users, such as OBO Foundry 2 or NCI Thesaurus 3 , is based on a set of imported third-party ontologies, etc. In this case in- consistencies might appear if some user does not un- derstand or accept the context in which shared ontolog- ical descriptions are used. Therefore, identiﬁcation of erroneous ontological deﬁnitions is a di ﬃ cult and time- consuming task. Sev eral ontology debugging methods [7, 8, 9, 10] were proposed to simplify ontology de velopment and maintenance. Usually the main aim of debugging is to 2 http://www.obofoundry.org 3 http://ncit.nci.nih.gov Pr eprint submitted to W eb Semantics: Science, Services and Agents on the W orld W ide W eb October 8, 2018 obtain a consistent and, optionally , coherent ontology . These basic requirements can be extended with addi- tional ones, such as test cases [9], which must be ful- ﬁlled by the tar get ontolo gy O t . Any ontology that does not fulﬁll the requirements is faulty re gardless of ho w it was created. For instance, an ontology might be created by an expert specializing descriptions of the imported ontologies (top-down) or by an inducti ve learning algo- rithm from a set of examples (bottom-up). Note that ev en if all requirements are completely speciﬁed, many logically equi valent target ontologies might exist. They may di ﬀ er in aspects such as the complexity of consistency checks, size or readability . Howe v er , selecting between logically equiv alent theo- ries based on such measures is out of the scope of this paper . Furthermore, although target ontologies may ev olve as requirements change over time, we assume that the target ontology remains stable throughout a de- bugging session. Giv en an set of requirements (e.g. formulated by a user) and a faulty ontology , the task of an ontology de- bugger is to identify the set of alternati ve diagnoses, where each diagnosis corresponds to a set of possibly faulty axioms. More concretely , a diagnosis D is a subset of an ontology O such that one should remove (change) all the axioms of a diagnosis from the ontol- ogy (i.e. O \ D ) in order to formulate an ontology O 0 that fulﬁlls all the gi ven requirements. Only if the set of requirements is complete the only possible ontology O 0 corresponds to the target ontology O t . In the follo wing we refer to the remov al of a diagnosis from the ontol- ogy as a trivial application of a diagnosis. Moreover , in practical applications it might be ine ﬃ cient to con- sider all possible diagnoses. Therefore, modern ontol- ogy debugging approaches focus on the computation of minimal diagnoses. A set of axioms D i is a minimal di- agnosis i ﬀ there is no proper subset D 0 i ⊂ D i which is a diagnosis. Thus, minimal diagnoses constitute minimal required changes to the ontology . Application of diagnosis methods can be problematic in the cases for which many alternativ e minimal diag- noses exist for a gi ven set of test cases and require- ments. A sample study of real-world incoherent on- tologies, which were used in [8], showed that hundreds or ev en thousands of minimal diagnoses may exist. In the case of the Transportation ontology the diagnosis method was able to identify 1782 minimal diagnoses 4 . In such situations a simple visualization of all alterna- tiv e sets of modiﬁcations to the ontology is ine ﬀ ective. 4 In Section 5, we will giv e a detailed characterization of these on- tologies. Thus an e ﬃ cient debugging method should be able to discriminate between the diagnoses in order to select the tar get diagnosis D t . T ri vial application of D t to the on- tology O allows a user to extend ( O \ D t ) with a set of additional axioms E X and, thus, to formulate the target ontology O t , i.e. O t = ( O \ D t ) ∪ E X . One possible solution to the diagnosis discrimina- tion problem would be to order the set of diagnoses by various preference criteria. For instance, Kalyanpur et al. [11] suggest a measure to rank the axioms of a diag- nosis depending on their structure, usage in test cases, prov enance, and impact in terms of entailments. Only the top ranking diagnoses are then presented to the user . Of course this set of diagnoses will contain the target diagnosis only in cases where the faulty ontology , the giv en requirements and test cases pro vide su ﬃ cient data to the appropriate heuristic. Howe ver , it is di ﬃ cult to identify which information, e.g. test cases, is really re- quired to identify the target diagnosis. That is, a user does not know a priori which and how man y tests should be provided to the debugger to ensure that it will return the target diagnosis. In this paper we present an approach for the acquisi- tion of additional information by generating a sequence of queries, the answers of which can be used to reduce the set of diagnoses and ultimately identify the target diagnosis. These queries should be answered by an or - acle such as a user or an information extraction system. In order to construct queries we e xploit the property that di ﬀ erent ontologies resulting from trivial applications of di ﬀ erent diagnoses entail unequal sets of axioms. Con- sequently , we can di ﬀ erentiate between diagnoses by asking the oracle if the target ontology should entail a set of logical sentences or not. These entailed logical sentences can be generated by the classiﬁcation and re- alization services provided in description logic reason- ing systems [12, 13, 14]. In particular , the classiﬁcation process computes a subsumption hierarchy (sometimes also called “inheritance hierarchy” of parents and chil- dren) for each concept description mentioned in a TBox. For each individual mentioned in an ABox, the realiza- tion computes all the concept names of which the indi- vidual is an instance [12]. W e propose tw o methods for selecting the next query of the set of possible queries: The ﬁrst method employs a greedy approach that selects queries which try to cut the number of diagnoses in half. The second method exploits the fact that some diagnoses are more likely than others because of typical user errors [5, 6]. Be- liefs for an error to occur in a given part of a kno wledge base, represented as a probability , can be used to esti- mate the change in entropy of the set of diagnoses if a 2 particular query is answered. In our ev aluation the fault probabilities of axioms are estimated by the type and number of the logical operators employed. F or exam- ple, roughly speaking, the greater the number of logical operators and the more comple x these operators are, the greater the fault probability of an axiom. For assign- ing prior fault probabilities to diagnoses we employ the fault probabilities of axioms. Of course other methods for guessing prior f ault probabilities, e.g. based on con- text of concept descriptions, measures suggested in the previous work [11], etc., can be easily integrated in our framew ork. Giv en a set of diagnoses and their proba- bilities the method selects a query which minimizes the expected entropy of a set of diagnoses after an oracle answers a query , i.e. maximizes the information gain. An oracle should answer such queries until a diagno- sis is identiﬁed whose probability is signiﬁcantly higher than those of all other diagnoses. This diagnosis is most likely to be the tar get diagnosis. In the ﬁrst ev aluation scenario we compare the per- formance of both methods in terms of the number of queries needed to identify the target diagnosis. The ev aluation is performed using generated examples as well as real-world ontologies presented in T ables 8 and 12. In the ﬁrst case we alter a consistent and co- herent ontology with additional axioms to generate con- ﬂicts that result in a predeﬁned number of diagnoses of a required length. Each faulty ontology is then analyzed by the debugging algorithm using entropy , greedy and “random” strategies, where the latter selects queries at random. The e v aluation results show that in some cases the entropy-based approach is almost 60% better than the greedy one whereas both approaches clearly outper - formed the random strategy . In the second ev aluation scenario we inv estigate the robustness of the entrop y-based strategy with respect to variations in the prior fault probabilities. W e analyze the performance of entropy-based and greedy strategies on real-world ontologies by simulating di ﬀ erent types of prior fault probability distributions as well as the “quality” of these probabilities that might occur in prac- tice. In particular , we identify the cases where all prior fault probabilities are (1) equal, (2) “moderately” varied or (3) “extremely” varied. Re garding the “quality” of the probabilities we in vestigate cases where the guesses based on the prior diagnosis probabilities are good, av- erage or bad. The results show that the entropy method outperforms “split-in-half” in almost all of the cases, namely when the tar get diagnosis is located in the more likely two thirds of the minimal diagnoses. In some sit- uations the entropy-based approach achie ves e ven twice the performance of the greedy one. Only in cases where the initial guess of the prior probabilities is very vague (the bad case), and the number of queries needed to identify the target diagnosis is low , “split-in-half” may sav e on average one query . Howe ver , if the number of queries increases, the performance of the entropy-based query selection increases compared to the “split-in-half” strategy . W e observed that if the number of queries is greater than 10, the entropy-based method is preferable ev en if the initial guess of the prior probabilities is bad. This is due to the e ﬀ ect that the initial bad guesses are improv ed by the Bayes-update of the diagnoses proba- bilities as well as an ability of the entropy-based method to stop in the cases when a probability of some diagnosis is abo ve an acceptance threshold predeﬁned by the user . Consequently , entropy-based query selection is robust enough to handle di ﬀ erent prior fault probability distri- butions. Additional experiments performed on big real-world ontologies demonstrate the scalability of the suggested approach. In our experiments we were able to identify the target diagnosis in an ontology with ov er 33000 ax- ioms using entropy-based query selection in only 190 seconds using an av erage of ﬁv e queries. The remainder of the paper is organized as follows: Section 2 presents two introductory examples as well as the basic concepts. The details of the entropy-based query selection method are gi ven in Section 3. Section 4 describes the implementation of the approach and is fol- lowed by ev aluation results in Section 5. The paper con- cludes with an ov erview of related w ork. 2. Motiv ating examples and basic concepts W e begin by presenting the fundamentals of ontology diagnosis and then sho w how queries and answers can be generated and employed to di ﬀ erentiate between sets of diagnoses. 2.1. Description logics Since the underlying knowledge representation method of ontologies in the Semantic W eb is based on description logics, we start by brieﬂy introducing the main concepts, employing the usual deﬁnitions as in [15, 16]. A knowledge base is comprised of two com- ponents, namely a TBox (denoted by T ) and a ABox ( A ). The TBox deﬁnes the terminology whereas the ABox contains assertions about named indi viduals in terms of the vocab ulary deﬁned in the TBox. The vo- cabulary consists of concepts, denoting sets of individ- uals, and roles, denoting binary relationships between individuals. These concepts and roles may be either 3 atomic or complex, the latter being obtained by em- ploying description operators. The language of descrip- tions is deﬁned recursively by starting from a schema S = ( CN , RN , IN ) of disjoint sets of names for con- cepts, roles, and individuals. T ypical operators for the construction of complex descriptions are C t D (disjunc- tion), C u D (conjunction), ¬ C (negation), ∀ R . C (con- cept value restriction), and ∃ R . C (concept exists restric- tion), where C and D are elements of CN and R ∈ RN . Knowledge bases are deﬁned by a ﬁnite set of logi- cal sentences. Sentences regarding the TBox are called terminological axioms whereas sentences regarding the ABox are called assertional axioms. T erminological ax- ioms are expressed by C v D (Generalized Concept In- clusion) which corresponds to the logical implication. Let a , b ∈ IN be individual names. C ( a ) and R ( a , b ) are thus assertional axioms. Concepts (rsp. roles) can be regarded as unary (rsp. binary) predicates. Roughly speaking description logics can be seen as fragments of ﬁrst-order predicate logic (without considering transiti ve closure or special ﬁx- point semantics). These fragments are speciﬁcally de- signed to ensure decidability or fa vorable computational costs. The semantics of description terms are usually given using an interpretation I = h ∆ I , ( · ) I i , where ∆ I is a do- main (non-empty uni v erse) of v alues, and ( · ) I is a func- tion that maps e very concept description to a subset of ∆ I , and ev ery role name to a subset of ∆ I × ∆ I . The mapping also associates a value in ∆ I with every indi- vidual name in IN . An interpretation I is a model of a knowledge base i ﬀ it satisﬁes all terminological axioms and assertional axioms. A kno wledge base is satisﬁable i ﬀ a model exists. A concept description C is coherent (satisﬁable) w .r .t. a TBox T , if a model I of T exists such that C I , ∅ . A TBox is incoherent i ﬀ an incoher- ent concept description exists. 2.2. Diagnosis of ontolo gies Example 1. Consider a simple ontology O with the ter- minology T : ax 1 : A v B ax 2 : B v C ax 3 : C v D ax 4 : D v R and assertions A : { A ( w ) , ¬ R ( w ) , A ( v ) } . Assume that the user explicitly states that the three as- sertional axioms should be considered as correct, i.e. these axioms are added to a background theory B . The introduction of a background theory ensures that the di- agnosis method focuses purely on the potentially faulty axioms. Furthermore, assume that the user requires the cur- rently inconsistent ontology O to be consistent. The only irreducible set of axioms (minimal conﬂict set) that preserves the inconsistency is C S : h ax 1 , ax 2 , ax 3 , ax 4 i . That is, one has to modify or remove the axioms of at least one of the following diagnoses D 1 : [ ax 1 ] D 2 : [ ax 2 ] D 3 : [ ax 3 ] D 4 : [ ax 4 ] to restore the consistency of the ontology . Howe ver , it is unclear which of the ontologies O i = O \ D i obtained by application of diagnoses from the set D : {D 1 , . . . , D 4 } is the target one. Deﬁnition 1. A target ontology O t is a set of logical sentences characterized by a set of backgr ound axioms B , a set of sets of logical sentences P that must be en- tailed by O t and the set of sets of logical sentences N that must not be entailed by O t . A tar get ontology O t must fulﬁll the following neces- sary r equir ements: • O t must be satisﬁable (optionally coher ent) • B ⊆ O t • O t | = p ∀ p ∈ P • O t 6| = n ∀ n ∈ N Giv en B , P , and N , an ontology O is faulty i ﬀ O does not fulﬁll all the necessary requirements of the target ontology . Note that the approach presented in this paper can be used with any knowledge representation language for which there exists a sound and complete procedure to decide whether O | = ax and the entailment operator | = is extensiv e, monotone and idempotent. For instance, these requirements are fulﬁlled by all subsets of O WL 2 which are interpreted under O WL Direct Semantics. Deﬁnition 1 allows a user to identify the target diag- nosis D t by providing su ﬃ cient information about the target ontology in the sets B , P and N . F or instance, if in Example 1 the user provides the information that O t | = { B ( w ) } and O t 6| = { C ( w ) } , the debugger will return only one diagnosis, namely D 2 . Application of this di- agnosis results in a consistent ontology O 2 = O \ D 2 that entails { B ( w ) } because of ax 1 and the assertion A ( w ). In addition, O 2 does not entail { C ( w ) } since O 2 ∪ {¬ C ( w ) } is consistent and, moreover , {¬ R ( w ) , ax 4 , ax 3 } | = { ¬ C ( w ) } . All other ontologies O i = ( O \ D i ) obtained by the ap- plication of the diagnoses D 1 , D 3 and D 4 do not fulﬁll the given requirements, since O 1 ∪ { B ( w ) } is inconsis- tent and therefore an y consistent extension of O 1 cannot entail { B ( w ) } . As both O 3 and O 4 entail { C ( w ) } , O 2 cor- responds to the target diagnosis O t . 4 Deﬁnition 2. Let h O , B , P , N i be a diagnosis pr oblem instance, where O is an ontology , B a backgr ound the- ory , P a set of sets of logical sentences which must be entailed by the targ et ontology O t , and N a set of sets of logical sentences which must not be entailed by O t . A set of axioms D ⊆ O is a diagnosis i ﬀ the set of axioms O \ D can be extended by a logical description E X such that: 1. ( O \ D ) ∪ B ∪ E X is consistent (and coherent if r equir ed) 2. ( O \ D ) ∪ B ∪ E X | = p ∀ p ∈ P 3. ( O \ D ) ∪ B ∪ E X 6| = n ∀ n ∈ N A diagnosis D i deﬁnes a partition of the ontology O where each axiom ax j ∈ D i is a candidate for changes by the user and each axiom ax k ∈ O \ D i is correct. If D t is the set of axioms of O to be changed (i.e. D t is the target diagnosis) then the target ontology O t is ( O \ D t ) ∪ B ∪ E X for some E X deﬁned by the user . In the follo wing we assume the background theory B together with the sets of logical sentences in the sets P and N always allow formulation of the target ontology . Moreov er , a diagnosis e xists i ﬀ a target ontology exists. Proposition 1. A diagnosis D for a diagnosis problem instance h O , B , P , N i exists i ﬀ B ∪ [ p ∈ P p is consistent (coher ent) and ∀ n ∈ N : B ∪ [ p ∈ P p 6| = n The set of all diagnoses is complete in the sense that at least one diagnosis exists where the ontology result- ing from the tri vial application of a diagnosis is a subset of the target ontology: Proposition 2. Let D , ∅ be the set of all diagnoses for a diagnosis pr oblem instance h O , B , P , N i and O t the tar get ontology . Then a diagnosis D t ∈ D exists s.t. ( O \ D t ) ⊆ O t . The set of all diagnoses can be characterized by the set of minimal diagnoses. Deﬁnition 3. A diagnosis D for a diagnosis problem instance h O , B , P , N i is a minimal diagnosis i ﬀ ther e is no D 0 ⊂ D such that D 0 is a diagnosis. Proposition 3. Let h O , B , P , N i be a diagnosis pr oblem instance. F or every diagnosis D ther e is a minimal di- agnosis D 0 s.t. D 0 ⊆ D . Deﬁnition 4. A diagnosis D for a diagnosis pr oblem in- stance h O , B , P , N i is a minimum cardinality diagnosis i ﬀ ther e is no diagnosis D 0 such that |D 0 | < |D| . T o summarize, a diagnosis describes which axioms are candidates for modiﬁcation. Despite the fact that multiple diagnoses may exist, some are more preferable than others. E.g. minimal diagnoses require minimal changes, i.e. axioms are not considered for modiﬁcation unless there is a reason. Minimal cardinality diagnoses require changing a minimal number of axioms. The ac- tual type of error contained in an axiom is irrelev ant as the concept of diagnosis deﬁned here does not make any assumptions about errors themselves. There can, how- ev er , be instances where an ontology is faulty and the empty diagnosis is the only minimal diagnosis, e.g. if some axioms are missing and nothing must be changed. The extension E X plays an important role in the on- tology repair process, suggesting axioms that should be added to the ontology . For instance, in Example 1 the user requires that the target ontology must not entail { B ( w ) } but has to entail { B ( v ) } , that is N = {{ B ( w ) }} and P = {{ B ( v ) }} . Because, the example ontology O is in- consistent some sentences must be changed. The con- sistent ontology O 1 = O \ D 1 , neither entails { B ( v ) } nor { B ( w ) } (in particular O 1 | = { ¬ B ( w ) } ). Consequently , O 1 has to be extended with a set E X of logical sentences in order to entail { B ( v ) } . This set of logical sentences can be approximated with E X = { B ( v ) } . O 1 ∪ E X is satisﬁable, entails { B ( v ) } but does not entail { B ( w ) } . All other ontologies O i = O \ D i , i = 2 , 3 , 4 are consistent but entail { B ( w ) , B ( v ) } and must be rejected because of the monotonic semantics of description logic. That is, there is no such extension E X that ( O i ∪ E X ) 6| = { B ( w ) } . Therefore, the diagnosis D 1 is the minimum cardinal- ity diagnosis which allows the formulation of the target ontology . Note that formulation of the complete exten- sion is impossible, since our diagnosis approach deals with changes to e xisting axioms and does not learn ne w axioms. The following corollary characterizes diagnoses without employing the true extension E X to formulate the target ontology . The idea is to use the sentences which must be entailed by the target ontology to approx- imate E X as shown abov e. Corollary 1. Given a diagnosis pr oblem instance 5 h O , B , P , N i , a set of axioms D ⊆ O is a diagnosis i ﬀ ( O \ D ) ∪ B ∪ [ p ∈ P p (Condition 1) is satisﬁable (coher ent) and ∀ n ∈ N : ( O \ D ) ∪ B ∪ [ p ∈ P p 6| = n (Condition 2) Proof sketch: ( ⇒ ) Let D ⊆ O be a diagnosis for h O , B , P , N i . Since there is an E X s.t. ( O \ D ) ∪ B ∪ E X is satisﬁable (coherent) and ( O \ D ) ∪ B ∪ E X | = p for all p ∈ P , it follo ws that ( O \ D ) ∪ B ∪ E X ∪ S p ∈ P p is sat- isﬁable (coherent) and therefore ( O \ D ) ∪ B ∪ S p ∈ P p is satisﬁable (coherent). Consequently , the ﬁrst condition of the corollary is fulﬁlled. Since ( O \ D ) ∪ B ∪ E X | = p for all p ∈ P and ( O \ D ) ∪ B ∪ E X 6| = n for all n ∈ N it follows that ( O \ D ) ∪ B ∪ E X ∪ S p ∈ P p 6| = n for all n ∈ N . Consequently , ( O \ D ) ∪ B ∪ S p ∈ P p 6| = n for all n ∈ N and the second condition of the corollary is fulﬁlled. ( ⇐ ) Let D ⊆ O and h O , B , P , N i be a diagnosis prob- lem instance. W ithout limiting generality let E X = P . By Condition 1 of the corollary ( O \ D ) ∪ B ∪ S p ∈ P p is satisﬁable (coherent). Therefore, for E X = P the sen- tences ( O \ D ) ∪ B ∪ E X are satisﬁable (coherent), i.e. the ﬁrst condition for a diagnosis is fulﬁlled and these sentences entail p for all p ∈ P which corresponds to the second condition a diagnosis must fulﬁll. Furthermore, by Condition 2 of the corollary ( O \ D ) ∪ B ∪ E X 6| = n for all n ∈ N holds and therefore the third condition for a di- agnosis is fulﬁlled. Consequently , D ⊆ O is a diagnosis for h O , B , P , N i . 2 Conﬂict sets , which are the parts of the ontology that preserve the inconsistenc y / incoherency , are usually em- ployed to constrain the search space during computation of diagnoses. Deﬁnition 5. Given a diagnosis pr oblem instance h O , B , P , N i , a set of axioms C S ⊆ O is a conﬂict set i ﬀ C S ∪ B ∪ S p ∈ P p is inconsistent (incoherent) or n ∈ N exists s.t. C S ∪ B ∪ S p ∈ P p | = n. Deﬁnition 6. A conﬂict set C S for an instance h O , B , P , N i is minimal i ﬀ ther e is no C S 0 ⊂ C S such that CS 0 is a conﬂict set. A set of minimal conﬂict sets can be used to compute the set of minimal diagnoses as sho wn in [17]. The idea is that each diagnosis must include at least one element of each minimal conﬂict set. Proposition 4. D is a minimal diagnosis for the di- agnosis pr oblem instance h O , B , P , N i i ﬀ D is a mini- mal hitting set for the set of all minimal conﬂict sets of h O , B , P , N i . Ontology Entailments O 1 ∅ O 2 { B ( w ) } O 3 { B ( w ) , C ( w ) } O 4 { B ( w ) , C ( w ) , D ( w ) } T able 1: Entailments of ontologies O i = ( O \ D i ) , i = 1 , . . . , 4 in Example 1 returned by realization. Giv en a set of sets S , a set H is a hitting set of S i ﬀ H ∩ S i , ∅ for all S i ∈ S and H ⊆ S S i ∈ S S i . Most modern ontology diagnosis methods [7, 8, 9, 10] are im- plemented according to Proposition 4 and di ﬀ er only in details, such as how and when (minimal) conﬂict sets are computed, the order in which hitting sets are gener- ated, etc. 2.3. Di ﬀ er entiating between diagnoses The diagnosis method usually generates a set of di- agnoses for a given diagnosis problem instance. Thus, in Example 1 an ontology debugger returns a set of four minimal diagnoses {D 1 , . . . , D 4 } . As explained in the previous section, additional information, i.e. sets of sets of logical sentences P and N , can be used by the de- bugger to reduce the set of diagnoses. Howe ver , in the general case the user does not kno w which sets P and N to provide to the debugger such that the target diagno- sis will be identiﬁed. Therefore, the debugger should be able to identify sets of logical sentences on its own and only ask the user or some other oracle, whether these sentences must or must not be entailed by the target on- tology . T o generate these sentences the debugger can apply each of the diagnoses in D = {D 1 , . . . , D n } and obtain a set of ontologies O i = O \ D i , i = 1 , . . . , n that fulﬁll the user requirements. For each ontology O i a description logic reasoner can generate a set of entail- ments such as entailed subsumptions provided by the classiﬁcation service and sets of class assertions pro- vided by the realization service. These entailments can be used to discriminate between the diagnoses, as dif- ferent ontologies entail di ﬀ erent sets of sentences due to extensi vity of the entailment relation. Note that in the examples provided in this section we consider only two types of entailments, namely subsumption and class assertion. In general, the approach presented in this pa- per is not limited to these types and can use all of the entailment types supported by a reasoner . For instance, in Example 1 for each ontology O i = ( O \ D i ) , i = 1 . . . 4 the realization service of a reasoner returns the set of class assertions presented in T able 1. W ithout any additional information the debugger can- not decide which of these sentences must be entailed 6 by the target ontology . T o obtain this information the diagnosis method must query an oracle that can spec- ify whether the target ontology entails some set of sen- tences or not. E.g. the debugger could ask an oracle if { D ( w ) } is entailed by the tar get ontology ( O t | = { D ( w ) } ). If the answer is yes , then { D ( w ) } is added to P and D 4 is considered as the target diagnosis. All other diagnoses are rejected because ( O \ D i ) ∪ B ∪ { D ( w ) } for i = 1 , 2 , 3 is inconsistent. If the answer is no , then { D ( w ) } is added to N and D 4 is rejected as ( O \ D 4 ) ∪ B | = { D ( w ) } and we hav e to ask the oracle another question. In the follo wing we consider a query Q as a set of logical sentences such that O t | = Q holds i ﬀ O t | = q i for all q i ∈ Q . Property 1. Given a diagnosis pr oblem instance h O , B , P , N i , a set of diagnoses D , a set of logical sen- tences Q r epr esenting the query ( O t | = Q ) and an oracle able to evaluate the query: If the oracle answers yes then every diagnosis D i ∈ D is a diagnosis for P ∪ { Q } i ﬀ both conditions hold: ( O \ D i ) ∪ B ∪ [ p ∈ P p ∪ Q is consistent (coherent) ∀ n ∈ N : ( O \ D i ) ∪ B ∪ [ p ∈ P p ∪ Q 6| = n If the or acle answers no then every diagnosis D i ∈ D is a diagnosis for N ∪ { Q } i ﬀ both conditions hold: ( O \ D i ) ∪ B ∪ [ p ∈ P p is consistent (coherent) ∀ n ∈ ( N ∪ { Q } ) : ( O \ D i ) ∪ B ∪ [ p ∈ P p 6| = n In particular , a query partitions the set of diagnoses D into three disjoint subsets. Deﬁnition 7. F or a query Q, each diagnosis D i ∈ D of a diagnosis pr oblem instance h O , B , P , N i can be assigned to one of the thr ee sets D P , D N or D ∅ wher e • D i ∈ D P i ﬀ it holds that ( O \ D i ) ∪ B ∪ [ p ∈ P p | = Q • D i ∈ D N i ﬀ it holds that ( O \ D i ) ∪ B ∪ [ p ∈ P p ∪ Q is inconsistent (incoher ent). • D i ∈ D ∅ i ﬀ D i ∈ D \  D P ∪ D N  Giv en a diagnosis problem instance we say that the diagnoses in D P predict a positive answer ( yes ) as a re- sult of the query Q , diagnoses in D N predict a negati ve answer ( no ), and diagnoses in D ∅ do not make any pre- dictions. Property 2. Given a diagnosis pr oblem instance h O , B , P , N i , a set of diagnoses D , a query Q and an oracle: If the oracle answers yes then the set of rejected di- agnoses is D N and the set of r emaining diagnoses is D P ∪ D ∅ . If the oracle answers no then the set of r ejected di- agnoses is D P and the set of r emaining diagnoses is D N ∪ D ∅ . Consequently , given a query Q either D P or D N is eliminated but D ∅ always remains after the query is an- swered. For generating queries we have to inv estigate for which subsets D P , D N ⊆ D a query exists that can di ﬀ erentiate between these sets. A straight forward ap- proach is to inv estigate all possible subsets of D . In our ev aluation we sho w that this is feasible if we limit the number n of minimal diagnoses to be considered during query generation and selection. E.g. for n = 9, the algo- rithm has to verify 512 possible partitions in the worst case. Giv en a set of diagnoses D for the ontology O , a set P of sets of sentences that must be entailed by the target ontology O t and a set of background axioms B , the set of partitions PR for which a query e xists can be computed as follows: 1. Generate the po wer set P ( D ) , PR ← ∅ 2. Assign an element of P ( D ) to the set D P i and gen- erate a set of common entailments E i of all ontolo- gies ( O \ D j ) ∪ B ∪ S p ∈ P p , where D j ∈ D P i 3. If E i = ∅ , then reject the current element D P i , i.e. set P ( D ) ← P ( D ) \ { D P i } and goto Step 2. Otherwise set Q i ← E i . 4. Use Deﬁnition 7 and the query Q i to classify the diagnoses D k ∈ D \ D P i into the sets D P i , D N i and D ∅ i . The generated partition is added to the set of partitions PR ← PR ∪ { D Q i , D P i , D N i , D ∅ i E } and set P ( D ) ← P ( D ) \ { D P i } . If P ( D ) , ∅ then go to Step 2. In Example 1 the set of diagnoses D of the ontology O contains 4 elements. Therefore, the power set P ( D ) in- cludes 15 elements { {D 1 } , { D 2 } , . . . , { D 1 , D 2 , D 3 , D 4 }} , assuming we omit the element corresponding to ∅ as it does not contain any diagnoses to be ev aluated. More- ov er , assume that P and N are empty . In each iter- ation an element of P ( D ) is assigned to the set D P i . 7 For instance, the algorithm assigns D P 1 = {D 1 , D 2 } . In this case the set of common entailments is empty as ( O \ D 1 ) ∪ B has no entailed sentences (see T able 1). Therefore, the set {D 1 , D 2 } is rejected and removed from P ( D ) . Assume that in the next iteration the al- gorithm selects D P 2 = {D 2 , D 3 } . In this case the set of common entailments E 2 = { B ( w ) } is not empty and so Q 2 = { B ( w ) } . The remaining diagnoses D 1 and D 4 are classiﬁed according to Deﬁnition 7. That is, the algo- rithm selects the ﬁrst diagnosis D 1 and veriﬁes whether ( O \ D 1 ) ∪ B | = { B ( w ) } . Given the negativ e answer of the reasoner , the algorithm checks if ( O \ D 1 ) ∪ B ∪ { B ( w ) } is inconsistent. Since the condition is satisﬁed the diag- nosis D 1 is added to the set D N 2 . The second diagnosis D 4 is added to the set D P 2 as it satisﬁes the ﬁrst require- ment ( O \ D 4 ) ∪ B | = { B ( w ) } . The resulting partition h { B ( w ) } , {D 2 , D 3 , D 4 } , {D 1 } , ∅ i is added to the set PR . Howe v er , a query need not include all of the entailed sentences. If a query Q partitions the set of diagnoses into D P , D N and D ∅ and an (irreducible) subset Q 0 ⊂ Q exists which preserves the partition then it is su ﬃ cient to query Q 0 . In our example, Q 2 : { B ( w ) , C ( w ) } can be reduced to its subset Q 0 2 : { C ( w ) } . If there are multiple irreducible subsets that preserve the partition then we select one of them. All of the queries and their corresponding partitions generated in Example 1 are presented in T able 2. Given these queries the debugger has to decide which one should be asked ﬁrst in order to minimize the number of queries to be answered. A popular query selection heuristic (called “split-in-half ”) prefers queries which allow half of the diagnoses to be removed from the set D regardless of the answer of an oracle. Using the data presented in T able 2, the “split-in- half ” heuristic determines that asking the oracle if ( O t | = { C ( w ) } ) is the best query (i.e. the reduced query Q 2 ), as two diagnoses from the set D are removed regard- less of the answer . Assuming that D 1 is the target di- agnosis, then an oracle will answer no to our question (i.e. O t 6| = { C ( w ) } ). Based on this feedback, the di- agnoses D 3 and D 4 are remov ed according to Prop- erty 2. Giv en the updated set of diagnoses D and P = {{ C ( w ) }} the partitioning algorithm returns the only partition h{ B ( w ) } , { D 2 } , { D 1 } , ∅ i . The heuristic then se- lects the query { B ( w ) } , which is also answered with no by the oracle. Consequently , D 1 is identiﬁed as the only remaining minimal diagnosis. In general, if n is the number of diagnoses and we can split the set of diagnoses in half with each query , then the minimum number of queries is log 2 n . Note that this minimum number of queries can only be achiev ed when all minimal diagnoses are considered at once, which is intractable ev en for relati vely small v alues of n . Howe v er , in case probabilities of diagnoses are known we can reduce the number of queries by utilizing two e ﬀ ects: 1. W e can exploit diagnoses probabilities to assess the likelihood of each answer and the expected value of the information contained in the set of diagnoses after an answer is giv en. 2. Even if multiple diagnoses remain, further query generation may not be required if one diagnosis is highly probable and all other remaining diagnoses are highly improbable. Example 2. Consider an ontolo gy O with the terminol- ogy T : ax 1 : A 1 v A 2 u M 1 u M 2 ax 4 : M 2 v ∀ s . A u D ax 2 : A 2 v ¬∃ s . M 3 u ∃ s . M 2 ax 5 : M 3 ≡ B t C ax 3 : M 1 v ¬ A u B and the backgr ound theory containing the assertions A : { A 1 ( w ) , A 1 ( u ) , s ( u , w ) } . The ontology is inconsistent and the set of minimal conﬂict sets C S = { h ax 1 , ax 3 , ax 4 i , h ax 1 , ax 2 , ax 3 , ax 5 i } . T o restore consistency , the user should modify all ax- ioms of at least one minimal diagnosis: D 1 : [ ax 1 ] D 3 : [ ax 4 , ax 5 ] D 2 : [ ax 3 ] D 4 : [ ax 4 , ax 2 ] Follo wing the same approach as in the ﬁrst example, we compute a set of possible queries and corresponding partitions using the algorithm presented above. A set of possible irreducible queries for Example 2 and their partitions are presented in T able 3. These queries par- tition the set of diagnoses D in a way that makes the application of myopic strategies, such as “split-in-half ”, ine ﬃ cient. A greedy algorithm based on such a heuris- tic would ﬁrst select the ﬁrst query Q 1 , since there is no query that cuts the set of diagnoses in half. If D 4 is the target diagnosis then Q 1 will be answered with ye s by an oracle (see Figure 1). In the ne xt iteration the algorithm would also choose a suboptimal query , the ﬁrst untried query Q 2 , since there is no partition that divides the di- agnoses D 1 , D 2 , and D 4 into two groups of equal size. Once again, the oracle answers ye s , and the algorithm identiﬁes query Q 4 to di ﬀ erentiate between D 1 and D 4 . Howe v er , in real-world settings the assumption that all axioms fail with the same probability is rarely the case. F or example, Roussey et al. [6] present a list of “anti-patterns” where an anti-pattern is a set of axioms, such as { C 1 v ∀ R . C 2 , C 1 v ∀ R . C 3 , C 2 ≡ ¬ C 3 } that 8 Query D P D N D ∅ Q 1 : { B ( w ) } {D 2 , D 3 , D 4 } {D 1 } ∅ Q 2 : { B ( w ) , C ( w ) } {D 3 , D 4 } {D 1 , D 2 } ∅ Q 3 : { B ( w ) , C ( w ) , Q ( w ) } {D 4 } {D 1 , D 2 , D 3 } ∅ T able 2: Possible queries in Example 1 Query D P D N D ∅ Q 1 : { B v M 3 } {D 1 , D 2 , D 4 } {D 3 } ∅ Q 2 : { B ( w ) } {D 3 , D 4 } {D 2 } {D 1 } Q 3 : { M 1 v B } {D 1 , D 3 , D 4 } {D 2 } ∅ Q 4 : { M 1 ( w ) , M 2 ( u ) } {D 2 , D 3 , D 4 } {D 1 } ∅ Q 5 : { A ( w ) } {D 2 } {D 3 , D 4 } {D 1 } Q 6 : { M 2 v D } {D 1 , D 2 } ∅ {D 3 , D 4 } Q 7 : { M 3 ( u ) } {D 4 } ∅ {D 1 , D 2 , D 3 } T able 3: Possible queries in Example 2 corresponds to a minimal conﬂict set. The study per- formed by [6] shows that such conﬂict sets often occur in practice due to frequent misuse of certain language constructs like quantiﬁcation or disjointness. Such stud- ies are ideal sources for estimating prior fault probabil- ities. Ho wev er , this is beyond the scope of this paper . Our approach for computing the prior fault probabili- ties of axioms is inspired by Rector et al. [5] and consid- ers the syntax of a knowledge representation language, such as restrictions, conjunction, negation, etc. For in- stance, if a user frequently changes the univ ersal to the existential quantiﬁer and vice versa in order to restore coherency , then we can assume that axioms including such restrictions are more likely to fail than the other ones. In [5] the authors report that in most cases incon- sistent ontologies are created because users (a) mix up ∀ r . S and ∃ r . S , (b) mix up ¬∃ r . S and ∃ r . ¬ S , (c) mix up t and u , (d) wrongly assume that classes are disjoint by default or overuse disjointness, or (e) wrongly ap- ply negation. Observing that misuses of quantiﬁers are more likely than other failure patterns one might ﬁnd that the axioms ax 2 and ax 4 are more likely to be faulty {D 4 } {D 1 } {D 1 } {D 2 } {D 1 , D 4 } : Q 4 {D 1 , D 2 } : Q 3 {D 1 , D 2 , D 4 } : Q 2 {D 3 } {D 1 , D 2 , D 3 , D 4 } : Q 1 ye s z z no $ $ ye s z z no $ $ ye s u u no $ $ no $ $ ye s z z Figure 1: The search tree of the greedy algorithm than ax 3 (because of the use of quantiﬁers), whereas ax 3 is more likely to be faulty than ax 5 and ax 1 (because of the use of negation). Detailed justiﬁcations of diagnoses probabilities are giv en in the next section. Howe ver , let us assume some probability distribution of the faults according to the ob- servations presented above such that: (a) the diagnosis D 2 is the most probable one, i.e. single fault diagnosis of an axiom containing a negation; (b) although D 4 is a double fault diagnosis, it follo ws D 2 closely as its ax- ioms contain quantiﬁers; (c) D 1 and D 3 are signiﬁcantly less probable than D 4 because conjunction / disjunction in ax 1 and ax 5 hav e a signiﬁcantly lower fault proba- bility than negation in ax 3 . T aking this information into account asking query Q 1 is essentially useless because it is highly probable that the target diagnosis is either D 2 or D 4 and, therefore, it is highly probable that the oracle will respond with ye s . Instead, asking Q 3 is more infor - mativ e because regardless of the answer we can e xclude one of the highly probable diagnoses, i.e. either D 2 or D 4 . If the oracle responds to Q 3 with no then D 2 is the only remaining diagnosis. Howe ver , if the oracle responds with ye s , diagnoses D 4 , D 3 , and D 1 remain, where D 4 is signiﬁcantly more probable compared to diagnoses D 3 and D 1 . If the di ﬀ erence between the probabilities of the diagnoses is high enough such that D 4 can be accepted as the target diagnosis, no additional questions are required. Obviously this strate gy can lead to a substantial reduction in the number of queries com- pared to myopic approaches as we demonstrate in our ev aluation. Note that in real-world application scenarios failure patterns and their probabilities can be discov ered by an- alyzing the debugging actions of a user in an ontology 9 editor , like Prot ´ eg ´ e. Learning of fault probabilities can be used to “personalize” the query selection algorithm to prefer user-speciﬁc faults. Howe v er , as our ev alua- tion sho ws, ev en a rough estimate of the probabilities is capable of outperforming the “split-in-half ” heuristic. 3. Entropy-based query selection T o select the best query we exploit a-priori failure probabilities of each axiom deriv ed from the syntax of description logics or some other knowledge representa- tion language, such as OWL. That is, the user is able to specify o wn beliefs in terms of the probability of syntax element such as ∀ , ∃ , u , etc. being erroneous; alter- nativ ely , the debugger can compute these probabilities by analyzing the frequency of v arious syntax elements in the target diagnoses of di ﬀ erent debugging sessions. If no failure information is av ailable then the debugger can initialize all of the probabilities with some small value. Compared to statistically well-founded proba- bilities, the latter approach provides a suboptimal but useful diagnosis discrimination process, as discussed in the ev aluation. Giv en the failure probabilities of all syntax elements se ∈ S of a knowledge representation language used in O , we can compute the failure probability of an axiom ta x i ∈ O p ( ax i ) = p ( F se 1 ∪ F se 2 ∪ · · · ∪ F se n ) where F se 1 . . . F se n represent the ev ents that the occur- rence of a syntax element se j in ax i is faulty . E.g. for a x 2 of Example 2 p ( a x 2 ) = p ( F v ∪ F ¬ ∪ F ∃ ∪ F u ∪ F ∃ ). Assuming that each occurrence of a syntax element fails independently , i.e. an erroneous usage of a syntax ele- ment se k makes it neither more nor less probable that an occurrence of syntax element se j is f aulty , the failure probability of an axiom is computed as: p ( ax i ) = 1 − Y se ∈ S (1 − F se ) c ( se ) (1) where c ( se j ) returns number of occurrences of the syn- tax element se j in an axiom ax i . If among other failure probabilities the user states that p ( F v ) = 0 . 001 , p ( F ¬ ) = 0 . 01 , p ( F ∃ ) = 0 . 05 and p ( F u ) = 0 . 001 then p ( ax 2 ) = p ( F v ∪ F ¬ ∪ F ∃ ∪ F u ∪ F ∃ ) = 0 . 108. Giv en the failure probabilities p ( ax i ) of axioms, the diagnosis algorithm ﬁrst calculates the a-priori proba- bility p ( D j ) that D j is the target diagnosis. Since all axioms fail independently , this probability can be com- puted as [18]: p ( D j ) = Y ax n ∈D j p ( ax n ) Y ax m ∈O\D j 1 − p ( ax m ) (2) The prior probabilities for diagnoses are then used to initialize an iterative algorithm that includes two main steps: (a) the selection of the best query and (b) updat- ing the diagnoses probabilities giv en query feedback. According to information theory the best query is the one that, given the answer of an oracle, minimizes the expected entropy of the set of diagnoses [18]. Let p ( Q i = ye s ) be the probability that query Q i is answered with ye s and p ( Q i = no ) be the probability for the an- swer no . Furthermore, let p ( D j | Q i = ye s ) be the prob- ability of diagnosis D j after the oracle answers ye s and p ( D j | Q i = no ) be the probability after the oracle an- swers no . The expected entrop y after querying Q i is: H e ( Q i ) = X v ∈ { ye s , no } p ( Q i = v ) × − X D j ∈ D p ( D j | Q i = v ) log 2 p ( D j | Q i = v ) Based on a one-step-look-ahead information theo- retic measure, the query which minimizes the expected entropy is considered best. This formula can be simpli- ﬁed to the following score function [18] which we use to e valuate all av ailable queries and select the one with the minimum score to maximize information gain: sc ( Q i ) = X v ∈ { ye s , no }  p ( Q i = v ) log 2 p ( Q i = v )  + p ( D ∅ i ) + 1 (3) where v ∈ { ye s , no } is a feedback of an oracle and D ∅ i is the set of diagnoses which do not make any predictions for the query Q i . The probability of the set of diagnoses p ( D ∅ i ) as well as of any other set of diagnoses D i like D P i and D N i is computed as: p ( D i ) = X D j ∈ D i p ( D j ) because by Deﬁnition 2, each diagnosis uniquely parti- tions all of the axioms of an ontology O into two sets, correct and faulty , and thus all diagnoses are mutually exclusi ve e vents. Since, for a query Q i , the set of diagnoses D can be partitioned into the sets D P i , D N i and D ∅ i , the probability that an oracle will answer a query Q i with either ye s or no can be computed as: p ( Q i = ye s ) = p ( D P i ) + p ( D ∅ i ) / 2 p ( Q i = no ) = p ( D N i ) + p ( D ∅ i ) / 2 (4) Clearly this assumes that for each diagnosis of D ∅ i both outcomes ar e equally likely and thus the probabil- ity that the set of diagnoses D ∅ i predicts either Q i = ye s or Q i = no is p ( D ∅ i ) / 2. 10 Follo wing feedback v for a query Q s , i.e. Q s = v , the probabilities of the diagnoses must be updated to take the ne w information into account. The update is made using Bayes’ rule for each D j ∈ D : p ( D j | Q s = v ) = p ( Q s = v |D j ) p ( D j ) p ( Q s = v ) (5) where the denominator p ( Q s = v ) is known from the query selection step (Equation 4) and p ( D j ) is either a prior probability (Equation 2) or is a probability calcu- lated using Equation 5 after a previous iteration of the debugging algorithm. W e assign p ( Q s = v |D j ) as fol- lows: p ( Q s = v |D j ) =              1 , if D j predicted Q s = v ; 0 , if D j is rejected by Q s = v ; 1 2 , if D j ∈ D ∅ s Example 1 (continued) Suppose that the debugger is not provided with any information about possible fail- ures and therefore assumes that all syntax elements fail with the same probability 0 . 01 and therefore p ( ax i ) = 0 . 01 for all ax i ∈ O . Using Equation 2 we can cal- culate probabilities for each diagnosis. F or instance, D 1 suggests that only one axiom ax 1 should be modi- ﬁed by the user . Hence, we can calculate the probabil- ity of diagnosis D 1 as p ( D 1 ) = p ( ax 1 )(1 − p ( ax 2 ))(1 − p ( ax 3 ))(1 − p ( ax 4 )) = 0 . 0097. All other minimal diag- noses have the same probability , since every other min- imal diagnosis suggests the modiﬁcation of one axiom. T o simplify the discussion we only consider minimal di- agnoses for query selection. Therefore, the prior prob- abilities of the diagnoses can be normalized to p ( D j ) = p ( D j ) / P D j ∈ D p ( D j ) and are equal to 0 . 25. Giv en the prior probabilities of the diagnoses and a set of queries (see T able 2) we ev aluate the score func- tion (Equation 3) for each query . E.g. for the ﬁrst query Q 1 : { B ( w ) } the probability p ( D ∅ ) = 0 and the proba- bilities of both the positiv e and negati ve outcomes are: p ( Q 1 = 1) = p ( D 2 ) + p ( D 3 ) + p ( D 4 ) = 0 . 75 and p ( Q 1 = 0) = p ( D 1 ) = 0 . 25. Therefore the query score is sc ( Q 1 ) = 0 . 1887. The scores computed during the initial stage (see T a- ble 4) suggest that Q 2 is the best query . T aking into ac- count that D 1 is the target diagnosis the oracle answers no to the query . The additional information obtained from the answer is then used to update the probabilities of diagnoses using the Equation 5. Since D 1 and D 2 predicted this answer , their probabilities are updated, p ( D 1 ) = p ( D 2 ) = 1 / p ( Q 2 = 1) = 0 . 5. The probabili- ties of diagnoses D 3 and D 4 which are rejected by the oracle’ s answer are also updated, p ( D 3 ) = p ( D 4 ) = 0. Query Initial score Q 2 = ye s Q 1 : { B ( w ) } 0.1887 0 Q 2 : { C ( w ) } 0 1 Q 3 : { Q ( w ) } 0.1887 1 T able 4: Expected scores for minimized queries ( p ( ax i ) = 0 . 01) Query Initial score Q 1 : { B ( w ) } 0.250 Q 2 : { C ( w ) } 0.408 Q 3 : { Q ( w ) } 0.629 T able 5: Expected scores for minimized queries ( p ( ax 1 ) = 0 . 025, p ( ax 2 ) = p ( ax 3 ) = p ( ax 4 ) = 0 . 01) In the next iteration the algorithm recomputes the scores using the updated probabilities. The results show that Q 1 is the best query . The other two queries Q 2 and Q 3 are irrelev ant since no information will be gained if they are asked. Giv en the oracle’ s negati v e feed- back to Q 1 , we update the probabilities p ( D 1 ) = 1 and p ( D 2 ) = 0. In this case the target diagnosis D 1 was identiﬁed using the same number of steps as the “split- in-half ” heuristic. Howe v er , if the user speciﬁes that the ﬁrst axiom is more likely to f ail, e.g. p ( ax 1 ) = 0 . 025, then Q 1 : { B ( w ) } will be selected ﬁrst (see T able 5). The recalculation of the probabilities giv en the negati v e outcome Q 1 = 0 sets p ( D 1 ) = 1 and p ( D 2 ) = p ( D 3 ) = p ( D 4 ) = 0. Therefore the debugger identiﬁes the target diagnosis in only one step. Example 2 (continued) Suppose that in ax 4 the user speciﬁed ∀ s . A instead of ∃ s . A and ¬∃ s . M 3 instead of ∃ s . ¬ M 3 in ax 2 . Therefore D 4 is the target diagno- sis. Moreover , assume that the deb ugger is provided with observations of three types of faults: (1) conjunc- tion / disjunction occurs with probability p 1 = 0 . 001, (2) negation p 2 = 0 . 01, and (3) restrictions p 3 = 0 . 05. Using Equation 1 we can calculate the probability of the axioms containing an error: p ( ax 1 ) = 0 . 0019, p ( ax 2 ) = 0 . 1074, p ( ax 3 ) = 0 . 012, p ( ax 4 ) = 0 . 051, and p ( ax 5 ) = 0 . 001. These probabilities are exploited to calculate the prior probabilities of the diagnoses (see T able 6) and to initialize the query selection process. T o simplify matters we focus on the set of minimal diag- noses. In the ﬁrst iteration the algorithm determines that Q 3 is the best query and asks the oracle whether O t | = { M 1 v B } is true or not (see T able 7). The obtained in- formation is then used to recalculate the probabilities of the diagnoses and to compute the next best subsequent 11 query , i.e. Q 4 , and so on. The query process stops after the third query , since D 4 is the only diagnosis that has the probability p ( D 4 ) > 0. Giv en the feedback of the oracle Q 4 = ye s for the second query , the updated probabilities of the diag- noses show that the target diagnosis has a probability of p ( D 4 ) = 0 . 9918 whereas p ( D 3 ) is only 0 . 0082. In order to reduce the number of queries a user can specify a threshold, e.g. σ = 0 . 95. If the absolute di ﬀ erence in probabilities of two most probable diagnoses is greater than this threshold, the query process stops and returns the most probable diagnosis. Therefore, in this exam- ple the debugger based on the entropy query selection requires less queries than the “split-in-half ” heuristic. Note that already after the ﬁrst answer Q 3 = ye s the most probable diagnosis D 4 is three times more likely than the second most probable diagnosis D 1 . Gi ven such a great di ﬀ erence we could suggest to stop the query process after the ﬁrst answer if the user would set σ = 0 . 65. 4. Implementation details The iterativ e ontology debugger (Algorithm 1) takes a faulty ontology O as input. Optionally , a user can provide a set of axioms B that are kno wn to be cor - rect as well as a set P of axioms that must be entailed by the tar get ontology and a set N of axioms that must not. If these sets are not given, the corresponding input arguments are initialized with ∅ . Moreover , the algo- rithm takes a set F P of fault probabilities for axioms ax i ∈ O , which can be computed as described in Sec- tion 3 by exploiting knowledge about typical user er- rors. Alternati vely , if no estimates of such probabili- ties are available, all probability values can be initial- ized using a small constant. W e sho w the results of such a strategy in our ev aluation section. The two other ar- guments σ and n are used to improve the performance of the algorithm. σ speciﬁes the diagnosis acceptance threshold, i.e. the minimum di ﬀ erence in probabilities between the most likely and second-most likely diag- noses. The parameter n deﬁnes the maximum number of most probable diagnoses that should be considered by the algorithm during each iteration. A further per- formance gain in Algorithm 1 can be achiev ed if we ap- proximate the set of the n most probable diagnoses with the set of the n most probable minimal diagnoses, i.e. we neglect non-minimal diagnoses. W e call this set of at most n most probable minimal diagnoses the leading diagnoses . Note, under the reasonable assumption that the fault probability of each axiom p ( ax i ) is less than 0 . 5, for ev ery non-minimal diagnosis N D a minimal di- agnosis D ⊂ N D exists which from Equation 2 is more probable than N D . Consequently the query selection al- gorithm presented here operates on the set of minimal diagnoses instead of all diagnoses (i.e. non-minimal diagnoses are excluded). Howe v er , the algorithm can be adapted with moderate e ﬀ ort to also consider non- minimal diagnoses. W e use the approach proposed by Friedrich et al. [9] to compute diagnoses and employ the combination of two algorithms, Q uick X plain [19] and HS-T ree [17]. In a standard implementation the latter is a breadth-ﬁrst search algorithm that takes an ontology O , sets P and N , and the maximum number of most probable minimal diagnoses n as an input. The algorithm generates min- imal hitting sets using minimal conﬂict sets, which are computed on-demand. This is motiv ated by the fact that in some circumstances a subset of all minimal conﬂict sets is su ﬃ cient for generating a subset of all required minimal diagnoses. For instance, in Example 2 the user wants to compute only n = 2 leading minimal diagnoses and a minimal conﬂict search algorithm returns C S 1 . In this case HS-T ree identiﬁes two required minimal diag- noses D 1 and D 2 and av oiding the computation of the minimal conﬂict set C S 2 . Of course, in the worst case, when all minimal diagnoses ha ve to be computed the al- gorithm should compute all minimal conﬂict sets. In ad- dition, the HS-T ree generation reuses minimal conﬂict sets in order to avoid unnecessary computations. Thus, in the real-world scenarios we e valuated (see T able 8), less than 10 minimal conﬂict sets were contained in the faulty ontologies having at most 13 elements while the maximal cardinality of minimal diagnoses was observed to be at most 9. Therefore, space limitations were not a problem for the breadth-ﬁrst generation. Ho wev er , for scenarios in volving diagnoses of greater cardinali- ties iterativ e-deepening strategies could be applied. In our implementation of HS-T ree we use the uniform-cost search strategy . Gi ven additional infor- mation in terms of axiom fault probabilities F P , the algorithm expands a leaf node in a search-tree if it is an element of the path corresponding to the maximum probability hitting set of minimal conﬂict sets computed so far . The probability of each minimal hitting set can be computed using Equation 2. Consequently , the algo- rithm computes a set of diagnoses ordered by their prob- ability starting from the most probable one. HS-T ree terminates if either the n most probable minimal diag- noses are identiﬁed or no further minimal diagnoses can be found. Thus the algorithm computes at most n min- imal diagnoses regardless of the number of all minimal diagnoses. 12 Answers D 1 D 2 D 3 D 4 Prior 0.0970 0.5874 0.0026 0.3130 Q 3 = ye s 0.2352 0 0.0063 0.7585 Q 3 = ye s , Q 4 = ye s 0 0 0.0082 0.9918 Q 3 = ye s , Q 4 = ye s , Q 1 = ye s 0 0 0 1 T able 6: Probabilities of diagnoses after answers Queries Initial Q 3 = ye s Q 3 = ye s , Q 4 = ye s Q 1 : { B v M 3 } 0.974 0.945 0.931 Q 2 : { B ( w ) } 0.151 0.713 1 Q 3 : { M 1 v B } 0.022 1 1 Q 4 : { M 1 ( w ) , M 2 ( u ) } 0.540 0.213 1 Q 5 : { A ( w ) } 0.151 0.713 1 Q 6 : { M 2 v D } 0.686 0.805 1 Q 7 : { M 3 ( u ) } 0.759 0.710 0.970 T able 7: Expected scores for queries HS-T ree uses Q uick X plain to compute required min- imal conﬂicts. This algorithm, gi ven a set of axioms AX and a set of correct axioms B returns a minimal conﬂict set C S ⊆ A X , or ∅ if axioms A X ∪ B are consistent. In the w orst case, to compute a minimal conﬂict Q uick - X plain performs 2 k (log( s / k ) + 1) consistency checks, where k is the size of the generated minimal conﬂict set and s is the number of axioms in the ontology . In the best case only log( s / k ) + 2 k are performed [19]. Impor- tantly , the size of the ontology is contained in the log function. Therefore, the time needed for consistency checks in our test ontologies remained below 0 . 2 sec- onds, ev en for real world knowledge bases with thou- sands of axioms. The maximum time to compute a min- imal conﬂict was observed in the Sweet-JPL ontology and took approx. 5 seconds (see T able 9). In order to take past answers into account the HS- T ree updates the prior probabilities of the diagnoses by ev aluating Equation 5. All required data is stored in the query history Q H as well as in the sets P and N . When complete, HS-T ree returns a set of tuples of the form h D i , p ( D i ) i where D i is contained in the set of the n most probable minimal diagnoses (leading diagnoses) and p ( D i ) is its probability calculated using Equation 2 and Equation 5. In the query-selection phase Algorithm 1 calls se - lect Q uer y function (Algorithm 2) to generate a tuple T = D Q , D P , D N , D ∅ E , where Q is the minimum score query (Equation 3) and D P , D N and D ∅ the sets of di- agnoses constituting the partition. The generation algo- rithm carries out a depth-ﬁrst search, removing the top element of the set D and calling itself recursiv ely to gen- Algorithm 1: onto D ebugging ( O , B , P , N , F P , n , σ ) Input : ontology O , set of background axioms B , set of sets of logical sentences to be entailed P , set of sets of logical sentences not to be entailed N , set of fault probabilities for axioms F P , maximum number of most probable minimal diagnoses n , acceptance threshold σ Output : a diagnosis D 1 D P ← ∅ ; Q H ← ∅ ; T ← h ∅ , ∅ , ∅ , ∅ i ; 2 while belo w T hreshold ( D P , σ ) ∧ get S core ( T ) , 1 do 3 DP ← HS-T ree ( O , B , P , N , F P , Q H , n ); 4 T ← select Q uer y ( D P , O , B , P ); 5 Q ← get Q uer y ( T ); 6 if Q = ∅ then exit loop ; 7 if get A nswer ( O t | = Q ) then P ← P ∪ { Q } ; 8 else N ← N ∪ { Q } ; 9 QH ← Q H ∪ { T } ; 10 retur n most P r ob able D iagnosis ( D P ); erate all possible subsets of the leading diagnoses. The set of leading diagnoses D is extracted from the set of tuples D P by the get D iagnoses function. In each leaf node of the search tree the genera te function calls cre - a te Q uer y creates a query given a set of diagnoses D P by computing common entailments and partitioning the set of diagnoses D \ D P , as described in Section 2.3. If a query for the set D P does not exist (i.e. there are no common entailments) or D P = ∅ then crea te Q uer y re- turns an empty tuple T = h ∅ , ∅ , ∅ , ∅ i . In all inner nodes 13 Algorithm 2: select Q uer y ( D P , O , B , P ) Input : set D P of tuples h D i , p ( D i ) i , ontology O , set of background axioms B , set of sets of logical sentences that must be entailed by the target ontology P Output : a tuple D Q , D P , D N , D ∅ E 1 D ← get D iagnoses ( D P ); 2 T ← genera te ( ∅ , D , O , B , P , DP ); 3 retur n minimize Q uer y ( T ); 4 function genera te ( D P , D , O , B , P , D P ) retur ns a tuple D Q , D P , D N , D ∅ E 5 if D = ∅ then 6 D ← get D iagnoses ( D P ); 7 retur n crea te Q uer y ( D P , O , B , P , D ); 8 D ← pop ( D ); 9 left ← genera te ( D P , D , O , B , P , D P ); 10 r ight ← genera te ( D P ∪ { D } , D , O , B , P , D P ); 11 if get S core (left, DP) < get S core (right, DP) then retur n left ; 12 else retur n r ight ; of the tree the algorithm selects a tuple that corresponds to a query with the minimum score as found using the get S core function. This function may implement the entropy-based measure (Equation 3), “split-in-half ” or any other preference criteria. Giv en an empty tuple T = h ∅ , ∅ , ∅ , ∅ i the function returns the highest possi- ble score of a used measure. In general, crea te Q uer y is called 2 n times, where we set n = 9 in our ev alua- tion. Furthermore, for each leading diagnosis not in D P , crea te Q uer y has to check if the associated query is en- tailed. If a query is not entailed, a consistenc y check has to be performed. Entailments are determined by classi- ﬁcation / realization and a subset check of the generated sentences. Common entailments are computed by ex- ploiting the intersection of entailments for each diagno- sis contained in D P . Note that the entailments for each leading diagnosis are computed just once and reused in for subsequent calls of crea te Q uer y . In the function minimize Q uer y , the query Q of the re- sulting tuple D Q , D P , D N , D ∅ E is iteratively reduced by applying Q uick X plain such that sets D P , D N and D ∅ are preserved. This is implemented by replacing the con- sistency checks performed by Q uick X plain with checks that ensure that the reduction of the query preserves the partition. In order to check if a partition is preserved, a consistency / entailment check is performed for each ele- ment in D N and D ∅ . Elements of D P need not be checked because these elements entail the query and therefore any reduction. In the worst case n (2 k log( s / k ) + 2 k ) con- sistency checks hav e to be performed in minimize Q uer y where k is the length of the minimized query . Entail- ments of leading diagnoses are reused. Algorithm 1 in v okes the function get Q uer y to obtain the query from the tuple stored in T and calls get A n - swer to query the oracle. Depending on the answer, Al- gorithm 1 extends either the set P or the set N and thus excludes diagnoses not compliant with the query answer from the results of HS-T ree in further iterations. Note, the algorithm can be easily adapted to allow the oracle to reject a query if the answer is unknown. In this case the algorithm proceeds with the next best query (w .r .t. the get S core function) until no further queries are avail- able. Algorithm 1 stops if the di ﬀ erence in the probabili- ties of the top two diagnoses is greater than the accep- tance threshold σ or if no query can be used to di ﬀ eren- tiate between the remaining diagnoses (i.e. the score of the minimum score query equals to the maximum score of the used measure). The most probable diagnosis is then returned to the user . If it is impossible to di ﬀ er- entiate between a number of highly probable minimal diagnoses, the algorithm returns a set that includes all of them. Moreov er , in the ﬁrst case (termination due to σ ), the algorithm can continue if the user is not satis- ﬁed with the returned diagnosis and at least one further query exists. Additional performance improvements can be achiev ed by using greedy strategies in Algorithm 2. The idea is to guide the search such that a leaf node of the left-most branch of a search tree contains a set of di- agnoses D P that might result in a tuple D Q , D P , D N , D ∅ E with a low-score query . This method is based on the property of Equation 3 that sc ( Q ) = 0 if X D i ∈ D P p ( D i ) = X D j ∈ D N p ( D j ) = 0 . 5 and p ( D ∅ ) = 0 Consequently , the query selection problem can be pre- sented as a two-w ay number partitioning problem: giv en a set of numbers, divide them into two sets such that the di ﬀ erence between the sums of the numbers in each set is as small as possible. The Complete Karmarkar-Karp (CKK) algorithm [20], which is one of the best algorithms developed for the two-way par- titioning problem, corresponds to an extension of the Algorithm 2 with a set di ﬀ erencing heuristic [21]. The algorithm stops if the optimal solution to the two-way partitioning problem is found or if there are no further subsets to be inv estigated. In the latter case the best found solution is returned. 14 The main drawback of applying CKK to the query selection process is that none of the pruning techniques can be used. Also ev en if the algorithm ﬁnds an opti- mal solution to the two-way partitioning problem there just might be no query for a found set of diagnoses D P . Moreov er , since the algorithm is complete it still has to in vestigate all subsets of the set of diagnoses in order to ﬁnd the minimum score query . T o avoid this e xhaustiv e search we extended CKK with an additional termina- tion criterion: the search stops if a query is found with a score below some predeﬁned threshold γ . In our ev alu- ation section we demonstrate substantial savings by ap- plying the CKK partitioning algorithm. T o sum up, the proposed method depends on the e ﬃ - ciency of the classiﬁcation / realization system and con- sistency / coherenc y checks giv en a particular ontology . The number of calls to a reasoning system can be re- duced by decreasing the number of leading diagnoses n . Howe ver , the more leading diagnoses provide the more data for generating the next best query . Conse- quently , by v arying the number of leading diagnoses it is possible to balance runtime with the number of queries needed to isolate the target diagnosis. 5 5. Evaluation W e ev aluated our approach using the real-world on- tologies presented in T able 8 with the aim of demon- strating its applicability real-world settings. In addition, we emplo yed generated e xamples to perform controlled experiments where the number of minimal diagnoses and their cardinality could be varied to make the iden- tiﬁcation of the target diagnosis more di ﬃ cult. Finally , we carried out a set of tests using randomly modiﬁed large real-world ontologies to provide some insights on the scalability of the suggested debugging method. For the ﬁrst test we created a generator which takes a consistent and coherent ontology , a set of fault patterns together with their probabilities, the minimum number of minimum cardinality diagnoses m , and the required cardinality |D t | of these minimum cardinality diagnoses as inputs. W e also assumed that the tar get diagnosis has cardinality |D t | . The output of the generator is an alteration of the input ontology for which at least the giv en number of minimum cardinality diagnoses with the required cardinality exist. Furthermore, to intro- duce inconsistencies (incoherencies), the generator ap- 5 The source code as well as precompiled binaries can be down- loaded from http://rmbd.googlecode.com . The package also in- cludes a Prot ´ eg ´ e-plugin implementing the methods as described. plies fault patterns randomly to the input ontology de- pending on their probabilities. In this experiment we took ﬁv e fault patterns from a case study reported by Rector et al. [5] and assigned fault probabilities according to their observations of typ- ical user errors. Thus we assumed that in cases (a) and (b) (see Section 2.3), where an axiom includes some roles (i.e. property assertions), axiom descriptions are faulty with a probability of 0 . 025, in cases (c) and (d) 0 . 01 and in case (e) 0 . 001. In each iteration, the gener- ator randomly selected an axiom to be altered and ap- plied a fault pattern. Follo wing this, another axiom was selected using the concept taxonomy and altered corre- spondingly to introduce an inconsistency (incoherency). The fault patterns were randomly selected in each step using the probabilities provided abo ve. For instance, giv en the description of a randomly se- lected concept A and the fault pattern “misuse of nega- tion”, we added the construct u¬ X to the description of A , where X is a new concept name. Next, we randomly selected concepts B and S such that S v A and S v B and added u X to the description of B . During the gen- eration process, we applied the HS-T ree algorithm af- ter each introduction of an incoherency / inconsistenc y to control two parameters: the minimum number of mini- mal cardinality diagnoses in the ontology and their car- dinality . The generator continues to introduce incoher- ences / inconsistencies until the speciﬁed parameter v al- ues are reached. For instance, if the minimum number of minimum cardinality diagnoses is equal to m = 6 and their cardinality is |D t | = 4, then the generated ontology will include at least 6 diagnoses of cardinality 4 and pos- sibly some additional number of minimal diagnoses of higher cardinalities. The resulting faulty ontology as well as the fault pat- terns and their probabilities were inputs for the ontol- ogy debugger . The acceptance threshold σ was set to 0 . 95 and the number of most probable minimal diag- noses n was set to 9. In addition, one of the minimal diagnoses with the required cardinality was randomly selected as the target diagnosis. Note, the tar get ontol- ogy is not equal to the original ontology , but rather a corrected version of the altered one in which the faulty axioms were repaired by replacing them with their orig- inal (correct) v ersions according to the tar get diagnosis. The tests were performed using the ontologies bike2 to bike9, bcs3, galen and galen2 from Racer’ s benchmark suite 6 . 6 A vailable at http://www.racer- systems.com/products/ download/benchmark.phtml 15 Ontology DL Axioms #C / #P / #I #CS / min / max #D / min / max Domain 1. Chemical ALCH F ( D ) 144 48 / 20 / 0 6 / 5 / 6 6 / 1 / 3 Chemical elements 2. Koala ALCON ( D ) 44 21 / 5 / 6 3 / 4 / 4 10 / 1 / 3 T raining 3. Sweet-JPL ALCH OF ( D ) 2579 1537 / 121 / 50 1 / 13 / 13 13 / 1 / 1 Earthscience 4. miniT ambis ALCN 173 183 / 44 / 0 3 / 2 / 6 48 / 3 / 3 Biological science 5. Univ ersity SOIN ( D ) 49 30 / 12 / 4 4 / 3 / 5 90 / 3 / 4 T raining 6. Economy ALCH ( D ) 1781 339 / 53 / 482 8 / 3 / 4 864 / 4 / 8 Mid-le v el 7. Transportation ALCH ( D ) 1300 445 / 93 / 183 9 / 2 / 6 1782 / 6 / 9 Mid-level T able 8: Diagnosis results for several of the real-world ontologies presented in [8]. #C / #P / #I are the number of concepts, properties and individuals in each ontology . #CS / min / max are the number of conﬂict sets, and their minimum and maximum cardinality . The same notation is used for diagnoses #D / min / max. The ontologies are available upon request. 1 2 3 4 5 6 7 8 9 4 6 8 10 12 Random Split-in-half | D t |=2 | D t |=4 | D t |=8 Required number of minimum card inalit y diag noses in a f aulty ontol ogy Average number of queries Figure 2: A verage number of queries required to select the target diagnosis D t with threshold σ = 0 . 95. Random and “split-in-half ” are shown for the cardinality of minimal diagnoses |D t | = 2. The av erage results of the ev aluation performed on each test ontology (presented in Figure 2) sho w that the entropy-based approach outperforms the “split-in-half ” heuristic as well as the random query selection strat- egy by more than 50% for the |D t | = 2 case due to its ability to estimate the probabilities of diagnoses and to stop once the target diagnosis crossed the acceptance threshold. On av erage the algorithm required 8 sec- onds to generate a query . In addition, Figure 2 shows that the number of queries required increases as the car- dinality of the target diagnosis increases, regardless of the method. Despite this, the entropy-based approach remains better than the “split-in-half ” method for di- agnoses with increasing cardinality . The approach did howe v er require more queries to discriminate between high cardinality diagnoses because in such cases more minimal conﬂicts were generated. Consequently , the debugger should consider more minimal diagnoses in order to identify the target one. For the next test we selected seven real-world ontolo- gies described in T ables 8 and 9 7 . Performance of both the entropy-based and “split-in-half ” selection strate- gies was e valuated using a v ariety of di ﬀ erent prior fault probabilities to inv estigate under which conditions the entropy-based method should be preferred. In our experiments we distinguished between three di ﬀ erent distributions of prior fault probabilities: ex- treme, moderate and uniform (see Figure 3 for an ex- ample). The extr eme distribution simulates a situation in which very high failure probabilities are assigned to a small number of syntax elements. That is, the provider of the estimates is quite sure that exactly these ele- ments are causing a fault. For instance, it may be well known that a user has problems formulating restrictions in O WL whereas all other elements, such as subsump- tion and conjunction, are well understood. In the case of a moderate distrib ution the estimates pro vide a slight bias tow ards some syntax elements. This distribution has the same motiv ation as the extreme one, howe ver , 7 All experiments were performed on a PC with Core2 Duo (E8400), 3 Ghz with 8 Gb RAM, running W indows 7 and Ja va 6. 16 Leading diagnoses All diagnoses Ontology Consistency Conﬂicts Diagnoses Consistenc y Conﬂicts Diagnoses Chemical time 0 / 3 / 8 90 / 107 / 128 1 / 97 / 326 0 / 3 / 18 105 / 130 / 179 2 / 126 / 402 calls 264 6 8 262 6 7 runtime: 723 runtime: 892 K oala time 0 / 1 / 3 19 / 25 / 30 0 / 11 / 70 0 / 2 / 4 24 / 30 / 37 0 / 12 / 105 calls 74 3 10 75 3 11 runtime: 120 runtime: 148 Sweet-JPL time 1 / 31 / 112 5185 / 5185 / 5185 0 / 586 / 5332 31 / 106 / 195 5192 / 5192 / 5192 1 / 438 / 5319 calls 187 1 10 195 1 14 runtime: 5991 runtime: 6312 miniT ambis time 0 / 5 / 14 84 / 157 / 210 0 / 57 / 504 1 / 5 / 15 88 / 167 / 225 3 / 19 / 537 calls 111 3 10 189 3 49 runtime: 586 runtime: 1027 Univ ersity time 0 / 2 / 3 31 / 41 / 54 0 / 20 / 157 0 / 2 / 5 37 / 46 / 60 2 / 5 / 200 calls 126 4 10 283 4 91 runtime: 205 runtime: 536 Economy time 1 / 12 / 26 410 / 460 / 569 0 / 282 / 2085 1 / 9 / 80 418 / 510 / 681 16 / 25 / 1929 calls 239 6 10 2064 8 865 runtime: 2857 runtime: 25369 T ransportaton time 0 / 11 / 58 237 / 438 / 683 0 / 352 / 3176 1 / 9 / 130 222 / 429 / 636 16 / 29 / 6394 calls 337 7 10 3966 9 1783 runtime: 3671 runtime: 65010 T able 9: Min / avg / max time and calls required to compute the nine leading most probable diagnoses as well as all diagnoses for the real-world ontologies. V alues are given for each stage, i.e. consistency checking, computation of minimal conﬂicts and minimal diagnoses, together with the total runtime needed to compute the diagnoses. All time values are 15 trial averages and are gi ven in milliseconds. 0,5 4,4 4,31369251 0 0,5 0 0,5 0,11590998 1 0,30326533 0,22727273 0,44629124 0,10345921 2 0,18393972 0,45454545 0,39835173 0,09234588 3 0,11156508 0,68181818 0,35556178 0,08242631 4 0,06766764 0,90909091 0,31736821 0,07357228 5 0,0410425 1,13636364 0,2832773 0,06566933 6 0,02489353 1,36363636 0,25284835 0,05861529 7 0,01509869 1,59090909 0,22568801 0,05231898 8 0,00915782 1,81818182 0,20144516 0,04669901 9 0,0055545 2,04545455 0,17980642 0,04168272 10 0,00336897 2,27272727 0,16049206 0,03720526 11 0,00204339 2,5 0,1432524 0,03320876 12 0,00123938 2,72727273 0,12786458 0,02964156 13 0,00075172 2,95454545 0,11412968 0,02645754 14 0,00045594 3,18181818 0,10187015 0,02361553 15 0,00027654 3,40909091 0,09092751 0,02107881 16 0,00016773 3,63636364 0,08116031 0,01881458 17 0,00010173 3,86363636 0,07244227 0,01679356 18 6,1705E ‐ 05 4,09090909 0,0646607 0,01498964 19 3,7426E ‐ 05 4,31818182 0,057715 0,01337949 20 2,27E ‐ 05 4,54545455 0,0515154 0,0119423 21 1,3768E ‐ 05 4,77272727 0,04598174 0,01065949 22 8,3509E ‐ 06 5 0,0410425 0,00951447 0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4 123456789 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 Probability Number  of  a  syntax  element Extreme Moderate Uniform Distribution: Figure 3: Example of prior f ault probabilities of syntax elements sam- pled from extreme, moderate and uniform distrib utions. in this case the probability estimator is less sure about the sources of possible errors in axioms. Both extreme and moderate distributions correspond to the exponen- tial distribution with λ = 1 . 75 and λ = 0 . 5 respectiv ely . The uniform distribution models the situation where no prior fault probabilities are provided and the system as- signs equal probabilities to all syntax elements found in a faulty ontology . Of course the prior probabilities of diagnoses may not reﬂect the actual situation. There- fore, for each of the three distributions we di ﬀ erentiate between good, av erage and bad cases. In the good case the estimates of the prior fault probabilities are correct and the target diagnosis is assigned a high probability . The averag e case corresponds to the situation when the target diagnosis is neither fav ored nor penalized by the priors. In the bad case the prior distribution is unrea- sonable and disfav ors the target diagnosis by assigning it a low probability . W e executed 30 tests for each of the combinations of the distributions and cases with an acceptance thresh- old σ = 0 . 85 and a required number of most probable minimal diagnoses n = 9. Each iteration started with the generation of a set of prior fault probabilities of syn- tax elements by sampling from a selected distribution (extreme, moderate or uniform). Given the priors we computed the set of all minimal diagnoses D of a gi ven ontology and selected the target one according to the chosen case (good, average or bad). In the good case the prior probabilities fav or the target diagnosis and, there- fore, it should be selected from the diagnoses with high probability . The set of diagnoses was ordered according 17 to their probabilities and the algorithm iterated through the set starting from the most probable element. In the ﬁrst iteration the most probable minimal diagnosis D 1 is added to the set G . In next iteration j a diagnosis D j was added to the set G if P i ≤ j p ( D i ) ≤ 1 3 and to the set A if P i ≤ j p ( D i ) ≤ 2 3 . The obtained set G contained all most probable diagnoses which we considered as good. All diagnoses in the set A \ G were classiﬁed as a verage and the remaining diagnoses D \ A as bad. Depending on the selected case we randomly selected one of the diagnoses as the target from the appropriate set. The results of the ev aluation presented in T able 10 show that the entropy-based query selection approach clearly outperforms “split-in-half ” in good and av erage cases for the three probability distributions. The average time required by the deb ugger to perform such basic op- erations as consistency checking, computation of min- imal conﬂicts and diagnoses is presented in T able 11. The results indicate that on average at most 17 seconds required to compute up to 9 minimal diagnoses and a query . Moreov er , the number of axioms in a query re- mains reasonable in most of the cases stays bounds, i.e. between 1 and 4 axioms per query . In the uniform case better results were observed since the diagnoses have di ﬀ erent cardinality and structure, i.e. they include di ﬀ erent syntax elements. Conse- quently , ev en if equal probabilities for all syntax ele- ments (uniform distrib ution) are gi ven, the probabilities of diagnoses are di ﬀ erent. Axioms with a greater num- ber of syntax elements receiv e a higher fault probability . Also, diagnoses with a smaller cardinality in many cases receiv e a higher probability . This information provides enough bias to fa vor the entropy-based method. In the bad case, where the target diagnosis received a low probability and no information regarding the prior fault probabilities w as giv en, we observed that the performance of the entropy-method improved as more queries were posed. In particular, in the University on- tology the performance is essentially similar (7.27 vs. 7.37) whereas in the Economy and T ransportation on- tology the entropy-based method can save and av erage of two queries. “Split-in-half ” appears to be particularly ine ﬃ cient in all good, av erage and bad cases when applied to ontolo- gies with a large number of minimal diagnoses, such as Economy and T ransportation. The main problem is that no stop criteria can be used with the greedy method as it is unable to provide any ordering on the set of diag- noses. Instead, the method continues until no further queries can be generated, i.e. only one minimal diagno- sis exists or there are no discriminating queries. Con- 000 max  win  Q 0,15 0,03 0,19 max  loss  Q 0,37 0,14 0,38 max  win  T 32% 34% 37% max  loss  T 33% 38% 35% Avg ‐ 0,11 ‐ 0,03 ‐ 0,07 Max ‐ 0,04 0,06 0,26 Min ‐ 0,26 ‐ 0,10 ‐ 0,31 ‐ 100% ‐ 80% ‐ 60% ‐ 40% ‐ 20% 0% 20% 40% 60% 80% 100% ‐ 1,00 ‐ 0,80 ‐ 0,60 ‐ 0,40 ‐ 0,20 0,00 0,20 0,40 0,60 0,80 1,00 Good Ave rage Bad Average  gain  of  time Average  gain  of  queries Average  number  of  quer ies Average  time Figure 4: A v erage time / query gain resulting from the application of the extended CKK partitioning algorithm. The whiskers indicate the maximum and minimum possible average gain of queries / time using extended CKK. versely , the entropy-based method is able to improv e its probability estimates using Bayes-updates as more queries are answered and to exploit the di ﬀ erences in the probabilities in order to decide when to stop. The most signiﬁcant gains are achiev ed for ontolo- gies with many minimal diagnoses and for the av erage and good cases, e.g. the target diagnosis is within the ﬁrst or second third of the minimal diagnoses ranked by their prior probability . In these cases the entropy-based method can sav e up to 60% of the queries. Therefore, we can conclude that e v en rough estimates of the prior fault probabilities are su ﬃ cient, provided that the target diagnosis is not signiﬁcantly penalized. Even if no fault probabilities are a vailable and there are many minimal diagnoses, the entropy-based method is advantageous. The di ﬀ erences between probabilities of individual syntax elements appears not to inﬂuence the results of the query selection process and a ﬀ ect only the number of outliers, i.e. cases in which the diagnosis ap- proach required either few or man y queries compared to the av erage. Another interesting observation is that often both methods eliminated more than n diagnoses in one iter- ation. For instance, in the case of the Transportation ontology both methods were able to remove hundreds of minimal diagnoses with a small number of queries. This behavior appears to stem from relations between the diagnoses. That is, the addition of a query to ei- ther P or N allows the method to remov e not only the diagnoses in sets D P or D N , but also some unobserved diagnoses that were not in any of the sets of n leading di- agnoses computed by HS-T ree . Giv en the sets P and N , HS-T ree automatically in validates all diagnoses which do not fulﬁll the requirements (see Deﬁnition 2). The extended CKK method presented in Section 4 was ev aluated in the same settings as the complete Al- 18 Entropy-based query selection Ontology Case Distribution Extreme Moderate Uniform min avg max min avg max min avg max Good 1 1.63 3 1 1.7 2 1 1.83 2 Chemical A vg. 1 1.87 4 1 1.73 3 1 1.7 2 Bad 2 3.03 4 2 3.03 4 2 3.17 4 Good 1 1.7 3 1 2.4 4 1 2.67 3 K oala A vg. 1 1.8 3 1 2.37 4 1 2.4 3 Bad 1 3.5 6 2 4.33 7 3 4.13 5 Good 1 3.27 7 2 3.43 7 3 3.87 7 Sweet-JPL A vg. 1 3.5 6 1 4.03 7 3 4.07 6 Bad 3 3.93 6 2 4.03 6 3 3.37 4 Good 1 2.37 4 2 2.73 4 2 2.77 3 miniT ambis A vg. 1 2.53 4 2 4.03 8 3 4.53 7 Bad 3 6.43 11 3 7.93 17 5 9.03 13 Good 1 2.7 4 3 3.83 7 3 4.4 8 Univ ersity A vg. 1 3.4 6 3 7.03 12 4 7.27 10 Bad 5 9.13 15 5 9.7 14 6 10.03 14 Good 1 3.2 11 3 3.1 4 3 3.93 6 Economy A vg. 1 4.63 14 3 5.57 12 5 6.5 8 Bad 8 12.3 19 6 11.5 21 7 11.67 19 Good 1 5.63 14 1 6.97 12 3 9.5 14 T ransportation A vg. 1 6.9 16 1 7.73 12 3 8.73 14 Bad 3 12.4 18 8 12.8 20 3 12.1 18 “Split-in-half ” query selection Good 2 2.63 3 2 2.7 3 2 2.53 3 Chemical A vg. 2 2.63 3 2 2.67 3 2 2.77 3 Bad 2 2.63 3 2 2.6 3 2 2.4 3 Good 3 3.3 4 3 3.3 4 3 3.47 4 K oala A vg. 3 3.33 4 3 3.2 4 3 3.23 4 Bad 3 3.43 4 3 3.4 4 3 3.5 4 Good 3 3.83 4 3 3.8 4 4 4 4 Sweet-JPL A vg. 3 3.57 4 3 3.8 4 3 3.47 4 Bad 3 3.87 4 3 3.8 4 3 3.8 4 Good 4 5.33 6 4 5 6 4 4 4 miniT ambis A vg. 4 5.1 6 4 4.93 7 5 5.43 7 Bad 5 5.93 8 4 5.8 7 5 6.3 7 Good 4 5.93 8 4 6 8 4 5.43 8 Univ ersity A vg. 4 5.87 7 5 6.73 9 6 7.37 8 Bad 5 6.97 9 5 7.2 9 5 7 8 Good 6 7.87 11 6 7.4 10 6 7.5 10 Economy A vg. 6 8 12 5 7.63 12 6 8.73 13 Bad 9 11.50 14 6 11.1 14 8 11.3 15 Good 5 8.03 13 5 7.3 11 6 11.43 18 T ransportation A vg. 3 9 16 5 9.4 13 5 11.43 18 Bad 10 12.67 19 7 13 19 6 13.8 20 T able 10: Minimum, a verage and maximum number of queries required by the entropy-based and “split-in-half ” query selection methods to identify the target diagnosis in real-world ontologies. Ontologies are ordered by the number of diagnoses. 19 Ontology Good A verage Bad DT QT QL DT QT QL DT QT QL Chemical 459.33 117.67 3 461.33 121 3.34 256.67 75.67 2.19 K oala 88.33 1308.33 3.47 92 1568.67 3.90 56.33 869.33 2.36 Sweet-JPL 2387.33 691.67 1.48 2272 926 1.61 2103 1240.33 1.57 miniT abmis 481.33 2764.33 3.27 398.33 2892 2.53 238.67 3223 1.76 Univ ersity 189.33 822.67 3.91 145 903.33 2.82 113 872 2.11 Economy 2953.33 6927 3.06 3239 8789 3.80 3083 8424.67 1.58 T ransportation 6577.33 9426.33 2.37 7080.67 10135.33 2.29 7186.67 9599.67 1.64 T able 11: A v erage time required to compute at most nine minimal diagnoses (DT) and a query (QT) in each iteration, as well as the average number of axioms in a query after minimization (QL). The averages are shown for extreme, moderate and uniform distributions using the entropy-based query selection method. Time is measured in milliseconds. gorithm 2 with acceptance threshold γ = 0 . 1. The ob- tained results presented in Figure 4 show that the ex- tended CKK method decreases the length of a debug- ging session by at least 60% while requiring on aver - age 0 . 1 queries more than Algorithm 2. In some cases (mostly for the uniform distribution) the debugger us- ing CKK search required even fewer queries than Algo- rithm 2 because of the inherent uncertainty of the do- main. The plot of the average time required by Algo- rithm 2 and CKK to identify the target diagnosis pre- sented in Figure 5 sho ws that the application of the latter can reduce runtime signiﬁcantly . In the last experiment we tried to simulate an expert dev eloping large real-world ontologies 8 as described in T able 12. Often in such settings an expert makes small changes to the ontology and then runs the reasoner to verify that the changes are valid, i.e. the ontology is consistent and its entailments are correct. T o simulate this scenario we used the generator described in the ﬁrst experiment to introduce 1 to 3 random changes that would make the ontology incoherent. Then, for each modiﬁed ontology , we performed 15 tests using the fault distributions as in the second test. The results obtained by the entropy-based query selection method using CKK for query computation are presented in T a- ble 13. These results show that the method can be used for analysis of large ontologies with o ver 33000 axioms while requiring a user to w ait for only a minute to com- pute the next query . 6. Related work Despite the range of ontology diagnosis methods av ailable (see [7, 8, 9]), to the best of our knowl- edge no interacti ve ontology debugging methods, such 8 The ontologies taken from TONES repository http://owl.cs.manchester.ac.uk/repository as our “split-in-half” or entropy-based methods, ha ve been proposed so far . The idea of ranking of diagnoses and proposing a target diagnosis is presented in [11]. This method uses a number of measures such as: (a) the frequency with which an axiom appears in conﬂict sets, (b) impact on an ontology in terms of its “lost” en- tailments when an axiom is modiﬁed or removed, (c) ranking of test cases, (d) provenance information about axioms, and (e) syntactic rele vance. For each axiom in a conﬂict set, these measures are ev aluated and com- bined to produce a rank value. These ranks are then used by a modiﬁed HS-T ree algorithm to identify diag- noses with a minimal rank. Howe ver , the method fails when a target diagnosis cannot be determined reliably with the gi ven a-priori kno wledge. In our work required information is acquired until the tar get diagnosis can be identiﬁed with conﬁdence. In general, the work of [11] can be combined with the ideas presented in this paper as axiom ranks can be taken into account together with other observ ations for calculating the prior probabilities of the diagnoses. The idea of selecting the next best query based on the expected entropy was exploited in the generation of decisions trees in [22] and further reﬁned for selecting measurements in the model-based diagnosis of circuits in [18]. W e extend these methods to query selection in the domain of ontology debugging. In the area of debugging logic programs, Shapiro [23] dev eloped debugging methods based on query answer- ing. Roughly speaking, Shapiro’ s method aims to de- tect one fault at a time by querying an oracle about the intended behavior of a Prolog program at hand. In our terminology , for each answer that must not be en- tailed this diagnosis approach generates one conﬂict at a time by exploiting the proof tree of a Prolog program. The method then identiﬁes a query that splits the con- ﬂict in half. Our approach can deal with multiple di- agnoses and conﬂicts simultaneously which can be ex- 20 Ontology Cton Opengalen-no-propchains Axioms 33203 9664 DL SH F ALEH IF ( D ) #CS / min / max 6 / 3 / 7 9 / 5 / 8 #D / min / max 15 / 1 / 5 110 / 2 / 6 Consistency 5 / 209 / 1078 1 / 98 / 471 QuickXplain 17565 / 20312 / 38594 7634 / 10175 / 12622 Diagnosis 1 / 5285 / 38594 10 / 1043 / 19543 Overall runtime 146186 119973 T able 12: Statistics for the real-world ontologies used in the stress-tests measured for a single random alteration. #CS / min / max are the num- ber of minimal conﬂict sets, and their minimum and maximum cardinality . The same notation is used for diagnoses #D / min / max. The mini- mum / av erage / maximum time required to make a consistency check (Consistency), compute a minimal conﬂict set (QuickXplain) and a minimal diagnosis are measured in milliseconds. Overall runtime indicates the time required to compute all minimal diagnoses in milliseconds. Good Ontology #Query Overall QT DT QL Cton 3 176828 6918 52237 4 Opengalen-no-propchains 8 154145 2349 22905 4 A verage Cton 4 177383 6583 52586 3 Opengalen-no-propchains 7 151048 3752 21344 4 Bad Cton 5 190407 5742 35608 1 Opengalen-no-propchains 14 177728 1991 11319 3 T able 13: A verage values measured for extreme, moderate and uniform distributions in each of the good, average and bad cases. #Query is the number of queries required to ﬁnd the target diagnosis. Overall runtime as well as the time required to compute a query (QT) and at least nine minimal diagnoses (DT) are giv en in milliseconds. Query length (QL) shows the a verage number of axioms in a query . ploited by query generation strategies such as “split-in- half” and entropy-based methods. Whereas the “split- in-half” strategy splits the set of diagnoses in half, Shapiros’ s method focuses on one conﬂict. Further- more, the exploitation of f ailure probabilities is not con- sidered in [23]. Howe ver , Shapiro’ s method includes the learning of new clauses in order to cover not entailed answers. Interlea ving discrimination of diagnoses and learning of descriptions is currently not considered in our approach because of their additional computational costs. From a general point of view Shapiro’ s method can be seen as a prominent example of inductive logic pro- gramming (ILP) including systems such as [24, 25]. In particular , [25] proposes inv erse entailments combined with general to speciﬁc search through a reﬁnement graph with the goal of generating a theory (hypothesis) which cov ers the examples and fulﬁlls additional prop- erties. Compared to ILP , the focus of our work lies on the theory revision. Howev er , our knowledge represen- tation languages are variants of description logics and not logic programs. Moreover , our method aims to dis- cov er axioms which must be changed while minimiz- ing user interaction. Preferences of theory changes are expressed by probabilities which are updated through Bayes’ rule. Other preferences based on plausible ex- tensions of the theory were not considered, again be- cause of their computational costs. Although model-based diagnosis has also been ap- plied to logic programs [26], constraint knowledge- bases [27] and hardware descriptions [28], none of these approaches propose a query generation method to dis- criminate between diagnoses. 7. Conclusions In this paper we presented an approach to the inter- activ e diagnosis of ontologies. This approach is appli- cable to any ontology language with monotonic seman- tics. W e showed that the axioms generated by classi- ﬁcation and realization reasoning services can be ex- ploited to generate queries which di ﬀ erentiate between diagnoses. For selecting the best next query we pro- posed two strate gies: The “split-in-half” strategy prefers 21 a) Extreme distribuon b) Moderate distribuon c) Uniform distribuon 0 50000 100000 150000 200000 250000 Average  me, ms 0 50000 100000 150000 200000 250000 Average  me, ms 0 50000 100000 150000 200000 250000 Average  me, ms G oo d Average Bad G oo d Average Bad CKK SelectQuery Figure 5: A verage time required to identify the target diagnosis using CKK and brute force query selection algorithms. queries which allow eliminating a half of leading diag- noses. The entropy-based strate gy employs informa- tion theoretic concepts to exploit knowledge about the likelihood of axioms needing to be changed because the ontology at hand is faulty . Based on the probabil- ity of an axiom containing an error we predict the in- formation gain produced by a query result, enabling us to select the best subsequent query according to a one- step-lookahead entropy-based scoring function. W e de- scribed the implementation of a interactiv e debugging algorithm and compared the entropy-based method with the “split-in-half” strategy . Our experiments sho wed a signiﬁcant reduction in the number of queries required to identify the target diagnosis when the entropy-based method is applied. Depending on the quality of the prior probabilities the number of queries required may be re- duced by up to 60%. In order to e valuate the robustness of the entropy- based method we experimented with di ﬀ erent prior fault probability distributions as well as di ﬀ erent qualities of the prior probabilities. Furthermore, we in vestigated cases where kno wledge about failure probabilities is missing or inaccurate. Where such knowledge is un- av ailable, the entropy-based methods ranks the diag- noses based on the number of syntax elements contained in an axiom and the number of axioms in a diagnosis. If we assume that this is a reasonable guess (i.e. the tar- get diagnosis is not at the lower end of the diagnoses ranked by their prior probabilities) then the entropy- based method outperforms “split-in-half”. Moreov er , ev en if the initial guess is not reasonable, the entropy- based method improves the accuracy of the probabilities as more questions are asked. Furthermore, the applica- bility of the approach to real-world ontologies contain- ing thousand of axioms was demonstrated by extensi ve set of ev aluations which are publicly a vailable. References [1] K. Shchekotykhin, G. Friedrich, Query strategy for sequential ontology debugging, in: P . F . Patel-Schneider , P . Y ue, P . Hit- zler , P . Mika, Z. Lei, J. Pan, I. Horrocks, B. Glimm (Eds.), The Semantic W eb - ISWC 2010 - 9th International Semantic W eb Conference, Shanghai, China, 2010, pp. 696–712. [2] B. C. Grau, I. Horrocks, B. Motik, B. Parsia, P . Patel-Schneider , U. Sattler , OWL 2: The next step for O WL, W eb Semantics: Sci- ence, Services and Agents on the W orld Wide W eb 6 (4) (2008) 309–322. [3] J. Ceraso, A. Provitera, Sources of error in syllogistic reasoning, Cognitiv e Psychology 2 (4) (1971) 400–410. [4] P . N. Johnson-Laird, Deductiv e reasoning, Annual review of psychology 50 (1999) 109–135. [5] A. Rector , N. Drummond, M. Horridge, J. Rogers, H. Knublauch, R. Stevens, H. W ang, C. Wroe, OWL Piz- zas: Practical Experience of T eaching OWL-DL: Common Errors & Common Patterns, in: E. Motta, N. R. Shadbolt, A. Stutt, N. Gibbins (Eds.), Engineering Knowledge in the Age of the SemanticW eb 14th International Conference, EKA W 2004, Springer , Whittenbury Hall, UK, 2004, pp. 63–81. [6] C. Roussey , O. Corcho, L. M. V ilches-Bl ´ azquez, A catalogue of OWL ontology antipatterns, in: International Conference On Knowledge Capture, ACM, Redondo Beach, California, USA, 2009, pp. 205–206. [7] S. Schlobach, Z. Huang, R. Cornet, F . Harmelen, Debugging In- coherent T erminologies, Journal of Automated Reasoning 39 (3) (2007) 317–349. [8] A. Kalyanpur , B. Parsia, M. Horridge, E. Sirin, Finding all Jus- tiﬁcations of OWL DL Entailments, in: K. Aberer, K.-S. Choi, N. F . Noy , D. Allemang, K.-I. Lee, L. J. B. Nixon, J. Golbeck, P . Mika, D. Maynard, R. Mizoguchi, G. Schreiber , P . Cudr ´ e- Mauroux (Eds.), The Semantic W eb, 6th International Semantic W eb Conference, 2nd Asian Semantic W eb Conference, ISWC 2007 + ASWC 2007, V ol. 4825 of LNCS, Springer V erlag, Berlin, Heidelberg, 2007, pp. 267–280. 22 [9] G. Friedrich, K. Shchekotykhin, A General Diagnosis Method for Ontologies, in: Y . Gil, E. Motta, R. Benjamins, M. Musen (Eds.), The Semantic W eb - ISWC 2005, 4th International Se- mantic W eb Conference, Springer, 2005, pp. 232–246. [10] M. Horridge, B. Parsia, U. Sattler , Laconic and Precise Justi- ﬁcations in OWL, Proc of the 7th International Semantic W eb Conference ISWC 2008 5318 (2008) 323–338. [11] A. Kalyanpur, B. Parsia, E. Sirin, B. Cuenca-Grau, Repair- ing Unsatisﬁable Concepts in OWL Ontologies, in: Y . Sure, J. Domingue (Eds.), The Semantic W eb: Research and Appli- cations, 3rd European Semantic W eb Conference, ESWC 2006, V ol. 4011 of Lecture Notes in Computer Science, Springer , Berlin, Heidelberg, 2006, pp. 170–184. [12] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur , Y . Katz, Pellet: A practical O WL-DL reasoner, W eb Semantics: Science, Services and Agents on the W orld Wide W eb 5 (2) (2007) 51–53. [13] V . Haarslev , R. M ¨ uller , RACER System Description, in: R. Gor ´ e, A. Leitsch, T . Nipkow (Eds.), 1st International Joint Conference on Automated Reasoning, V ol. 2083 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp. 701–705. [14] B. Motik, R. Shearer, I. Horrocks, Hypertableau Reasoning for Description Logics, Journal of Artiﬁcial Intelligence Research 36 (1) (2009) 165–228. [15] A. Borgida, On the relative e xpressiveness of description logics and predicate logics, Artiﬁcial Intelligence 82 (1-2) (1996) 353– 367. [16] F . Baader, Appendix: Description Logic T erminology, in: P . F . Baader , Franz and Calvanese, Die go and McGuinness, Deborah L. and Nardi, Daniele and Patel-Schneider (Ed.), Description Logic Handbook, Cambridge Univ ersity Press, 2003, pp. 485– 495. [17] R. Reiter, A Theory of Diagnosis from First Principles, Artiﬁcial Intelligence 32 (1) (1987) 57–95. [18] J. de Kleer, B. C. W illiams, Diagnosing multiple faults, Artiﬁ- cial Intelligence 32 (1) (1987) 97–130. [19] U. Junker , Q UICKXPLAIN: Preferred Explanations and Relax- ations for Over-Constrained Problems, in: D. L. McGuinness, G. Ferguson (Eds.), Proceedings of the Nineteenth National Conference on Artiﬁcial Intelligence, Sixteenth Conference on Innov ativ e Applications of Artiﬁcial Intelligence, V ol. 3, AAAI Press / The MIT Press, 2004, pp. 167–172. [20] R. E. K orf, A complete anytime algorithm for number partition- ing, Artiﬁcial Intelligence 106 (2) (1998) 181–203. [21] N. Karmarkar , R. M. Karp, G. S. Lueker , A. M. Odlyzko, Prob- abilistic analysis of optimum partitioning, Journal of Applied Probability 23 (3) (1986) 626–645. [22] J. R. Quinlan, Induction of Decision T rees, Machine Learning 1 (1) (1986) 81–106. [23] E. Y . Shapiro, Algorithmic Program Debugging, MIT Press, 1983. [24] S. Muggleton, W . L. Buntine, Machine Inv ention of First-order Predicates by Inverting Resolution, in: J. Laird (Ed.), Proceed- ings of the 5th International Conference on Machine Learning ( { ICML ’88 } ), Morgan Kaufmann, 1988, pp. 339–352. [25] S. Muggleton, In verse Entailment and Progol, Ne w Genera- tion Computing, Special issue on Inducti ve Logic Programming 13 (3-4) (1995) 245–286. [26] L. Console, G. Friedrich, D. T . Dupre, Model-Based Diagnosis Meets Error Diagnosis in Logic Programs, in: IJCAI, 1993, pp. 1494–1501. [27] A. Felfernig, G. Friedrich, D. Jannach, M. Stumptner , Consistency-based diagnosis of conﬁguration kno wledge bases, Artiﬁcial Intelligence 152 (2004) 213–234. [28] G. Friedrich, M. Stumptner , F . W otawa, Model-based diagnosis of hardware designs, Artif. Intell. 111 (1-2) (1999) 3–39. 23

Interactive ontology debugging: two query strategies for efficient fault localization

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment