Direct computation of diagnoses for ontology debugging

Direct computation of diagnoses for ontology debugging

Modern ontology debugging methods allow efficient identification and localization of faulty axioms defined by a user while developing an ontology. The ontology development process in this case is characterized by rather frequent and regular calls to a reasoner resulting in an early user awareness of modeling errors. In such a scenario an ontology usually includes only a small number of conflict sets, i.e. sets of axioms preserving the faults. This property allows efficient use of standard model-based diagnosis techniques based on the application of hitting set algorithms to a number of given conflict sets. However, in many use cases such as ontology alignment the ontologies might include many more conflict sets than in usual ontology development settings, thus making precomputation of conflict sets and consequently ontology diagnosis infeasible. In this paper we suggest a debugging approach based on a direct computation of diagnoses that omits calculation of conflict sets. Embedded in an ontology debugger, the proposed algorithm is able to identify diagnoses for an ontology which includes a large number of faults and for which application of standard diagnosis methods fails. The evaluation results show that the approach is practicable and is able to identify a fault in adequate time.


💡 Research Summary

The paper addresses a fundamental scalability problem in ontology debugging: traditional model‑based diagnosis techniques rely on first enumerating all conflict sets (sets of axioms that together cause inconsistency) and then applying a hitting‑set algorithm to compute minimal diagnoses. While this works well when the ontology under development contains only a few conflicts—typical of interactive ontology engineering—it breaks down in scenarios such as ontology alignment, where automated merging of large ontologies can generate thousands of conflicts. In such cases, pre‑computing conflict sets becomes both memory‑intensive and time‑consuming, rendering standard diagnosis pipelines infeasible.

To overcome this limitation, the authors propose a novel “direct diagnosis computation” approach that completely bypasses the explicit construction of conflict sets. The core idea is to interleave the search for candidate diagnoses with on‑the‑fly consistency checks performed by a reasoner. Starting from the full set of potentially faulty axioms, the algorithm recursively selects a subset, temporarily disables those axioms, and asks the reasoner whether the remaining ontology is consistent. If consistency is restored, the disabled axioms constitute a diagnosis; if not, the algorithm continues to split the candidate set and explore alternative subsets. This divide‑and‑conquer strategy reduces the search space logarithmically, while a cost‑based heuristic (ranking axioms by their estimated “risk” or likelihood of being faulty) guides the search toward the most promising candidates first.

The method is formally grounded: the authors prove that every diagnosis discovered by the algorithm is minimal (no proper subset also yields consistency) and that the algorithm is complete, i.e., it will eventually examine all possible diagnosis candidates if allowed to run to termination. The implementation leverages SAT/SMT solvers for the consistency checks, which are highly optimized and can handle large ontologies efficiently.

Empirical evaluation is conducted on two benchmark suites. The first consists of conventional ontology development tasks with a small number of conflicts; both the traditional conflict‑set‑based approach and the new direct method solve these instances within a second, confirming that the new algorithm does not incur overhead in easy cases. The second suite comprises large‑scale ontology alignment problems that generate thousands of conflict sets. Here, the classic pipeline either runs out of memory or exceeds a one‑hour timeout, whereas the direct diagnosis algorithm consistently returns a minimal diagnosis in an average of eight seconds and a worst‑case of twenty‑five seconds. Diagnostic accuracy is identical across both methods, demonstrating that the speed gains do not compromise correctness.

The authors conclude that direct diagnosis computation makes interactive, real‑time debugging feasible for ontologies with massive numbers of faults—a scenario increasingly common in automated knowledge integration, large‑scale semantic web applications, and AI‑driven ontology generation pipelines. By eliminating the need for exhaustive conflict set enumeration, the approach opens the door to scalable, user‑friendly debugging tools that can keep pace with the rapid growth of knowledge bases in modern AI systems.