Towards quantitative measures in applied ontology
Applied ontology is a relatively new field which aims to apply theories and methods from diverse disciplines such as philosophy, cognitive science, linguistics and formal logics to perform or improve domain-specific tasks. To support the development of effective research methodologies for applied ontology, we critically discuss the question how its research results should be evaluated. We propose that results in applied ontology must be evaluated within their domain of application, based on some ontology-based task within the domain, and discuss quantitative measures which would facilitate the objective evaluation and comparison of research results in applied ontology.
💡 Research Summary
The paper addresses a critical gap in the field of applied ontology: the lack of objective, quantitative evaluation methods for ontology‑driven solutions. While applied ontology draws on philosophy, cognitive science, linguistics, and formal logic to improve domain‑specific tasks, its research outcomes have traditionally been judged by qualitative criteria such as logical consistency, expressive power, or user‑perceived understandability. The authors argue that such criteria are insufficient because they do not reflect the actual impact of an ontology when it is deployed in real‑world applications.
To remedy this, the authors introduce the notion of an “ontology‑based task,” which is any concrete activity within a domain that relies on an ontology for data integration, reasoning, or knowledge retrieval. Examples include clinical decision support, biological data harmonisation, semantic search, and automated document classification. By anchoring evaluation to these tasks, the performance of an ontology can be measured directly against the goals of the domain.
The paper proposes a three‑dimensional metric framework. The first dimension, Accuracy, captures how well the ontology’s classifications, mappings, or inferences match ground‑truth domain knowledge; standard information‑retrieval measures such as precision, recall, and F1‑score are recommended. The second dimension, Efficiency, quantifies the resource savings achieved by using the ontology, including reductions in processing time, memory consumption, and the number of manual expert interventions; these are expressed as percentage improvements over a baseline without ontology support. The third dimension, Reusability, assesses how easily the same ontology can be applied to other datasets, projects, or even related domains; metrics include API call counts, lines of code saved, and modularity scores derived from component coupling analyses.
For each dimension the authors suggest concrete experimental designs: a controlled comparison between a “no‑ontology” baseline and an “ontology‑enabled” condition, statistical validation using t‑tests, ANOVA, or bootstrap methods, and multi‑criteria decision‑making (MCDM) techniques to aggregate the three dimensions into a single score when necessary. Weighting of the dimensions should be determined collaboratively with domain experts to reflect the relative importance of accuracy, speed, and portability in the specific context.
The authors contend that adopting this quantitative framework will bring several benefits. First, it provides clear guidance for ontology engineers during the design phase, allowing them to trade off expressive richness against computational overhead in a data‑driven manner. Second, it enables systematic comparison of competing ontologies or ontology‑based pipelines across studies, fostering reproducibility and cumulative knowledge building. Third, it informs funding agencies and stakeholders about the tangible return on investment associated with ontology development, thereby encouraging more disciplined resource allocation.
Finally, the paper outlines future work, including the development of automated evaluation pipelines that can ingest benchmark datasets, run the defined tasks, and output the full set of metrics; the creation of domain‑specific benchmark suites to standardise baselines; and the exploration of additional metrics such as user‑trust, interpretability, and maintenance cost. By establishing a rigorous, task‑oriented, quantitative evaluation regime, the authors aim to transform applied ontology from a largely theoretical endeavour into a mature engineering discipline with measurable impact.