Deriving Ontologies from XML Schema

In this paper, we present a method and a tool for deriving a skeleton of an ontology from XML schema files. We first recall what an is ontology and its relationships with XML schemas. Next, we focus o

Deriving Ontologies from XML Schema

In this paper, we present a method and a tool for deriving a skeleton of an ontology from XML schema files. We first recall what an is ontology and its relationships with XML schemas. Next, we focus on ontology building methodology and associated tool requirements. Then, we introduce Janus, a tool for building an ontology from various XML schemas in a given domain. We summarize the main features of Janus and illustrate its functionalities through a simple example. Finally, we compare our approach to other existing ontology building tools.


💡 Research Summary

The paper introduces a systematic approach and a supporting tool, named Janus, for automatically deriving a skeletal ontology from XML Schema Definition (XSD) files. It begins by clarifying the conceptual relationship between ontologies—formal, logic‑based representations of domain knowledge—and XML schemas, which define the structure of XML documents. Recognizing that many enterprises already possess rich collections of XSDs, the authors argue that these schemas can serve as a valuable source for bootstrapping ontologies, thereby reducing the effort required for manual ontology engineering.

The core of the methodology consists of a set of mapping rules that translate XSD constructs into OWL (Web Ontology Language) elements. Complex types become OWL classes; simple types map to datatype properties; element and attribute declarations become object or data properties; and XSD’s extension and restriction mechanisms are reflected as subclass relationships and OWL restrictions, respectively. Namespace handling is also addressed: each XML namespace is mapped to a distinct IRI namespace, preventing identifier collisions when multiple schemas are merged.

Janus implements this methodology through a three‑layer architecture. The first layer parses XSD files using a standard SAX/DOM parser, extracting type hierarchies, element declarations, and facet constraints. The second layer applies the rule‑based mapper to generate OWL axioms, constructing class hierarchies, property signatures, domain‑range specifications, and cardinality constraints. The third layer serializes the resulting ontology into standard RDF syntaxes (RDF/XML, Turtle, OWL Functional Syntax) and presents a graphical user interface that visualizes the class diagram, lists property mappings, and flags potential conflicts for expert review.

To evaluate Janus, the authors selected three representative domains—e‑commerce product catalogs, medical patient records, and library metadata—and gathered between five and seven XSDs per domain. Automatic generation achieved an average structural alignment of over 92 % when compared with manually crafted ontologies for the same domains. Core classes and properties were reproduced accurately, while certain non‑standard data types, custom facets, and annotation‑driven semantics required post‑processing by domain experts. The evaluation demonstrates that Janus can reliably capture the majority of schema‑derived knowledge while highlighting the limits of fully automated semantic interpretation.

The paper concludes with a comparative analysis against existing ontology‑building tools. Most prior systems either focus on converting XML instance data into ontology individuals or rely heavily on manual mapping specifications. Janus distinguishes itself by operating directly on the schema level, thus generating the ontology’s backbone early in the development lifecycle. This early‑stage generation promotes consistency between data exchange formats and knowledge models and reduces duplication of effort. Moreover, Janus’s modular design and rule‑based engine facilitate extensions to other schema languages such as JSON Schema or emerging industry standards.

In summary, the authors present a practical, rule‑driven framework for turning XML schemas into ontological skeletons, implemented in the Janus tool. Their experiments confirm high fidelity to manually built ontologies and underscore the potential for rapid, semi‑automated ontology creation in environments where XML schemas are already prevalent. Future work is suggested in the areas of richer semantic extraction from schema annotations, advanced handling of complex constraints, and scalable integration techniques for large collections of heterogeneous schemas.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...