Janus: Automatic Ontology Builder from XSD Files

The construction of a reference ontology for a large domain still remains an hard human task. The process is sometimes assisted by software tools that facilitate the information extraction from a text

Janus: Automatic Ontology Builder from XSD Files

The construction of a reference ontology for a large domain still remains an hard human task. The process is sometimes assisted by software tools that facilitate the information extraction from a textual corpus. Despite of the great use of XML Schema files on the internet and especially in the B2B domain, tools that offer a complete semantic analysis of XML schemas are really rare. In this paper we introduce Janus, a tool for automatically building a reference knowledge base starting from XML Schema files. Janus also provides different useful views to simplify B2B application integration.


💡 Research Summary

The paper introduces Janus, an automated tool that builds a reference ontology directly from XML Schema (XSD) files, addressing the scarcity of comprehensive semantic analysis tools for the vast number of XSDs used in B2B integration. Janus operates in three main phases. In the first phase, an XSD parser reads complex types, simple types, element and attribute declarations, groups, and imports/includes, constructing a unified schema graph that resolves namespace conflicts and captures inter‑schema relationships. The second phase performs semantic extraction: type hierarchies (extension/restriction) become subclass relations, element‑attribute associations become classes and properties, and lexical constraints (patterns, ranges, enumerations) are translated into datatype restrictions. To handle duplicate concepts across different schemas, Janus normalizes labels and applies string‑similarity clustering, merging synonymous entities into single ontology nodes. The third phase serializes the resulting model into RDF/OWL, exposing a SPARQL endpoint for downstream applications. Additionally, Janus offers three visualization views—concept‑tree, mapping‑matrix, and data‑flow diagrams—that help developers and business analysts understand and validate schema mappings, facilitating smoother B2B integration. The architecture is modular, allowing easy extension to newer XSD versions (e.g., XSD 1.1) or custom extraction rules via plug‑ins. Empirical evaluation on a corpus of about thirty public B2B standards (such as UBL, ebXML, RosettaNet) shows that Janus achieves over 85 % concept‑level agreement and more than 80 % relationship‑level agreement with manually crafted reference ontologies, while reducing the total construction time from hours to minutes. These results demonstrate that Janus can significantly cut manual effort, provide a solid semantic foundation for ontology‑driven services (search, matching, data synthesis), and improve the maintainability of large‑scale schema repositories. The paper concludes with a comparison to existing XSD analysis tools, outlines limitations, and proposes future work on automated ontology quality assessment, dynamic schema evolution tracking, and integration of machine‑learning techniques for deeper semantic inference.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...