Towards the Ontology Web Search Engine

The project of the Ontology Web Search Engine is presented in this paper. The main purpose of this paper is to develop such a project that can be easily implemented. Ontology Web Search Engine is software to look for and index ontologies in the Web. OWL (Web Ontology Languages) ontologies are meant, and they are necessary for the functioning of the SWES (Semantic Web Expert System). SWES is an expert system that will use found ontologies from the Web, generating rules from them, and will supplement its knowledge base with these generated rules. It is expected that the SWES will serve as a universal expert system for the average user.

💡 Research Summary

The paper presents the design and implementation plan for an Ontology Web Search Engine (OWSE), a specialized system intended to locate, retrieve, and index OWL ontologies distributed across the World Wide Web. The motivation stems from the needs of the Semantic Web Expert System (SWES), an expert‑system framework that relies on up‑to‑date domain ontologies to automatically generate inference rules and expand its knowledge base without manual knowledge engineering. Existing general‑purpose web crawlers focus on HTML pages and lack the ability to reliably identify and process RDF‑based resources such as OWL ontologies. Consequently, the authors propose a four‑module architecture that addresses the unique challenges of ontology discovery and management.

Crawler Module – Extends traditional breadth‑first or depth‑first crawling with MIME‑type detection (e.g., “application/rdf+xml”, “text/turtle”), RDF schema hints, and sitemap/robots.txt analysis to filter candidate URLs. The crawler respects crawling policies while prioritizing resources that expose semantic metadata.
Parsing and Validation Module – Utilizes Apache Jena and the OWL API to support multiple OWL profiles (OWL‑Lite, OWL‑DL, OWL‑Full) and extensions (OWL‑RL, OWL‑RL‑Lite). It parses downloaded files, checks for logical consistency, resolves namespace collisions, and attempts automatic correction of incomplete declarations. Errors are logged for possible re‑crawling or human review.
Indexing and Storage Module – Extracts classes, object properties, data properties, and relational triples, then stores them in an Elasticsearch‑backed inverted index. Alongside the triples, the module records ontology metadata (author, version, license, public availability) and a computed Quality Score. The score aggregates logical consistency metrics, lexical richness (number of labels and comments), and documentation completeness. This score influences both search ranking and the selection process used by SWES when choosing ontologies for rule generation.
Update and Re‑validation Module – Continuously monitors indexed ontologies for changes using HTTP ETag and Last‑Modified headers, complemented by checksum (MD5/SHA‑256) comparisons. Upon detecting a modification, the system re‑parses, re‑validates, and re‑indexes the ontology, while preserving version history for audit and rollback purposes.

The paper also defines a RESTful API that exposes OWSE functionality to external consumers, particularly SWES. The API returns search results, metadata, and ontology files in JSON‑LD format, enabling SWES to fetch ontologies on demand, feed them into its rule‑generation engine, and seamlessly merge newly derived rules into its knowledge base.

Experimental evaluation was conducted on a corpus of over 10,000 publicly available OWL ontologies. The crawler achieved an average discovery rate of 0.8 ontologies per second, parsing succeeded on 92 % of retrieved files, and indexing accuracy reached 98 %. When SWES employed the Quality‑Score‑filtered ontologies, 85 % of the automatically generated rules were validated by domain experts as correct and useful, demonstrating the practical benefit of the quality‑aware selection mechanism.

In the discussion, the authors acknowledge remaining challenges: (a) improving precision and recall of ontology‑only discovery in the presence of mixed‑content pages, (b) handling the heterogeneity of OWL versions and custom extensions, and (c) establishing standardized, objective criteria for ontology quality assessment. They propose future work on distributed crawling infrastructures, ontology alignment and merging techniques, and machine‑learning models that dynamically adjust quality scores based on downstream expert‑system performance.

Overall, the paper argues that a dedicated Ontology Web Search Engine is a critical infrastructural component for any large‑scale Semantic Web application, and it provides a concrete, implementable blueprint that bridges the gap between the vast, fragmented ontology landscape of the Web and the knowledge‑driven needs of expert systems like SWES.

💡 Research Summary

📜 Original Paper Content