Ontology Based Information Extraction for Disease Intelligence

Disease Intelligence (DI) is based on the acquisition and aggregation of fragmented knowledge of diseases at multiple sources all over the world to provide valuable information to doctors, researchers and information seeking community. Some diseases have their own characteristics changed rapidly at different places of the world and are reported on documents as unrelated and heterogeneous information which may be going unnoticed and may not be quickly available. This research presents an Ontology based theoretical framework in the context of medical intelligence and country/region. Ontology is designed for storing information about rapidly spreading and changing diseases with incorporating existing disease taxonomies to genetic information of both humans and infectious organisms. It further maps disease symptoms to diseases and drug effects to disease symptoms. The machine understandable disease ontology represented as a website thus allows the drug effects to be evaluated on disease symptoms and exposes genetic involvements in the human diseases. Infectious agents which have no known place in an existing classification but have data on genetics would still be identified as organisms through the intelligence of this system. It will further facilitate researchers on the subject to try out different solutions for curing diseases.

💡 Research Summary

The paper presents a comprehensive ontology‑driven framework for Disease Intelligence (DI), aiming to overcome the fragmentation of disease‑related knowledge that is scattered across heterogeneous sources worldwide. The authors begin by highlighting the limitations of conventional biomedical databases and classification schemes such as ICD, SNOMED CT, and MeSH, which are primarily designed for structured, static records. In the era of rapidly evolving pathogens and the explosion of unstructured data—clinical narratives, news reports, genomic sequences—these systems fail to provide timely, integrated insights.

To address this gap, the authors design a multi‑layered ontology using the Web Ontology Language (OWL). At the top level, four principal classes are defined: Disease, Pathogen, HumanGene, and Drug. Each of these is further specialized (e.g., InfectiousDisease vs. NonInfectiousDisease; ViralAgent, BacterialAgent, FungalAgent). Object properties such as hasSymptom, causes, interactsWith, hasGeneticMarker, and treatedBy interconnect the classes, forming a rich semantic graph that captures symptom‑disease, drug‑symptom, and gene‑disease relationships in a single model.

A key contribution is the systematic alignment of the ontology with existing taxonomies and genomic repositories. The authors employ owl:equivalentClass and owl:sameAs annotations to map ICD‑10, SNOMED CT, and MeSH terms to the corresponding ontology classes. Simultaneously, they ingest sequence data from NCBI GenBank, Ensembl, and UniProt, linking them to Pathogen and HumanGene instances via a hasGeneticSequence property. This dual alignment enables the system to recognize newly discovered organisms that lack a formal taxonomic entry but possess genetic data; such entities are automatically instantiated as UnclassifiedPathogen within the ontology.

The framework incorporates a reasoning engine that applies logical rules to infer novel knowledge. For example, a rule stating “if Drug X alleviates Symptom Y, then Drug X is potentially therapeutic for any Disease that includes Symptom Y” allows the system to suggest repurposing opportunities without explicit curation. Similarly, the reasoner can deduce that a particular genetic variant is implicated in multiple diseases, supporting precision‑medicine strategies. Queries are expressed in SPARQL, and the authors provide a web‑based console that lets users retrieve complex disease‑symptom‑drug‑gene networks on demand.

Implementation details reveal a web‑served ontology coupled with interactive visualizations built on D3.js. Users—clinicians, researchers, policy makers—can search for a disease and instantly view associated symptoms, causative pathogens, relevant human genes, approved drugs, and ongoing clinical trials. Real‑time feeds from sources such as WHO situation reports and GISAID genomic updates are ingested, ensuring that the ontology reflects the latest evidence.

In the discussion, the authors argue that their ontology‑based DI platform delivers three major benefits. First, semantic integration of disparate data sources dramatically improves information retrieval speed and completeness. Second, the inclusion of genomic data alongside traditional clinical descriptors enables rapid identification of emerging pathogens and variant strains, even when they are absent from conventional classification systems. Third, the reasoning component provides decision‑support capabilities, facilitating drug repurposing, multi‑target therapy design, and genotype‑guided treatment recommendations.

The paper concludes by outlining future work: automated ontology expansion using natural‑language processing pipelines, scaling the system to handle massive streaming data, and embedding the platform within electronic health record (EHR) workflows for real‑time clinical decision support. Overall, the study demonstrates that a well‑engineered, openly accessible ontology can serve as the backbone of a dynamic, global disease intelligence ecosystem, bridging the gap between fragmented biomedical knowledge and actionable insight.