📝 Original Info
- Title: Evaluation of Semantic Web Technologies for Storing Computable Definitions of Electronic Health Records Phenotyping Algorithms
- ArXiv ID: 1707.07673
- Date: 2017-07-26
- Authors: Researchers from original ArXiv paper
📝 Abstract
Electronic Health Records are electronic data generated during or as a byproduct of routine patient care. Structured, semi-structured and unstructured EHR offer researchers unprecedented phenotypic breadth and depth and have the potential to accelerate the development of precision medicine approaches at scale. A main EHR use-case is defining phenotyping algorithms that identify disease status, onset and severity. Phenotyping algorithms utilize diagnoses, prescriptions, laboratory tests, symptoms and other elements in order to identify patients with or without a specific trait. No common standardized, structured, computable format exists for storing phenotyping algorithms. The majority of algorithms are stored as human-readable descriptive text documents making their translation to code challenging due to their inherent complexity and hinders their sharing and re-use across the community. In this paper, we evaluate the two key Semantic Web Technologies, the Web Ontology Language and the Resource Description Framework, for enabling computable representations of EHR-driven phenotyping algorithms.
💡 Deep Analysis
Deep Dive into Evaluation of Semantic Web Technologies for Storing Computable Definitions of Electronic Health Records Phenotyping Algorithms.
Electronic Health Records are electronic data generated during or as a byproduct of routine patient care. Structured, semi-structured and unstructured EHR offer researchers unprecedented phenotypic breadth and depth and have the potential to accelerate the development of precision medicine approaches at scale. A main EHR use-case is defining phenotyping algorithms that identify disease status, onset and severity. Phenotyping algorithms utilize diagnoses, prescriptions, laboratory tests, symptoms and other elements in order to identify patients with or without a specific trait. No common standardized, structured, computable format exists for storing phenotyping algorithms. The majority of algorithms are stored as human-readable descriptive text documents making their translation to code challenging due to their inherent complexity and hinders their sharing and re-use across the community. In this paper, we evaluate the two key Semantic Web Technologies, the Web Ontology Language and the
📄 Full Content
1
Evaluation of Semantic Web Technologies for Storing Computable Definitions
of Electronic Health Records Phenotyping Algorithms
Václav Papež1,2,*, MSc, Spiros Denaxas1,2,*, PhD, Harry Hemingway1,2, FRCP
1 Institute of Health Informatics, University College London, London, UK
2 Farr Institute of Health Informatics Research, University College London, London, UK
Abstract
Electronic Health Records are electronic data generated during or as a byproduct of routine patient care. Structured,
semi-structured and unstructured EHR offer researchers unprecedented phenotypic breadth and depth and have the
potential to accelerate the development of precision medicine approaches at scale. A main EHR use-case is defining
phenotyping algorithms that identify disease status, onset and severity. Phenotyping algorithms utilize diagnoses,
prescriptions, laboratory tests, symptoms and other elements in order to identify patients with or without a specific
trait. No common standardized, structured, computable format exists for storing phenotyping algorithms. The majority
of algorithms are stored as human-readable descriptive text documents making their translation to code challenging
due to their inherent complexity and hinders their sharing and re-use across the community. In this paper, we evaluate
the two key Semantic Web Technologies, the Web Ontology Language and the Resource Description Framework, for
enabling computable representations of EHR-driven phenotyping algorithms.
Introduction
Electronic Health Records (EHR) are structured, semi-structured and unstructured data that are generated during
routine interactions of patients with primary care, hospital care and tertiary healthcare or as a byproduct of those
interactions for billing or administrative purposes1. Structured EHR are recorded using controlled clinical
terminologies while unstructured data include clinical text and narrative. Semi-structured EHR data often loosely
follow a data specification (e.g. prescription events, medical imaging reports) but this varies greatly across information
systems, clinical specialties and healthcare providers. High-throughput genotyping and increased availability of EHR
data are giving scientists the unprecedented opportunity to exploit routinely generated clinical data to advance
precision medicine at scale. EHR data can fundamentally alter the manner in which genetic association studies are
performed and enable scientists to examine the association of genetic variants and traits in larger sample sizes and
phenotypic breadth2.
A primary use-case of EHR data is the creation of phenotyping (or “case finding”) algorithms3, computational
algorithms that identify patients that have (or have not) been diagnosed with a particular condition4 (e.g. acute
myocardial infarction, prostate cancer, or anxiety etc.) and where applicable the disease onset and severity.
Phenotyping algorithms tend to use clinical information such as diagnoses, laboratory tests, symptoms, clinical
examination findings, prescriptions, referrals and other EHR data elements. While the term phenotype is traditionally
defined as the physical manifestation of a particular trait, in the context of EHR research, phenotypes are broadly (but
not exclusively) as the presence or absence of a particular clinical condition. In EHR resources linked with genetic
data, such as the Electronic Medical Records and Genomics (eMERGE) consortium5, these phenotypes can enable
large-scale genomic association studies which have been traditionally limited to a small set of traits. Phenotyping
however is a challenging and time-consuming process since often data been collected for care, auditing or
administrative purposes and not for research. The contents of EHR data sources are an indirect representation of the
true patient state as skewed by the underlying healthcare process e.g. clinical guidelines, information systems, data
standards6.
Defining and validating EHR phenotyping algorithms is challenging and time-consuming. Challenges are amplified
by the lack of a common definition standard for algorithms, making their sharing across the scientific community
problematic. Despite the fact that phenotype components are structured and often annotated by controlled clinical
terminology terms, phenotype definitions, and their underlying logic are usually expressed as free-text which is not
readily machine-readable. The translation from this narrative to programming code used to identify and extract patients
(e.g. implementing a phenotyping algorithm using Structured Query Language for use in a relational database
management system) can be problematic due to potential ambiguities in the manner in which the algorithm was
expressed or potential ways of implementing it using local data. There is a clear and urgent need to develop and
2
evaluate a computable, standards-driven format to facilitate the systematic creation, sharing and re-use of EH
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.