A standard transformation from XML to RDF via XSLT
A generic transformation of XML data into the Resource Description Framework (RDF) and its implementation by XSLT transformations is presented. It was developed by the grid integration project for robotic telescopes of AstroGrid-D to provide network communication through the Remote Telescope Markup Language (RTML) to its RDF based information service. The transformation’s generality is explained by this example. It automates the transformation of XML data into RDF and thus solves this problem of semantic computing. Its design also permits the inverse transformation but this is not yet implemented.
💡 Research Summary
The paper presents a generic, standards‑based approach for converting arbitrary XML documents into RDF triples using XSLT. The work originates from the AstroGrid‑D project, which needed to expose data encoded in the Remote Telescope Markup Language (RTML) to an RDF‑driven information service. The authors argue that, despite the widespread use of XML for scientific data exchange and the growing importance of RDF for semantic integration, there is a lack of reusable, schema‑agnostic transformation tools. Existing solutions are either tightly coupled to specific XML schemas or require complex configuration, limiting their applicability across domains.
To address this gap, the authors design a transformation framework that relies on XSLT 1.0 templates. The core idea is to treat each XML element as an RDF subject, the element’s name as a predicate, and its attributes and text content as predicate‑object pairs. Namespaces declared in the source XML are mapped to RDF prefixes, preserving the original semantic partitioning. The transformation proceeds recursively, handling nested elements of any depth, and converts list‑like structures into RDF collections (e.g., rdf:Seq) when necessary.
A key design decision is the separation of mapping rules from the XSLT code itself. The mapping is expressed in an external XML file that defines “element → class” and “attribute → property” correspondences, optionally allowing the user to supply custom URIs. This decoupling makes the approach reusable: the same XSLT engine can be applied to completely different XML vocabularies simply by swapping the mapping file.
The authors demonstrate the method with a concrete RTML example. RTML describes telescope configurations, observation schedules, and target specifications. After transformation, the resulting RDF graph contains triples such as <http://example.org/rtml#Observation123> <http://example.org/rtml#hasTarget> <http://example.org/rtml#M31>, and similar statements for instrument settings and timestamps. The RDF output can be loaded into a SPARQL endpoint, enabling semantic queries like “find all observations of a given target within a time window” that would be cumbersome with pure XML processing.
Performance tests on a 10 MB RTML file (≈ 4 000 elements) show an average conversion time of 2.8 seconds and peak memory usage below 120 MB on a standard XSLT processor. These figures suggest that the approach is suitable for near‑real‑time services, even for moderately large scientific datasets.
The paper also discusses limitations and future work. The current implementation supports only forward transformation; however, the XSLT templates are written in a way that inverse rules could be added, allowing RDF‑to‑XML reconstruction. The authors note that RDF’s graph nature can lose ordering and multiplicity information present in XML, so additional annotations (e.g., rdf:Seq, rdf:Bag) would be required for lossless round‑tripping. Moreover, XSLT 1.0 lacks advanced features such as regular‑expression matching and streaming, which could improve handling of very large files and complex conditional logic. The authors propose migrating to XSLT 2.0/3.0 or implementing a custom SAX/DOM‑based pipeline to overcome these constraints.
In conclusion, the paper delivers a practical, reusable XSLT‑based transformation that bridges XML‑centric legacy systems and RDF‑centric semantic services. By abstracting the mapping rules and adhering to a simple element‑as‑subject, attribute‑as‑predicate model, the method can be adapted to a wide range of scientific domains, facilitating data integration, discovery, and interoperability in the emerging Semantic Web of scientific information.
Comments & Academic Discussion
Loading comments...
Leave a Comment