Is SHACL Suitable for Data Quality Assessment?
Knowledge graphs have been widely adopted in both enterprises, such as the Google Knowledge Graph, and open platforms like Wikidata, to represent domain knowledge and support artificial intelligence applications. They model real-world information as nodes and edges. To embrace flexibility, knowledge graphs often lack enforced schemas (i.e., ontologies), leading to potential data quality issues, such as semantically overlapping nodes. Yet ensuring their quality is essential, as issues in the data can affect applications relying on them. To assess the quality of knowledge graphs, existing works propose either high-level frameworks comprising various data quality dimensions without concrete implementations, define tools that measure data quality with ad-hoc SPARQL queries, or promote the usage of constraint languages, such as the Shapes Constraint Language (SHACL), to assess and improve the quality of the graph. Although the latter approaches claim to address data quality assessment, none of them comprehensively tries to cover all data quality dimensions. In this paper, we explore this gap by investigating the extent to which SHACL can be used to assess data quality in knowledge graphs. Specifically, we defined SHACL shapes for 69 data quality metrics proposed by Zaveri et al. [1] and implemented a prototype that automatically instantiates these shapes and computes the corresponding data quality measures from their validation results. All resources are provided for repeatability.
💡 Research Summary
The paper investigates whether the Shapes Constraint Language (SHACL) core can serve as a comprehensive tool for assessing data quality (DQ) in knowledge graphs (KGs). Recognizing that KGs often forgo enforced schemas to maintain flexibility, the authors focus on the resulting quality challenges such as overlapping concepts, missing constraints, and inconsistent data. Building on the systematic literature review by Zaveri et al., which identified 69 DQ metrics across four high‑level categories—Accessibility, Intrinsic, Contextual, and Representational—the study attempts to map each metric to a SHACL core shape.
To make this mapping feasible, the authors introduce three assumptions: (A1) every entity is explicitly typed, (A2) the ontology supplies complete class and property definitions (including domain, range, and other characteristics), and (A3) domain experts provide necessary background knowledge (e.g., expected cardinalities, reference value sets). These assumptions are realistic for well‑curated KGs but may not hold for noisy, crowd‑sourced graphs. When an assumption is violated, the corresponding metric is either excluded or only partially validated.
The authors systematically evaluate each metric’s implementability, marking it as fully supported (✓), partially supported (p), or not supported (x) by SHACL core. The results are summarized in four tables (one per category). In the Accessibility group, most metrics—such as licensing detection (L1), interlink existence (I2), digital signature usage (S1), and the performance metric P1 (prohibition of hash‑URIs in large graphs)—are fully expressible using SHACL’s built‑in constraint components (e.g., sh:Pattern, sh:MinCount). Metrics that depend on runtime characteristics (e.g., SPARQL endpoint latency, high‑throughput processing) cannot be captured by static graph validation and are therefore marked as partially or not supported.
In the Intrinsic category, syntactic validity and consistency metrics are largely implementable. For example, cardinality violations, datatype mismatches, and IRI pattern checks can be expressed directly. However, semantic accuracy metrics that require reasoning over ontology axioms or comparison with external reference datasets exceed SHACL core’s expressive power; they would need SHACL‑SPARQL or external reasoning engines. Completeness metrics that involve measuring redundancy, information richness, or compactness often require aggregation across many triples, leading to partial support.
The Contextual and Representational categories show similar patterns. Temporal freshness (e.g., “data not older than X days”) can be modeled with sh:Pattern on timestamp literals, but assessing whether the freshness aligns with business SLAs demands external logic. Trustworthiness, provenance depth, and user‑centric relevance metrics also require information beyond what static SHACL constraints can capture.
To demonstrate practicality, the authors built a prototype that automatically generates SHACL shapes for the selected metrics, loads them into an RDF4J SHACL engine, and runs validation against a target KG. The validation report lists each violating node; the prototype then computes DQ scores by converting violations into ratio measures (e.g., #violations / #entities) or binary indicators. Composite scores are derived by aggregating multiple shape results with configurable weights. All source code, shape definitions, and example datasets are released for reproducibility.
The discussion acknowledges SHACL core’s strengths: a formal, machine‑readable way to encode many low‑level quality constraints, ease of integration with existing RDF pipelines, and the ability to generate quantitative DQ scores from validation reports. Nevertheless, the authors argue that SHACL alone cannot fully address higher‑level DQ dimensions that involve dynamic performance characteristics, deep semantic reasoning, or domain‑specific business rules. They suggest extending the approach with SHACL‑SPARQL for custom query‑based constraints, SHACL‑JS for procedural checks, and coupling with external quality‑assessment services.
In conclusion, the paper provides a thorough, metric‑by‑metric evaluation of SHACL core’s suitability for KG data quality assessment. It shows that while SHACL can cover a substantial portion of the DQ landscape—especially syntactic, structural, and some semantic checks—comprehensive DQ evaluation still requires complementary techniques. The work lays a solid foundation for future research on hybrid DQ frameworks that combine SHACL’s declarative validation with reasoning, statistical profiling, and domain‑expert input.
Comments & Academic Discussion
Loading comments...
Leave a Comment