Q${}^2$Forge: Minting Competency Questions and SPARQL Queries for Question-Answering Over Knowledge Graphs

Q${}^2$Forge: Minting Competency Questions and SPARQL Queries for Question-Answering Over Knowledge Graphs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The SPARQL query language is the standard method to access knowledge graphs (KGs). However, formulating SPARQL queries is a significant challenge for non-expert users, and remains time-consuming for the experienced ones. Best practices recommend to document KGs with competency questions and example queries to contextualise the knowledge they contain and illustrate their potential applications. In practice, however, this is either not the case or the examples are provided in limited numbers. Large Language Models (LLMs) are being used in conversational agents and are proving to be an attractive solution with a wide range of applications, from simple question-answering about common knowledge to generating code in a targeted programming language. However, training and testing these models to produce high quality SPARQL queries from natural language questions requires substantial datasets of question-query pairs. In this paper, we present Q${}^2$Forge that addresses the challenge of generating new competency questions for a KG and corresponding SPARQL queries. It iteratively validates those queries with human feedback and LLM as a judge. Q${}^2$Forge is open source, generic, extensible and modular, meaning that the different modules of the application (CQ generation, query generation and query refinement) can be used separately, as an integrated pipeline, or replaced by alternative services. The result is a complete pipeline from competency question formulation to query evaluation, supporting the creation of reference query sets for any target KG.


💡 Research Summary

The paper presents Q²Forge, a comprehensive framework designed to automate the creation of question-query (Q²) datasets for Knowledge Graphs (KGs). The core challenge addressed is the significant bottleneck in exploiting KGs: formulating SPARQL queries is difficult for non-experts and time-consuming even for experts. While best practices recommend documenting KGs with Competency Questions (CQs) and example SPARQL queries, such resources are often scarce. Furthermore, training and evaluating Large Language Model (LLM)-based natural language-to-SPARQL translation systems require substantial, high-quality Q² datasets, which are largely limited to general KGs like DBpedia and Wikidata. Q²Forge fills this gap by providing an end-to-end pipeline to generate, validate, and refine tailored CQs and SPARQL queries for any target KG, including domain-specific or private ones.

Q²Forge operates through a modular, three-stage pipeline orchestrated via a configurable backend service (Gen²KGBot). The process begins with KG Configuration and Pre-processing. Users define a KG profile (name, endpoint, prefixes, etc.). The system then extracts ontological class information and, crucially, samples instances to analyze the actual properties and value types used in the data, bridging the gap between ontology definitions and practical KG representation. This contextual information is vital for subsequent steps.

The second stage is Competency Question Generation. Leveraging an LLM, the system generates natural language CQs based on the KG’s configuration and any additional domain context provided by the user (e.g., a paper abstract). The model is instructed to output questions categorized by complexity (Basic, Intermediate, Advanced) and with relevant tags.

The third and most critical stage is SPARQL Query Generation, Execution, and Refinement. Here, a natural language question (from the previous stage or user input) is translated into a SPARQL query by an LLM, which uses the pre-processed KG schema as context. The generated query is executed against the live KG endpoint. The results are then interpreted back into natural language by another LLM call to aid user understanding. This creates a candidate Q² pair. The system incorporates an iterative human-in-the-loop validation process where the user judges the pair’s relevance. A key feature is the use of an LLM as a “judge” to suggest automated query refinements. The user can iteratively refine the query based on this feedback or manual edits until satisfied.

A defining strength of Q²Forge is its flexible and extensible architecture. It is open-source and built with modularity in mind. The three core modules (CQ generation, query generation, query refinement) can be used independently, as a complete pipeline, or replaced with alternative implementations (e.g., a user’s proprietary text-to-SPARQL service). The backend exposes a documented web API, facilitating community extensions and integration with robust frameworks like LangChain.

In summary, Q²Forge offers a practical, automated solution for a critical resource gap. It lowers the barrier for creating reference Q² sets needed for documenting KGs, benchmarking question-answering systems, and training/fine-tuning LLMs for accurate SPARQL generation across diverse and specialized knowledge domains.


Comments & Academic Discussion

Loading comments...

Leave a Comment