Enhancing QA Systems with Complex Temporal Question Processing Capabilities

This paper presents a multilayered architecture that enhances the capabilities of current QA systems and allows different types of complex questions or queries to be processed. The answers to these questions need to be gathered from factual information scattered throughout different documents. Specifically, we designed a specialized layer to process the different types of temporal questions. Complex temporal questions are first decomposed into simple questions, according to the temporal relations expressed in the original question. In the same way, the answers to the resulting simple questions are recomposed, fulfilling the temporal restrictions of the original complex question. A novel aspect of this approach resides in the decomposition which uses a minimal quantity of resources, with the final aim of obtaining a portable platform that is easily extensible to other languages. In this paper we also present a methodology for evaluation of the decomposition of the questions as well as the ability of the implemented temporal layer to perform at a multilingual level. The temporal layer was first performed for English, then evaluated and compared with: a) a general purpose QA system (F-measure 65.47% for QA plus English temporal layer vs. 38.01% for the general QA system), and b) a well-known QA system. Much better results were obtained for temporal questions with the multilayered system. This system was therefore extended to Spanish and very good results were again obtained in the evaluation (F-measure 40.36% for QA plus Spanish temporal layer vs. 22.94% for the general QA system).

💡 Research Summary

The paper addresses a well‑known limitation of current question‑answering (QA) systems: the inability to correctly handle complex temporal questions whose answers are distributed across multiple documents. To overcome this, the authors propose a multilayered architecture that adds a dedicated temporal processing layer on top of an existing QA engine. The core idea is to decompose a complex temporal query into a set of simpler, temporally unambiguous sub‑questions, let the underlying QA system answer each sub‑question independently, and then recombine the partial answers while enforcing the original temporal constraints.

The temporal layer consists of two main phases. In the decomposition phase, the input question undergoes morphological analysis, dependency parsing, and temporal expression detection. Identified temporal cues (e.g., “before”, “after”, “between 1995 and 2000”) are mapped to a small set of hand‑crafted rules that generate logical operators (AND, OR, NOT) and produce a logical structure. This structure is then used to generate a series of simple questions that each contain a single temporal reference. The rule set is intentionally minimal and language‑agnostic; only a limited lexicon of temporal markers and pattern templates is required, which makes the approach portable to new languages.

In the recomposition phase, the answers returned by the base QA system for each simple question are collected. Each answer is normalized with respect to dates or intervals, and the temporal constraints of the original query are re‑applied. For instance, the complex query “Who was the US president between 1995 and 2000?” is split into “Who was the US president in 1995?” and “Who was the US president in 2000?”. If both sub‑answers refer to the same individual, that individual is returned as the final answer; otherwise, additional sub‑queries may be generated to resolve ambiguities. This explicit verification step dramatically reduces false positives that arise when a QA system retrieves correct facts that do not satisfy the temporal relationship.

The authors evaluate the approach on English and Spanish corpora. In English, a baseline general‑purpose QA system achieved an F‑measure of 38.01 %, whereas the same system augmented with the temporal layer reached 65.47 %, an improvement of more than 70 % relative. When compared with a well‑known QA system (OpenEphyra), the multilayered architecture still outperformed it on temporal questions. In Spanish, the temporal layer yielded an F‑measure of 40.36 % versus 22.94 % for the baseline, confirming that the minimal‑resource rule set can be transferred across languages with limited effort.

Key contributions of the work are: (1) a systematic method for decomposing complex temporal questions into atomic sub‑questions using a lightweight, language‑independent rule base; (2) an answer recomposition mechanism that enforces temporal constraints after the base QA engine has produced candidate answers; (3) empirical evidence that the multilayered architecture can be plugged into existing QA pipelines and substantially improve performance on temporal queries in multiple languages.

The paper also outlines future research directions. Extending the rule set to handle more sophisticated temporal logic—such as overlapping intervals, recurring events, and causal relations—would broaden the applicability of the system. Integrating machine‑learning models for temporal expression recognition could reduce the manual effort required to craft rules while preserving the benefits of the decomposition‑recomposition paradigm. Finally, scaling the evaluation to larger, truly multilingual datasets would provide a more comprehensive picture of the approach’s robustness.

Overall, the study demonstrates that a modest, well‑engineered temporal processing layer can transform a generic QA system into a capable answerer of complex, time‑sensitive questions, paving the way for more nuanced, context‑aware information retrieval services.