Towards Unsupervised Learning of Temporal Relations between Events

Automatic extraction of temporal relations between event pairs is an important task for several natural language processing applications such as Question Answering, Information Extraction, and Summarization. Since most existing methods are supervised and require large corpora, which for many languages do not exist, we have concentrated our efforts to reduce the need for annotated data as much as possible. This paper presents two different algorithms towards this goal. The first algorithm is a weakly supervised machine learning approach for classification of temporal relations between events. In the first stage, the algorithm learns a general classifier from an annotated corpus. Then, inspired by the hypothesis of “one type of temporal relation per discourse, it extracts useful information from a cluster of topically related documents. We show that by combining the global information of such a cluster with local decisions of a general classifier, a bootstrapping cross-document classifier can be built to extract temporal relations between events. Our experiments show that without any additional annotated data, the accuracy of the proposed algorithm is higher than that of several previous successful systems. The second proposed method for temporal relation extraction is based on the expectation maximization (EM) algorithm. Within EM, we used different techniques such as a greedy best-first search and integer linear programming for temporal inconsistency removal. We think that the experimental results of our EM based algorithm, as a first step toward a fully unsupervised temporal relation extraction method, is encouraging.

💡 Research Summary

The paper tackles the challenging task of extracting temporal relations between events, a capability that underpins many downstream NLP applications such as question answering, information extraction, and summarization. While most prior work relies on fully supervised learning with large, manually annotated corpora, such resources are scarce for many languages and domains. To mitigate this dependency, the authors propose two distinct algorithms that drastically reduce the need for annotated data.

The first algorithm is a weakly supervised, cross‑document bootstrapping approach. It begins by training a general temporal‑relation classifier on a modestly sized, fully annotated corpus (e.g., TimeBank). This classifier uses a rich set of local features: tense, aspect, lexical cues, dependency‑tree paths, and distance between events. After the initial training, the system gathers a cluster of topically related documents, identified via topic modeling or embedding‑based similarity. Within each cluster, the classifier’s predictions are aggregated, and the most frequent relation type is treated as a “cluster‑level label.” This global label is then combined with the classifier’s local probability scores to re‑rank or adjust the final decision for each event pair. The key hypothesis driving this method is that a single discourse tends to exhibit a dominant temporal relation pattern, allowing the global cluster information to compensate for the scarcity of supervision. Experiments on English news and blog corpora show that, without any additional annotated examples, the bootstrapped system outperforms several strong supervised baselines, achieving a 3–5 % absolute gain in F1 score.

The second algorithm aims at fully unsupervised temporal‑relation extraction by employing the Expectation‑Maximization (EM) framework. Here, the temporal relation between any pair of events is treated as a latent variable. In the E‑step, the current model parameters are used to compute posterior probabilities for each possible relation given the observed linguistic features. The M‑step updates the parameters to maximize the expected complete‑data likelihood. Pure EM, however, can produce temporally inconsistent graphs (e.g., cycles or contradictory ordering). To address this, the authors integrate two complementary techniques. First, a greedy best‑first search fixes the most confident relations early and iteratively refines the remaining ones. Second, an Integer Linear Programming (ILP) formulation encodes global consistency constraints such as transitivity (if A → B and B → C then A → C) and acyclicity, and solves for the optimal, globally consistent set of relations. The ILP step effectively removes contradictions that would otherwise degrade performance. The EM‑ILP system, evaluated without any gold‑standard training data, attains over 70 % accuracy and demonstrates particularly strong performance on documents with complex event chains.

Both methods are evaluated using standard metrics (accuracy, precision, recall, F1) on publicly available datasets. The weakly supervised bootstrapping approach consistently surpasses fully supervised baselines, while the EM‑ILP approach establishes a solid baseline for truly unsupervised temporal relation extraction. Error analysis reveals that noisy cluster composition and the choice of initial parameters can affect outcomes, suggesting avenues for further refinement.

In terms of contributions, the paper (1) introduces a practical way to leverage large unannotated corpora together with a small seed classifier, (2) demonstrates how EM can be combined with logical constraints via ILP to enforce temporal coherence, and (3) provides extensive empirical evidence that both strategies can outperform traditional supervised systems. The authors also outline future work, including multilingual extensions, handling of complex events with attributes (duration, intensity), and more sophisticated topic‑clustering techniques. Overall, the study offers a compelling roadmap for reducing annotation bottlenecks in temporal relation extraction and advancing the field toward more language‑agnostic, data‑efficient solutions.