Negative-Aware Diffusion Process for Temporal Knowledge Graph Extrapolation
Temporal Knowledge Graph (TKG) reasoning seeks to predict future missing facts from historical evidence. While diffusion models (DM) have recently gained attention for their ability to capture complex predictive distributions, two gaps remain: (i) the generative path is conditioned only on positive evidence, overlooking informative negative context, and (ii) training objectives are dominated by cross-entropy ranking, which improves candidate ordering but provides little supervision over the calibration of the denoised embedding. To bridge this gap, we introduce Negative-Aware Diffusion model for TKG Extrapolation (NADEx). Specifically, NADEx encodes subject-centric histories of entities, relations and temporal intervals into sequential embeddings. NADEx perturbs the query object in the forward process and reconstructs it in reverse with a Transformer denoiser conditioned on the temporal-relational context. We further derive a cosine-alignment regularizer derived from batch-wise negative prototypes, which tightens the decision boundary against implausible candidates. Comprehensive experiments on four public TKG benchmarks demonstrate that NADEx delivers state-of-the-art performance.
💡 Research Summary
The paper tackles the problem of temporal knowledge graph (TKG) extrapolation, i.e., forecasting future facts beyond the observed time horizon. While deterministic embedding methods have dominated early work, recent diffusion‑based approaches have introduced stochastic modeling but suffer from two critical shortcomings: (i) they condition the generative diffusion path solely on positive triples, ignoring the discriminative signal provided by negative examples, and (ii) they rely on generic cross‑entropy ranking losses that improve candidate ordering but do not enforce a calibrated separation between plausible and implausible entities.
To address these gaps, the authors propose NADEx (Negative‑Aware Diffusion model for TKG Extrapolation). The method first constructs subject‑centric histories for each query (s, r, ?, t) as three aligned sequences of past objects, relations, and time intervals. These sequences are embedded via learnable matrices (Eₒ, Eᵣ, E_Δt) and fed into a Transformer‑based denoiser.
In the forward diffusion stage, Gaussian noise is added not only to the target object embedding oₜ but also to a batch‑wise negative prototype o₋ₜ. The negative prototype is obtained by averaging the embeddings of all other target entities in the same mini‑batch, thereby providing a compact representation of “what is not the answer.” Both noisy positive and negative embeddings evolve through M diffusion steps under a linear α schedule, with a global scaling factor δ controlling overall diffusion strength.
During reverse diffusion, the Transformer predicts the mean µ_θ and covariance Σ_θ of the denoised embedding conditioned on the current noisy state, the relation, the timestamp, and the diffusion step. The process iteratively refines the noisy vector back to a clean representation of either the true object or the negative prototype.
Training combines two objectives: (1) a standard cross‑entropy reconstruction loss that encourages the denoised embedding to match the true target, and (2) a novel cosine‑alignment regularizer that minimizes the cosine similarity between the reconstructed embedding and the batch‑wise negative prototype. This regularizer explicitly widens the angular margin between positive and negative candidates, sharpening the decision boundary and yielding a more calibrated predictive distribution.
Extensive experiments on four public TKG benchmarks—ICEWS‑14, ICEWS‑05‑15, GDELT, and WIKI‑KG—show that NADEx consistently outperforms prior state‑of‑the‑art methods, including DiffuTKG, DPCL‑Diff, and LLM‑DR, across MRR and Hits@1/3/10 metrics. Ablation studies confirm that both the negative‑aware diffusion and the cosine‑alignment regularizer contribute substantially to performance gains; removing either component leads to noticeable drops in accuracy and calibration.
The paper also discusses computational efficiency: the batch‑wise negative prototype is computed with negligible overhead, and the Transformer denoiser scales similarly to existing diffusion models. A limitation noted is that batch size is dictated by the number of events at each timestamp, which can be small for sparse time steps, potentially reducing the diversity of negative prototypes. Future work may explore dynamic batching, multi‑level negative prototypes (e.g., per relation or time segment), and integration with larger language‑model backbones for richer temporal reasoning.
Overall, NADEx advances TKG extrapolation by (i) incorporating negative context directly into the diffusion process to reduce predictive variance, (ii) enforcing an explicit angular margin via cosine alignment to improve discriminability, and (iii) leveraging a Transformer‑based denoiser to capture complex temporal‑relational dependencies. The result is a diffusion‑based TKG model that delivers both higher accuracy and better calibrated uncertainty estimates, setting a new benchmark for stochastic TKG forecasting.
Comments & Academic Discussion
Loading comments...
Leave a Comment