Autoregressive Models for Knowledge Graph Generation
Knowledge Graph (KG) generation requires models to learn complex semantic dependencies between triples while maintaining domain validity constraints. Unlike link prediction, which scores triples independently, generative models must capture interdependencies across entire subgraphs to produce semantically coherent structures. We present ARK (Auto-Regressive Knowledge Graph Generation), a family of autoregressive models that generate KGs by treating graphs as sequences of (head, relation, tail) triples. ARK learns implicit semantic constraints directly from data, including type consistency, temporal validity, and relational patterns, without explicit rule supervision. On the IntelliGraphs benchmark, our models achieve 89.2% to 100.0% semantic validity across diverse datasets while generating novel graphs not seen during training. We also introduce SAIL, a variational extension of ARK that enables controlled generation through learned latent representations, supporting both unconditional sampling and conditional completion from partial graphs. Our analysis reveals that model capacity (hidden dimensionality >= 64) is more critical than architectural depth for KG generation, with recurrent architectures achieving comparable validity to transformer-based alternatives while offering substantial computational efficiency. These results demonstrate that autoregressive models provide an effective framework for KG generation, with practical applications in knowledge base completion and query answering.
💡 Research Summary
This paper introduces a novel approach to Knowledge Graph (KG) generation through autoregressive modeling, addressing the limitations of traditional link prediction methods. The core problem is defined as learning to generate complete, semantically valid subgraphs that satisfy domain-specific constraints (e.g., type consistency, temporal validity) collectively, rather than scoring individual triples independently.
The authors propose ARK (Auto-Regressive Knowledge Graph Generation), a family of models that treat a KG as a sequence of (head, relation, tail) triples. ARK employs a Gated Recurrent Unit (GRU)-based decoder to generate graphs token-by-token in an autoregressive manner. A key pre-processing step involves randomizing the order of triples within each graph during training. This forces the model to learn order-invariant semantic constraints inherent to the graph structure itself, rather than memorizing positional patterns.
Building upon ARK, the paper presents SAIL (Sequential Auto-Regressive Knowledge Graph Generation with Latents), a variational extension that incorporates a Variational Autoencoder (VAE) framework. SAIL uses a multi-layer perceptron (MLP) encoder to map an input graph sequence into parameters of a latent distribution. The sampled latent code z is then broadcast to all time steps of the GRU decoder, conditioning the entire generation process. This architecture enables three modes: unconditional sampling from the prior, conditional completion from a partial graph, and controlled generation via latent space manipulation (e.g., interpolation).
The models are evaluated on the IntelliGraphs benchmark, which comprises three synthetic datasets with algorithmically verifiable semantics (testing path structures, type constraints, and temporal interval reasoning) and two real-world, Wikidata-derived datasets (movie and academic publication domains) with complex relational patterns and large vocabularies.
The experimental results demonstrate high effectiveness. ARK and SAIL achieve semantic validity scores ranging from 89.2% to 100.0% across the diverse datasets. A significant finding from the ablation studies is that model capacity—specifically hidden dimensionality (>=64)—is more critical for performance than architectural depth (number of layers). Furthermore, the paper shows that recurrent architectures (single-layer GRUs) can achieve validity comparable to deeper transformer-based alternatives while offering substantial computational efficiency, suggesting that KG generation tasks may not require the extreme sequence modeling capacity needed for other NLP domains.
The main contributions of the work are: 1) Introducing ARK, an autoregressive approach that learns implicit semantic constraints for KG generation without explicit rule supervision; 2) Proposing SAIL, a variational extension enabling controlled generation via latent representations; 3) Providing empirical evidence that model capacity outweighs depth, and that efficient RNNs can match transformer performance for this task; 4) Releasing code and models to establish baselines for future research. This work establishes autoregressive models as a powerful and practical framework for KG generation, with direct applications in knowledge base completion, query answering, and data augmentation.
Comments & Academic Discussion
Loading comments...
Leave a Comment