Generative Ontology: When Structured Knowledge Learns to Create

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Traditional ontologies describe domain structure but cannot generate novel artifacts. Large language models generate fluently but produce outputs lacking structural validity, hallucinating mechanisms without components, goals without end conditions. We introduce Generative Ontology, a framework synthesizing these complementary strengths: ontology provides the grammar; the LLM provides the creativity. Generative Ontology encodes domain knowledge as executable Pydantic schemas constraining LLM generation via DSPy signatures. A multi-agent pipeline assigns specialized roles: a Mechanics Architect designs game systems, a Theme Weaver integrates narrative, a Balance Critic identifies exploits, each carrying a professional “anxiety” that prevents shallow outputs. Retrieval-augmented generation grounds designs in precedents from existing exemplars. We demonstrate the framework through GameGrammar, generating complete tabletop game designs, and present three empirical studies. An ablation study (120 designs, 4 conditions) shows multi-agent specialization produces the largest quality gains (fun d=1.12, depth d=1.59; p<.001), while schema validation eliminates structural errors (d=4.78). A benchmark against 20 published board games reveals structural parity but a bounded creative gap (fun d=1.86): generated designs score 7-8 while published games score 8-9. A test-retest study (50 evaluations) validates the LLM-based evaluator, with 7/9 metrics achieving Good-to-Excellent reliability (ICC 0.836-0.989). The pattern generalizes beyond games. Any domain with expert vocabulary, validity constraints, and accumulated exemplars is a candidate for Generative Ontology.

💡 Research Summary

The paper introduces “Generative Ontology,” a framework that unites the rigor of traditional ontologies with the creative fluency of large language models (LLMs). Traditional ontologies excel at describing domain concepts, relationships, and constraints, but they are passive and cannot produce novel artifacts. Conversely, LLMs generate fluent, imaginative text but often suffer from “structural hallucination,” producing designs that lack coherent components, valid mechanisms, or proper end conditions. The authors propose to treat an ontology as a grammar and an LLM as a poet, thereby enabling structured yet creative generation.

The technical core consists of two steps. First, domain knowledge is encoded as executable Pydantic schemas. In the case study of tabletop games, a “GameOntology” schema defines required fields such as title, theme, goal, end condition, a list of mechanisms (restricted by an enum of known mechanism types), components, player dynamics, and optional metadata. The schema enforces type checking, minimum/maximum string lengths, and hierarchical nesting, guaranteeing that any generated artifact conforms to the ontology’s conceptual structure.

Second, the authors operationalize generation with DSPy (Declarative Self‑improving Python). DSPy treats LLM calls as typed functions (signatures) that declare inputs (e.g., desired theme, player count) and an output that must instantiate the Pydantic model. The signature’s docstring becomes the system prompt, automatically embedding field descriptions and the full schema into the prompt. At runtime DSPy (1) builds the prompt, (2) queries the LLM, (3) parses the response into a GameOntology instance, (4) validates it against the schema, and (5) either returns the validated object or triggers a retry. A Chain‑of‑Thought module can be added to make the model reason step‑by‑step before committing to structured output, improving coherence beyond raw text generation.

Recognizing that a single LLM call would have to juggle mechanisms, theme, components, balance, and player experience, the framework introduces a multi‑agent pipeline. The ontology is decomposed into sub‑domains, each assigned a specialized agent:

Mechanics Architect – generates the set of game mechanisms, limited to the enum defined in the ontology.
Theme Weaver – crafts narrative and setting, ensuring alignment with the chosen mechanisms.
Balance Critic – analyses interactions among mechanisms and components, flagging exploits or over‑powered strategies.
Validation Manager – performs additional logical checks beyond Pydantic validation (e.g., consistency between end condition and victory points).

Each agent is given a “professional anxiety” prompt that forces it to avoid shallow, generic answers and to produce depth‑oriented output. This mirrors human design teams where responsibilities are divided, but the process is fully automated.

The authors evaluate the approach through three empirical studies.

Ablation Study (120 designs, 4 conditions) – Designs were generated under: (a) baseline LLM, (b) LLM + schema validation, (c) LLM + multi‑agent specialization, and (d) full Generative Ontology. Results show the multi‑agent condition yields the largest gains in fun (Cohen’s d = 1.12) and strategic depth (d = 1.59), both p < .001. Schema validation alone eliminates structural errors dramatically (d = 4.78).
Benchmark Comparison (20 published board games) – Generated designs were scored against real games on structural parity, fun, tension/drama, and social interaction. Structural metrics were equivalent, but generated games scored 7–8 on fun versus 8–9 for published games, indicating a bounded creative gap.
Test‑Retest Reliability of LLM‑Based Evaluator (50 evaluations) – Nine quality metrics (e.g., originality, balance, thematic cohesion) were rated twice by the same LLM evaluator. Seven metrics achieved Good‑to‑Excellent reliability (intraclass correlation coefficients ranging from 0.836 to 0.989), confirming that the LLM can serve as a stable judge when calibrated with the ontology‑driven schema.

The paper argues that the pattern generalizes beyond games. Any domain possessing (i) a rich expert vocabulary, (ii) well‑defined validity constraints, and (iii) a corpus of exemplars can be modeled as an executable ontology, then used to guide LLM generation. Potential applications include medical protocol design, curriculum development, software architecture, and policy drafting.

Philosophically, the authors invoke Whitehead’s distinction between eternal objects (abstract patterns) and actual occasions (concrete instantiations), positioning the ontology as the “eternal object” and the LLM as the process that actualizes it. They claim that, just as grammar makes poetry possible, ontology makes structured generation possible.

In conclusion, Generative Ontology demonstrates that coupling formal knowledge representations with modern LLMs yields a system capable of producing structurally valid, creatively rich artifacts. The combination of Pydantic schemas, DSPy signatures, and a multi‑agent anxiety‑driven pipeline provides a reproducible, extensible blueprint for AI‑augmented design across diverse fields.

Generative Ontology: When Structured Knowledge Learns to Create

💡 Research Summary

Comments & Academic Discussion

Leave a Comment