A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation
ArXiv ID: 2512.02195
Date: 2025-12-01
Authors: - David Ph. Shakouri (Leiden University Centre for Linguistics, Leiden University) - Crit Cremers (Leiden University Centre for Linguistics, Leiden University) - Niels O. Schiller (City University of Hong Kong)

📝 Abstract

This paper presents an initial study performed by the MODOMA system. The MODOMA is a computational multi-agent laboratory environment for unsupervised language acquisition experiments such that acquisition is based on the interaction between two language models, an adult and a child agent. Although this framework employs statistical as well as rule-based procedures, the result of language acquisition is a knowledge-based language model, which can be used to generate and parse new utterances of the target language. This system is fully parametrized and researchers can control all aspects of the experiments while the results of language acquisition, that is, the acquired grammatical knowledge, are explicitly represented and can be consulted. Thus, this system introduces novel possibilities for conducting computational language acquisition experiments. The experiments presented by this paper demonstrate that functional and content categories can be acquired and represented by the daughter agent based on training and test data containing different amounts of exemplars generated by the adult agent. Interestingly, similar patterns, which are well-established for human-generated data, are also found for these machine-generated data. As the procedures resulted in the successful acquisition of discrete grammatical categories by the child agent, these experiments substantiate the validity of the MODOMA approach to modelling language acquisition.

💡 Deep Analysis

Deep Dive into A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation.

📄 Full Content

Computational Linguistics in the Netherlands Journal 14 (2025) 167-189 Submitted 12/2024; Published 07/2025 A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation David Ph. Shakouri∗,∗∗ d.p.shakouri@hum.leidenuniv.nl Crit Cremers∗ c.l.j.m.cremers@hum.leidenuniv.nl Niels O. Schiller∗,∗∗,∗∗∗ Niels.Schiller@cityu.edu.hk ∗Leiden University Centre for Linguistics (LUCL), Leiden University, the Netherlands ∗∗Leiden Institute for Brain and Cognition (LIBC), Leiden University, the Netherlands ∗∗∗City University of Hong Kong (CityU), Hong Kong Abstract This paper presents an initial study performed by the MODOMA system. The MODOMA is a computational multi-agent laboratory environment for unsupervised language acquisition experi- ments such that acquisition is based on the interaction between two language models, an adult and a child agent. Although this framework employs statistical as well as rule-based procedures, the result of language acquisition is a knowledge-based language model, which can be used to generate and parse new utterances of the target language. This system is fully parametrized and researchers can control all aspects of the experiments while the results of language acquisition, that is, the acquired grammatical knowledge, are explicitly represented and can be consulted. Thus, this sys- tem introduces novel possibilities for conducting computational language acquisition experiments. The experiments presented by this paper demonstrate that functional and content categories can be acquired and represented by the daughter agent based on training and test data containing dif- ferent amounts of exemplars generated by the adult agent. Interestingly, similar patterns, which are well-established for human-generated data, are also found for these machine-generated data. As the procedures resulted in the successful acquisition of discrete grammatical categories by the child agent, these experiments substantiate the validity of the MODOMA approach to modelling language acquisition. 1. Introduction This paper presents a study demonstrating the acquisition of grammatical knowledge by the MOD- OMA. The term MODOMA is an acronym for moeder-dochter-machine (Dutch for ‘mother-daughter- machine’). This framework is aimed at providing a language acquisition laboratory, that is, a sim- ulation environment for language acquisition experiments. On the one hand, all aspects of the system are parametrized so that the settings are user-controlled while on the other hand, all results and executed language acquisition procedures are logged and can be retrieved. The MODOMA integrates several characteristics that enable unique possibilities for language acquisition experi- ments. The MODOMA implements a multi-agent design modelling parent-child interaction such that both the parent and the child are language models in a single system. Both agents employ explicit representations of their grammatical knowledge, making the acquired knowledge and lan- guage processing retrievable. This explicit representation distinguishes the MODOMA system from other language learning systems, such as large language models (LLMs), which do not rely on such explicit knowledge structures. This combination of properties provides new opportunities for con- ducting computational experiments simulating first language acquisition. In a typical MODOMA experiment the input data to the language acquisition algorithm are interactively generated by the mother agent while the daughter agent constantly updates grammatical representations and takes ©2025 David Ph. Shakouri, Crit Cremers, and Niels O. Schiller. arXiv:2512.02195v1 [cs.CL] 1 Dec 2025 part in the interaction with the mother agent based on the currently acquired grammar. This design represents a novel approach as most often computational models of language acquisition are based on inputting corpora to the language acquisition procedures (e.g. Alishahi and Stevenson 2008, Alishahi and Chrupa la 2012, Conner et al. 2009, Matusevych et al. 2013). In this paper, rather than limiting the term ”language model” to the typical usage in current NLP, where it usually refers to neural models trained on a next-word prediction objective (e.g., transformer-based models like GPT, cf. Brown et al. 2020, Radford et al. 2018, and BERT, cf. Devlin et al. 2019), we use it in a broader sense, encompassing rule-based approaches, statistical methods, and artificial neural networks as all of these approaches are employed to address common tasks in NLP. This distinction is important because the mother agent in our multi-agent system relies on explicit linguistic rules, while language acquiring agent incorporates statistical methods to infer a rule-based model for a linguistic phenomenon. The study presented by this paper investigates whether the MODOMA daughter agent is able to acquire and represent functional and content categories by integrating an application

…(Full text truncated)…

📄 Read Full PDF on ArXiv