The emergence of numerical representations in communicating artificial agents

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Human languages provide efficient systems for expressing numerosities, but whether the sheer pressure to communicate is enough for numerical representations to arise in artificial agents, and whether the emergent codes resemble human numerals at all, remains an open question. We study two neural network-based agents that must communicate numerosities in a referential game using either discrete tokens or continuous sketches, thus exploring both symbolic and iconic representations. Without any pre-defined numeric concepts, the agents achieve high in-distribution communication accuracy in both communication channels and converge on high-precision symbol-meaning mappings. However, the emergent code is non-compositional: the agents fail to derive systematic messages for unseen numerosities, typically reusing the symbol of the highest trained numerosity (discrete), or collapsing extrapolated values onto a single sketch (continuous). We conclude that the communication pressure alone suffices for precise transmission of learned numerosities, but additional pressures are needed to yield compositional codes and generalisation abilities.

💡 Research Summary

The paper investigates whether the communicative pressures that shape human language are sufficient for artificial agents to develop numerical representations that resemble natural numerals. Two neural‑network agents are placed in a referential game: a sender sees an image containing a certain number of black dots (numerosity) and must convey this quantity to a receiver, which must select the matching image among distractors. Two communication modalities are explored. In the discrete condition, the sender uses an LSTM to emit a sequence of tokens drawn from a fixed vocabulary; in the continuous condition, the sender draws a fixed number of straight lines on a blank canvas, producing a sketch that is interpreted by the receiver through the same pretrained ViT visual encoder used for the dot images.

During training, agents are exposed to numerosity values 1–5 (with 700 examples per class) under two target‑image conditions: “Same” (both agents see the identical image) and “Diff” (different instances of the same quantity). A multi‑class hinge loss drives the sender–receiver pair to maximize the probability of selecting the correct target. Message‑length regularisation is added to create an information bottleneck. Results show that both channels achieve high in‑distribution accuracy (≈85‑95 %) and low conditional entropy, indicating that each numerosity is mapped to a near‑bijective, high‑precision signal. When length penalties are increased, messages become shorter with negligible loss in accuracy, demonstrating that precision and efficiency emerge purely from the joint pressure of coordination and bottleneck constraints.

Generalisation, however, fails dramatically. When tested on unseen numerosities (e.g., 6, 7, 8), agents in the discrete setting reuse the token sequence that was assigned to the largest trained numerosity (5). The continuous sketches exhibit a similar pattern: all larger, out‑of‑distribution quantities collapse onto a single sketch that closely resembles the one used for the highest trained numerosity. Interpolation within the trained range also shows no systematic compositional structure; messages are holistic labels rather than composable parts.

A further manipulation varies the frequency of training examples across classes (Uniform, Increase, Decrease). Despite these frequency biases, more frequent numbers do not acquire shorter or simpler codes. The authors attribute this to the absence of explicit production or perception costs beyond task accuracy, which in human languages drives Zipf‑like shortening of high‑frequency words.

The authors conclude that while a pure communication objective is enough to induce precise, arbitrary mappings for the numerosities that agents experience, it does not generate the compositionality, systematic generalisation, or frequency‑based economy characteristic of human numeral systems. They suggest that additional pressures—such as iterated learning, population‑level interaction, explicit message‑length penalties, or biases toward symbol reuse—may be required to push emergent protocols toward the kind of structured, recursive numeral systems observed across cultures. The work thus supports the view that human numerical language likely arose from a combination of transmission efficiency, economic constraints, and the need to generalise beyond the immediate training set, rather than from communication pressure alone.

The emergence of numerical representations in communicating artificial agents

💡 Research Summary

Comments & Academic Discussion

Leave a Comment