대형 언어 모델이 소수 인수 분해 트리 시퀀스의 규칙성을 학습할 수 있을까

Reading time: 5 minute
...

📝 Abstract

We study whether a Large Language Model can learn the deterministic sequence of trees generated by the iterated prime factorization of the natural numbers. Each integer is mapped into a rooted planar tree and the resulting sequence NT defines an arithmetic text with measurable statistical structure. A transformer network (the GPT-2 architecture) is trained from scratch on the first 10 11 elements to subsequently test its predictive ability under next-word and masked-word prediction tasks. Our results show that the model partially learns the internal grammar of NT , capturing non-trivial regularities and correlations. This suggests that learnability may extend beyond empirical data to the very structure of arithmetic.

💡 Analysis

We study whether a Large Language Model can learn the deterministic sequence of trees generated by the iterated prime factorization of the natural numbers. Each integer is mapped into a rooted planar tree and the resulting sequence NT defines an arithmetic text with measurable statistical structure. A transformer network (the GPT-2 architecture) is trained from scratch on the first 10 11 elements to subsequently test its predictive ability under next-word and masked-word prediction tasks. Our results show that the model partially learns the internal grammar of NT , capturing non-trivial regularities and correlations. This suggests that learnability may extend beyond empirical data to the very structure of arithmetic.

📄 Content

Testing Transformer Learnability on the Arithmetic Sequence of Rooted Trees Alessandro Breccia∗1, Federica Gerace1, Marco Lippi2, Gabriele Sicuro1, and Pierluigi Contucci1 1Department of Mathematics, University of Bologna, Italy 2Department of Information Engineering, University of Florence, Italy December 2, 2025 Abstract We study whether a Large Language Model can learn the deterministic sequence of trees generated by the iterated prime factorization of the natural numbers. Each integer is mapped into a rooted planar tree and the resulting sequence NT defines an arithmetic text with measurable statistical structure. A transformer network (the GPT-2 architecture) is trained from scratch on the first 1011 elements to subsequently test its predictive ability under next-word and masked-word prediction tasks. Our results show that the model partially learns the internal grammar of NT , capturing non-trivial regularities and correlations. This suggests that learnability may extend beyond empirical data to the very structure of arithmetic. 1 Introduction Prime factorization, the decomposition of a natural number into its constituent primes, lies at the crossroads of arithmetic, complexity theory, and computational practice. While every integer admits a unique factorization, the operational effort required to obtain it grows quickly with its magnitude. State-of-the-art algorithms achieve remarkable perfor- mance for moderately large inputs, yet their complexity escalates rapidly when confronted with truly large instances. Moreover, in this limit, the sequence of integers with known prime factorizations becomes effectively sparse, with regions where the factorizations of intermediate values are computationally inaccessible. It is therefore natural to ask whether modern machine learning methods, and more specifically Large Language Models (LLMs), can offer any advantages from this perspective. ∗alessandro.breccia2@unibo.it 1 arXiv:2512.01870v1 [cs.AI] 1 Dec 2025 In this work we start from the central observation that the sequence of prime factorizations, when iterated to the exponents, can be converted into an arithmetic text: each integer can be mapped to a rooted tree (see Eq. 1) encoding its multiplicative and exponential prime structure, or equivalently represented as a Dyck word, namely a balanced binary string. The resulting infinite sequence, denoted by NT , is a deterministic text, a symbolic unfolding where each structural “word” appears infinitely many times, much like recurring syntactic motifs in natural language that reemerge across different sentences. In fact the statistical and some grammatical properties of NT have been studied in previous works, revealing a self-organized hierarchy of symbolic units, a sublinear growth of the dictionary, and long-range correlations that recall several features of natural lan- guages (Contucci et al. 2025); (Conti and Contucci 2025); (Conti, Contucci, and Iudele- vich 2024); (Conti, Contucci, and Iudelevich 2025). These observations suggest that the arithmetic text possesses an internal grammar, an emergent syntax rooted in the structure of the integers themselves. Machine–learning approaches to arithmetic sequences have been investigated in numer- ous settings. Neural networks applied to the primes (He 2018) showed limited ability to reproduce their distribution, and theoretical arguments based on Kolmogorov–complexity (Kolpakov and Rocke 2023); (Kolpakov and Rocke 2024) suggest that the prime indica- tor function is not compressible within standard statistical frameworks. Related work on modular classification of integers (Jian Wu et al. 2023) shows that high accuracy is ob- tained only when externally provided arithmetic features are incorporated into the data representation. Attempts to predict prime gaps or infer prime ratios using sequential or dense architec- tures (Pylov, Maitak, and Protodyakonov 2023); (Blake 2023) demonstrate local predictive ability over restricted ranges, but accuracy degrades as size increases and no structural in- ference emerges. Early neural factorization experiments (Jansen and Nakayama 2005) concluded that numerical encodings behave as noise beyond superficial correlations, and this has been confirmed with modern models (Nene and Uludag 2022), where degradation in performance with bit length strongly suggests that structural information is absent from raw inputs. More advanced diffusion–based refinement techniques (Freivalds, Ozoli¸nˇs, and B¯arzdinˇs 2023) have produced improved numerical results on limited ranges, but scaling issues remain. Machine–learning methods have also been used to approximate the M¨obius function and related arithmetic predicates (Qin and Ye 2024); (Lee and Kim 2024), often achiev- ing high accuracy, but only when the input includes explicit structured information such as modular reductions or sparse encodings, indicating that the models exploit externally supplied arithmetic descriptors rather than di

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut