Revised comment on the paper titled 'The Origin of Quantum Mechanical Statistics: Insights from Research on Human Language

Reading time: 5 minute
...

📝 Original Info

  • Title: Revised comment on the paper titled ‘The Origin of Quantum Mechanical Statistics: Insights from Research on Human Language
  • ArXiv ID: 2512.07881
  • Date: 2025-11-27
  • Authors: ** Mikołaj Sienicki ∗, Krzysztof Sienicki † **

📝 Abstract

This short note comments on Aerts et al. [1] , which proposes that ranked word frequencies in texts should be read through the lens of Bose-Einstein (BE) statistics and even used to illuminate the origin of quantum statistics in physics. The core message here is modest: the paper offers an interesting analogy and an eye-catching fit, but several key steps mix physical claims with definitions and curve-fitting choices. We highlight three such points: (i) a normalization issue that is presented as "bosonic enhancement," (ii) an identification of rank with energy that makes the BE fit only weakly diagnostic of an underlying mechanism, and (iii) a baseline comparison that is too weak to support an ontological conclusion. We also briefly flag a few additional concerns (interpretation drift, parameter semantics, and reproducibility).

💡 Deep Analysis

Figure 1

📄 Full Content

Revised comment on the paper titled “The Origin of Quantum Mechanical Statistics: Insights from Research on Human Language” (arXiv preprint arXiv:2407.14924, 2024) Mikołaj Sienicki∗ Krzysztof Sienicki† December 10, 2025 Abstract This short note comments on Aerts et al. [1], which proposes that ranked word frequencies in texts should be read through the lens of Bose–Einstein (BE) statistics and even used to illuminate the origin of quantum statistics in physics. The core message here is modest: the paper offers an interesting analogy and an eye-catching fit, but several key steps mix physical claims with definitions and curve-fitting choices. We highlight three such points: (i) a normalization issue that is presented as “bosonic enhancement,” (ii) an identification of rank with energy that makes the BE fit only weakly diagnostic of an underlying mechanism, and (iii) a baseline comparison that is too weak to support an ontological conclusion. We also briefly flag a few additional concerns (interpretation drift, parameter semantics, and reproducibility). Keywords: Bose–Einstein statistics; Zipf’s law; rank–frequency; Zipf–Mandelbrot; statistical mechanics analogy; Hong–Ou–Mandel; model selection (AIC/BIC); count data likelihood; arXiv:2407.14924. 1 What the paper claims, in plain terms Aerts et al. [1] propose a mapping from a text to an “ideal gas” picture: word-types are treated as if they were particles occupying “energy levels,” where the level index is simply the word’s rank in the frequency table. A Bose–Einstein-shaped occupancy curve is then fitted to the rank–frequency list, and the quality of the fit is taken to support a stronger interpretation—that texts behave like a gas of indistinguishable bosons, and that this analogy may even shed light on why Bose–Einstein statistics appears in physics. ∗Polish-Japanese Academy of Information Technology, ul. Koszykowa 86, 02-008 Warsaw, Poland, European Union. †Chair of Theoretical Physics of Naturally Intelligent Systems (NIS), Lipowa 2/Topolowa 19, 05-807 Podkowa Leśna, Poland, European Union. 1 arXiv:2512.07881v1 [q-bio.NC] 27 Nov 2025 There is nothing wrong with exploratory analogies. The issue is that the paper repeatedly slides from “this curve fits” to “this is evidence for a specific physical mechanism.” The three points below explain why that slide is not justified by the present analysis. 2 Three core technical concerns 2.1 Normalization does not create a probability boost A central step argues that when two single-particle states are set equal inside a symmetrized two-boson expression, the state vector acquires a factor √ 2 and hence the squared norm becomes 2, which is then read as a doubling of the probability that two bosons occupy the same microstate [1]. But an overall scale factor of a ket is not a physical probability. Probabilities are computed from normalized states; rescaling a vector does not change physics. To be clear: (anti-)symmetrization can change joint detection statistics once an observable and a measurement scenario are specified, but the mistake is to read the norm of an unnormalized ket as a propensity. If the intended point is bosonic “bunching,” that phenomenon arises from interference in a specified measurement set-up (e.g. Hong–Ou–Mandel-type effects), not from treating the norm of an unnormalized ket as a probability [2]. 2.2 Rank-as-energy makes the BE fit only weakly diagnostic The “energy levels” used in the paper are defined by rank, Ei = i, (1) and the “total energy” is then defined as E = X i i N(Ei), (2) with N(Ei) the frequency (occupation) of the i-th ranked word-type [1]. These quantities are not measured constraints in the sense of statistical mechanics; they are constructed from the rank–frequency table by definition. For notational convenience, once (1) is adopted we write N(i) ≡N(Ei). As a result, a BE-shaped fit cannot be taken as evidence for BE physics unless the mapping is operationally justified and shown to be robust. One can also see why a BE curve can mimic familiar linguistic scaling when energy is identified with rank. With (1), the BE functional form reads N(i) = 1 Aei/B −1. (3) For i ≪B, ei/B = 1 + i/B + O((i/B)2), so N(i) ≈ 1 (A −1) + (A/B)i. (4) 2 If a fit yields A close to 1, then in the same small-i regime (4) is approximately Zipf–Mandelbrot-like. Written in a form that avoids denominator ambiguity, N(i) ≈ B A i + B(A −1). (5) Only when the offset term is negligible, i.e. when i ≫B(A −1) A and still i ≪B, (6) does (5) simplify further to an approximately Zipf-like scaling: N(i) ≈B i (in the window (6), with A ≈1). (7) In other words, one gets a Zipf-like window only when A is sufficiently close to 1 and there exists an intermediate range of ranks satisfying (6). This is not a refutation of the fit; it is a reminder that, under rank-as-energy, the BE form has enough flexibility to reproduce classical rank–frequency regularities over an intermediate range [3, 4]. For completeness, note that

📸 Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut