Self-organization of progress across the century of physics
We make use of information provided in the titles and abstracts of over half a million publications that were published by the American Physical Society during the past 119 years. By identifying all unique words and phrases and determining their monthly usage patterns, we obtain quantifiable insights into the trends of physics discovery from the end of the 19th century to today. We show that the magnitudes of upward and downward trends yield heavy-tailed distributions, and that their emergence is due to the Matthew effect. This indicates that both the rise and fall of scientific paradigms is driven by robust principles of self-organization. Data also confirm that periods of war decelerate scientific progress, and that the later is very much subject to globalization.
💡 Research Summary
The authors conduct a large‑scale “culturomics” study of physics by mining the titles and abstracts of over half a million papers published in the American Physical Society (APS) journals from July 1893 to October 2012. After extracting all unique lexical items—single words and multi‑word phrases up to four words—they compute the monthly relative frequency f(t) of each token, normalizing by the total number of papers published in that month for each journal and for the whole APS corpus. Common English stop‑words, numbers, formulas, and tokens that begin or end with stop‑words are filtered out, leaving a set of roughly 118 k single words, 3.3 M two‑word phrases, 13.3 M three‑word phrases, and 23.8 M four‑word phrases (≈40 M time series).
For each time series the authors apply a moving average and then search for windows of width w = 2, 4, 8, and 16 years where the linear fit yields the most positive (upward) and most negative (downward) slope x. The absolute values |x| are ranked, and the cumulative distribution P(|x|) is plotted. Across most APS journals the distributions display heavy tails: many tokens show only modest changes, while a few “trendsetters” exhibit very large upward or downward shifts. Using maximum‑likelihood estimation and Kolmogorov‑Smirnov goodness‑of‑fit tests, they find that for several journals (e.g., PR, PRI, PRL) the tail follows a power law P(x) ∝ x^{−α+1}. Other journals are better described by a power law with an exponential cutoff or by a stretched exponential, with parameters reported in Table I.
To test for the Matthew effect (preferential attachment), the authors calculate a trend rate r(f) = Δf/Δt for each token and examine its dependence on the token’s average frequency f. The relationship is approximately linear: the more frequently a word or phrase is used, the larger its expected upward momentum, and likewise the larger its expected downward drop during periods of decline. This confirms that popularity begets further popularity (or faster decay), mirroring the preferential attachment observed in citation networks and collaboration graphs.
Historical events leave clear imprints. During both World Wars the overall monthly output of APS papers drops by roughly an order of magnitude (from ~100 to <10 papers per month), demonstrating that large‑scale conflicts decelerate scientific dissemination. Geocoding author affiliations reveals a shift from early‑century US dominance to a more global landscape after the 1950s, with the Soviet collapse, the fall of the Berlin Wall, and the rise of China, Russia, Japan, Canada, and many European and South‑American countries contributing substantially by the 2010s. Per‑capita productivity highlights Switzerland, Israel, Denmark, Sweden, Slovenia, Finland, Germany, the Netherlands, France and Austria as the most efficient contributors.
Methodologically, the study is limited by incomplete abstract coverage (some APS journals lack abstracts), the brevity of many abstracts (which can inflate trends for generic words), and the imperfect separation of physics‑relevant tokens from linguistic noise. Moreover, not all heavy‑tailed distributions fit a pure power law; deviations are attributed to finite‑size effects, saturation, and sub‑linear attachment mechanisms. Nonetheless, the work demonstrates that massive textual corpora can be harnessed to quantify the rise and fall of scientific paradigms, showing that the same self‑organizing principles that govern citation accrual also drive the semantic evolution of physics over more than a century.
Comments & Academic Discussion
Loading comments...
Leave a Comment