Dynamics of Human-AI Collective Knowledge on the Web: A Scalable Model and Insights for Sustainable Growth

Dynamics of Human-AI Collective Knowledge on the Web: A Scalable Model and Insights for Sustainable Growth
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Humans and large language models (LLMs) now co-produce and co-consume the web’s shared knowledge archives. Such human-AI collective knowledge ecosystems contain feedback loops with both benefits (e.g., faster growth, easier learning) and systemic risks (e.g., quality dilution, skill reduction, model collapse). To understand such phenomena, we propose a minimal, interpretable dynamical model of the co-evolution of archive size, archive quality, model (LLM) skill, aggregate human skill, and query volume. The model captures two content inflows (human, LLM) controlled by a gate on LLM-content admissions, two learning pathways for humans (archive study vs. LLM assistance), and two LLM-training modalities (corpus-driven scaling vs. learning from human feedback). Through numerical experiments, we identify different growth regimes (e.g., healthy growth, inverted flow, inverted learning, oscillations), and show how platform and policy levers (gate strictness, LLM training, human learning pathways) shift the system across regime boundaries. Two domain configurations (PubMed, GitHub and Copilot) illustrate contrasting steady states under different growth rates and moderation norms. We also fit the model to Wikipedia’s knowledge flow during pre-ChatGPT and post-ChatGPT eras separately. We find a rise in LLM additions with a concurrent decline in human inflow, consistent with a regime identified by the model. Our model and analysis yield actionable insights for sustainable growth of human-AI collective knowledge on the Web.


💡 Research Summary

The paper addresses the emerging reality that humans and large language models (LLMs) now co‑produce and co‑consume the same web‑based knowledge archives (e.g., Wikipedia, PubMed, GitHub). While this joint activity can accelerate the growth of shared knowledge, it also creates systemic risks such as quality dilution, human competence erosion, and model collapse. Existing work typically studies LLMs or human learning in isolation; a unified, dynamic view of the coupled human‑AI ecosystem is missing.

To fill this gap, the authors propose a minimal yet interpretable continuous‑time dynamical system with five state variables: archive size K(t) (number of distinct knowledge items), archive quality q(t) (average fidelity of items, bounded between 0 and 1), LLM skill θ(t) (a latent capacity measure that maps to base answer accuracy via a logistic “skill curve”), aggregate human skill H(t), and LLM query volume Q(t) (the smoothed demand for model answers). The system is defined by equations (1a)–(1e), each capturing a specific feedback loop:

  1. Content inflow – Human contributions arrive at rate α_H · H(t); LLM‑generated content arrives at rate α_A · Q(t) · g(a(t)), where a(t)=σ(θ(t))·q(t) is the current LLM answer accuracy and g(a) is a logistic gate function parameterized by a₀ (strictness) and κ_gate (sharpness). The gate models platform moderation (editors, peer review, community standards).

  2. Quality dynamics – The average quality q(t) drifts toward the intrinsic human‑only benchmark q_H when humans add material, and toward the current LLM accuracy a(t) when LLM content is admitted. The drift magnitude scales with the respective contribution rates and is inversely proportional to the archive size K(t), reflecting the diminishing marginal impact of new items in a large corpus. A decay term δ_q · q(t) captures gradual obsolescence.

  3. LLM skill evolution – θ(t) evolves via two additive terms: (i) supervised pre‑training on the archive, pulling θ toward a scaling‑law target θ* (K,q) = θ_max · ln(1+K)/ln(1+K_max) · q; (ii) reinforcement learning from human feedback (RLHF), modeled as η_RLHF · RLHF(θ,Q) = η_RLHF ·


Comments & Academic Discussion

Loading comments...

Leave a Comment