Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models (LLMs) are increasingly used in academic writing workflows, yet they frequently hallucinate by generating citations to sources that do not exist. This study analyzes 100 AI-generated hallucinated citations that appeared in papers accepted by the 2025 Conference on Neural Information Processing Systems (NeurIPS), one of the world’s most prestigious AI conferences. Despite review by 3-5 expert researchers per paper, these fabricated citations evaded detection, appearing in 53 published papers (approx. 1% of all accepted papers). We develop a five-category taxonomy that classifies hallucinations by their failure mode: Total Fabrication (66%), Partial Attribute Corruption (27%), Identifier Hijacking (4%), Placeholder Hallucination (2%), and Semantic Hallucination (1%). Our analysis reveals a critical finding: every hallucination (100%) exhibited compound failure modes. The distribution of secondary characteristics was dominated by Semantic Hallucination (63%) and Identifier Hijacking (29%), which often appeared alongside Total Fabrication to create a veneer of plausibility and false verifiability. These compound structures exploit multiple verification heuristics simultaneously, explaining why peer review fails to detect them. The distribution exhibits a bimodal pattern: 92% of contaminated papers contain 1-2 hallucinations (minimal AI use) while 8% contain 4-13 hallucinations (heavy reliance). These findings demonstrate that current peer review processes do not include effective citation verification and that the problem extends beyond NeurIPS to other major conferences, government reports, and professional consulting. We propose mandatory automated citation verification at submission as an implementable solution to prevent fabricated citations from becoming normalized in scientific literature.

💡 Research Summary

The paper “Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025” presents a systematic investigation of AI‑generated citation hallucinations that slipped through the peer‑review process of the 2025 Neural Information Processing Systems conference. The authors obtained a dataset from GPTZero, which had applied an automated Hallucination Check to 4,841 of the 5,290 accepted papers (≈ 91 % of the conference). The tool flagged citations that could not be verified via Google Scholar, PubMed, arXiv, CrossRef, or DOI lookup; human experts then confirmed 100 citations as genuine fabrications. These 100 citations were spread across 53 distinct papers, representing roughly 1 % of all accepted submissions despite each paper being examined by three to five domain experts.

The core contribution is a five‑category taxonomy of “failure modes” that describes how a fabricated citation deviates from legitimate scholarly practice:

Total Fabrication (TF) – all elements (authors, title, venue, identifiers) are invented. This dominates the sample (66 %).
Partial Attribute Corruption (PAC) – a mixture of real and invented components (e.g., real author names attached to a non‑existent title). Accounts for 27 %.
Identifier Hijacking (IH) – a valid DOI or arXiv ID is used, but the accompanying metadata (authors, title, venue) does not match the linked document, creating a false sense of verifiability. 4 %.
Semantic Hallucination (SH) – the title sounds plausible and domain‑appropriate but corresponds to no real work. 1 %.
Placeholder Hallucination (PH) – obvious generation artifacts such as “Firstname Lastname” or incomplete IDs. 2 %.

A striking finding is that every single hallucination exhibits at least one secondary failure mode, making them “compound” in nature. Secondary characteristics are heavily weighted toward Semantic Hallucination (63 % of all citations) and Identifier Hijacking (29 %). For example, a citation may be a total fabrication of authors and title while also embedding a real DOI; reviewers clicking the link see a legitimate paper and may assume the citation is correct, overlooking the mismatch. This dual‑layer deception attacks the two primary heuristics reviewers rely on: (a) surface‑level plausibility (recognizable author names, venue titles) and (b) the existence of a resolvable identifier.

Statistical analysis reveals a bimodal distribution of contamination severity. In 92 % of the affected papers, only one or two fabricated citations appear, suggesting modest AI assistance. In the remaining 8 %, papers contain four to thirteen fabricated citations, indicating heavy reliance on language‑model‑generated references. The authors argue that the probability of a paper containing fabricated citations rises non‑linearly with the extent of LLM use.

The paper also identifies a subtle “Contamination Inheritance” phenomenon: some hallucinations appear to be reproduced from already‑polluted training data. An example is a fabricated citation that previously existed in a pre‑print version of another paper; the LLM likely learned this erroneous pattern and regenerated it. This blurs the line between pure hallucination and the propagation of existing data errors.

In the discussion, the authors critique current peer‑review workflows for lacking systematic citation verification. Reviewers typically check references for relevance but do not routinely validate identifiers or cross‑check every citation against bibliographic databases. Consequently, sophisticated fabricated citations can evade detection, especially when they exploit multiple failure modes simultaneously.

To mitigate the problem, the authors propose a three‑pronged solution:

Automated citation verification at submission – mandatory checks that resolve DOIs, arXiv IDs, and URLs, and flag mismatches between metadata and the linked document.
Human verification of flagged citations – reviewers or editorial staff must examine any citation that fails the automated check before acceptance.
Community‑wide infrastructure – development of an open‑source API or service that conferences and journals can integrate, providing real‑time cross‑database validation and a “trust score” for each reference.

The paper concludes that without such safeguards, AI‑generated fabricated citations risk contaminating the scholarly citation graph, eroding reproducibility, and wasting researcher time. The authors call for immediate policy changes at major AI conferences, journals, and even governmental reporting bodies, emphasizing that the issue is not confined to NeurIPS but is a systemic threat to research integrity in the era of large language models.

Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025

💡 Research Summary

Comments & Academic Discussion

Leave a Comment