The Rise of Null Hypothesis Significance Testing (NHST): Institutional Massification and the Emergence of a Procedural Epistemology

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

It has long been a puzzle why, despite sustained reform efforts, many applied scientific fields remain dominated by Null Hypothesis Significance Testing (NHST), a framework that dichotomizes study results and privileges “statistically significant” findings. This paper examines that puzzle by situating the development and rise of NHST within its historical and institutional context. Taking Actor-Network Theory as a point of entry, the analysis identifies the conditions under which particular inferential technologies stabilize and endure. The analysis shows that, although NHST does not resolve the technical problem of statistical inference, it came to dominate as a social technology that addressed the most pressing institutional challenge of the postwar period: the mass expansion of scientific networks. Under conditions of rapid institutional growth, NHST’s technical slippages–purging research context and replacing epistemic judgment with mechanical procedures–became functional features rather than flaws. These features enabled procedural self-sufficiency across settings marked by heterogeneous goals and uneven expertise, thereby sealing NHST’s position as the obligatory passage point in many postwar scientific fields.

💡 Research Summary

The paper tackles the enduring dominance of Null Hypothesis Significance Testing (NHST) in applied research by situating its rise within the post‑World‑II expansion of scientific institutions. While the p‑value threshold (p ≤ 0.05) has become a de‑facto currency for journal editors, reviewers, funding agencies, and media, the authors argue that this persistence cannot be explained solely by statistical misconceptions or inadequate education. Instead, they adopt an Actor‑Network Theory (ANT) framework to trace how two historically distinct inferential traditions—Fisher’s significance testing and Neyman‑Pearson’s hypothesis‑testing paradigm—were fused into a single procedural technology that met the needs of a rapidly massifying research ecosystem.

The historical section reconstructs Fisher’s 1922 “On the Mathematical Foundations of Theoretical Statistics,” emphasizing his focus on specification, estimation, and the derivation of sampling distributions. Fisher’s approach relied on expert judgment to choose a population model and then used sample data to estimate parameters, with the sampling distribution serving as a bridge between the two. In parallel, Neyman and Pearson introduced a decision‑theoretic framework that formalized Type I and Type II error rates, power, and the concept of an “acceptance region.” Although logically incompatible—Fisher’s tests were inferentially exploratory, while Neyman‑Pearson’s were prescriptive—the post‑war surge in university enrollments, government research funding, and the creation of large, heterogeneous research networks created a demand for a method that could be applied uniformly across disciplines, institutions, and levels of expertise.

Using ANT’s concepts of “black‑boxing” and “obligatory passage points,” the authors show that NHST became a black box that concealed its underlying assumptions (e.g., random sampling, independence, correct model specification) and offered a simple, mechanistic decision rule: if the p‑value falls below 0.05, the result is “significant.” This rule functioned as an obligatory passage point for a wide array of actors—researchers, journal editors, peer reviewers, funding bodies, and policy makers—because it allowed them to bypass detailed contextual judgments and rely on a standardized procedural output. The paper coins the term “procedural self‑efficiency” to describe how NHST provided procedural self‑sufficiency: it could travel across settings without repeated epistemic renegotiation, thereby supporting the scalability required by the massive expansion of scientific activity.

The authors argue that the very “technical slippages” of NHST—its tendency to purge contextual information, its reliance on a single numeric threshold, and its insensitivity to effect size—were not flaws in this institutional context but functional features. They enabled rapid, uniform decision‑making, facilitated the evaluation of research productivity, and supported the construction of a common metric that could be easily communicated to non‑specialist audiences. Consequently, NHST entrenched itself as a social technology of “institutional massification,” analogous to Theodore Porter’s “technology of trust,” but focused on procedural uniformity rather than professional autonomy.

The paper then turns to contemporary debates: the replication crisis, widespread criticism of p‑hacking, and calls for statistical reform. While many reform efforts target individual cognition (e.g., better teaching of effect sizes, confidence intervals), the authors contend that such interventions overlook the structural role NHST plays in the research ecosystem. Because NHST is embedded in evaluation criteria, publication incentives, and funding decisions, merely improving statistical literacy will not dismantle its dominance. Effective reform must therefore address institutional incentives—introducing preregistration, rewarding transparent reporting, diversifying decision criteria beyond binary significance, and reshaping peer‑review norms.

In conclusion, the paper presents NHST as a “procedural epistemology”: a set of mechanical procedures that supplanted substantive epistemic judgment in the service of organizational efficiency and scalability. Its persistence is less a matter of statistical necessity and more a product of historical contingencies that aligned a flawed yet highly portable inferential technology with the needs of a rapidly expanding scientific infrastructure. The authors suggest that future reforms should be grounded in this sociotechnical understanding, targeting the institutional scaffolding that sustains NHST rather than focusing solely on individual statistical competence.

The Rise of Null Hypothesis Significance Testing (NHST): Institutional Massification and the Emergence of a Procedural Epistemology

💡 Research Summary

Comments & Academic Discussion

Leave a Comment