A History of Cluster Analysis Using the Classification Societys Bibliography Over Four Decades

The Classification Literature Automated Search Service, an annual bibliography based on citation of one or more of a set of around 80 book or journal publications, ran from 1972 to 2012. We analyze here the years 1994 to 2011. The Classification Society’s Service, as it was termed, has been produced by the Classification Society. In earlier decades it was distributed as a diskette or CD with the Journal of Classification. Among our findings are the following: an enormous increase in scholarly production post approximately 2000; a very major increase in quantity, coupled with work in different disciplines, from approximately 2004; and a major shift also from cluster analysis in earlier times having mathematics and psychology as disciplines of the journals published in, and affiliations of authors, contrasted with, in more recent times, a “centre of gravity” in management and engineering.

💡 Research Summary

The paper conducts a comprehensive bibliometric and content analysis of the Classification Literature Automated Search Service (CLAS), an annual bibliography compiled by the Classification Society from 1972 to 2012. Focusing on the 1994‑2011 interval, the authors extract all records that cite any of roughly 80 core books or journals, clean the dataset by removing duplicates, standardizing author names and affiliations, and assigning each paper to a disciplinary category using a combination of the International Standard Classification (ISC) and a custom keyword mapping scheme.

Time‑series analysis of publication counts reveals a modest output of about 200 papers per year through the late 1990s, followed by a sharp increase after the turn of the millennium. By 2004 the annual volume exceeds 1,000 items, indicating a “publication explosion” that coincides with the rise of data‑intensive fields such as data mining, business intelligence, and web analytics. The authors attribute this surge to the diffusion of clustering techniques beyond their traditional strongholds in mathematics, statistics, and psychology into applied domains that demand large‑scale pattern discovery.

Disciplinary composition further underscores this shift. In the early period, mathematics/statistics (≈45 % of papers) and psychology/social sciences (≈30 %) dominate. After 2004, engineering/systems (≈35 %) and management/marketing (≈30 %) become the leading fields, while the share of pure mathematics declines. Institutional analysis shows a decreasing proportion of U.S. and European university affiliations and a growing presence of corporate research labs and consulting firms, especially in Asia and the Middle East. This pattern reflects the commercialization of clustering methods for product design, supply‑chain optimization, and customer segmentation.

Citation‑network analysis identifies the “core” intellectual nodes. Classic algorithmic papers such as Ward (1963) and MacQueen (1967) were central in the 1990s, but the 2000s see new hubs: variants of K‑means, hierarchical clustering extensions, spectral clustering, and deep‑learning‑based approaches. Moreover, the co‑authorship network becomes increasingly interdisciplinary, with multi‑discipline teams proliferating, indicating that modern clustering research is highly collaborative and application‑driven.

The discussion interprets these findings as evidence of a paradigm shift: clustering research has migrated from a primarily theoretical, methodological focus toward a problem‑solving, industry‑oriented orientation. The surge in management and engineering publications aligns with the demand for scalable, software‑enabled clustering tools, which in turn fuels further scholarly output.

Limitations are acknowledged. Because CLAS only indexes a curated set of ~80 sources, recent open‑access journals, conference proceedings, and emerging venues may be under‑represented, potentially biasing the observed trends. The disciplinary classification, reliant on keyword mapping, also introduces subjectivity. The authors propose future work that expands the data horizon via web‑scraping, applies machine‑learning‑based text mining for automatic field assignment, and integrates richer metadata (e.g., funding sources, citation contexts) to refine trend detection.

In conclusion, the 1994‑2011 bibliographic record demonstrates two salient transformations: a dramatic post‑2000 increase in scholarly production and a pronounced gravitation of clustering research toward management and engineering domains. These shifts signal that clustering techniques have matured into indispensable tools for contemporary data‑driven decision making, and they suggest that forthcoming research should prioritize algorithmic innovations that address real‑world scalability and domain‑specific constraints.

💡 Research Summary

📜 Original Paper Content