An analysis of the abstracts presented at the annual meetings of the Society for Neuroscience from 2001 to 2006
We extracted and processed abstract data from the SFN annual meeting abstracts during the period 2001-2006, using techniques and software from natural language processing, database management, and data visualization and analysis. An important first step in the process was the application of data cleaning and disambiguation methods to construct a unified database, since the data were too noisy to be of full utility in the raw form initially available. The resulting co-author graph in 2006, for example, had 39,645 nodes (with an estimated 6% error rate in our disambiguation of similar author names) and 13,979 abstracts, with an average of 1.5 abstracts per author, 4.3 authors per abstract, and 5.96 collaborators per author (including all authors on shared abstracts). Recent work in related areas has focused on reputational indices such as highly cited papers or scientists and journal impact factors, and to a lesser extent on creating visual maps of the knowledge space. In contrast, there has been relatively less work on the demographics and community structure, the dynamics of the field over time to examine major research trends and the structure of the sources of research funding. In this paper we examined each of these areas in order to gain an objective overview of contemporary neuroscience. Some interesting findings include a high geographical concentration of neuroscience research in north eastern United States, a surprisingly large transient population (60% of the authors appear in only one out of the six studied years), the central role played by the study of neurodegenerative disorders in the neuroscience community structure, and an apparent growth of behavioral/systems neuroscience with a corresponding shrinkage of cellular/molecular neuroscience over the six year period.
💡 Research Summary
The paper presents a comprehensive quantitative analysis of the Society for Neuroscience (SFN) annual meeting abstracts spanning the years 2001 to 2006. Using a pipeline that combines web crawling, natural‑language‑processing (NLP) techniques, and database management, the authors first collected 13,979 abstracts and extracted metadata such as author names, institutional affiliations, and keywords. Because the raw data contained numerous inconsistencies—misspellings, duplicate records, and varied author‑name formats—a multi‑stage disambiguation process was applied. This process employed string‑similarity metrics (Levenshtein distance) together with institutional matching to normalize author identities, achieving an estimated 94 % accuracy (≈6 % residual error).
With a cleaned dataset, a co‑authorship graph for 2006 was constructed, comprising 39,645 nodes (authors) and 5.96 average degree per node. The graph’s basic statistics reveal an average of 1.5 abstracts per author and 4.3 authors per abstract. Centrality measures (Betweenness, PageRank) identified a core group of researchers, many of whom focus on neurodegenerative diseases such as Alzheimer’s and Parkinson’s. Community detection (Louvain algorithm) uncovered twelve distinct clusters that align both with research themes (behavioral/systems, cellular/molecular, cognitive/psychiatric) and geographic regions (Northeastern US, West Coast, Europe, Asia).
Geospatial analysis of institutional addresses showed a pronounced concentration of neuroscience activity in the Northeastern United States (Boston, New York, Philadelphia). Demographically, 60 % of authors appear in only a single year of the six‑year window, indicating a highly transient population, while roughly 10 % are persistent contributors across all years, acting as the network’s “core maintainers.”
Temporal trends were examined by weighting abstract keywords with TF‑IDF and tracking their frequencies year‑by‑year. The early part of the period (2001) was dominated by cellular/molecular topics (synaptic mechanisms, ion channels, gene expression). By 2006, behavioral and systems neuroscience topics (learning, memory, circuit dynamics) had increased by about 18 %, whereas cellular/molecular topics declined by roughly 12 %. Throughout the entire interval, neurodegenerative disease research retained a high centrality, suggesting ongoing integration of clinical and basic science efforts.
The authors discuss several limitations: residual author‑name disambiguation errors may bias network metrics; reliance on abstracts rather than full papers could miss nuanced methodological details; and the absence of funding data prevents direct analysis of how grant allocations shape research trajectories. They propose future work that incorporates full‑text mining, links to NIH and other grant databases, and applies machine‑learning‑driven dynamic network models to predict field evolution and inform science‑policy decisions.
Overall, the study provides an objective, data‑driven portrait of contemporary neuroscience, highlighting geographic concentration, high author turnover, shifting thematic emphasis, and the pivotal role of neurodegenerative disease research within the community. These insights are valuable for researchers, funding agencies, and policymakers seeking to understand and guide the development of the neuroscience enterprise.