Collaboration networks from a large CV database: dynamics, topology and bonus impact
Understanding the dynamics of research production and collaboration may reveal better strategies for scientific careers, academic institutions and funding agencies. Here we propose the use of a large and multidisciplinar database of scientific curricula in Brazil, namely, the Lattes Platform, to study patterns of scientific production and collaboration. In this database, detailed information about publications and researchers are made available by themselves so that coauthorship is unambiguous and individuals can be evaluated by scientific productivity, geographical location and field of expertise. Our results show that the collaboration network is growing exponentially for the last three decades, with a distribution of number of collaborators per researcher that approaches a power-law as the network gets older. Moreover, both the distributions of number of collaborators and production per researcher obey power-law behaviors, regardless of the geographical location or field, suggesting that the same universal mechanism might be responsible for network growth and productivity.We also show that the collaboration network under investigation displays a typical assortative mixing behavior, where teeming researchers (i.e., with high degree) tend to collaborate with others alike. Finally, our analysis reveals that the distinctive collaboration profile of researchers awarded with governmental scholarships suggests a strong bonus impact on their productivity.
💡 Research Summary
This paper leverages the Brazilian Lattes Platform—a comprehensive, self‑maintained CV repository covering roughly 2.7 million researchers—to construct and analyze a large‑scale scientific collaboration network spanning three decades. By parsing each curriculum vitae, the authors extracted researcher identifiers, institutional addresses, fields of expertise, and bibliographic records. Duplicate papers were merged using a Damerau‑Levenshtein distance threshold (≤10 % of the maximum distance) applied to titles sharing the same year, author count, and initial letter, thereby minimizing name‑disambiguation errors that plague many bibliometric studies.
From the cleaned data, a bipartite researcher‑paper graph was built and then projected onto a weighted, undirected co‑authorship network. The final network (TCN) comprises 275 061 researchers (nodes) and 1 095 871 co‑authorship links (edges), with 90.4 % of nodes belonging to a giant component, indicating a highly interconnected scientific community across all eight CNPq‑defined disciplinary areas.
Growth dynamics reveal exponential increases in both the number of active researchers (s_r ∝ e^{0.139 t}) and collaborations (s_c ∝ e^{0.181 t}) over the last thirty years. Moreover, collaborations scale super‑linearly with researchers (s_c ∝ s_r^{1.31}), confirming that the network becomes denser as it expands—a pattern also observed in other empirical networks.
Degree distributions evolve from a power‑law with an exponential cutoff (P(k) ∝ k^{−γ_y} e^{−k/l_y}) toward a purer power‑law as the network ages: the cutoff length l_y grows faster than linearly while the exponent γ_y stabilizes. This suggests that early constraints on the number of collaborators relax over time, allowing highly connected “hub” researchers to emerge. Publication productivity follows a similar law (P(n) ∝ n^{−β_p} e^{−n/l_p}) with β_p ≈ 1.7 and l_p ≈ 157, confirming Lotka’s law in a modern, large‑scale context.
Structural analysis shows a relatively high clustering coefficient (C = 0.465) and a modest positive assortativity (r = 0.094), indicating that researchers tend to form tightly knit groups and that high‑degree nodes preferentially connect with other high‑degree nodes. The average nearest‑neighbor degree k_nn(k) rises logarithmically with k, reinforcing the assortative mixing pattern.
A focal point of the study is the impact of governmental scholarships (CNPq “bolsas”), which provide a bonus salary to distinguished researchers. The scholarship network (SCN) contains only 12 302 researchers (≈5 % of TCN) but accounts for 20 % of all publications. Scholars are on average five times more productive (papers per author) and have roughly four times more collaborators than non‑scholars. Their subgraph is more cohesive (94.6 % in the giant component) yet exhibits a lower clustering coefficient (C = 0.266), reflecting a broader reach across research groups.
Geographic analysis across Brazil’s 26 states and the Federal District shows that degree distributions are virtually identical, indicating a universal collaboration mechanism irrespective of location. However, the average number of collaborators per researcher scales with the number of researchers in a state as ⟨k⟩ ∝ N^{0.12}, a weak allometric relationship suggesting that larger scientific communities provide slightly more collaboration opportunities but do not fundamentally alter the network’s shape.
In summary, the Lattes database enables a detailed, individual‑level view of scientific production and collaboration. The authors demonstrate that Brazilian scientific collaboration exhibits universal growth laws (exponential expansion, power‑law degree and productivity distributions, assortative mixing) and that policy‑driven bonuses substantially amplify both productivity and network centrality. The work highlights the value of large, self‑reported CV repositories for scientometric research and provides a baseline for future studies on the dynamics of scientific ecosystems, including cross‑national comparisons and the modeling of policy interventions.
Comments & Academic Discussion
Loading comments...
Leave a Comment