How do OSS projects change in number and size? A large-scale analysis to test a model of project growth

How do OSS projects change in number and size? A large-scale analysis to   test a model of project growth
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Established Open Source Software (OSS) projects can grow in size if new developers join, but also the number of OSS projects can grow if developers choose to found new projects. We discuss to what extent an established model for firm growth can be applied to the dynamics of OSS projects. Our analysis is based on a large-scale data set from SourceForge (SF) consisting of monthly data for 10 years, for up to 360'000 OSS projects and up to 340'000 developers. Over this time period, we find an exponential growth both in the number of projects and developers, with a remarkable increase of single-developer projects after 2009. We analyze the monthly entry and exit rates for both projects and developers, the growth rate of established projects and the monthly project size distribution. To derive a prediction for the latter, we use modeling assumptions of how newly entering developers choose to either found a new project or to join existing ones. Our model applies only to collaborative projects that are deemed to grow in size by attracting new developers. We verify, by a thorough statistical analysis, that the Yule-Simon distribution is a valid candidate for the size distribution of collaborative projects except for certain time periods where the modeling assumptions no longer hold. We detect and empirically test the reason for this limitation, i.e., the fact that an increasing number of established developers found additional new projects after 2009.


💡 Research Summary

This paper investigates the dynamics of open‑source software (OSS) projects and developers by testing whether a classic economic model of firm growth can describe OSS communities. Using a large‑scale dataset from SourceForge, the authors collected monthly snapshots from January 2003 to June 2012, covering up to 360 000 projects, 340 000 developers, and 576 000 developer‑project links. After cleaning the data (removing corrupted months, autopurge effects, and automatically generated “personal” projects after 2010), they performed an aggregated analysis.

Both the total number of projects (Np), developers (Nd), and links (K) exhibit exponential growth, following the law of proportional growth (ΔX/Δt = ωX). Growth rates are statistically significant: ω≈1.30 % for links, 1.27 % for developers, and 1.54 % for projects. Notably, the project growth rate accelerates after 2010 (from 1.33 % to 1.81 %), while the link growth rate remains stable, indicating that the network becomes sparser. This acceleration coincides with a sharp rise in single‑developer projects, while multi‑developer project creation stays roughly constant.

Programming‑language analysis (limited to ~40 % of projects) shows that seven languages (C, C#, C++, Java, JavaScript, PHP, Python) dominate. Over the decade, C’s share drops from 25 % to 15 %, whereas Java, C#, and scripting languages gain ground. All languages display an increasing proportion of single‑developer projects, with C#, PHP, Python, and JavaScript exceeding 70 % single‑developer share by 2012, suggesting a preference for lightweight, script‑based development in solo contexts.

The core theoretical contribution adapts Simon’s model of firm entry and proportional growth to OSS. The model assumes that each newly entering developer either (with probability p) founds a new project or (with probability 1‑p) joins an existing project chosen uniformly at random. Under these assumptions, the size distribution of collaborative (multi‑developer) projects follows a Yule‑Simon distribution, characterized by a power‑law tail f(x) ∝ x⁻γ. The authors estimate γ via an Expectation‑Maximization (EM) algorithm and validate the fit using Kolmogorov‑Smirnov tests, QQ‑plots, and likelihood comparisons. For the period 2003‑2009 the Yule‑Simon model fits the empirical size distribution well.

However, after 2009 the fit deteriorates. Empirical analysis reveals that the probability p is no longer constant: many established developers begin to launch additional projects, inflating the number of single‑developer projects and violating the model’s entry assumption. Consequently, the observed size distribution deviates from the Yule‑Simon prediction during 2010‑2012. The authors discuss this limitation and suggest extending the model to allow a time‑varying p(t) or to incorporate heterogeneous developer behavior.

Network‑theoretic analysis treats the OSS ecosystem as a bipartite graph of developers and projects. Links are unweighted and represent a developer’s registration to a project within a given month. Projections onto the developer layer (co‑participation) and the project layer (shared developers) reveal that larger projects attract more newcomers, confirming a “network effect” consistent with proportional growth.

In summary, the study demonstrates that classic firm‑growth models can capture OSS community dynamics under certain conditions, especially when developer entry behavior is homogeneous. The observed shift around 2010 highlights the importance of accounting for evolving developer strategies, such as multi‑project participation. Future work should incorporate newer platforms (e.g., GitHub), model heterogeneous entry probabilities, and explore the impact of developer motivation and project lifecycles on growth patterns.


Comments & Academic Discussion

Loading comments...

Leave a Comment