Multi-Ecosystem Modeling of OSS Project Sustainability

Multi-Ecosystem Modeling of OSS Project Sustainability
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many OSS projects join foundations such as Apache, Eclipse, and OSGeo, to aid their immediate plans and improve long-term prospects by getting governance advice, incubation support, and community-building mechanisms. But foundations differ in their policies, funding models, and support strategies. Moreover, since projects joining these foundations are diverse, coming at different lifecycle stages and having different needs, it can be challenging to decide on the appropriate project-foundation match and on the project-specific plan for sustainability. Here, we present an empirical study and quantitative analysis of the sustainability of incubator projects in the Apache, Eclipse, and OSGeo foundations, and, additionally, of OSS projects from GitHub outside of foundations. We develop foundation-specific sustainability models and a project triage, based on projects’ sociotechnical trace profiles, and demonstrate their effectiveness across the foundations. Our results show that our models with triage can effectively forecast sustainability outcomes not only within but across foundations. In addition, the generalizability of the framework allows us to apply the approach to GitHub projects outside the foundations. We complement our findings with actionable recovery strategies from previous work and apply them to case studies of failed incubator projects. Our study highlights the value of sociotechnical frameworks in characterizing and addressing software project sustainability issues.


💡 Research Summary

This paper investigates how open‑source software (OSS) projects can be assessed for long‑term sustainability across multiple ecosystem contexts, specifically the Apache Software Foundation (ASF), Eclipse Foundation (EF), OSGeo Foundation (OF), and projects hosted on GitHub that are not affiliated with any foundation. The authors collect extensive trace data for 329 ASF incubator projects, 161 EF incubator projects, 20 OF incubator projects, and 21 GitHub projects. Each project is labeled according to its final outcome: “Graduated” or “Retired” for foundation incubators, and “Successful” or “Unsuccessful” for GitHub projects.

The core methodological contribution is the construction of sociotechnical networks (STNs) that capture both technical activity (commits, pull‑request reviews, issue resolutions) and social interaction (mailing‑list discussions, issue comments, forum posts). From these networks the authors extract roughly 50 features, including node centralities, clustering coefficients, community structures, temporal growth rates, contributor diversity metrics, and release cadence indicators.

Using these features, the study builds foundation‑specific sustainability prediction models. Each model combines a multilayer perceptron (MLP) with a Long Short‑Term Memory (LSTM) component to handle static graph metrics and temporal dynamics respectively. Five‑fold cross‑validation yields high performance within each foundation: ASF models achieve 0.91 accuracy and 0.88 F1‑score; EF models 0.84/0.79; OF models 0.86/0.81. However, when a model trained on one foundation is applied to another, performance drops sharply, indicating that sustainability signals are not fully transferable across ecosystems.

To address this, the authors introduce OSS‑Prof, a project‑foundation routing classifier that predicts the most suitable foundation for a given OSS project based solely on its STN features. OSS‑Prof attains 0.87 routing accuracy. After routing, the appropriate foundation‑specific model is applied, restoring cross‑foundation prediction accuracy to 0.81.

The paper also evaluates the applicability of these models to non‑foundation GitHub projects. By first routing a GitHub project to the most similar foundation and then using that foundation’s sustainability model, the authors achieve 0.78 overall accuracy (F1‑score 0.74). In contrast, a model trained directly on GitHub success metrics performs poorly when used to predict foundation sustainability, confirming that “success” (stars, forks, short‑term activity) and “sustainability” (governance maturity, licensing compliance, long‑term contributor retention) are distinct constructs.

Feature importance analysis reveals ecosystem‑specific drivers: ASF sustainability correlates strongly with contributor diversity and mailing‑list activity; EF relies on regular releases and code‑review intensity; OF depends on GIS‑domain issue activity and external contributor participation. These findings map directly onto each foundation’s policy requirements (e.g., Apache’s IP compliance, Eclipse’s DP standards, OSGeo’s geospatial community focus).

Limitations include the relatively small GitHub sample, potential bias in labeling (especially the “Retired” status, which may represent a natural lifecycle conclusion rather than failure), and the black‑box nature of deep‑learning models. The authors suggest future work to incorporate additional foundations (e.g., Linux Foundation, CNCF), longer observation windows (5+ years), and explainable AI techniques (SHAP, LIME) to make the models more actionable for governance bodies.

In summary, the study demonstrates that sociotechnical network features are powerful predictors of OSS project sustainability, that foundation‑specific models outperform generic ones, and that a routing mechanism can effectively bridge the gap between diverse ecosystems. This provides project maintainers with data‑driven guidance for selecting an appropriate foundation and offers foundations insight into which community dynamics to nurture for long‑term project health.


Comments & Academic Discussion

Loading comments...

Leave a Comment