Determining Context Factors for Hybrid Development Methods with Trained Models

Selecting a suitable development method for a specific project context is one of the most challenging activities in process design. Every project is unique and, thus, many context factors have to be considered. Recent research took some initial steps towards statistically constructing hybrid development methods, yet, paid little attention to the peculiarities of context factors influencing method and practice selection. In this paper, we utilize exploratory factor analysis and logistic regression analysis to learn such context factors and to identify methods that are correlated with these factors. Our analysis is based on 829 data points from the HELENA dataset. We provide five base clusters of methods consisting of up to 10 methods that lay the foundation for devising hybrid development methods. The analysis of the five clusters using trained models reveals only a few context factors, e.g., project/product size and target application domain, that seem to significantly influence the selection of methods. An extended descriptive analysis of these practices in the context of the identified method clusters also suggests a consolidation of the relevant practice sets used in specific project contexts.

💡 Research Summary

The paper tackles the long‑standing problem of selecting an appropriate software development method for a given project context. While many studies have proposed hybrid or blended methods, they often overlook which contextual factors truly drive method and practice selection. To fill this gap, the authors conduct a large‑scale empirical investigation using the HELENA dataset, which contains 829 project records spanning a wide range of industries, sizes, domains, regulatory environments, and other project characteristics.

The research proceeds in four analytical stages. First, an exploratory factor analysis (EFA) reduces the original set of roughly twenty raw variables (e.g., project size, team size, target domain, regulatory constraints, risk level, customer pressure) to five latent factors. These factors are interpreted as “Project/Product Size,” “Target Application Domain,” “Regulatory/Standard Requirements,” “Team Capability & Experience,” and “Customer/Market Pressure.” The factor solution is validated through eigenvalue criteria, scree‑plot inspection, and factor loadings above 0.5.

Second, the authors fit a series of multivariate logistic regression models, one for each of the thirty‑plus development methods recorded in the dataset (including Scrum, XP, Waterfall, V‑Model, DevOps, Lean, etc.). The five latent factors serve as independent variables, and the binary outcome indicates whether a method was used in a given project. The regression results reveal that “Project/Product Size” and “Target Application Domain” are the most statistically significant predictors across the majority of methods (p < 0.01). For example, large‑scale, enterprise‑level projects have a high odds ratio for Waterfall, V‑Model, and RUP, whereas small, fast‑moving startups show strong positive coefficients for Scrum, XP, and DevOps. “Regulatory/Standard Requirements” is significant mainly for safety‑critical domains (medical, finance), while “Team Capability” and “Customer Pressure” have more modest, method‑specific effects.

Third, using the regression‑derived method‑selection probabilities, the authors construct a similarity matrix among methods and apply hierarchical agglomerative clustering (average linkage). This yields five “base clusters,” each comprising up to ten methods that share a common contextual profile. The clusters can be described as follows:

Cluster 1 – Large‑scale, regulation‑intensive projects (Waterfall, V‑Model, RUP, etc.).
Cluster 2 – Medium‑size, mixed‑domain projects (Scrum, Kanban, Scrumban).
Cluster 3 – Small, market‑driven, highly automated projects (DevOps, Lean, XP).
Cluster 4 – Innovation‑focused, research‑oriented projects (Design‑Thinking, Rapid‑Prototyping).
Cluster 5 – Maintenance‑and‑operations heavy environments (ITIL, Service‑Now).

Finally, the paper performs a descriptive analysis of the practices (e.g., continuous integration, test automation, sprint retrospectives) associated with each cluster. By mapping practices to clusters, the authors illustrate how a practitioner can quickly assemble a context‑aware practice set, or even design a custom hybrid method by mixing elements from different clusters while respecting the underlying factor profile.

Key contributions include: (1) a data‑driven identification of the most influential contextual factors for method selection; (2) a systematic clustering of methods that serves as a foundation for constructing hybrid development approaches; and (3) a consolidated view of practice bundles that are empirically linked to specific context clusters.

The study acknowledges several limitations. The HELENA dataset relies on self‑reported survey data, which may introduce respondent bias and over‑representation of certain industries (e.g., finance, healthcare). Logistic regression assumes linear relationships and may miss non‑linear interactions among factors. Moreover, the static snapshot does not capture temporal evolution of project contexts. Future work is suggested to explore non‑linear machine‑learning models (random forests, gradient boosting), incorporate time‑series data to model dynamic context changes, and validate the proposed hybrid method templates in real‑world case studies.

In sum, the paper provides a rigorous, statistically grounded framework for linking project context to development method choice, thereby offering practitioners a practical roadmap for tailoring or hybridizing methodologies to fit the unique demands of their projects.

💡 Research Summary

📜 Original Paper Content