Applying empirical software engineering to software architecture: challenges and lessons learned
In the last 15 years, software architecture has emerged as an important software engineering field for managing the development and maintenance of large, software- intensive systems. Software architecture community has developed numerous methods, techniques, and tools to support the architecture process (analysis, design, and review). Historically, most advances in software architecture have been driven by talented people and industrial experience, but there is now a growing need to systematically gather empirical evidence about the advantages or otherwise of tools and methods rather than just rely on promotional anecdotes or rhetoric. The aim of this paper is to promote and facilitate the application of the empirical paradigm to software architecture. To this end, we describe the challenges and lessons learned when assessing software architecture research that used controlled experiments, replications, expert opinion, systematic literature reviews, obser- vational studies, and surveys. Our research will support the emergence of a body of knowledge consisting of the more widely-accepted and well-formed software architecture theories.
💡 Research Summary
The paper addresses a critical gap in software architecture research: the lack of systematic empirical evidence supporting the myriad methods, techniques, and tools that have been introduced over the past fifteen years. While the field has historically advanced through practitioner experience and anecdotal success stories, the authors argue that a disciplined empirical approach is now essential for distinguishing truly effective practices from mere hype. To that end, the authors examine six major categories of empirical work that have been applied to software architecture—controlled experiments, replications, expert opinion studies, systematic literature reviews, observational studies, and surveys—and they document the specific challenges encountered in each category as well as the lessons learned from attempting to overcome those challenges.
In controlled experiments, the primary difficulty lies in the mismatch between the scale and complexity of academic testbeds and the realities of large‑scale industrial projects. This threatens external validity, making it hard to generalize findings. The authors recommend co‑designing experiments with industry partners, using real project artefacts, and openly sharing protocols and data to improve reproducibility. Replication studies suffer from insufficient reporting in the original work; without detailed experimental designs, data sets, and analysis scripts, reproducing results becomes a guessing game. The paper calls for mandatory open‑access repositories for all replication artefacts and a clear definition of success criteria before attempting a replication.
Expert opinion research is valuable for capturing tacit knowledge, yet it is vulnerable to selection bias and inconsistent elicitation methods. The authors suggest establishing transparent criteria for expert selection (e.g., years of experience, domain diversity) and employing structured techniques such as the Delphi method to reduce bias and improve consensus reliability. Systematic literature reviews (SLRs) face the problem of terminology heterogeneity in architecture research, which can cause relevant studies to be missed. The authors advocate for multi‑database searches, exhaustive keyword synonym lists, and adherence to PRISMA‑like reporting standards to ensure completeness and traceability.
Observational studies provide rich, context‑sensitive data but are limited by access constraints, observer effects, and difficulties in quantifying qualitative observations. The paper recommends blind observation protocols, pilot studies to refine coding schemes, and mixed‑methods analysis to translate observations into measurable constructs. Survey research, while scalable, often suffers from ambiguous question wording, low response rates, and non‑representative samples. The authors propose rigorous pilot testing, the use of validated Likert scales, incentives for participants, and stratified sampling to enhance validity and reliability.
Beyond methodological specifics, the authors present a conceptual framework for integrating empirical findings into a coherent body of software architecture theory. They introduce an “Evidence Hierarchy” that ranks empirical contributions from controlled experiments (highest) down to expert opinion (lowest), and they propose an “theory‑empiricism feedback loop” whereby emerging architectural theories are continuously tested, refined, or discarded based on new empirical data. This loop is intended to foster a self‑correcting knowledge ecosystem, moving the field away from anecdotal justification toward scientifically grounded principles.
Finally, the paper outlines a roadmap for researchers and practitioners: (1) embed industry collaboration early in study design; (2) make all raw data, analysis scripts, and experimental protocols publicly available; (3) structure results in a meta‑analytic friendly format; (4) strengthen peer‑review criteria for empirical rigor in conferences and journals; and (5) incorporate empirical methods into software architecture curricula to train the next generation of scholars.
In sum, the paper provides a comprehensive, practice‑oriented guide to applying empirical software engineering to software architecture. By cataloguing challenges, proposing concrete mitigation strategies, and offering a higher‑level integration framework, it aims to accelerate the emergence of a well‑validated, evidence‑based body of software architecture knowledge that can reliably inform both research and industry practice.
Comments & Academic Discussion
Loading comments...
Leave a Comment