Large-scale Complex IT Systems
This paper explores the issues around the construction of large-scale complex systems which are built as ‘systems of systems’ and suggests that there are fundamental reasons, derived from the inherent complexity in these systems, why our current software engineering methods and techniques cannot be scaled up to cope with the engineering challenges of constructing such systems. It then goes on to propose a research and education agenda for software engineering that identifies the major challenges and issues in the development of large-scale complex, software-intensive systems. Central to this is the notion that we cannot separate software from the socio-technical environment in which it is used.
💡 Research Summary
**
The paper argues that the rapid growth of large‑scale, software‑intensive systems—particularly those formed by independently owned and managed components—has outpaced the capabilities of traditional software engineering. Using the 2010 “Flash Crash” in U.S. equity markets as a vivid illustration, the authors show how algorithmic trading systems, each operated by separate firms, interacted in an unforeseen way that caused a market‑wide failure. This failure was not due to a coding bug but to emergent behavior arising from a coalition of systems that had no single owner, no unified design authority, and often competing or even hostile interests.
To capture this reality the authors introduce the term “coalition of systems,” distinguishing it from the more familiar “system‑of‑systems” (SoS) literature, which usually assumes a single organization integrates the constituent parts. In a coalition, components are voluntarily linked by agreed protocols, yet they retain autonomy, can be replaced unilaterally, and may change their behavior for political or economic reasons. Consequently, the three reductionist assumptions that underpin most software engineering—single ownership, rational technical decision‑making, and clearly bounded problems—are violated.
The paper further separates complexity into two categories. Inherent complexity stems from dynamic relationships among components (e.g., trust, contracts, real‑time data exchange) and from the evolving operational environment. These relationships are non‑deterministic, making system‑level properties emergent and unpredictable. Epistemic complexity arises from the limits of human knowledge and tooling; as system size grows, traceability between requirements, design, and tests becomes impractical, and existing analysis tools cannot keep pace. Traditional software engineering has been successful at reducing epistemic complexity through modularization, object‑oriented design, and test‑first approaches, but these techniques assume low inherent complexity and centralized control.
Given the inadequacy of reductionist methods for coalitions, the authors propose a research and education agenda focused on outward‑looking, interdisciplinary solutions. Key elements include:
- Dynamic, real‑time modeling and simulation of inter‑system interactions to explore “what‑if” scenarios and anticipate emergent failures.
- Protocol and interface design that explicitly accounts for political, economic, and trust considerations, not just technical correctness.
- Governance frameworks that provide soft authority (e.g., standards bodies, regulatory incentives) to align independently managed components without imposing a single owner.
- Multi‑disciplinary collaboration involving computer scientists, systems engineers, economists, sociologists, and policy experts, as exemplified by the US SEI Ultra‑Large Scale Systems (ULSS) consortium and the UK Large‑Scale Complex IT Systems (LSCITS) initiative.
- New analytical tools such as formal verification for emergent properties, adaptive infrastructure that can reconfigure in response to changing relationships, and metrics for measuring both inherent and epistemic complexity.
The authors stress that incremental improvements to existing methods will be insufficient; breakthrough research is needed, yet practical, incremental advances driven by real‑world problems should also be pursued. Ultimately, the paper calls for a paradigm shift in software engineering: from a focus on building isolated, well‑defined monoliths to engineering resilient, adaptable coalitions of systems that can safely operate in the complex socio‑technical ecosystems that dominate modern society.
Comments & Academic Discussion
Loading comments...
Leave a Comment