Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems
As LLM-based multi-agent systems (MAS) become more autonomous, their free-form interactions increasingly dominate system behavior. However, scaling the number of agents often amplifies context pressure, coordination errors, and system drift. It is we…
Authors: Weihao Zhang, Yitong Zhou, Huanyu Qu
Loosely-Structured Soware: Engine ering Context, Structure , and Evolution Entr opy in Runtime-Re wired Multi- Agent Systems W eihao Zhang The Hong Kong Univ ersity of Science and T echnology Hong Kong SAR, China Yitong Zhou AI Chip Center for Emerging Smart Systems Hong Kong SAR, China Huanyu Qu University of Macau Macau SAR, China Hongyi Li T singhua University China Abstract As LLM-based multi-agent systems (MAS) b ecome more au- tonomous, their free-form interactions increasingly dominate sys- tem behavior . However , scaling the number of agents often ampli- es context pressure , coordination errors, and system drift. It is well known that building r obust MAS requires mor e than prompt tuning or increased model intelligence. It necessitates engineering discipline focused on architecture to manage complexity under uncertainty . W e characterize agentic software by a cor e property: runtime generation and evolution under uncertainty . Drawing upon and extending software engineering experience, especially object- oriented programming, this paper introduces Loosely-Structured Software (LSS) , a new class of software systems that shifts the engi- neering focus from constructing deterministic logic to managing the runtime entropy generated by View-constructed programming, semantic-driven self-organization, and endogenous evolution. T o make this entr opy gov ernable, we intr oduce design principles under a three-layer engineering framework: View/Context Engi- neering to manage the execution environment and maintain task- relevant Views, Structure Engineering to organize dynamic binding over artifacts and agents, and Evolution Engineering to govern the lifecycle of self-rewriting artifacts. Building on this framework, we develop LSS design patterns as semantic control blocks that stabi- lize uid, inference-mediated interactions while preserving agent adaptability . T ogether , these abstractions improve the designability , scalability , and evolvability of agentic infrastructure. W e provide basic experimental validation of key mechanisms, demonstrating the eectiveness of LSS. Ke ywords Multi-agent Systems, LLM Agents, Software Engineering, Design Patterns 1 Introduction Large language model (LLM) multi-agent systems (MAS) are expe- riencing a “Cambrian explosion” [ 4 , 13 , 29 , 33 , 47 , 51 , 54 ]. Across academia and industry , teams are assembling systems that search, plan, code, and iterate, aiming to turn probabilistic models into reliable and productive infrastructure. Po wered by semantic under- standing and intelligent r easoning, LLM agents can synthesize infor- mation, propose actions, and coordinate across tools. Y et progress in MAS has also exposed a hard limit: capability does not scale arXiv , linearly with the number of agents [ 18 ]. In practice, larger teams often amplify errors and coordination ov erhead, and some claims argue that simpler agent architectures can better align with user intent [ 31 ]. A critical gap persists: while prototypes dazzle with emergent capabilities, production systems often hit a ceiling of complexity where additional agents add more failur es than utility . Figure 1: The engineering focus shifts from designing sp ecic logical code to designing runtime generation and evolution. Much of the current MAS literature focuses on building “more intelligent” orchestration frameworks [ 16 , 43 , 55 , 56 ]. Increasingly , howev er , the dominant failures are not merely a matter of prompt tuning or base model intelligence, but an engineering problem of architecture (recently emerging OpenClaw-/NanoClaw-like archi- tectures [ 29 , 33 ] also reect this tr end): the discipline required to manage comple xity , interfaces, and drift under uncertainty . A ccord- ingly , this paper treats MAS as a new kind of software system and develops an engineering perspective that draws from, adapts, and extends conventional software engineering principles and patterns. MAS need not be treated as an opaque “intelligence-on-intelligence” black box; rather , they can b e approached as software systems that can be designed, governed, tested, and evolved. Engineering this new kind of software r equires confronting its core property: runtime generation and evolution under uncertainty . Classic software architecture assumes build-time modular decom- position and slow-changing boundaries. Agentic software violates these assumptions through three coupled “physics”: • View-Constructed Programming : Unlike modular soft- ware where logic is encapsulated in code, an agent’s eective program is determined by a step-specic View constructed from global Artifact les (e.g., system prompts, skills, plans, 1 arXiv , 2026 W eihao Zhang, Yitong Zhou, Huanyu , and Hongyi Li tools, memories) and projected into its context window at runtime [ 21 , 27 , 34 ]. • Runtime-Generated Structure, Abstraction, and Se- mantic Binding : Unlike code-dene d linking, component connections are formed dynamically via semantic under- standing. The system’s connectivity and abstraction can be generated on the y via agent/to ol selection, temporar y team organization, and interfaces negotiated in natural language rather than xed function signatures [ 13 , 15 , 38 , 42 , 51 ]. • Endogenous Evolution : The system’s executable substrate (Artifact les that mediate behavior in-context) is itself rewritable by the system, and thus forms the foundation of adaptation and self-improvement [ 24 , 43 , 46 ]. This paper denes Loosely-Structured Software (LSS) as a soft- ware paradigm characterized by these properties. While many ex- isting frameworks pr ovide valuable structure through deterministic pipelines to ensure reliability ( e.g., LangChain, A utoGPT) [ 4 , 54 ], LSS explores an alternative path by treating runtime variability as a rst-class design dimension (Figure 1 ). For engineers trained in traditional software development, this gap signals a de eper shift. Instead of writing deterministic logic behind stable interfaces, one must engineer the conditions under which agents construct Views, bind to capabilities, and revise their own Artifacts. A recurring challenge is deciding what to constrain and what to leave exible . Traditional softwar e design workows favor stable abstractions (protocols, intermediate representations) and stable topologies (e.g., classic Object-Oriented Programming, OOP patterns). In LSS, howev er , abstractions and architecture are often unstable at design time and can b e generate d and revised at runtime. Mor eover , because agents interpret the same Artifact through task-driv en perspectives, the abstraction can be vie wpoint- dependent rather than objective. A ddressing these issues cannot rely solely on adding more higher-lev el wrappers. Beyond these design dierences, practical MAS characteristics further complicate engineering: (i) View-boundedness (context budgets make context pollution a main failure source) [ 10 , 21 , 27 ], (ii) collaboration cognitive gaps (agents often lack explicit supply–demand contracts, so coordination devolves into semantic probing) [ 1 , 38 , 51 ], and (iii) p ersistent-memory and Artifact maintenance (large, e volving stores of skills, traces, and long-term memories accumulate operational debt and require coherent man- agement) [ 34 , 53 , 58 ]. T ogether , these characteristics drive the poor scalability observed in practice: as more agents are added, the sys- tem bears greater “conte xt pr essure ” and a growing communication tax without proportional gains in task success. Figure 2 summarizes the key dierences between LSS and tra- ditional software in both behavior and system features. These dif- ferences shift the engineering objective fr om “ constructing deter- ministic logic” to “managing runtime entropy distribution”: guid- ing the growth of the system itself. W e introduce a three-layer framework that systematizes this entropy management: Layer 1 (View/Context Engine ering) manages the execution environ- ment and attention span (taming Context Entrop y); Layer 2 (Struc- ture Engineering) manages the organization of artifacts and agents to enable dynamic capability discovery and binding (taming Self-Organization Entropy ); and Layer 3 (Evolution Engineering) manages the lifecycle of self-rewriting artifacts (taming Evolution- ary Entropy). Building on this framework, we also adapt reusable software design wisdom—especially from OOP [ 9 ]—and intellectualized it as dynamic engineering mechanisms for regulating information ow and distribution. Unlike OOP design patterns that typically map to static class hierarchies and compile-time relationships, LSS Design Patterns (e.g., Semantic Router , Mediator ) operate as semantic control blocks that stabilize uid, inference-mediated interactions. These patterns transform structural uncertainty into a feature that in- creases designability —ensuring that an intuitiv ely designed agent organization yields expected b ehaviors; scalability —where “more agents” translates to higher task completion rather than increased noise; and evolvability —where the system can adapt to task en- vironments and grow its own structure and capabilities through managed self-rewriting. This paper makes the following contributions: • W e dene Loosely-Structured Software (LSS) as a distinct paradigm, contrasting its dynamic “physics” with traditional static software engineering. • W e propose a Three-Layer Framework (View , Structure, Evolution) that systematizes the management of agent con- text, self-organization, and evolution entr opy respectively . • W e formalize some r eusable Design Principles and Design Patterns that provide concr ete mechanisms for stabilizing uid interactions, and we discuss how these logical patterns can b e mapped into physical agent realizations through prac- tical design strategies. • W e demonstrate through W orkow Examples how LSS principles solve common failure modes like context overow and binding hallucination with multi-agent evaluation. This paper is intende d to be complementary to existing MAS research rather than a replacement. Some of the design principles and patterns discussed in this paper build on established ideas in the MAS and agentic-systems literature and practice . W e reframe and extend them through a software engineering lens to recapture design intuitions. This also implies the limitations of this work. The goal of this paper is to provide higher-lev el design language, not an exhaustive catalog of implementation techniques or a detailed treat- ment of the technical pitfalls of each pattern across frameworks and deployments. W e present experimental demonstrations in the W orkow Examples to illustrate how the proposed principles and patterns address common architectural concerns, but a comprehen- sive pattern-by-pattern implementation is beyond the scope of this paper . 2 Background and Related W ork 2.1 Basic Concepts of Multi-A gent Systems Multi-agent systems (MAS) are composed of autonomous entities that interact to solve problems beyond the capabilities of individual agents. In contemp orary LLM-base d systems, an agent is commonly modeled as an LLM-centered control loop equipped with four key modules: Prole , Memory , Planning , and Action [ 47 ]. This ab- straction is useful because it separates (i) what the agent is suppose d to be (role and constraints), (ii) what it knows (task-relevant context), 2 Loosely-Structured Soware: Engineering Context, Structure, and Evolution Entropy in Runtime-Re wired Multi- Agent Systems arXiv , 2026 Figure 2: Loosely-Structured Software: a software system with unstable, runtime-generated structure & abstraction and its dominant engineering focuses (View , Binding, Evolution). (iii) how it decides (goal decomposition and selection), and (iv ) what it can do (tool- and environment-facing operations). • Prole : Denes the role, p ersona, and constraints of the agent, often encoded in system prompts. • Memory : Stores interaction histor y and knowledge, ranging from short-term working context to long-term vector stores or knowledge graphs. • Planning : Decomposes complex user goals into executable subtasks and selects next steps under uncertainty ( e.g., Re- Act [ 56 ]). • Action : Enables the agent to perceive and ae ct its envi- ronment via external to ols and APIs, where tool schemas, permissions, and side eects become part of the execution boundary . In production stacks, these conceptual mo dules map to concrete architectural components: agents (execution units), tools (external services), and skills (reusable procedures or playbooks encoded as prompts and structured steps). Recent interoperability protocols further clarify these surfaces. The Model Context Protocol (MCP) standar dizes ho w an agent host connects to external tool and data servers through a discoverable , typed interface, making “tool access” a portable capability across applications [ 14 , 28 ]. Complementarily , Agent2Agent ( A2A) stan- dardizes how agents advertise capabilities and coordinate work across boundaries (e .g., long-running tasks and modality negotia- tion), making “agent access” portable acr oss heterogeneous agentic applications [ 1 ]. T ogether , MCP and A2A expose a layered interoper- ability stack, while skills remain lightweight, in-context procedures that guide how tools are invoked and how agents collaborate. A practical distinction in this architecture is the trade-o b e- tween stability and agility , particularly b etween tool interfaces and skill procedures . T ools stabilize deliv ery via explicit schemas and im- plementations, while skills are agile, natural-language procedures executed in-context and frequently regenerated. 2.2 Related W ork LSS dierentiates itself by addressing the architectural mismatch in current agentic engineering: the attempt to constrain probabilistic, uid intelligence within static, deterministic software patterns. W e categorize existing work through this lens. Loose Coupling Software . Loose coupling is an established concept in software engineering, spanning modular decomposition and information hiding [ 36 ] through Service-Oriented Architectur e (SO A) and microservices [ 6 ]. It aims to reduce change propaga- tion by minimizing shared assumptions, narrowing dependency surfaces, and mediating interactions through stable abstractions; classic guidance frames this in terms of cohesion–coupling trade- os [ 45 ]. In distributed systems, loose coupling is also discussed via interaction styles that decouple components in time and space ( e.g., publish/subscribe), reducing direct kno wledge and synchronization requirements between producers and consumers. In many practical stacks, loose coupling is op erationalized through explicit, versioned interface contracts (e.g., IDLs, schemas, REST -style APIs). Such contracts stabilize syntactic interoperabil- ity , yet they also introduce coupling through shared denitions and compatibility management, espe cially under ongoing API evo- lution [ 19 , 57 ]. LSS emphasizes decoupling at the semantic layer: components coordinate through natural-language intents and run- time interpretation, enabling probabilistic Semantic Binding beyond xed contracts. Context Engineering. Research in retrieval-augmented genera- tion (RA G) [ 20 ] and long-context management ( e.g., MemGPT [ 34 ]) studies how to assemble task-relevant information for LLMs under context-window constraints, often discussed as context engineering in recent surveys [ 27 ]. This includes retrieval and reranking, context compaction (summarization/compression), and structured mem- ory mechanisms, as well as variants that interleave retrieval and generation with self-critique [ 2 ]. It is also motivated by empirical ndings on long-conte xt utilization (e.g., attention degradation and 3 arXiv , 2026 W eihao Zhang, Yitong Zhou, Huanyu , and Hongyi Li “lost in the middle ” eects), which inuence ho w context should be ordered, scoped, and refreshed during multi-step execution [ 21 ]. Orchestration Frameworks. Frameworks such as LangChain [ 4 ], A utoGen [ 51 ], and MetaGPT [ 13 ] provide reusable primitives for dening agents, tools, and message passing, while DSPy [ 16 ] explores declarative compilation and optimization of prompt pipelines. This line of work motivates practical design choices around interaction topology ( e.g., graphs, pipelines, routers) and the mechanics of routing, delegation, and tool invocation. A utonomous Agents. Autonomous agent systems such as A u- toGPT [ 54 ] and Baby AGI explore r ecursive self-prompting and tool- augmented task execution, while later work such as V oyager [ 46 ] and Generative Agents [ 35 ] studies long-horizon learning and open- ended social simulation. Across these systems, a recurring engi- neering concern is how to represent and update agent state (e.g., memories and skill libraries) and ho w to incorporate verication and rollback mechanisms into long-running loops. Harness Engineering. The recent notion of harness engineer- ing frames agentic software dev elopment as the design of the sur- rounding environment that makes agents reliable: repository-local documentation, deterministic guardrails (linters, structural tests), automated fe edback loops, and runtime telemetry that agents can in- spect while iterating. Op enAI reports using such a harness to build and maintain a large codebase with Codex agents, with the harness spanning tests, integration, documentation, and observability [ 32 ]. Fowler further discusses harnesses as a practical vocabulary for the tooling and practices that keep agents “in check” while enabling sustained iteration at scale [ 8 ]. Summary . While existing work excels at providing the mech- anisms (how to call an LLM, how to store vectors), LSS seeks to synthesize these diverse engine ering practices into a unied ar- chitectural vocabulary . It aims to pr ovide a complementary layer of formalization that helps dev elopers reason about system-level entropy . 3 Denition of Loosely-Structured Software In this section, we formalize the concept of Loosely-Structured Soft- ware (LSS) to distinguish it from traditional software. W e rst iden- tify four conceptual features that characterize LSS systems. Based on these features, we then introduce a system model that captures the essential runtime dynamics of such systems. Finally , we out- line a three-layer conceptual framework that organizes the core engineering problems of LSS. 3.1 Conceptual Features W e describ e a system as Loosely-Structured Software (LSS) when it exhibits the following four conceptual features: View-Constructed Programming. The system controls exe- cution by repeatedly constructing a step-sp ecic View from global information. Under context window constraints, it selects, lters, and projects the information scope for the current step, rather than operating directly on the full dataset or statically encoding a xed information scope per step (such as explicitly restricting the scope of a variable in the code in traditional programming). Runtime Semantic Binding. The invocation relationships be- tween artifacts are not determined solely by hardcoded linking Figure 3: Comparison of Architectural Paradigms: T radi- tional Software vs. LSS. Left: Traditional software relies on compile-time program structure to constrain runtime be- havior . Right: LSS makes behavior dep end on the runtime trajectory of Views and Artifacts, which induces the eective agent conguration over time. (such as functional call, inheritance, composition, or message pass- ing). Instead, the system employs Semantic Routing to dynamically bind agents and intents to artifacts and construct execution and information ow paths at runtime, achieving pure runtime-dened binding of targets. Endogenous Evolution. Core logic is enco ded in readable and rewritable A rtifacts (e.g., skill les) rather than immutable co de. The system can programmatically read, analyze, and Rewrite these Artifacts at runtime, enabling b ehavior change by modifying the Artifacts themselves. Dynamic Abstraction Conversion. Abstraction boundaries are not statically xed. The same artifact can be use d in dierent roles across Views—serving as an interface denition, an inter- mediate contract, or concrete implementation for analysis—with abstractions generated and reinterpreted on demand as the execu- tion context shifts. 3.2 System Model: Runtime Elements and Primitives Agent frameworks often model an agent as a bundle of les, with the LLM abstracted as a background reasoning engine. In practical agentic execution, behavior is trajectory-dependent: what the agent can do at step 𝑡 is shaped by the accumulated interaction history and, critically , by the step-specic information that the system constructs from les and injects into the model. In LSS, an agent’s capability , stability , and identity are ther efore realized via step-by- step construction rather than being xed as a single predeclared program. T o formalize this runtime, we describe LSS using four fundamental elements: Intent , Global A rtifacts , View , and Output . 1. Four Core Runtime Elements. For formalization, w e model the LSS from the perspective of ongoing interaction trajectories, where 𝑡 denotes the discrete interaction step. At any step 𝑡 , the system dynamics are governed by: • Artifacts ( 𝐴 𝑡 ): The global collection of all persistent arti- facts at step 𝑡 . The LLM is abstracted away as part of the default infrastructure. W e view LSS as a collection of les; we call les that shape the system’s capabilities and b ehavior artifacts . Artifacts can include reusable prompts (e.g., sys- tem prompts, agent.md ), skills (e.g., skill.md ), plans, code, 4 Loosely-Structured Soware: Engineering Context, Structure, and Evolution Entropy in Runtime-Re wired Multi- Agent Systems arXiv , 2026 tool registries (e.g., MCP), contracts ( e.g., A2A ), traces, docu- ments, databases, and memories used by some MAS frame- works [ 33 ]. They also include artifacts introduced in this pa- per (e .g., index.md for dening the le system, contract.md and team.md for agent collaboration, fork.md for agentic inheritance, evolve.md for system evolution, lens.md and route.md for semantic binding, task.md in the automatic task CI/CD). Artifacts can be viewed as the system’s “hard drive ” at step 𝑡 . They are indexed by 𝑡 because the system can rewrite artifacts, causing the global capability space to evolve o ver time. • Intent ( 𝐼 𝑡 ): The e xplicit query or driving force of the current computation step (e.g., an external user instruction, a subtask invoked from another agent, or a self-driven intent). It acts as the semantic input of the " Agent program" . • View ( 𝑉 𝑡 ): The spe cic, transient projection dynamically assembled from 𝐴 𝑡 and injected into the LLM’s context win- dow . It acts as the system’s “active RAM. ” Logically , 𝑉 𝑡 is distinct from 𝐼 𝑡 : while 𝐼 𝑡 is the objective to be solved, 𝑉 𝑡 is the step-specic execution scop e required to solve it (a step- level executable program). Howe ver , the boundar y between them can be subje ctive, dep ending on the agent’s p erspective when executing the task. • Output ( 𝑂 𝑡 ): The contents or actions generate d by the LLM and the automatic environmental feedback (e.g., tool execu- tion results). It closes one interaction step. 2. Architectural Inversion: Instances Inducing Classes. By tracking these four elements over time (Figure 3 ), we reveal a funda- mental architectural inversion between traditional object-oriented software and LSS. In traditional OOP, a dev eloper usually statically denes a Class in the codebase, which then explicitly spawns exe- cution Instances at runtime. In LSS, the relationship can b e inverted. First, multiple interaction steps dene an A gent Instance as the complete context trajectory ( 𝜏 ) of its execution up to step 𝑛 : Agent Instance ( 𝜏 𝑛 ) ≡ { ( 𝑉 0 , 𝐼 0 , 𝑂 0 ) , ( 𝑉 1 , 𝐼 1 , 𝑂 1 ) , . . . , ( 𝑉 𝑛 , 𝐼 𝑛 , 𝑂 𝑛 ) } (1) This trajectory captures the specic, ongoing physical execution of an agent solving a problem. W e ignore cases where an agent framework ne-tunes LLM weights. LSS conceptually abstracts the LLM as a general-purpose, uniform reasoning engine whose b ehavior is conditioned by the runtime View se quence. Consequently , the structural identity , capa- bilities, and cognitive boundaries of the agent—its Agent Class —are not xed upfront. Instead, the Agent Class is dened by the Vie w sequence it is exposed to, which is dynamically induced by the Agent Instance: Agent Class ≡ { 𝑉 0 , 𝑉 1 , . . . , 𝑉 𝑛 } (2) This suggests a shift in perspective within the LSS paradigm: where the ’ Agent Class’ can b e viewed as an emergent property in- duced by its runtime Context trajector y . Even if two agent instances start with the identical initial prole in 𝐴 𝑡 , a divergence in their runtime View sequences (e.g., one retrieving clean API schemas and the other retrieving polluted conversational noise) means they may mutate into entirely dierent A gent Classes mid- execution. Some rule-based software can also generate types or be- haviors at runtime via me chanisms such as reection and metapro- gramming [ 17 , 25 ], but most mainstream OOP design patterns are not built on the assumption that class structure is routinely gener- ated at runtime. In LSS, by contrast, this kind of runtime-induced structure is a core characteristic. 1 Comparison: • Traditional soware: Class denes instances. • LSS: Agent instances dene class. 3. The LSS Exe cution Cycle. Based on this formalization, the continuous execution cycle of an LSS system transitions the state from step 𝑡 to 𝑡 + 1 and can be modele d via a set of primitives, which dynamically operate over the global Arti- facts, the current Intent, and the historical context (let 𝜏 < 𝑡 = { ( 𝑉 0 , 𝐼 0 , 𝑂 0 ) , . . . , ( 𝑉 𝑡 − 1 , 𝐼 𝑡 − 1 , 𝑂 𝑡 − 1 ) } denote the trajector y up to step 𝑡 − 1 ): (1) Perception (View Construction): 𝑉 𝑡 = Project ( 𝐴 𝑡 , 𝐼 𝑡 , 𝜏 < 𝑡 ) . The Project primitive assembles a step-sp ecic View through a coupled sele ction process: 𝐼 𝑡 and 𝜏 < 𝑡 jointly shape what gets retrieved and prioritized from 𝐴 𝑡 , and what ultimately gets included in the resulting 𝑉 𝑡 . Note that every procedure in LSS can be semantic: Project not only selects relevant Artifacts, but can also transform them (e .g., compressing, ltering, or merging information) based on the Intent. In practice, Project typically requires only a subset of, or a compressed representation of, 𝜏 < 𝑡 . (2) Decision & Action (Execution): 𝑂 𝑡 = Execute ( LLM ( 𝑉 𝑡 , 𝐼 𝑡 , 𝜏 < 𝑡 ) ) . The underlying LLM takes the constructed View 𝑉 𝑡 , the explicit objective 𝐼 𝑡 , and the historical context 𝜏 < 𝑡 as input. The Execute primitive resolves and performs the agent’s inference and carries out any accompanying operations (e.g., executing a Python script, calling an API) to produce the output result 𝑂 𝑡 . (3) Evolution (Artifact Update): 𝐴 𝑡 + 1 = Update ( 𝐴 𝑡 , 𝜏 𝑡 ) . The Update primitive uses the current trajector y 𝜏 𝑡 (including the latest ( 𝑉 𝑡 , 𝐼 𝑡 , 𝑂 𝑡 ) ) to modify the system’s Artifacts ( e.g., rewriting a skill le or recor ding a distilled trace), evolving the global artifact space 𝐴 for the future. (4) Propagation (Intent Formulation): 𝐼 𝑡 + 1 = Formulate ( 𝜏 𝑡 ) . The Formulate primitive analyzes the current context to de- rive or generate explicit Intents for either itself or other agents. This entails breaking do wn a complex r esult into a new subtask, issuing a self-correction incentive, or formulat- ing a query for another agent. Crucially , in an LSS architecture, these primitiv es do not have to be executed sequentially by a single monolithic entity . For instance, in Figure 4 , a “Lens” might execute the Project primitive to construct a 𝑉 𝑡 , which is then handed o ver to a “W orker” to Execute . Similarly , the 𝐼 𝑡 + 1 generated by the Formulate primitive can be targeted at an entirely dierent agent. This also implies that the same information can play dierent roles for dier ent agents: what is an Output for 1 T o clarify the conceptual shift, we contrast LSS with a stylized model of traditional software—acknowledging that real-world systems often exhibit hybrid behaviors. 5 arXiv , 2026 W eihao Zhang, Yitong Zhou, Huanyu , and Hongyi Li Figure 4: Multi-agent interaction in LSS, represented using UML-style sequence diagram notation. The diagram illus- trates how the primitives decompose the loosely-structured runtime. one agent can become part of the View for another agent (e.g., the Router’s 𝑂 𝑡 becomes the W orker’s 𝑉 𝑡 ). In Section 7 , we detail the mapping between these roles and the concrete agents in our implementation. Multiple roles can be embedded into one agent based on the Semantic Cohesion Principle. Before intr oducing that, we can view each role as a logically isolated agent for simplicity . Figure 5: Three-layer engineering framework for LSS: View– Context, Structure–Capability , and Evolution– Adaptation. T ogether , these runtime elements and primitives make the struc- ture and abstraction dynamics of LSS universal. Unlike traditional software, where developers mostly precisely predene control logic and information ow , LSS accepts runtime uncertainty induced by View construction, semantic binding, and Artifact evolution. Engi- neering LSS ther efore shifts from exact control to semantic entropy management. Concretely , we identify three forms of runtime en- tropy: Context Entropy , Section 4 , Self-Organization Entropy , Section 5 , and Evolutionary Entropy , Se ction 6 . It can be imag- ined that if LSS owns thousands of artifacts or hundreds of agents, the runtime uncertainty will b e extremely large and lead to system failure [ 18 ]. Figure 5 organizes these challenges into three layers (View–Context, Structure–Capability , and Evolution–A daptation). The following sections discuss possible engineering principles and patterns for taming each entropy respectively . 4 View/Context Engineering: T aming Context Entropy The rst layer concerns the immediate execution environment, shift- ing the engine ering focus from designing Control Flow to designing Context Flow . In traditional software, the runtime environment (stack, heap, variable scope) is managed by the OS and compiler . In LSS, the runtime environment for each reasoning step is the infor- mation projected into the model’s View: whatever enters the View participates in execution by activating capabilities (tools and skills), constraining behavior (policies), and biasing decisions (e xamples and retrieved memories). This is View-Constructed Programming : rather than executing a xed program against dierent inputs, the system repeatedly assembles a step-sp ecic “executable slice” fr om a large Artifact pool (prompts, skills, documents, traces) and then runs the model within that slice [ 3 , 20 , 34 , 41 ]. The engineering question becomes how to govern View construction so pr oduced Views are stable, economical, and aligned with the step Intent. Challenge: Context Entropy & T wo-Sided Failure. W e dene Context Entropy as the instability introduced by the gap between the actual View projected to the agent and the ideally most helpful View for the current step. View construction makes information boundaries highly dynamic (the textual execution scop e can be assembled and recombined arbitrarily), and long contexts further degrade the model’s ability to reliably use evidence . This entropy manifests as a two-sided failure mode: • Excess (Context Pollution): Providing too much informa- tion introduces irrelevant noise and conicting constraints, leading to attention dilution [ 10 , 50 ]. • Deciency (Context Star vation): Providing too little infor- mation causes the absence of critical constraints or local state, leading to reasoning breaks and execution errors [ 2 , 20 ]. Both failure modes are systematic rather than incidental. Retrieval may surface stale or weakly relevant memories; an upstream agent may over-serialize its chain-of-thought into downstream context [ 48 , 49 ]; and tool outputs may encode assumptions that are each locally reasonable but jointly inconsistent. Because the View is assembled at runtime, small inclusion/omission changes can ip behavior , causing capabilities to appear “non-deterministic” even when the model is unchanged. The core diculty is thus not merely context window size, but enforcing step-level View governance—what is re vealed, when, and under which contracts— so the executed View stays close to the ideal View under a bounded cognitive budget. 4.1 Design Principles Progressive Disclosure. T o reduce Context Entropy , View expo- sure should be staged base d on the agent’s condence and the step Intent: • Minimal Sucient: Default to projecting only what is nec- essary for the current step; keep everything else out to reduce accidental coupling and instruction interference. 6 Loosely-Structured Soware: Engineering Context, Structure, and Evolution Entropy in Runtime-Re wired Multi- Agent Systems arXiv , 2026 • Adaptive Context Expansion: When uncondence rises, expand the View with additional semantically relevant e vi- dence to increase task success probability [ 2 ]. • Context Backpressure: Use obser vable context pressure (e .g., token budget, ambiguity signals) to throttle exposure or compress and summarize under high pressur e. Allow mor e evidence under low pressur e [ 34 ]. This is the most widely adopted context-engineering principle in current MAS framew orks [ 27 , 31 ]. In practice, an agent skill is al- ready a concrete embo diment of Progressive Disclosure: a skill pack- ages a task into step-wise, View-sized instructions and constraints that can be selectively loaded as execution progresses, rather than dumping the entire Artifact pool into the View at once. Step-level Customization. T o reduce long-run cross- contamination, context engineering can also be customized at the granularity of each step as an optional governance tactic: • Context Branching & Stitching: For complex sub-tasks, fork multiple clean sub-contexts (i.e., multiple reasoning branches) to isolate local reasoning traces and av oid contam- inating the main thread, and only stitch a distilled outcome back [ 48 , 55 ]. • Context Isolation: For each step, construct a temporary sub-context by selecting a subset of the trajectory 𝜏 < 𝑡 that best matches the current Intent, then merge the step outcome into the main context. Note that many agent frameworks already provide context com- pression and cleanup mechanisms that can serve as building blocks for these operations. Comparison: • Traditional soware: V ariable scop e is largely xed at runtime. • LSS: Eective scop e (the step View and its carried history) is dynamically constructe d at runtime. Figure 6: Sequence diagram of a Me diator constructing a task- specic A2A protocol between two agents. 4.2 Design Patterns Semantic Lens. Intent. Retrieve and compose the right arti- facts/information on demand. Me chanism. When the global Artifact pool becomes too large for a worker agent to reliably self-select, even exposing only artifact metadata (e.g., thousands of skills) can impose signicant context pressure, and metadata alone may b e insucient to choose reliably among many near-duplicate skills without reading their bodies. In this case, a dedicated agent (or mo d- ule) can implement the Project runtime primitive: given the step Intent and trajectory , it retriev es and assembles a compact View from the relevant artifacts (often starting minimal and expand- ing when neede d), thereby operationalizing Progressive Disclosure in View construction. Because a Lens can repeatedly generate an agent’s step Views along a trajectory , it can also b e viewed as an Agent Class Generator that induces dierent worker classes via dierent View policies. Context Curator . Intent. Keep agents’ execution history more usable by distillation and compression. Mechanism. The Curator distills and compresses an agent’s accumulated conte xt, then pro- duces a step-level temporar y sub-context for the next operation. Since an Agent Class can be characterized by its View trajectory , the sub-context eectively induces a temporar y “narrower subclass” for that step. W e revisit and generalize this mechanism as agentic inheritance in the next section, and the Context Curator can be viewed as an Inheritance Generator . Mediator . Intent. Enable collaboration without mutual View pol- lution. Mechanism. In LSS, a Mediator can be seen as dynamically constructing a task-specic A2A-style collab oration contract: it con- centrates the negotiation in its own View , distills the result into a clean contract, and then delivers only the protocolized outcome to the W orkers. Concretely , the coordination unfolds as: (i) the Medi- ator receives an Intent that r equires collaboration among agents; (ii) the Mediator iteratively negotiates and compiles an explicit contract (e .g., roles, I/O schema, state commitments, and allowed side ee cts); (iii) instead of continuing with the original agents (whose Views no w contain bargaining traces), the Mediator forks two derived, clean-scope d child agents that inherit only the minimal necessary traje ctory and the nalize d contract in their context tra- jectories; (iv) the child agents execute tasks under contract-aligned Views. Figure 6 illustrates this ow , and reusable protocols can be persisted as contract.md artifacts for future reuse. End Criteria. Intent. De cide when an agent instance has nished its responsibility and can be safely retired. Mechanism. In LSS, many agent instances are intentionally ephemeral: they are terminated once a transient task is completed. End criteria can attach a set of completion predicates to an agent (e.g., required outputs produced, invariants satised, budget exhausted, receipt of a sp ecic signal from the user or another agent, etc.). Once the criteria are met, the agent can be terminated, and optional termination hooks can be triggered to nalize side eects (e.g., Claude Code hooks): for example, distilling and returning a summary to other agents, or archiving reusable artifacts. 7 arXiv , 2026 W eihao Zhang, Yitong Zhou, Huanyu , and Hongyi Li 5 Structure Engineering: T aming Self-Organization Entropy This layer focuses on Structure Engineering : governing runtime bind- ings so that artifact and agent topologies form in a task-appropriate way . Unlike traditional software where most dependency edges are xed at compile time, “structure” in LSS is not a static mod- ule graph; it is the binding topology that emerges as the system repeatedly performs runtime semantic reasoning. Each binding decision—routing a message to an agent [ 13 , 38 , 51 ], selecting an artifact, invoking a skill, or using a tool [ 42 , 56 ]—implicitly creates a dependency edge for that step. These edges accumulate into a topology: which capabilities are frequently composed, which agents become hubs, and which indices b ecome discovery gateways. This is also why Structure Engineering cannot be reduced to “better prompts” or “more to ols”: even if the View is clean (Lay er 1), the system can still fail if it binds to the wrong capability , binds in the wrong order , or binds across incompatible assumptions. Structure Engineering therefore treats binding as a rst-class architectural event and asks how to make binding outcomes predictable and scalable. Challenge: Self-Organization Entropy & Binding Failures W e dene Self-Organization Entropy as the uncertainty of bind- ing failures while the system moves from its current binding topol- ogy toward a topology that would b e most favorable for the curr ent task. The core issue is not the gap itself; it is the uncertainty in whether the system can reliably r each the favorable topology un- der weak constraints. As the number of artifacts and agents scales, the combinatorial candidate space expands while prior constraints remain weak, which makes the formation of a useful topology increasingly unstable [ 23 , 52 ]. In practice, this often manifests as: • Binding Miss: The system fails to nd a suitable artifact or agent to bind to, so progress stalls, loops, or falls back to low-condence guesses [ 2 , 23 ]. • Binding Wrong: The system binds to an unsuitable artifact or agent (e.g., an incompatible capability , an irrelevant inde x, or a peer that cannot deliver the required output), leading to incorrect execution paths or non-convergent co ordina- tion [ 23 , 37 , 39 ]. • Binding T oo Much: The system binds to too many artifacts and agents. This over-e xposes candidates and their meta- data in the View , causing context pressure and pollution; it can also create an overly dense topology that increases coordination overhead and reduces task eciency [ 34 , 40 ]. These failures share a common cause: the system is trying to con- struct a task-appropriate topology under weak constraints. When routing is under-specied, it compensates by guessing; when mul- tiple partially relevant candidates exist, it oscillates; and when bindings are made without clear contracts, the r esulting topology becomes brittle and hard to debug. 5.1 Design Principles LSS systems are intelligent enough to self-organize and adapt their collaboration top ology to dierent tasks, but this creates an en- gineering dilemma: binding cannot be left completely free, and it also cannot be fully hard-coded as in traditional software. Struc- ture Engine ering therefore aims to keep control on the “edge of chaos” through semantic design. T o tame self-organization entropy without over-constraining semantic adaptability , we propose three governance principles. T ask-Scop ed Mo dularity . Artifacts and agents that partici- pate in the same task often exhibit stronger cohesion: within- task bindings are typically more frequent and reuse-oriented, while cross-task bindings should be mediated by explicit semantic gates [ 13 , 38 , 51 ]. A task cluster can start from a small seed set of agents/artifacts and expand along high-condence , task-oriented edges. Dierent task clusters can still be connected through seman- tic gates, which helps reduce cr oss-task misbinding and pre vents unbounded search from overwhelming the View . Binding Provenance . Provenance can be recorded for binding events (routing decisions, retrie val hits, tool calls, and inheritance spawns). Pro venance turns “why did it bind to that?” from a seman- tic black box into a traceable explanation: which artifact intr oduced a dependency , what evidence justied it, and how downstream steps relied on it. This is essential for debugging misbindings for both human users and higher-level agents [ 30 , 44 ]. Structure as Ability . Delivering a structure to an agent is not delivering a single, pr ecise answer; it is empow ering the agent to nd, compose, and validate answers under constraints. For example, delivering a le is a direct information delivery that requires unusually strong mutual understanding of what “the right le ” is. Delivering a le-system organization (e.g. a tree or a graph) instead delivers an ability: the receiver can locate the right le by probing the structure with its o wn task needs. The same applies to multi-agent collaboration. Delivering a team structure—roles, connections, etc.—giv es an agent the ability to coordinate and reuse peers without re-negotiating bindings from scratch. In this sense, indices, team specications, and inheritance designs are structures that compress future binding uncertainty into more reliable, callable aordances. Even a skill can b e abstracted as a structure that shapes tool invocation and agent behavior . This concept is abstract, so let us illustrate it with a concrete example. Consider an agent tasked with debugging a massiv e code- base. Giving the agent the full text of 5,000 source les ( or even just the le names) leads to context collapse. Ho wever , delivering a hierarchical index.md (a Structure) that maps features to les does more than provide a specic le. It equips the agent with the capability to systematically navigate, probe, and isolate the desired les in this le system. The structure acts as an executable map that the LLM’s reasoning engine can “run, ” ther eby providing an ability rather than a single answer . Comparison : • Traditional soware: Structure constrains Infor- mation. Information processing denes capability . • LSS: Information Emerges Structure. Structure ex- tends capability . In the patterns below , we call the structur e among artifacts an Index . The structures that relate artifacts/temporally ev olving infor- mation and agents can be dened by the previous layer’s Semantic Lens and, in this layer , by the Semantic Router . For agent–agent structure, we distinguish two families: (i) communication-based 8 Loosely-Structured Soware: Engineering Context, Structure, and Evolution Entropy in Runtime-Re wired Multi- Agent Systems arXiv , 2026 cooperation topologies, which we call a T eam ; and (ii) agentic in- heritance , which structures how View and Context ar e composed across derived agents. 5.2 Design Patterns Following the ab ove principles, we can give a set of structural-lev el design patterns. Semantic Router . Intent. Route information to the right agent under semantic constraints. Me chanism. Given a piece of information—for example an Intent ( what needs to be done next) or an output (what has just be en produced)—a Semantic Router forwards it to the most suitable agent. This is dierent from Seman- tic Lens : Lens sele cts proper information ( Artifacts) to place into a given agent’s View; Router decides which agent should receive a given message. The router can also record binding provenance for its r outing decisions so that downstream agents can understand why a route was chosen. More subjectively , the Semantic Lens con- structs an agent’s View and thus acts like an agent-class generator , whereas the Router primarily determines the inputs to an agent class, eectively instantiating the class. Index Generator . Intent. Enable an agent to discover relevant Artifacts. Mechanism. An Index Generator produces an indexing system for the Artifacts relevant to the current task, such as a tree, a graph, or a semantic linked list. Since Artifacts are les, the Index can be viewed as a generated, task-specic le system. The index is a navigational structure that makes intentional discov ery easier and more reliable than free-form searching. The index can be recorded and exchanged as index.md . A particularly useful form is a le-centered pointer graph that links semantically related les. Given a focal le, the Index Generator can produce a neighborhood structure centered on it, where each edge explains how the linked le relates to the focal one. Comparison: • Traditional soware: The le system is a rigid, objective hierarchy manipulate d by explicit paths. • LSS: The le system (Index) is a uid, subjective structure generate d on-the-y by semantic Intent. It is also interesting that Semantic Lens can be viewed as a kind of Index. Unlike an explicit tree or graph structure, it is an intelligent index whose structure is largely a black box. Therefore, if an agent can generate a “Semantic Lens” , that generator can also be seen as an Index Generator . This example highlights a key dierence from traditional software design: in LSS, a r ole or an abstraction can shift with the designer’s/agents’ perspe ctive. W e call this subje ctivity of design viewpoints the polymorphism of abstractions ; together with their composability (e.g., packaging skill-dened services as MCP tools), it reects the mindset shift required when moving beyond traditional software thinking. These uid concepts can further be generated, evolved, and nested at runtime, which gives LSS a very exible and inclusive design space. T eam Generator . Intent. Enable an agent to collaborate with other agents. Mechanism. A T eam Generator produces a cooperation structure among agents for a given task: roles, who talks to whom, and what each agent is responsible for . The team sp ecication can be written and exchanged as team.md . With an explicit team, routing becomes a structural operation rather than an ad-ho c choice. A team can include Mediators and Routers to keep collaboration well- structured. And similarly , a Router can be viewed as an intelligent team structure. Comparison: • Traditional soware: Components integrate through rigid, syntactic contracts (APIs). • LSS: Agents integrate through uid, semantic ne- gotiation (Natural Language Contracts) (Figure 7 ). Figure 7: Establishing a more stable structure and contract through exible multi-agent negotiation. After-task team. Sometimes the team can be created after a task: agents rst self-organize to complete the task, and then the successful collaboration experience is summarized into team.md as a persistent team structure [ 52 ]. This can be vie wed as an ev olution mechanism discussed in the next section. A team.md itself can also serve as a task-completion playbook for future agents. Inheritance Generator . Intent. Derive a task-t, context-clean agent by inheriting from existing agent instances. Mechanism. For a specic step, the Inheritance Generator derives a child agent from one or more existing agent instances, inheriting only the minimal required trajectory and constraints while ke eping the child’s View clean and avoiding pollution of the parent. This resembles a fork operation in modern OSes. The Inheritance Generator can also govern the lifecycle of child agents by specifying when to fork and by applying End Criteria. When there is only one parent, the Inheritance Generator reduces to the Context Curator introduced in the previous se ction. Therefore, in this se ction, we emphasize multiple inheritance : the primar y responsibility is not merely clean- ing/compressing a parent’s context, but integrating and migrating capabilities composed from multiple agent instances to produce an agent that is better t for a sp ecic task. The inheritance structure can be recorded in fork.md . Figure 8 contrasts single-inheritance context isolation with multiple-inheritance capability integration. While we intentionally borrow terminology from OOP (such as Classes, Inheritance), we emphasize that LSS reinterprets these con- cepts from a syntactic space to a semantic space. In classical OOP , inheritance is a static, compile-time relationship dening structural reuse. In LSS, Semantic Inheritance is a dynamic, runtime event 9 arXiv , 2026 W eihao Zhang, Yitong Zhou, Huanyu , and Hongyi Li where a child agent is instantiated by inheriting the pr ecise contex- tual trajectory (the semantic state), allowing execution to branch safely without polluting the main reasoning thread. (This runtime derivation somewhat resembles prototype-based programming in, e.g., JavaScript.) By mapping uid AI behaviors to established soft- ware engineering behaviors, we provide a cognitive-bridge from traditional software engineering for governing prompt-driven exe- cution. Using Inheritance Generator , a parent agent can fork several iso- lated child branches to explore dier ent binding hypotheses (e .g., alternative retrie val paths, r outing choices, or binding orders) with- out polluting the main execution. Each branch runs under a small contract (scope, budget, allow ed side eects, and expected return). A selector then compares outcomes and merges only distille d, veri- able deltas back into the parent—for example , a validated index entry , a corrected routing rule, or a reliable invocation recipe— discarding the rest. Figure 8: Inheritance in LSS: single inheritance isolates a child agent’s context under a Context Curator; multiple in- heritance selects and composes trajector y fragments across instances under an Inheritance Generator to integrate capa- bilities and memories. Supply Chain. Intent. Make multi-hop binding explanations traceable and stable. Mechanism. Provenance is a property of each binding ev ent. By linking together the provenance of multiple bind- ings, the system forms a supply chain across artifacts and agents: which routing decision led to which retrieval, which r etrieval led to which invocation, and which invocation produced which down- stream dependency—and, crucially , why each hop was justied. Such chains can be maintained at runtime and viewed as a seman- tic, distributed logging mechanism or a call stack [ 30 , 44 ]. Facade & Filter . Intent. Encapsulate a clean semantic gate for complex subsystems. Me chanism. Similar in spirit to the OOP Facade Pattern, the system exposes a simplied external interface for a com- plex internal collaboration subgraph, hiding internal negotiations and intermediate artifacts. In practice, the core of OpenClaw can be viewed as such a Facade with agent Router—a local gate way [ 33 ]. Unlike a purely syntactic API wrapper , the facade is semantic: it can lter and reorganize inbound and outbound information (e.g., redact sensitive details, enforce p olicy constraints, or normalize outputs into a stable schema). Metaphorically , a Facade can act like a “Maxwell’s demon” that r educes runtime entropy in the system by preventing unnecessary or harmful information passing and reducing failures caused by mismatched bindings. It is worth noting that dynamic or static skill creators are mech- anisms widely supported by current agent frame works; they can also be seen as a design pattern for constructing structure among tool calls and agent behaviors. 6 Evolution Engineering: T aming Evolutionary Entropy The third layer marks the transition from designing program ex- ecution to governing endogenous program evolution . An LSS does not merely run; it continuously rewrites the Artifacts that make it functional. More concretely , an LSS is a collection of Artifact les. Agent capabilities, team organization, tools and data, prompts, skills, indices, and routing policies are all les that actively determine the system’s behavioral boundar y . They shape what can b e bound, which Views are constructible, and which actions are reachable. Evolution Engine ering studies this reality directly . If the system can change these persistent les from inside its own e xecution loop, then “running the program” and “ changing the program” collapse into one continuous process, and using the system be comes part of developing it . Comparison: • Traditional soware: Developing for usage. • LSS: Usage as part of development. In this view , Endogenous Evolution is the long-term self- development of the system by modifying its own Artifacts. Be cause these changes happen inside the execution loop, they create a feed- back cycle. The same View-dened inference that pr oduced a be- havior can also justify a patch that changes the Artifact behind that behavior , and the patched Artifact then shapes future Views that justify future patches. This is less like editing a static codebase with a clear split between development and deployment and more like online incremental learning on a moving target, wher e biases can be amplied and friction reduced through repeated reuse [ 11 , 40 ]. Note that Evolution Engineering in this Section mainly fo cuses on the system scale : how artifacts are created, re vised, retained, and retired. Improvements to the base model’s capability are outside the engineering boundary here. It also distinguishes cr oss-session evolution from in-context adaptation . A single agent may impro- vise tactics within one run to nish a task. Evolution Engineering is about what the system ke eps across runs: stable, structural im- provements and capability deposits that change what future runs can do. Therefore, the evolution engineering goal is to make the system better under repeated use : • More adaptive: as it completes tasks and interacts with users, it becomes increasingly aligned with user preferences and working conv entions, reducing context entropy by stabi- lizing what information is relevant and how it is pr esented. • More coherent: internal structure becomes easier to navi- gate and compose. Early versions often carry visible hand- designed constraints and internal mismatches across the LSS. 10 Loosely-Structured Soware: Engineering Context, Structure, and Evolution Entropy in Runtime-Re wired Multi- Agent Systems arXiv , 2026 T wo agents may lack an ecient communication protocol, interfaces may be inconsistent, and Artifacts may be format- ted or scoped in incompatible ways. O ver time, the Artifact set can also accumulate stale, r edundant, or conicting items. The system then behaves like a machine whose gears do not mesh: it turns with friction. Evolution should make these gears engage smoothly , reducing structural entropy . • More capable: new , reusable abilities emerge as Artifacts and are consolidated into stable services. • Bounded blast radius: failures ar e increasingly conned to small, local mistakes rather than spreading into system-wide drift. Challenge: Evolutionary Entropy (Knowledge Rot & Long- term Drift). W e dene Evolutionary Entropy as the long- term uncertainty and unpredictable system drift induced by self- modication. After many edits, it becomes hard to tell what truly helped and what else was unintentionally aected. In practice, it often manifests as: • T oo lazy: evolution is too slow . V aluable information and working metho ds discov ered in tasks ar e not retained, so the system cannot accumulate useful capability . • T oo active: evolution is too frequent. Behavior becomes unstable, changes oscillate, and the Artifact store lls with debris and inconsistencies, accelerating knowledge rot [ 40 ]. • Goal misalignment: evolution does not move toward the human or task intent. Over repeated self-evolution cycles, locally rewarded patches can compound into systematic de- viation from human intent as a p ost-deployment failur e [ 11 ]. What is hard for evolution engineering is that a patch can help on the current task while making the overall system worse. Evo- lution needs to learn from short-term success, but it also nee ds a long-term direction. This requires the evolving system to model its own operation and objectives, and to apply changes in a planned, deliberate way . 6.1 Design Principles Plan the Ephemeral/Persistent Boundar y . Since an LSS is a collection of Artifact les, Evolution Engineering is mainly about planning the boundary b etween what is temporary for a task and what becomes part of the persistent Artifact set. At its simplest, the LSS maintains a global p ersistent Artifact pool, and the Evolver de- posits important, eective ephemeral structure and Artifacts from each task session into this pool; an example of such a structure is OpenClaw’s two-level memory [ 33 ]. Howev er , “ephemeral vs. persistent” is not a binary property but a time-scale-relative spec- trum (Figure 9 ): a memory item may be persistent over a day y et ephemeral relative to a week-long redesign; a routing rule may be stable for a project phase but later b ecome a transient cache after a project nishes. Across multi-scale agent lifecycles and multi-layer memories, dierent layers can have dierent levels of evolutionary activity . Evolution as Learning. Evolution is a learning process that needs environmental feedback. Useful fee dback signals include: • Objective verication: whether the system outputs satisfy tests, constraints, or user-specied che cks. Figure 9: The evolution engineering should accept a relative and unstable ephemeral/p ersistent Boundary . • Human fe edback: corrections, suggestions, accept/reject behavior , rollbacks, and edit traces as ground truth for align- ment. • Self-reection: internal evaluation of process and outcome, e.g., a revie wer/critic agent iteratively provides feedback to a worker agent for renement [ 24 ]. Hebb’s Rule. “res together , wires together” is a local, unsu- pervise d learning rule inspired by synaptic plasticity [ 12 ]. For LSS, the analogous engineering principle is to reinforce structural path- ways that are r epeatedly used across tasks. When a binding path, skill, index, or collaboration topology is repeatedly validated, the system can stabilize it by promoting it into a persistent artifact or wrapper in an explicit contract (e.g., an MCP or A2A contract). Con- versely , changes correlated with regr essions can be down-weighted or pushed back into ephemeral scope. Entropy-aware Evolving. Evolution should stay aware of the three entropies intr oduced in this paper . One practical requirement here is history residues : evolution should not be purely overwrit- ing. The system can retain historical slices and their rationales so that it can roll back, and past trajectories act as a prior that smooths abrupt oscillations during exploration. Design the Structure of Evolving. Finally , the engine er’s job shifts from hand-authoring rules for how the structure should evolve to building a structure that makes evolution possible and safe. This means trusting the LSS’s own intelligence to propose improvements leaning toward endogenous and spontaneous evolu- tion , while providing meta-structure (incentives, exploration space, selection pressure) that keeps evolution inside a desired scope. 11 arXiv , 2026 W eihao Zhang, Yitong Zhou, Huanyu , and Hongyi Li Comparison: • Traditional soware: Design the evolution rules of the structure. • LSS: Design the structure of evolution. 6.2 Design Patterns Following the ab ove principles, we can give a set of system evolution design patterns. Sandbox Mode. Intent. Bound blast radius by default while still enabling fast exploration. Mechanism. When the system explores changes to persistent artifacts, it creates one or more sandbox envi- ronments that run in isolated scopes. It w orks on sandbox copies of the relevant artifacts and write targets. Comparable tasks (or replay tasks) are executed to evaluate each candidate, including A/B- style comparisons when feasible . Because the ephemeral/persistent boundary is relative, nested sandboxes could be intr oduced. A dedi- cated agent can decide when to spawn sandboxes and let the Evolver merge their results. Only changes that remain safe and consistently helpful are merged back into the persistent artifact set; others are discarded. Evolver . Intent. Improve the system by observing and revising its artifact set over time. Mechanism. The Evolver continuously monitors task completion, interaction traces, and user feedback, then proposes concrete e dits to the artifact set. Each proposal is packaged as an explicit, reviewable patch with a stated hypothesis (what behavior it should change) and a rollback chain. The Evolver gates merges through verication: it may replay representative tasks, run tests and static checks, and perform sandboxed A/B comparisons before promoting changes into the persistent artifact set. The behavior of Evolver can be specied in evolve.md . As a concrete example, an Evolver can implement an agent-level genetic algorithm: starting from a baseline agent instance and its traces, it generates a population of derived candidate instances via counterfactual e dits and controlled “mutations” (e.g., p erturbing a binding choice, swapping a tool r eturn, or injecting/removing a small context fragment) while keeping external writes isolated. Each candidate is evaluated by replaying tasks or running representativ e checks in sandbox mode, producing tness signals. The Evolver then applies sele ction pressure by retaining the best-performing instances (and optionally r ecombining their compatible improv e- ments) and discarding the rest, thereby performing an explicit natural-selection loop over agent instances. Finally , the Evolver distills valuable lessons and memories from the top-performing agent instances, updates the artifacts accordingly , and thus drives system evolution. This entire process can be repeated during idle periods by continuously replaying previously completed tasks. Comparison: • Traditional soware: An exception breaks the control ow . • LSS: A n exception is appende d to the context ow to generate the next action. Semantic Palimpsest. Intent. A new le type that carries con- tinuous semantic history residues. Mechanism. Semantic Palimpsest is a le that stores more than the current version. It carries contin- uous semantic residues: ho w an Artifact e volved, which changes mattered for behavior , and why they were made. Even when the le displays a “curr ent” version, the semantic shadow of its tra- jectory remains available and inuential. Compared with version control systems like git, it can still support rollback, but it empha- sizes semantic history over line-by-line dis. Small edits that cause large behavior changes may b e salient [ 7 ], and the reasons behind changes stay attached to the Artifact so that the ev olution path is explainable. Artifact Maintainer . Intent. Counteract entropy accumulation in the Artifact store. Mechanism. An asynchronous maintainer scans for redundancy , outdated items, and fragmentation. It consolidates duplicates, marks stale items, detects conicts, and retires structures that are rarely bound, keeping the Artifact space navigable and reducing the inventory cost of evolution. The Maintainer can also report warnings for structural incoherence. Artifact Tiering. Intent. Make layered “ephemeral vs. persistent” a pr operty of the le system. Me chanism. Memor y and Artifacts are organized into explicit tiers with dierent retention and read/write policies: • Hot tier: fast-changing work products created, modie d, and deleted frequently for active tasks. • W arm tier: stable core skills, indices, and other long-term memories that are read often and written rarely . • Cold tier: low-frequency , edge Artifacts that may dissipate or be forgotten. Evolution is then not only content change but also tier migra- tion : promoting/demoting artifacts across tiers becomes an explicit operation that expresses lifecycle intent. This framing e choes a large body of research on MAS memor y , where memor y is treated as dierentiated stores with dierent read/write policies, main- tained by retrieval, summarization/reection, consolidation, and forgetting to preserve long-horizon coherence under limited con- text [ 20 , 21 , 34 , 35 , 43 , 46 , 47 , 53 , 58 ]. Accordingly , w e do not further delve into memory management here. Shared Interaction Space. Intent. Establish a work-together space in which human and agent actions are both explicit, p er- missioned operations. Mechanism. A shared space supports (i) co- creation via joint editing of Artifacts, (ii) human specication via direct manipulation of objects (which acts as structured instruction), and (iii) agent learning by observing edit traces and acceptance behavior . This reduces ambiguity about intent and couples direct manipulation with automation and interactive learning from user feedback [ 5 , 43 ]. Ideally , this shared space exposes humans to a stable le system, while the indices and internal working represen- tations maintained by agents can remain more volatile and messy from a human perspective. 7 Design Pattern Mapping While LSS Design Patterns provide a top-down inspiration for sys- tem engineering, they function as logical guidelines rather than strict physical implementation. Unlike traditional OOP where a design pattern often corresponds directly to a class structure, an LSS pattern like "Router" does not ne cessarily mandate a dedicate d 12 Loosely-Structured Soware: Engineering Context, Structure, and Evolution Entropy in Runtime-Re wired Multi- Agent Systems arXiv , 2026 Figure 10: Selected evolution patterns in Layer 3: Sandbox Mode evaluates changes before they enter the p ersistent ar- tifact set, Semantic Palimpsest ke eps changes traceable and rollback-capable, and Artifact Tiering manages artifact life- cycles by moving them across tiers with dierent read/write policies. "Router Agent. " Instead, these patterns should be viewed as cogni- tive me chanisms or plugins. A single physical Agent might embo dy multiple logical patterns, or conversely , a single logical pattern might require a collaborative team of Agents to execute . The mapping from logical design to physical Agents involves several strategies: • Dedicated Agent: Assigning a logical functional module to a specialized Agent. • Embedded Mechanism: Implementing multiple logical modules as internal steps or built-in mechanisms within a single Agent. • Collaboration: Utilizing a team of Agents to collaboratively implement a single logical module. • Derived Exe cution: Deriving a new A gent from an existing one to execute a module. Suitable when sharing the parent’s context is necessary , but the parent cannot aord additional context burden from the sub-pr ocess. • Post-Execution Derived: Implementing the module di- rectly within the current Agent’s context, but deriving a fresh Agent after the execution to "forget" the temporar y module implementation. • Self-Derived Loop: An Agent uses a derivation loop to continuously derive its own “next-hop instance” to maintain execution continuity while refreshing its context at each iteration. Consider the “Semantic Lens” example introduced in Figure 4 . Figure 11 shows one p ossible mapping implementation. When a W orker Agent needs a sp ecic skill artifact to complete a task, it can reuse its own context to derive a temporary “Lens Agent, ” injecting it with the Intent for the required le ( Derived Execution ). In practice , this is often handled by a system-level Agent Generator , which invokes a Lens Agent skill based on the W orker’s request ( Collaboration ) and derives the Lens Agent’s initial state from the W orker; exposing a Lens creator skill directly to the W orker is not a scalable design. The Lens Agent then iterates over a Skill Index, querying les one by one. After reading each le ’s content, it determines whether the content satises the W orker’s needs. If so, it records the le path. Regardless of the match result, after processing a le the Lens Agent derives a fresh instance to clear the context occupied by the le it just read ( Self-Derived Loop ). Finally , the aggregated paths are r eturned to the W orker , and the Lens Agent is terminated once its end criteria are met. The W orker remains uncontaminated by massive search noise. Figure 11: Sequence diagram of the Semantic Lens pat- tern illustrating Derived Execution, Collaboration, and Self- Derived Loops. The W orker Agent initiates the process, and the Lens Agent recursively derives new instances to handle each artifact, ke eping the W orker’s context clean. Design Principle: Semantic Cohesion Principle. One of the guiding principles for these architectural de cisions is Semantic Co- hesion, which replaces the Single Resp onsibility Principle (SRP) [ 26 ] in OOP. While SRP separates classes based on functions, Semantic Cohesion groups capabilities based on their shared semantic infor- mation required for reasoning . If two mechanisms rely on the same documents and memor y , they should likely b e merged to avoid the cost of mutual understanding among multiple agents. Conversely , if they require incompatible Vie ws, or if we intentionally want to explore diverse Vie ws, they are better split to prev ent context pollu- tion, even if they ar e functionally related. Additionally , the choice of mapping strategy should also consider the Agent’s real-time context pressure . Comparison: • Traditional soware: Single Responsibility Prin- ciple for designing a class. • LSS: Semantic Cohesion Principle for mapping an agent. Design Pattern: Vib e Compiler . Intent. Enable developers to dene system architecture through loose , standardized descriptions rather than rigid code, and automate the physical agent mapping. Mechanism. W e envision a Vibe Compiler as a Meta Agent that takes a standardized, loose ar chitectural description ( akin to a UML class diagram) and automatically generates an LSS implementation satisfying SCP. It frees developers from being concerned ab out low-level agent-orchestration details. Empirically , such strongly semantic systems risk over-engineering: they need space for self- organization and evolution to maintain adaptability . 13 arXiv , 2026 W eihao Zhang, Yitong Zhou, Huanyu , and Hongyi Li Design Pattern: Runtime Pattern Shifter . Intent. Dynamically adapt the system topology to real-time context pressure and task complexity . Mechanism. A Runtime Pattern Shifter monitors execu- tion metrics (e .g., context window usage, ambiguity ) and triggers dierent mapping modes on the y . This pattern can be integrate d with the T eam Generator discussed in the previous section and also can enable evolution mechanisms. 8 LSS W orkow Evaluation 8.1 Evaluation on RepoBench-R W e use Rep oBench-R as a retrieval-focused co de-completion bench- mark to validate two mechanisms: Semantic Lens and Index Gen- erator . Spe cically , we use the Python subset ( python_cff ) with the test_easy split. RepoBench-R [ 22 ] provides code-completion queries paired with a candidate pool of cross-le snippets drawn from the same repository , along with a gold snippet identier indi- cating which retrieved conte xt is most relevant for pr edicting the next line. This setup isolates the retrieval subproblem: given the local code context, select a small set of supp orting snippets that would best enable a downstream model to continue the code. W e compar e three retrieval congurations under the same model (DeepSeek API) and the same candidate budget. Each query starts from a lexical pre-selection pool, from which each variant outputs top- 𝐾 snippets with 𝐾 = 5 . In addition, evidence length is explic- itly bounded: candidate briefs for Lens ranking are truncated to 280 characters, and W orker-side snippet reads are truncated to 700 characters p er selected item. (A) W orker-only retrieval asks a single W orker to both scan all candidates and sele ct the top- 𝐾 snip- pets. (B) Lens + W orker splits responsibilities: a Lens selects the top- 𝐾 candidates using brief evidence, and the W orker then reads only the selected snippets. (C) Lens + Index + W orker further inserts an Index Generator that produces compact per-candidate index descriptions; the Lens makes its selection using these indexes rather than raw snippet text, after which the W orker still consumes only the selecte d snippets. Across variants we track total token cost, per-agent token usage, and average input-context tokens per agent. Figure 12: Evaluation results on RepoBench-R: (a) Retrieval success comparison, and ( b) A verage input-context tokens across variants. Retrieval quality is summarized in Figure 12 (a) using two comple- mentary metrics. Hit@5 measures r ecall: whether the gold snippet appears anywhere in the selecte d top- 5 results. T op1 Accuracy measures precision at rank 1: whether the rst selected candidate is exactly the gold snippet. On RepoBench-R, Lens+W orker improves Hit@5 over W orker-only (0.70 → 0.78), and Lens+Index yields the highest Hit@5 (0.84), indicating that indexing helps the Lens make more robust, recall-oriented selections. In contrast, T op1 Accuracy remains low and largely unchanged across variants (0.10–0.12), suggesting that while the mechanisms reliably broaden coverage of the correct evidence, ranking the single b est snippet remains dicult and may require stronger reranking obje ctives or richer dependency signals. T o understand context pressure, we rst examine average input- context tokens in Figure 12 (b). Introducing a Semantic Lens sub- stantially reduces the W orker’s average input (1543 → 1395) be- cause the W orker no longer needs to ingest the full candidate pool. Adding the Index Generator keeps the W orker’s average bounded at a similar level (1422) while shifting part of the selection burden from the Lens’s raw-snippet evidence to compact index lines. The net eect matches the intended b ehavior: retrieval-related context pressure is redistributed away from the W orker toward spe cialized, bounded-scope agents. Figure 13: Per-quer y total token traces across three variants. T otal token cost in Figure 13 increases for Lens-assiste d vari- ants b ecause selection is externalized into additional agent calls. Although total token cost increases, the Index Generator cost is an amortizable overhead: if index packages are persisted and reused across many queries within the same repository ( or the same stable candidate set), the one-time indexing cost can b e spread o ver multi- ple Lens decisions, making the Lens+Index conguration attractiv e for repeated or long-horizon workows. Figure 14: Per-quer y W orker token traces across three vari- ants. W orker-side token traces in Figure 14 show that Lens-assisted routing often reduces W orker tokens and trims average context consumption, although the magnitude is query-dependent. Overall, 14 Loosely-Structured Soware: Engineering Context, Structure, and Evolution Entropy in Runtime-Re wired Multi- Agent Systems arXiv , 2026 RepoBench-R results demonstrate a quality–cost trade-o aligned with the LSS objective of controlled context routing: adding Se- mantic Lens and Index Generator improves recall-oriented retrieval ( Hit@5 : 0.70 → 0.84) and bounds W orker context pressur e, at the price of additional orchestration tokens that can b e partially amor- tized via index reuse. 8.2 W orkow on Comprehensive LSS Environment Beyond benchmark retrieval, we build an automated research envi- ronment to serve as a harness engineering substrate for LSS work- ows. At its core is a le-based project knowledge base that stores thousands of atomic entries—ideas, thoughts, references, experi- ment records, design decisions, and intermediate drafts—each with lightweight metadata for linking, status tracking, and incremental archival. The knowledge base gr ows continuously during day-to- day human– Agent collaboration: A gents can distill valuable arti- facts from a human’s daily outputs, and can also contribute new artifacts produced by literature review , code exploration, or ex- periment execution. Over time, this le-mediated knowledge base becomes a shared interaction space where b oth humans and Agents can read, write, and build upon the same persistent state. Figure 15: The agent class architecture of task distribution and completion workow in the knowledge-base-centric en- vironment. This kno wledge base can act as both a productivity environment and an experiment environment for LSS: this paper itself was de vel- oped inside this environment, and this environment can be used for the evaluation of this paper . As a concr ete example, we implement a le-mediated, auditable task distribution and completion workow with persistent state. As shown in Figure 15 , a user r equest is rst issued as an Intent , which is then expanded into a set of concrete tasks written as structured task.md items into a shared Task Po ol . A semantic r outer selects appropriate planning/execution agents (e.g., Planner/W orker instances) conditioned on r eusable capability spec- ications ( skill.md ); agents act over the same knowledge-base substrate (articles, code, e xperiment reports, references) and com- mit outcomes back to the lesystem as traceable artifacts. These agents proceed through the following steps: (1) Generate & Dispatch: An AI reviewer/contr oller trans- forms the user intent into tasks via a task generator , mate- rializes them as task.md les in the T ask Pool, and routes to nd the suitable planners to facilitate the construction of task.md . (2) Execute & Log: A task dispatcher assigns ( or workers claim) tasks from the pool; workers e xecute the actions, consult rel- evant skills/knowledge through semantic lenses, and append stepwise execution logs and outputs to the corresponding task les and the Result Memory . (3) Review & Iterate: The reviewer reads task summaries and accumulated results from the Result Memory , performs ac- ceptance, and if requirements are not met, emits a new round of tasks; otherwise the workow halts. W e use such a workow to assist us in completing a research process, but a full research cycle requires strong human interac- tion at key contr ol points. For instance , manual re view of task.md , ne-grained adjustment of article organization logic, and targeted investigations or experiments are often necessary . This makes the highly interactive process dicult to quantify . Furthermore, be- cause it is challenging for agents to have a standardized metric for “good research, ” fully automated research generation often struggles to meet our requirements. Therefor e, after completing this paper , we tasked the aforementioned LSS workow with reproducing the research process of this paper , but in a completely automated man- ner . W e created an AI reviewer to evaluate the generated results of LSS by comparing them with the results of this paper and pro- viding corresponding Intents to the Task Generator . Except for the AI Reviewer , no other LSS Agents had access to this paper . Since agents can always identify points for further improvement, we limited the numb er of task.md items produced by the Task Gener- ator to a maximum of 10 per round. In total, 10 rounds of research activities were p erformed. W orker Agents were responsible for writing, proposing new ideas, conducting literature research ( via W eb Search tool), performing experimental verication of basic concepts, and generating gures. As illustrativ e gures cannot be generated directly , we congured the agents to produce pr ompts for Nano Banana. Figure 16: Illustrative per-round trace of task volume and token usage in the replay workow . W e record p er-round task counts and token traces for each type of agent at every round in Figure 16 (DeepSeek API). For typical research, the experimental component often consumes far more tokens than other parts, and exploration can potentially continue indenitely . T o manage this, we reduced the activity of the Experi- ment Agent, allo wing only a single round of basic experimentation. 15 arXiv , 2026 W eihao Zhang, Yitong Zhou, Huanyu , and Hongyi Li Even so, the Experiment Agent still consume d the most tokens. Throughout the run, LSS dynamically generated 23 skills. In this specic evaluation, we simplied the process by using task.md as a View to trigger W orker execution and categorized token usage by task type. This bypassed the Semantic Router , although a router would normally be required to dispatch tasks to existing W ork- ers. Figure 16 also presents the AI review er’s subjective evaluation scores for each round, including b oth the generated manuscript sections and the image prompts. 9 Conclusion Multi-agent systems increasingly behave like Lo osely-Structured Software. It is sometimes assumed that as AI models b ecome capa- ble of end-to-end autonomous execution, the need for system engi- neering will shrink. In practice, a transitional period may emerge: higher autonomy makes free-form interactions more open-ended, which can increase system-level entrop y . In the LSS paradigm, the value of engineering is ther efore less about prescribing step-by-step task logic and more about designing the macro-level “physics”— governance mechanisms for context isolation, structural bindings, and safe self-evolution—so that a system of capable agents remains stable under repeated use. Moreover , these engine ering design ex- periences also benet AI development itself. It also suggests a shift in the software ecosystem: if most executable behavior is synthe- sized at runtime , the volume of “generated code” may ev entually dwarf the amount of code written ahead of time. This challenges today’s software development and distribution models and raises unresolved questions about intellectual property and ownership. 10 AI Use Statement The authors acknowledge that generative AI contributed to this manuscript. GPT -5.2 (OpenAI) supporte d renement of languages, while Nano Banana Pro 2 (Google) assisted with some image gener- ation. The automated research environment proposed in this paper in section 8 helped to explore new ideas and perform preliminary validation. All AI-generated content was critically reviewed, e dited, and approved by the authors, who retain full responsibility for the manuscript’s integrity and accuracy . References [1] A2A Project Contributors. 2025. A2A Protocol Documentation. https://a2a- protocol.org/latest/ [2] Akari Asai, Zeqiu Wu, Yizhong W ang, A virup Sil, and Hannaneh Hajishirzi. 2023. Self-rag: Learning to retrieve , generate, and critique through self-r eection. In The T welfth International Conference on Learning Representations . [3] T om Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastr y , Amanda Askell, et al . 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. [4] Harrison Chase and LangChain Contributors. 2022. LangChain. https://github. com/langchain- ai/langchain [5] Paul F Christiano, Jan Leike, T om Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017). [6] Nicola Dragoni, Saverio Giallorenzo , Alberto Lluch Lafuente, Manuel Mazzara, Fabrizio Montesi, Ruslan Mustan, and Larisa Sana. 2017. Microser vices: yes- terday , today, and tomorrow . Present and ulterior software engineering (2017), 195–216. [7] Federico Errica, Davide Sanvito, Giuseppe Siracusano, and Rob erto Bifulco. 2025. What did i do wrong? quantifying llms’ sensitivity and consistency to prompt engineering. In Proceedings of the 2025 Conference of the Nations of the A meri- cas Chapter of the Association for Computational Linguistics: Human Language T e chnologies (V olume 1: Long Papers) . 1543–1558. [8] Martin Fowler . 2026. Harness Engineering. https://martinfowler.com/articles/ exploring- gen- ai/harness- engineering.html [9] Erich Gamma. 1995. Design patterns: elements of reusable obje ct-oriented software . Pearson Education India. [10] Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. In Procee dings of the 16th ACM workshop on articial intelligence and security . 79–90. [11] Siwei Han, K aiwen Xiong, Jiaqi Liu, Xinyu Y e, Yaofeng Su, W enbo Duan, Xinyuan Liu, Cihang Xie, Mohit Bansal, Mingyu Ding, et al . 2025. Alignment Tipping Process: How Self-Evolution Pushes LLM Agents O the Rails. arXiv preprint arXiv:2510.04860 (2025). [12] Donald Olding Hebb. 2005. The organization of b ehavior: A neuropsychological theory . Psychology press. [13] Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili W ang, Steven Ka Shing Yau, Zijuan Lin, et al . 2023. MetaGPT: Meta programming for a multi-agent collaborative framework. In The twelfth international conference on learning representations . [14] Xinyi Hou, Y anjie Zhao, Shenao W ang, and Haoyu W ang. 2025. Model context protocol (mcp): Landscape, security threats, and future resear ch directions. ACM Transactions on Software Engineering and Methodology (2025). [15] Ehud Karpas, Omri Abend, Y onatan Belinkov , Barak Lenz, Opher Lieber , Nir Ratner , Y oav Shoham, Hot Bata, Y oav Levine, Kevin Leyton-Br own, et al . 2022. MRKL Systems: A modular, neuro-symbolic architecture that combines large language mo dels, external knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445 (2022). [16] Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri V ardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, et al . 2023. Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714 (2023). [17] Gregor Kiczales, Jim Des Rivieres, and Daniel G Bobrow . 1991. The art of the metaobject protocol . MIT press. [18] Y ubin Kim, Ken Gu, Chanw oo Park, Chunjong Park, Samuel Schmidgall, A Ali Heydari, Y ao Y an, Zhihan Zhang, Yuchen Zhuang, Mark Malhotra, et al . 2025. T owards a science of scaling agent systems. arXiv preprint (2025). [19] Alexander Lercher , Johann Glock, Christian Macho, and Martin Pinzger . 2024. Microservice API evolution in practice: A study on strategies and challenges. Journal of Systems and Software 215 (2024), 112110. [20] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler , Mike Lewis, W en-tau Yih, Tim Ro cktäschel, et al . 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 (2020), 9459–9474. [21] Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts. Transactions of the asso ciation for computational linguistics 12 (2024), 157–173. [22] Tianyang Liu, Canwen Xu, and Julian McAuley . 2023. Rep oBench: Benchmarking Repository-Level Code Auto-Completion Systems. arXiv: 2306.03091 [cs.CL] [23] Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Y ang, et al . 2023. Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688 (2023). [24] Aman Madaan, Niket T andon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegree, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al . 2023. Self-rene: Iterative renement with self-feedback. Advances in neural information processing systems 36 (2023), 46534–46594. [25] Stefan Marr , Chris Seaton, and Stéphane Ducasse. 2015. Zero-overhead metapro- gramming: Reection and metaobject protocols fast and without compromises. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation . 545–554. [26] Robert Martin and Robert C Martin. 2013. Agile software development, principles, patterns, and practices . BoD–Books on Demand. [27] Lingrui Mei, Jiayu Y ao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, et al . 2025. A sur vey of context engineering for large language models. arXiv preprint arXiv:2507.13334 (2025). [28] Model Context Protocol Contributors. 2025. Spe cication - Model Context Pro- tocol. https://modelcontextprotocol.io/specication/2025- 11- 25 Accessed: 2026-02-26. [29] NanoClaw Project. 2026. NanoClaw: Secure Personal AI Agent. https://nanoclaw . dev/ [30] Victor Ojewale, Harini Suresh, and Suresh V enkatasubramanian. 2026. Au- dit Trails for Accountability in Large Language Models. arXiv preprint arXiv:2601.20727 (2026). [31] OpenAI. 2025. A Practical Guide to Building Agents. https://cdn.op enai.com/ business- guides- and- resources/a- practical- guide- to- building- agents.pdf [32] OpenAI. 2026. Harness Engineering: Leveraging Codex in an Agent-First W orld. https://openai.com/index/harness- engine ering/ 16 Loosely-Structured Soware: Engineering Context, Structure, and Evolution Entropy in Runtime-Re wired Multi- Agent Systems arXiv , 2026 [33] OpenClaw T eam. 2026. OpenClaw: A Local Gateway for High-Reliability AI Agents. https://openclaw .ai [34] Charles Packer , Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah W oo ders, and Joseph_E Gonzalez. 2023. MemGPT: towards LLMs as operating systems. (2023). [35] Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior . In Proceedings of the 36th annual acm symp osium on user interface software and technology . 1–22. [36] David Lorge Parnas. 1972. On the criteria to be used in decomposing systems into modules. Commun. ACM 15, 12 (1972), 1053–1058. [37] Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. 2024. Go- rilla: Large language model connected with massive apis. Advances in Neural Information Processing Systems 37 (2024), 126544–126565. [38] Chen Qian, W ei Liu, Hongzhang Liu, Nuo Chen, Y ufan Dang, Jiahao Li, Cheng Y ang, W eize Chen, Y usheng Su, Xin Cong, et al . 2024. Chatdev: Communicative agents for software dev elopment. In Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) . 15174–15186. [39] Y ujia Qin, Shihao Liang, Yining Y e, Kunlun Zhu, Lan Y an, Yaxi Lu, Y ankai Lin, Xin Cong, Xiangru T ang, Bill Qian, et al . 2023. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 (2023). [40] Abhishek Rath. 2026. Agent Drift: Quantifying Behavioral Degradation in Multi- Agent LLM Systems Over Extended Interactions. arXiv preprint (2026). [41] Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems . 1–7. [42] Timo Schick, Jane Dwivedi- Y u, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer , Nicola Cancedda, and Thomas Scialom. 2023. T o olformer: Language mo dels can teach themselves to use to ols. Advances in neural information processing systems 36 (2023), 68539–68551. [43] Noah Shinn, Fe derico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Y ao. 2023. Reexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems 36 (2023), 8634–8652. [44] Renan Souza, Timothy Poteet, Brian Etz, Daniel Rosendo, Amal Gueroudji, W oong Shin, Prasanna Balaprakash, and Rafael Ferreira da Silva. 2025. LLM Agents for Interactive W orkow Provenance: Reference Architecture and Evaluation Methodology . In Procee dings of the SC’25 W orkshops of the International Conference for High Performance Computing, Networking, Storage and Analysis . 2257–2268. [45] W ayne P. Stevens, Glenford J. Myers, and Larry L. Constantine. 1974. Structured design. IBM systems journal 13, 2 (1974), 115–139. [46] Guanzhi W ang, Y uqi Xie, Yunfan Jiang, Ajay Mandlekar , Chaow ei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar . 2023. V oyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023). [47] Lei W ang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Y ang, Jingsen Zhang, Zhiyuan Chen, Jiakai T ang, Xu Chen, Y ankai Lin, et al. 2024. A survey on large language model based autonomous agents. Frontiers of Computer Science 18, 6 (2024), 186345. [48] Xuezhi W ang, Jason W ei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery , and Denny Zhou. 2023. Self-Consistency Impro ves Chain of Thought Reasoning in Language Models. arXiv preprint (2023). [49] Jason W ei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al . 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837. [50] T ongyu W en, Chenglong W ang, Xiyuan Y ang, Haoyu T ang, Y ueqi Xie, Lingjuan Lyu, Zhicheng Dou, and Fangzhao Wu. 2025. Defending against indirect prompt injection by instruction detection. arXiv preprint arXiv:2505.06311 2 (2025). [51] Qingyun Wu, Gagan Bansal, Jie yu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al . 2024. A utogen: Enabling next-gen LLM applications via multi-agent conversations. In First conference on language modeling . [52] Boming Xia, Qinghua Lu, Liming Zhu, Zhenchang Xing, Dehai Zhao, and Hao Zhang. 2024. Evaluation-Driven Development and Operations of LLM A gents: A Process Model and Reference Architecture. arXiv preprint (2024). [53] Wujiang Xu, Zujie Liang, K ai Mei, Hang Gao, Juntao T an, and Yongfeng Zhang. 2025. A-mem: Agentic memory for llm agents. arXiv preprint (2025). [54] Hui Y ang, Sifu Yue, and Y unzhong He. 2023. Auto-gpt for online de cision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224 (2023). [55] Shunyu Y ao, Dian Yu, Jerey Zhao, Izhak Shafran, T om Griths, Y uan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems 36 (2023), 11809–11822. [56] Shunyu Yao , Jerey Zhao, Dian Y u, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Y uan Cao. 2022. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations . [57] Chenxing Zhong, He Zhang, Chao Li, Huang Huang, and Daniel Feitosa. 2023. On measuring coupling between microservices. Journal of Systems and Software 200 (2023), 111670. [58] W anjun Zhong, Lianghong Guo, Qiqi Gao , He Y e, and Y anlin W ang. 2024. Memo- ryBank: Enhancing Large Language Models with Long- T erm Memor y . Proceedings of the AAAI Conference on Articial Intelligence 38, 17 (Mar. 2024), 19724–19731. 17
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment