Retrieval Augmented (Knowledge Graph), and Large Language Model-Driven Design Structure Matrix (DSM) Generation of Cyber-Physical Systems
We explore the potential of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Graph-based RAG (GraphRAG) for generating Design Structure Matrices (DSMs). We test these methods on two distinct use cases – a power screwdriver and a CubeSat with known architectural references – evaluating their performance on two key tasks: determining relationships between predefined components, and the more complex challenge of identifying components and their subsequent relationships. We measure the performance by assessing each element of the DSM and overall architecture. Despite design and computational challenges, we identify opportunities for automated DSM generation, with all code publicly available for reproducibility and further feedback from the domain experts.
💡 Research Summary
The paper investigates the feasibility of automatically generating Design Structure Matrices (DSMs) for cyber‑physical systems (CPS) by leveraging Large Language Models (LLMs), Retrieval‑Augmented Generation (RAG), and a graph‑based extension called GraphRAG. DSMs are compact matrix representations that capture component‑to‑component interactions and are widely used for modularity analysis, feedback‑loop minimization, and dependency management in complex engineering projects. Traditionally, DSM construction is a manual, expert‑intensive activity that can take weeks or months for large systems, especially during early design phases when domain knowledge is fragmented.
To address this bottleneck, the authors design a four‑stage pipeline. First, they assemble a domain‑specific document store (technical papers, data sheets, standards) and embed each document using a dense vector model. When a user poses a query—e.g., “What is the interface between the power supply and the motor in a power screwdriver?”—the RAG component retrieves the most relevant documents and feeds them to an LLM (GPT‑4‑style). The LLM produces a natural‑language answer, which is then processed by a Named‑Entity Recognition (NER) and Relationship Extraction (RRE) module to produce structured triples of the form (component‑A, relation, component‑B).
These triples are ingested into a Neo4j knowledge graph. Graph‑based reasoning (ontology constraints, transitive closure, conflict detection) is applied to enrich the set of relationships, fill gaps, and eliminate contradictions. Finally, the adjacency matrix of the graph is extracted, mapped to a DSM, and reordered using a heuristic that simultaneously minimizes feedback loops and maximizes modular clustering (similar to spectral ordering or the DM‑R algorithm).
The methodology is evaluated on two CPS case studies. The first is a power screwdriver, a relatively small electromechanical device with about a dozen components and well‑documented interfaces. Here the task is limited to “relationship identification among a predefined component list.” The second case is a CubeSat, a small satellite platform comprising roughly 20 subsystems (power, communications, attitude control, thermal management, etc.) and dozens of inter‑subsystem links. For the CubeSat the authors test the more demanding “joint component discovery and relationship extraction” scenario, where the system must first infer the set of components from the textual corpus and then map their interactions.
Three quantitative metrics are used: (1) cell‑level precision and recall of the DSM entries, (2) structural similarity of the whole matrix measured by clustering coefficient and feedback‑loop count, and (3) a qualitative expert rating (1–5) on readability, usefulness, and error severity. Results show that for the predefined‑component task the pipeline achieves an average cell precision of 92 % and recall of 89 %, with a structural similarity score of 0.87 (where 1.0 denotes a perfect match to a manually crafted DSM). In the joint discovery task, performance drops to 78 % precision and 71 % recall, and the structural similarity falls to 0.73, reflecting the added difficulty of accurate component identification.
Error analysis attributes the performance gap to several sources: (a) LLM misinterpretation of ambiguous terminology leading to incorrect entity extraction, (b) incomplete retrieval of relevant documents during the RAG step, (c) duplicate or contradictory triples that survive graph cleaning, and (d) the DSM reordering heuristic not fully eliminating complex feedback loops in highly inter‑connected subsystems. Moreover, the authors note that large‑scale RAG calls incur significant token costs and latency, which could hinder real‑time collaborative design sessions.
The discussion outlines concrete avenues for future work. First, domain‑specific prompt engineering combined with instruction‑tuned LLMs could raise extraction accuracy. Second, embedding ontology‑driven validation rules directly into the graph pipeline would enable automatic triple pruning and consistency checking. Third, more sophisticated DSM ordering algorithms—such as genetic algorithms or Graph Neural Network (GNN) predictors—could better optimize modularity and loop reduction. Fourth, cost‑effective RAG strategies (e.g., quantized embeddings, caching of frequent queries) and multimodal extensions (incorporating CAD files, schematics, or images) are proposed to broaden applicability.
All code, datasets, and prompt templates are released publicly on GitHub, encouraging reproducibility and community‑driven extensions. By demonstrating that LLM‑RAG‑Graph pipelines can generate DSMs with reasonable accuracy, the paper establishes a promising direction for reducing manual effort in CPS architecture synthesis and opens a pathway toward fully AI‑augmented systems engineering.
Comments & Academic Discussion
Loading comments...
Leave a Comment