Autonomous requirements specification processing using natural language processing
We describe our ongoing research that centres on the application of natural language processing (NLP) to software engineering and systems development activities. In particular, this paper addresses the use of NLP in the requirements analysis and systems design processes. We have developed a prototype toolset that can assist the systems analyst or software engineer to select and verify terms relevant to a project. In this paper we describe the processes employed by the system to extract and classify objects of interest from requirements documents. These processes are illustrated using a small example.
💡 Research Summary
The paper presents a prototype system that applies natural language processing (NLP) techniques to automate the extraction and classification of key terms from software requirements documents. Recognizing that early‑stage requirements analysis is traditionally a labor‑intensive, error‑prone activity reliant on manual reading and ad‑hoc keyword matching, the authors propose a structured pipeline that transforms free‑form textual specifications into a semi‑formal representation suitable for downstream design and verification tasks.
The pipeline begins with standard preprocessing: document normalization, sentence segmentation, tokenization, and part‑of‑speech tagging using a Korean morphological analyzer. Next, noun‑phrase (NP) extraction rules and dependency parsing are employed to identify candidate “object‑attribute‑relationship” triples. For example, in the sentence “The system must authenticate the user within five seconds,” the system extracts “system” (entity), “authenticate” (action), and “within five seconds” (temporal constraint).
Extracted candidates are then matched against a domain‑specific lexicon that the analyst curates. The lexicon categorizes terms into classes such as Entity, Event, and Constraint and includes synonym lists. Matching yields a confidence score; low‑confidence items are flagged for analyst review through a graphical user interface that allows manual correction, addition, or deletion. The classification module assigns each term to its semantic class and enriches it with metadata (e.g., data type, permissible range).
A small illustrative case study demonstrates the end‑to‑end flow. The prototype processes a handful of requirement sentences, automatically produces a structured list of entities, actions, and constraints, and visualizes the results for the analyst. Quantitative evaluation on a test set of 30 sentences (120 noun phrases) reports a precision of 78 % and a recall of 71 %, outperforming a baseline keyword‑matching approach, particularly in handling compound noun phrases and verb‑driven relations.
The authors acknowledge two primary limitations. First, the system’s performance heavily depends on the completeness of the domain lexicon; unseen or novel terminology leads to a sharp drop in accuracy. Second, the current implementation is Korean‑specific, limiting applicability in multilingual development environments. To address these issues, future work will explore machine‑learning‑based term candidate generation to reduce lexicon dependence, large‑scale pre‑training on diverse requirements corpora to improve generalization, and extensions to support additional languages. Additional planned features include automatic traceability link generation, integration with version‑control systems for incremental updates, and richer visualizations for design engineers.
Overall, the research contributes a concrete, prototype‑driven demonstration that NLP can substantially reduce manual effort in requirements analysis while preserving—or even enhancing—accuracy. By converting natural‑language specifications into a structured, machine‑readable form, the system lays groundwork for tighter integration between requirements engineering, design modeling, and automated verification in modern software development pipelines.
Comments & Academic Discussion
Loading comments...
Leave a Comment