Semantic annotation of requirements for automatic UML class diagram generation
The increasing complexity of software engineering requires effective methods and tools to support requirements analysts’ activities. While much of a company’s knowledge can be found in text repositories, current content management systems have limited capabilities for structuring and interpreting documents. In this context, we propose a tool for transforming text documents describing users’ requirements to an UML model. The presented tool uses Natural Language Processing (NLP) and semantic rules to generate an UML class diagram. The main contribution of our tool is to provide assistance to designers facilitating the transition from a textual description of user requirements to their UML diagrams based on GATE (General Architecture of Text) by formulating necessary rules that generate new semantic annotations.
💡 Research Summary
The paper addresses the persistent challenge of converting natural‑language software requirements into formal UML class diagrams, a step that traditionally requires substantial manual effort and expertise. To bridge this gap, the authors propose a prototype tool built on the GATE (General Architecture for Text Engineering) framework, leveraging a combination of standard NLP components and custom semantic annotation rules.
The processing pipeline consists of three main phases. In the first phase, the input requirement document is tokenized, sentences are split, and each token receives a part‑of‑speech tag using GATE’s Tokeniser, Sentence Splitter, and POS Tagger. This morphological analysis produces a parse tree that captures the basic syntactic structure of the text.
The second phase performs semantic extraction of UML concepts. The authors extend GATE’s ANNIE Named Entity Transducer with a set of handcrafted JAPE (Java Annotation Patterns Engine) rules and domain‑specific Gazetteer lists. These rules identify three UML elements:
- Classes – recognized primarily through noun entries in the Gazetteer and noun‑verb‑noun patterns (e.g., “client passes order”).
- Associations – detected when a verb appears between two class nouns, with the verb itself annotated as the association.
- Attributes – extracted either from tokens that belong to an attribute list or from noun‑verb‑noun constructions where the middle noun does not belong to the class list.
The third phase resolves relationships among the extracted entities. Using GATE’s Orthomatcher component, the system performs a lightweight coreference resolution, linking different mentions of the same concept and annotating complex relational structures. The output of the pipeline is an XML file that contains all semantic tags (Class, Association, Attribute) while extraneous tags such as
To evaluate the approach, the authors assembled a corpus of requirement texts from several application domains and ran the tool on this dataset. The experimental results show that the system can reliably identify UML elements with a low error rate for relatively simple sentences. However, the authors acknowledge that performance degrades on more intricate or domain‑specific sentences, reflecting the inherent limitations of a rule‑based method that depends on manually curated Gazetteer entries and JAPE patterns.
In the discussion, the paper situates its contribution among related works that either rely on domain ontologies, transform use‑case diagrams into class diagrams, or generate natural language from UML models. Unlike those approaches, the presented tool works directly on raw textual requirements without requiring an intermediate use‑case model, but it shares the common drawback of limited scalability and adaptability.
The conclusion emphasizes that semantic annotation of requirements, as demonstrated with GATE, can automate a substantial portion of the requirements‑to‑design transition, reducing manual effort and improving traceability. Future work is outlined as the automatic generation of Gazetteer lists, incorporation of machine‑learning‑based entity recognition to handle linguistic variability, and the development of a seamless XML‑to‑UML diagram conversion module. The authors also thank the GATE community for support, underscoring the collaborative nature of the research.
Comments & Academic Discussion
Loading comments...
Leave a Comment