A Note on Semantics (with an Emphasis on UML)
This note clarifies the concept of syntax and semantics and their relationships. Today, a lot of confusion arises from the fact that the word “semantics” is used in different meanings. We discuss a general approach at defining semantics that is feasible for both textual and diagrammatic notations and discuss this approach using an example formalization. The formalization of hierarchical Mealy automata and their semantics definition using input/output behaviors allows us to define a specification, as well as an implementation semantics. Finally, a classification of different approaches that fit in this framework is given. This classification may also serve as guideline when defining a semantics for a new language.
💡 Research Summary
The paper tackles a persistent source of confusion in software engineering and formal methods: the ambiguous use of the terms “syntax” and “semantics.” It begins by rigorously distinguishing the two concepts. Syntax is defined as the purely structural aspect of a language—its tokens, grammar rules, parse trees, and other formal representations that describe how symbols may be combined. Semantics, by contrast, is the interpretation of those structures, i.e., the rules that assign meaning to syntactic entities. The authors further split semantics into static semantics (type checking, well‑formedness constraints) and dynamic semantics (execution behavior, input/output relations).
Having clarified terminology, the authors propose a unified framework for defining semantics that works for both textual languages and diagrammatic notations such as UML. The framework consists of three stages: (1) Model Construction, where a concrete syntactic artifact (a parse tree, graph, or diagram) is built; (2) Semantic Mapping, which translates the model into a mathematical object—typically a state‑transition system, a function, or a relational structure; and (3) Behavior Extraction, where the mathematical object is used to generate a set of observable behaviors, usually expressed as sequences of inputs and corresponding outputs. This three‑step pipeline abstracts away the concrete representation of the language, allowing the same semantic definition technique to be applied to source code, state‑chart diagrams, class diagrams, or any other formal visual language.
To demonstrate the practicality of the approach, the paper formalizes Hierarchical Mealy Automata (HMA) as a running example. An HMA extends the classic Mealy machine with a hierarchy of states, each of which may contain a nested automaton. This hierarchical structure mirrors the modular decomposition found in many real‑world systems and in UML state‑charts. The authors first give a precise syntactic description of HMAs, then define two distinct semantics.
The first is a specification semantics. Here the hierarchy is “flattened” into a single flat Mealy machine, and the semantics is defined as the set of all possible input‑output functions that the machine can realize. Formally, each HMA is mapped to a relation (R \subseteq I^{} \times O^{}) where (I) and (O) are the input and output alphabets. This relation captures every admissible behavior without committing to any particular execution strategy. It is therefore suitable for requirements specification, formal verification, and refinement checking, because it abstracts away implementation details while preserving observable behavior.
The second is an implementation semantics. In this view the HMA is interpreted as an executable state machine. The mapping retains the hierarchical nesting and introduces an execution policy that resolves nondeterminism, concurrency, and hidden states. The resulting semantics is a set of concrete traces generated by a deterministic (or policy‑guided nondeterministic) scheduler. This version is directly useful for code generation, simulation, and testing, because it provides a concrete operational model that can be executed on a target platform.
The paper then classifies semantic definition approaches into four broad categories:
- Structural Semantics – maps syntactic elements to mathematical structures (e.g., categories, algebras) preserving the compositional architecture of the model.
- Behavioral Semantics – focuses on the dynamic execution, often expressed as transition systems, trace languages, or operational rules.
- Abstract Semantics – abstracts away low‑level details to capture high‑level properties such as invariants, safety conditions, or security policies.
- Implementation Semantics – ties the model to concrete execution artifacts (code, hardware) and defines the exact runtime behavior.
This taxonomy serves as a decision‑making guide for language designers: depending on the intended use (verification, code generation, documentation, etc.) one can select the appropriate semantic style or combine several.
Finally, the authors discuss the practical impact of having a well‑defined semantics. A precise semantics enables formal verification (model checking, theorem proving) by providing an unambiguous target for proof obligations. It also facilitates automatic code generation because the implementation semantics can be directly translated into executable artifacts. Moreover, in a model‑based development workflow, a shared semantics for textual and diagrammatic notations improves tool interoperability and reduces the risk of semantic drift when models are transformed or refined.
In conclusion, the paper delivers a comprehensive treatment of syntax versus semantics, proposes a language‑agnostic three‑stage semantic definition framework, validates it with a detailed formalization of hierarchical Mealy automata, and offers a clear classification of semantic approaches. This work equips language designers, tool builders, and researchers with a solid methodological foundation for defining, analyzing, and applying semantics to both existing and future modeling languages, including UML and its many diagram types.
Comments & Academic Discussion
Loading comments...
Leave a Comment