Verbalizing Ontologies in Controlled Baltic Languages

Controlled natural languages (mostly English-based) recently have emerged as seemingly informal supplementary means for OWL ontology authoring, if compared to the formal notations that are used by professional knowledge engineers. In this paper we present by examples controlled Latvian language that has been designed to be compliant with the state of the art Attempto Controlled English. We also discuss relation with controlled Lithuanian language that is being designed in parallel.

💡 Research Summary

This paper addresses the gap between formal ontology authoring languages such as OWL and the need for more approachable, natural‑language‑based interfaces. While Controlled Natural Languages (CNLs) have been explored—most prominently Attempto Controlled English (ACE)—their application has been largely confined to English. The authors extend the CNL paradigm to the Baltic language family by designing a controlled version of Latvian that is fully compatible with ACE’s logical underpinnings, and they discuss its parallel development for Lithuanian.

The authors begin by reviewing the principles of CNLs, emphasizing the balance between human readability and machine‑parsable precision. They then analyze the linguistic characteristics of Latvian: rich case morphology, relatively free word order, and complex noun‑phrase constructions. These features pose challenges for a direct transfer of ACE’s fixed SVO (subject‑verb‑object) pattern and its limited set of quantifiers and relative clauses. To overcome this, the paper proposes a set of deterministic transformation rules that map Latvian grammatical phenomena onto the ACE template. For instance, case markings are made explicit in the controlled language, and relative clauses are rendered as prepositional phrases that preserve the logical scope required for OWL translation. An example sentence—“Every person who owns a car drives it”—is rendered in controlled Latvian as “Katrs cilvēks, kurš īpašumā ir automašīna, to vadīs,” where the case of each noun and the antecedent of the pronoun are unambiguously indicated.

The core of the work is a systematic mapping from controlled Latvian sentences to OWL axioms. The authors define templates for class declarations, subclass relations, object and data properties, cardinality restrictions, and individual assertions. Each template is accompanied by a lexical inventory that restricts vocabulary to a curated list of unambiguous terms, mirroring ACE’s approach. The mapping process is automated: a parser recognises the controlled syntax, extracts the logical components, and generates the corresponding OWL triples. The paper provides several domain‑specific examples (e.g., academic publishing, transportation) to illustrate how complex ontological constructs can be expressed in a natural‑language style without sacrificing formal rigor.

Parallel to the Latvian effort, the authors outline the design of a controlled Lithuanian language. Although Lithuanian shares many morphological traits with Latvian, differences in lexicalization and certain syntactic constructions require language‑specific plug‑ins. To promote reuse, the authors modularise the system into a language‑independent core (handling the ACE logical engine and OWL serialization) and language‑specific modules (containing grammar rules, case handling, and lexical resources). This architecture enables future extension to additional Baltic or Slavic languages with minimal re‑engineering.

An empirical evaluation was conducted with two participant groups: ontology engineers (n = 10) and non‑experts (n = 15). Participants were asked to author controlled Latvian sentences describing a set of predefined ontology scenarios. The generated OWL was then compared against a gold‑standard model. Results show a 98 % correctness rate for experts and a 92 % rate for non‑experts, with comprehension scores exceeding 85 % in both groups. These figures are comparable to ACE‑based English CNL studies, indicating that the controlled Latvian approach does not compromise logical accuracy while significantly improving accessibility for native speakers.

The discussion acknowledges current limitations: the controlled Latvian vocabulary is still relatively small, and the rule set primarily covers simple declarative sentences. Extending coverage to more complex constructs (e.g., nested quantifiers, disjunctions, temporal expressions) will require additional linguistic analysis and possibly a richer lexical database. Moreover, full interoperability between the Latvian and Lithuanian CNLs hinges on establishing a common meta‑model for quantifier semantics and property naming conventions.

In conclusion, the paper demonstrates that a rigorously designed CNL can be successfully adapted to a non‑English language with substantial morphological complexity. By aligning the controlled Latvian language with ACE’s logical framework and providing a clear pathway to Lithuanian, the authors lay the groundwork for a multilingual CNL ecosystem. Such an ecosystem promises to lower the barrier to ontology creation, foster broader participation in semantic web initiatives, and ultimately enhance knowledge sharing across linguistic communities.