Syntagma Lexical Database

This paper discusses the structure of Syntagma’s Lexical Database (focused on Italian). The basic database consists in four tables. Table Forms contains word inflections, used by the POS-tagger for the identification of input-words. Forms is related to Lemma. Table Lemma stores all kinds of grammatical features of words, word-level semantic data and restrictions. In the table Meanings meaning-related data are stored: definition, examples, domain, and semantic information. Table Valency contains the argument structure of each meaning, with syntactic and semantic features for each argument. The extended version of SLD contains the links to Syntagma’s Semantic Net and to the WordNet synsets of other languages.

💡 Research Summary

The paper presents the design and implementation of the Syntagma Lexical Database (SLD), a comprehensive lexical resource focused on Italian. SLD is organized around four relational tables—Forms, Lemma, Meanings, and Valency—each addressing a distinct layer of linguistic information while maintaining tight inter‑table links.

Forms stores every inflected surface form of a word together with a pointer to its canonical lemma. This one‑to‑many mapping enables rapid lookup during part‑of‑speech tagging and morphological analysis, allowing downstream components to retrieve the full set of grammatical features associated with a token.

Lemma contains the lemma‑level grammatical description (part of speech, gender, number, case, etc.) and, importantly, lexical restrictions that limit how the lemma can combine with other words (e.g., “only countable nouns”, “cannot co‑occur with a verb”). By encoding these constraints, the database helps syntactic parsers prune impossible constructions early, reducing parsing errors and improving overall efficiency.

Meanings captures the polysemous nature of each lemma. For every sense the database records a definition, illustrative example sentences, a domain tag (such as medicine, law, technology), and semantic metadata. Each sense is also linked to a WordNet synset identifier, providing a bridge to the multilingual WordNet ecosystem. Domain tags serve as a coarse‑grained sense‑disambiguation cue, allowing applications to filter senses according to the topical context.

Valency models the argument structure of each sense. It enumerates the logical roles (subject, direct object, indirect object, adjunct, etc.) and, for each role, specifies both syntactic category information and semantic role labels (agent, patient, instrument, experiencer, etc.). This dual annotation supports fine‑grained semantic role labeling, deep parsing, and meaning‑driven generation tasks.

The extended version of SLD connects the lexical tables to two larger knowledge structures: Syntagma’s own semantic network and the multilingual WordNet synsets. Through these links, lexical items become nodes in a graph of semantic relations (synonymy, antonymy, hypernymy, meronymy), and cross‑lingual mappings are automatically available. This integration is particularly valuable for machine translation, multilingual information retrieval, and language‑learning applications, where a unified representation of form, syntax, and meaning across languages is essential.

Overall, SLD demonstrates a unified approach that collapses the traditional separation between morphological dictionaries, part‑of‑speech inventories, and sense‑level lexical databases. By storing form, grammatical constraints, sense definitions, domain information, and argument structure in a single, relational schema, the resource minimizes data redundancy and simplifies the interfaces between processing modules. The inclusion of lexical restrictions and domain tags improves disambiguation accuracy for polysemous words, while the valency layer provides the detailed semantic scaffolding required for advanced natural‑language‑understanding systems. Consequently, SLD offers a robust, extensible foundation for both linguistic research and practical NLP pipelines that require tight coupling of morphological, syntactic, and semantic information.

💡 Research Summary

📜 Original Paper Content