Constraining constructions with WordNet: pros and cons for the semantic annotation of fillers in the Italian Constructicon

Constraining constructions with WordNet: pros and cons for the semantic annotation of fillers in the Italian Constructicon
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The paper discusses the role of WordNet-based semantic classification in the formalization of constructions, and more specifically in the semantic annotation of schematic fillers, in the Italian Constructicon. We outline how the Italian Constructicon project uses Open Multilingual WordNet topics to represent semantic features and constraints of constructions.


💡 Research Summary

The paper presents the Italian Constructicon (ItCon) project, focusing on how to constrain the semantic fillers of constructions using WordNet‑based classification. In Construction Grammar, constructions are pairings of form and function that can range from simple lexical items to complex multi‑word patterns. Recent work on “constructicography” seeks to build structured repositories (Constructicons) that capture these patterns, but Italian has lagged behind, lacking both a Constructicon and an Italian FrameNet.

To address this gap, the authors introduce ItCon, an open, collaborative resource designed to interoperate with existing Italian linguistic resources such as treebanks and lexical databases. The core of ItCon consists of three linked components: (1) a database of constructions (cxns) with metadata; (2) a graph where each node represents a construction and edges encode horizontal and vertical relations; and (3) a body of annotated examples in CoNLL‑U format. For the formal representation of constructions, the authors develop a new format called CoNLL‑C, which extends the UD‑compatible CoNLL‑X schema with additional fields (REQUIRED, ADJACENCY, WITHOUT, IDENTITY) that allow precise control over which tokens must appear, which may be intervening, which values are excluded, and which features must be shared across slots.

The novel contribution lies in the semantic layer: each slot can be annotated with an OntoClass feature that draws on Open Multilingual WordNet (OMW) topics, i.e., the top‑level lexicographer files used in Princeton WordNet and mapped to Italian MultiWordNet. Currently the authors employ 26 noun topics and 15 verb topics. By tagging the noun slot of the construction “fare N feeling” with the topic noun.feeling, the system can filter out false positives such as “fare demagogia” or “fare cassa”, while retaining genuine psychological‑state nouns like “schifo”, “paura”, and “piacere”.

A coverage experiment was conducted on all Italian UD treebanks (excluding the Old Italian treebank). Lemmas with frequency >5 (5,273 items) were examined; 90 % of noun lemmas and 87.3 % of verb lemmas could be linked to at least one OMW topic. When considering token frequencies, only 3.5 % of forms (both nouns and verbs) lacked a topic. These figures suggest that a substantial proportion of constructions in Italian corpora can be captured using the proposed semantic annotation.

The authors also discuss limitations. First, the pre‑defined OMW topics are coarse‑grained; they may not cover all needed semantic nuances, and extending the tagset is difficult without sacrificing interoperability. Second, the current scheme only handles nouns and verbs; adjectives and adverbs lack a comparable topic hierarchy in Italian WordNet, limiting the ability to constrain those slots. Third, OMW topics provide class information but no explicit semantic relations (e.g., antonymy, similarity) and no cross‑POS links. To overcome this, the paper proposes future work that would exploit WordNet’s synset relations via the IDENTITY field, allowing constraints such as “the two nouns in an oxymoron must be antonyms” or “the verb and its object in a cognate construction must be semantically related”. However, this approach depends on the completeness of Italian WordNet (ItalWordNet) relations; currently many cross‑POS relations are missing, which could cause over‑filtering.

In summary, the paper demonstrates that using OMW topics for semantic classification offers a quick, standards‑based way to annotate filler constraints in an Italian Constructicon, improving interoperability with other multilingual resources. At the same time, it highlights the need for richer, more fine‑grained ontologies and better coverage of semantic relations to fully capture the complexity of constructional meaning in Italian. Future directions include expanding the semantic tagset to adjectives and adverbs, integrating synset‑level relations for inter‑slot constraints, and enhancing the underlying Italian WordNet resources.


Comments & Academic Discussion

Loading comments...

Leave a Comment