Unveiling the relationship between complex networks metrics and word senses
The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shine light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts
💡 Research Summary
The paper tackles the problem of Word Sense Disambiguation (WSD) by representing texts as complex networks (CN) and exploiting the local topological properties of ambiguous words. The authors argue that while humans effortlessly disambiguate word meanings from context, computers struggle, and existing approaches either rely on extensive lexical resources (deep paradigm) or on shallow statistical cues such as neighboring word frequencies. To explore an alternative, the study models each word as a node and each successive word pair as a directed, weighted edge, constructing adjacency networks from 18 books. Stop‑words are removed and remaining words are lemmatized; crucially, each occurrence of an ambiguous word is treated as a distinct node, allowing the structural signature of each sense to be examined separately.
Ten polysemous words (save, note, march, present, jam, ring, just, bear, rock, close) are manually annotated with their correct senses. For each occurrence, the authors compute sixteen local network measurements, grouped into four categories: (1) basic degree (k₁) and strength (s₁) and their hierarchical extensions (k₂‑k₄, s₂‑s₄); (2) clustering coefficients (C₁‑C₄) reflecting the density of triangles; (3) neighbor‑based statistics – average degree and strength of neighbors (h_kⁿ, h_sⁿ) and their standard deviations (Δkⁿ, Δsⁿ); (4) path‑based metrics – average shortest‑path length (l) and betweenness centrality (B). Hierarchical expansions are performed by merging a node with its neighbors iteratively, a technique previously shown to improve network characterization.
To assess discriminative power, three standard classifiers—C4.5 decision trees, Naïve Bayes, and k‑Nearest Neighbors (kNN)—are trained using 10‑fold cross‑validation. As a baseline, a “most‑frequent‑sense” classifier is used, and a traditional shallow approach based on the frequencies of the 5, 20, or roughly 50 nearest words is also evaluated. Results (Tables 2 and 3) reveal that the CN‑based method achieves statistically significant accuracy (p‑value α_cn < 5 × 10⁻²) for nine out of ten words, outperforming the traditional frequency‑based method for five of them. Notably, the word “jam” is disambiguated with 100 % accuracy. The best classifiers typically require only a small subset of the sixteen features—often five or fewer—and in some cases just two features (e.g., for “save”, only neighbor strength h_sⁿ and average shortest‑path length l suffice).
Feature importance analysis shows that hierarchical connectivity measures (kₘ, sₘ) and higher‑order clustering coefficients (Cₘ) are the most informative for distinguishing senses. This suggests that the way a word’s immediate and extended neighborhoods are wired captures semantic nuances, likely because different senses appear in distinct syntactic or topical contexts, which are reflected in the network’s structure. Neighbor‑based statistics also contribute, indicating that the variability of surrounding nodes’ connectivity carries discriminative signal.
The authors conclude that complex‑network topology provides a complementary source of information for WSD. While the current approach does not aim to build the state‑of‑the‑art disambiguation system, it demonstrates that structural cues can rival or surpass shallow lexical cues, especially when large corpora are available to ensure statistical robustness. They propose that integrating CN‑derived features with traditional semantic resources (e.g., WordNet, lexical databases) or modern deep‑learning embeddings could yield hybrid systems with superior performance. Overall, the study illuminates a concrete link between semantic meaning and the underlying graph‑theoretic properties of text, opening avenues for further interdisciplinary research between network science and natural language processing.
Comments & Academic Discussion
Loading comments...
Leave a Comment