Resolving Lexical Ambiguity in Tensor Regression Models of Meaning
This paper provides a method for improving tensor-based compositional distributional models of meaning by the addition of an explicit disambiguation step prior to composition. In contrast with previous research where this hypothesis has been successfully tested against relatively simple compositional models, in our work we use a robust model trained with linear regression. The results we get in two experiments show the superiority of the prior disambiguation method and suggest that the effectiveness of this approach is model-independent.
💡 Research Summary
The paper investigates whether an explicit disambiguation step before composition improves the performance of tensor‑based compositional distributional models of meaning when the tensors are learned via linear regression, i.e., without any simplifying assumptions that were used in earlier work. The authors first construct a standard distributional vector space from the ukWac corpus, reduce it to 300 dimensions with SVD, and then train a separate linear regression model for each verb. For a given verb, the rows of the input matrix X are vectors of nouns that appear as objects of that verb, while the rows of the output matrix Y are the holistic vectors of the corresponding verb‑object phrases. The regression objective minimizes a regularized squared error, producing a matrix W that serves as the verb’s tensor (order‑2 in this implementation, but conceptually a full order‑3 tensor could be built in the same way).
Two experimental settings are presented. In the supervised setting, five highly polysemous verbs (break, catch, play, admit, draw) are manually split into two distinct senses, each associated with a set of objects that unambiguously evoke that sense. For each verb, the authors train three matrices: one on the combined data (ambiguous) and two on the sense‑specific data (disambiguated). Composite vectors are obtained by multiplying the appropriate matrix with the object vector, and the resulting vectors are compared against the true holistic vectors of the verb‑object phrases. Evaluation uses three metrics: strict accuracy (the correct composite vector is the top‑ranked), mean reciprocal rank (MRR), and average cosine similarity. Across all verbs and metrics, the disambiguated models outperform the ambiguous baseline, with statistically significant improvements (p < 0.001). For example, the verb “play” shows accuracy rising from 0.20 to 0.60, MRR from 0.28 to 0.68, and average cosine similarity from 0.41 to 0.68.
In the unsupervised setting, the authors replace manual sense annotation with a clustering approach. For each occurrence of a verb, they build a context vector by averaging the vectors of all other words in the same sentence. Hierarchical agglomerative clustering (HAC) is then applied to these context vectors, assuming that distinct clusters correspond to distinct senses. For each cluster, a separate regression model is trained, yielding sense‑specific tensors automatically. The same evaluation pipeline as in the supervised experiment is used, and again the sense‑specific (disambiguated) tensors produce higher accuracy, MRR, and cosine similarity than the single ambiguous tensor.
The key insight is that the benefit of prior disambiguation is not tied to the simplicity of the compositional function; even when the composition is performed by a full‑blown linear‑regression‑based tensor, separating sense selection from composition yields markedly better semantic representations. This suggests that the disambiguation step reduces noise in the input vectors, allowing the learned tensors to capture more precise relational information. Moreover, the results demonstrate model‑independence: the same improvement is observed for both simple element‑wise models (as shown in earlier work) and for the more expressive tensor regression model used here.
The authors conclude that explicit sense disambiguation should be considered a standard preprocessing step for compositional distributional semantics, regardless of the underlying compositional architecture. They propose future work on extending the approach to higher‑order tensors (e.g., full order‑3 tensors for transitive verbs), integrating deep neural disambiguation modules, and scaling the method to larger corpora and real‑time applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment