A Computational Model to Disentangle Semantic Information Embedded in Word Association Norms

Reading time: 5 minute
...

📝 Original Info

  • Title: A Computational Model to Disentangle Semantic Information Embedded in Word Association Norms
  • ArXiv ID: 0812.3070
  • Date: 2008-12-17
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Two well-known databases of semantic relationships between pairs of words used in psycholinguistics, feature-based and association-based, are studied as complex networks. We propose an algorithm to disentangle feature based relationships from free association semantic networks. The algorithm uses the rich topology of the free association semantic network to produce a new set of relationships between words similar to those observed in feature production norms.

💡 Deep Analysis

Deep Dive into A Computational Model to Disentangle Semantic Information Embedded in Word Association Norms.

Two well-known databases of semantic relationships between pairs of words used in psycholinguistics, feature-based and association-based, are studied as complex networks. We propose an algorithm to disentangle feature based relationships from free association semantic networks. The algorithm uses the rich topology of the free association semantic network to produce a new set of relationships between words similar to those observed in feature production norms.

📄 Full Content

Understanding the structure of semantic knowledge is an open challenge of fundamental importance in cognitive science. Along the most powerful computational probabilistic approaches to this challenge [1,2,3,4,5], recent studies have used also the perspective offered by the theory of complex networks to gain insight on it [6,7]. The main idea behind the network approach is to map empirical data onto a graph (usually called complex network) that summarizes the observed relations between words in a given experiment. Once the network is constructed, its statistical characterization (distribution of degree of nodes, clustering measures, etc.) reveal properties that can help to better understand the large-scale structure of semantic relations in the specific set. However, while the network approach has been merely descriptive up to now, computational models like LSA [1], WAS [4] or the Topic Model [3] have an intrinsic predictive capability. In particular, some of these models are used to reveal interaction between episodic and semantic memory, considering empirical data that reflects the impact of environmental (i.e. nonlinguistic) experience upon linguistic phenomena [4,8,9].

The description of semantic knowledge as a complex network of interactions between words, does not suffice to get a clear picture of the specific relations between complex networks representing different semantic empirical data sets. One of the main reasons for this is that while the notion of node is quite uncontroversial (in our case a word), the concept of edge is not so because it must be committed to a definition of relationship. In what semantics is concerned, we can consider that a word is related to another one if they belong to the same class (category-related, such as car and wagon); or if they tend to co-occur in many contexts (car and road); or if they have a cause-effect relationship (fire and smoke), and so on. For some of these types of relationship there exist empirical data that quantify how strong two words are related. (Notice that two words may have several of these relationships).

It is clear that different semantic networks will arise depending on the type of association used to link words by the subjects of a cognitive experiment. Moreover, given the intricate complexity of human mind, the more free the association scenario, the more rich the types of relationship will appear. These different association scenarios can reflect semantic or episodic memory contents, depending on the experiment. One of the main challenges is to understand the interaction between both memory representations. In [4] the authors propose the prediction of semantic similarity effects in episodic memory using empirical data. The procedure applied is a modification of the general LSA scheme, using singular value decomposition and multidimensional scaling over a specific data set [10]. The results show the emergence of feature association groups in a multidimensional space known as Word Association Space (WAS).

We will consider the same problem from a complex network perspective adding a different interpretation of the disentanglement process with plausible cognitive implications. In our work, this prediction is reformulated in the following terms: whether is possible to disentangle similarity relationships from general association words networks by the navigation of the semantic network. We address this question assuming that: (i) Each available data set is a partial exposure to semantic knowledge; (ii) Some data sets are more general than others, they grasp the heterogeneity of the semantic knowledge more precisely; and (iii) as a consequence of (ii), some information from a less general data set might be partially implicit in a more general one. We will construct upon these hypothesis and propose an algorithm that allows the disentanglement of a type of relationship embedded on the structure of a more general association network. In particular, we will focus on two well-known data sets in English: the free-association database constructed by [10], and the semantic feature production norms by [11].

Feature Production Norms (FP from now on) were produced by McRae et al. by asking subjects to conceptually recognize features when confronted with a certain word. This feature collection is used to build up a vector of characteristics for each word, where each dimension represents a feature. In particular, participants are presented with a set of concept names and are asked to produce features they think are important for each concept. Each feature stands as a vector component, with a value that represents its production frequency across participants. These norms include 541 living and nonliving thing concepts, for which semantic closeness or similarity is computed as the cosine (overlap) between pairs of vectors of characteristics. The cosine is obtained as the dot product between two concept vectors, divided by the product of their lengths.

A

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut