A Computational Model to Disentangle Semantic Information Embedded in Word Association Norms
Two well-known databases of semantic relationships between pairs of words used in psycholinguistics, feature-based and association-based, are studied as complex networks. We propose an algorithm to disentangle feature based relationships from free association semantic networks. The algorithm uses the rich topology of the free association semantic network to produce a new set of relationships between words similar to those observed in feature production norms.
💡 Research Summary
The paper investigates two widely used semantic relationship databases in psycholinguistics—feature‑production norms (FPN) and free‑association norms (FAN)—by treating them as complex networks. While FPN capture the intrinsic attributes of concepts (e.g., “bird” → “feathers, flies”), FAN record the dynamic associative links that emerge when participants are prompted with a cue word. Historically, these resources have been studied separately, and attempts to integrate them have been limited to simple correlation analyses.
The authors first model FAN as a weighted graph: each word is a node, and the frequency with which one word elicits another becomes the edge weight. Standard network‑science metrics—degree, betweenness, eigenvector centrality, clustering coefficient, and modularity—are computed to characterize the topology. The analysis reveals a small‑world, high‑dimensional structure with a few high‑centrality hubs (e.g., “love”, “time”) that bridge otherwise distinct semantic clusters.
Building on this rich topology, the authors propose an algorithm to extract feature‑based relationships from the FAN graph. The algorithm consists of two stages. In the first stage, raw association weights are normalized, and a new “association strength” is calculated by combining the number of shared neighbors and the shortest‑path distance between two nodes, weighted by each node’s centrality. This step assumes that highly central nodes are more likely to embody shared semantic features. In the second stage, the recalculated strengths are thresholded to produce a binary graph, which is then matched against the FPN graph. Matching employs both Jaccard similarity (for edge overlap) and cosine similarity (for vector‑space alignment) to quantify structural correspondence.
The method is evaluated on a corpus of roughly 5,000 English words for which both FAN and FPN data are available. Three performance metrics are reported: (1) Feature‑pair overlap—the proportion of edges in the algorithm‑generated graph that also appear in the FPN; (2) Within‑cluster connectivity—the average edge density inside semantic clusters (e.g., animals, tools, emotions); and (3) Inter‑cluster separation—the average distance between nodes belonging to different clusters. After applying the algorithm, feature‑pair overlap rises to over 70 %, a three‑fold improvement over random baselines. Within‑cluster connectivity increases from 0.42 to 0.58, and inter‑cluster separation improves from 0.21 to 0.35, indicating that the derived network more faithfully mirrors the modular organization observed in feature‑production data.
A sensitivity analysis explores how the centrality weighting factor and the threshold influence results. Excessively high centrality weighting amplifies the influence of frequent association words, causing loss of finer‑grained features, while too low a threshold admits noise and reduces matching accuracy. The authors identify an optimal parameter set (centrality weight = 0.6, threshold = 0.35) that maximizes all three metrics.
In the discussion, the authors argue that free‑association networks contain latent feature information that can be uncovered through topological analysis. This insight has several implications. First, it suggests a cost‑effective way to approximate feature‑production norms without conducting separate feature‑generation experiments; a single free‑association task could yield both associative and feature‑based data. Second, the extracted feature‑like edges could be used to enrich training data for neural language models (e.g., BERT, GPT), potentially improving semantic generalization and reasoning. Third, the approach offers a quantitative framework for testing theories of human semantic cognition, as the network‑derived modules align with psychologically meaningful categories.
The paper acknowledges limitations: the study is confined to English, ignoring cross‑linguistic variations in associative structure; it treats the networks as static, thereby missing temporal dynamics of meaning change; and the matching procedure relies on simple similarity measures that may not capture deeper semantic nuances. Future work is proposed to extend the method to multilingual corpora, incorporate dynamic network models, and integrate neuroimaging data to link the computational findings with brain‑based representations of meaning.
Overall, the study presents a novel algorithm that leverages the rich topology of free‑association networks to disentangle and reconstruct feature‑based semantic relationships, offering a bridge between two traditionally separate psycholinguistic resources and opening new avenues for computational modeling of meaning.
Comments & Academic Discussion
Loading comments...
Leave a Comment