TOPol: Capturing and Explaining Multidimensional Semantic Polarity Fields and Vectors
Traditional approaches to semantic polarity in computational linguistics treat sentiment as a unidimensional scale, overlooking the multidimensional structure of language. This work introduces TOPol (Topic-Orientation POLarity), a semi-unsupervised framework for reconstructing and interpreting multidimensional narrative polarity fields under human-on-the-loop (HoTL) defined contextual boundaries (CBs). The framework embeds documents using a transformer-based large language model (tLLM), applies neighbor-tuned UMAP projection, and segments topics via Leiden partitioning. Given a CB between discourse regimes A and B, TOPol computes directional vectors between corresponding topic-boundary centroids, yielding a polarity field that quantifies fine-grained semantic displacement during regime shifts. This vectorial representation enables assessing CB quality and detecting polarity changes, guiding HoTL CB refinement. To interpret identified polarity vectors, the tLLM compares their extreme points and produces contrastive labels with estimated coverage. Robustness analyses show that only CB definitions (the main HoTL-tunable parameter) significantly affect results, confirming methodological stability. We evaluate TOPol on two corpora: (i) U.S. Central Bank speeches around a macroeconomic breakpoint, capturing non-affective semantic shifts, and (ii) Amazon product reviews across rating strata, where affective polarity aligns with NRC valence. Results demonstrate that TOPol consistently captures both affective and non-affective polarity transitions, providing a scalable, generalizable, and interpretable framework for context-sensitive multidimensional discourse analysis.
💡 Research Summary
The paper introduces TOPol (Topic‑Orientation Polarity), a semi‑unsupervised framework that models semantic polarity as a multidimensional vector field rather than a single scalar. Documents are first embedded with a general‑purpose transformer‑based large language model (tLLM). The high‑dimensional embeddings are reduced with neighbor‑tuned UMAP, preserving local topology, and then partitioned into latent topics using Leiden community detection. A human‑on‑the‑loop (HoTL) defined contextual boundary (CB) splits the corpus into two disjoint subsets (regimes A and B). For each topic, the centroids of the A‑ and B‑subsets are computed, and their difference forms a polarity vector v_i = μ_Bi – μ_Ai. All vectors are anchored at a common origin, yielding a semantic polarity field that captures both magnitude (strength of semantic shift) and direction (semantic orientation) for each topic across the CB.
To make these abstract vectors interpretable, TOPol employs a contrastive explainability step using a large language model (gemini‑2.5‑flash). For each vector, the nearest documents to the two centroids are retrieved and fed to the LLM, which generates natural‑language labels for the dominant poles, estimates coverage percentages, and provides exemplar sentences and keywords. This yields human‑readable descriptions such as “confidence‑doubt” or “transparency‑obfuscation” for non‑affective corpora, and aligns with standard sentiment dimensions for affective data.
The framework is evaluated on two contrasting corpora. (1) A macro‑economic dataset of 600 U.S. central‑bank speeches split at the May 2007 pre‑crisis vs. post‑crisis breakpoint. The resulting polarity field reveals heterogeneous, multi‑dimensional shifts (e.g., policy‑focus, risk‑perception) that are far more coherent than those obtained from random CBs. (2) A balanced set of 10 000 Amazon product reviews divided by rating (positive vs. negative). Here the polarity vectors align strongly with the primary sentiment axis and the LLM‑generated labels closely match NRC valence scores. Robustness experiments perturb UMAP parameters, Leiden resolution, number of topics, and even swap the embedding model; only the definition of the CB substantially alters the polarity field, confirming methodological stability.
Key contributions include (a) formalizing semantic polarity as a vector field induced by contextual boundaries, (b) a flexible pipeline that can be applied to any transformer embedding, (c) an LLM‑driven contrastive labeling mechanism that automatically discovers interpretable semantic dimensions, and (d) extensive validation showing that TOPol captures both affective and non‑affective polarity transitions across domains. The authors argue that TOPol extends sentiment analysis into a broader class of discourse analysis tools capable of detecting, quantifying, and explaining context‑sensitive meaning changes, with potential applications in policy monitoring, market research, and dynamic text analytics. Future work may explore multi‑step CBs, real‑time streaming data, and integration with downstream decision‑making systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment