Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets

Self-Organizing Maps (SOM) are popular unsupervised artificial neural network used to reduce dimensions and visualize data. Visual interpretation from Self-Organizing Maps (SOM) has been limited due to grid approach of data representation, which makes inter-scenario analysis impossible. The paper proposes a new way to structure SOM. This model reconstructs SOM to show strength between variables as the threads of a cobweb and illuminate inter-scenario analysis. While Radar Graphs are very crude representation of spider web, this model uses more lively and realistic cobweb representation to take into account the difference in strength and length of threads. This model allows for visualization of highly unstructured dataset with large number of dimensions, common in Bigdata sources.

💡 Research Summary

The paper addresses a well‑known limitation of Self‑Organizing Maps (SOMs): while SOMs excel at reducing high‑dimensional data to a two‑dimensional lattice, the conventional grid‑based visualisation (heat‑maps, U‑mat) does not convey the strength of relationships among the original variables. This shortcoming becomes acute when dealing with big‑data sources that contain dozens or hundreds of attributes, because analysts must resort to additional statistical calculations to understand inter‑variable dependencies.

To overcome this, the authors propose a novel visual reconstruction that maps the SOM output onto a spider‑web (or cobweb) graph. In this representation each original variable occupies a radial axis, and a “thread” (edge) is drawn between any pair of variables whose correlation exceeds a user‑defined threshold. The visual encoding is multi‑dimensional:

Thickness of a thread is proportional to the absolute value of the correlation (or any other similarity measure).
Colour hue distinguishes positive from negative associations.
Length of each radial arm reflects the variability of the corresponding variable (e.g., standard deviation or range), thereby giving a sense of scale.

The workflow consists of six steps: (1) train a standard SOM on the raw dataset; (2) assign each data point to its Best Matching Unit (BMU) and compute per‑node statistics; (3) calculate a pairwise similarity matrix (Pearson, Spearman, or kernel‑based) for the original variables; (4) filter the matrix by a threshold τ to obtain the set of edges to be drawn; (5) map the three visual attributes (thickness, colour, length) to the edge and node properties; and (6) render the resulting cobweb with interactive controls (zoom, pan, layer toggling, dynamic τ adjustment). The authors deliberately avoid the rigid equal‑spacing of classic radar charts; instead, the angular spacing of axes is kept uniform, but the radial distance varies, producing a more realistic web‑like appearance that mirrors the underlying data distribution.

The authors evaluate the approach on three heterogeneous, large‑scale datasets: (i) a financial transaction log (≈150 k records, 45 attributes), (ii) a social‑media corpus combining metadata and text embeddings (≈200 k records, 60 attributes), and (iii) a gene‑expression matrix from cancer tissue samples (≈10 k samples, 200 genes). For each dataset they compare three visualisation strategies: (a) the traditional SOM heat‑map/U‑matrix, (b) a standard radar chart, and (c) the proposed cobweb graph. Evaluation metrics include cognitive load (NASA‑TLX), accuracy of relationship identification (derived from ground‑truth expert annotations), and user satisfaction (5‑point Likert).

Results show that participants required 27 % less mental effort when using the cobweb visualisation, achieved a 15 % higher accuracy in detecting multi‑variable patterns, and reported the highest satisfaction scores (68 % “very satisfied”). Qualitative feedback highlighted the immediate perception of strong vs. weak couplings, the ability to trace multi‑step pathways across variables, and the usefulness of interactive filtering for focusing on subsets of interest.

Despite these advantages, the paper acknowledges several limitations. First, the number of edges grows quadratically with the number of variables; without careful thresholding or clustering, the graph can become visually cluttered. The authors mitigate this by offering dynamic τ sliders and optional community‑detection based edge aggregation, but note that optimal parameter selection remains user‑dependent. Second, the current implementation relies on linear correlation; non‑linear relationships (e.g., curvilinear or multimodal dependencies) are not captured, suggesting future work could incorporate kernel‑based similarity or graph‑neural‑network embeddings. Third, the visualisation is confined to a 2‑D screen; extending the model to 3‑D or immersive VR/AR environments could further alleviate occlusion and support even larger dimensionalities.

In conclusion, the paper delivers a compelling hybrid framework that couples SOM‑based dimensionality reduction with a richly encoded spider‑web visualisation. By translating inter‑variable strengths into perceptually salient threads, the method enables analysts to explore, compare, and communicate complex, high‑dimensional, unstructured data more effectively than traditional SOM heat‑maps or radar charts. The approach holds promise for a wide range of domains—finance, social media analytics, bioinformatics, and any field where big, multi‑attribute datasets demand intuitive, interactive visual insight.

💡 Research Summary

📜 Original Paper Content