Probing Neural Topology of Large Language Models
Probing large language models (LLMs) has yielded valuable insights into their internal mechanisms by linking neural activations to interpretable semantics. However, the complex mechanisms that link neuron’s functional co-activation with the emergent model capabilities remains largely unknown, hindering a deeper understanding and safer development of LLMs. In this work, we introduce graph probing, a method for uncovering the functional connectivity of LLM neurons and relating it to language generation performance. By probing models across diverse LLM families and scales, we discover a universal predictability of language generation and understanding performance using only neural topology, which persists even when retaining just 1% of neuron connections. Strikingly, probing on topology outperforms probing on activation by up to 130.4% and 67.7% on perplexity and space/time semantic regression respectively, suggesting that neural topology contains orders of richer information of LLM performance than neural activation, which can be easily extracted with simple linear or MLP probes. To explain the dependence between neural topology and language performance, we identify default networks and hub neurons in LLMs and provide causal evidence by interventional experiments on multiple benchmarks, showing that LLMs actually exploit these topological information. Further analyses suggest that graph probing can be effectively leveraged to improve the efficiency and reliability of LLMs through proof-of-concept applications in model pruning and hallucination detection. Codes and data for the graph probing toolbox are available at https://github.com/DavyMorgan/llm-graph-probing.
💡 Research Summary
The paper introduces “graph probing,” a novel methodology for extracting and exploiting the functional connectivity of neurons inside large language models (LLMs). Instead of focusing on raw activation vectors, the authors treat the time‑series of hidden states generated as a model processes a token sequence as signals and compute Pearson correlations between every pair of neurons. This yields a dense n × n weighted graph A, where each node is a neuron and each edge weight aᵢⱼ reflects functional co‑activation.
Two lightweight probes—a linear regression and a single‑hidden‑layer multilayer perceptron (MLP)—are trained on the flattened adjacency matrix to predict a variety of downstream metrics: perplexity (PPL), semantic regression of temporal (artwork release year) and spatial (world‑place latitude/longitude) information, hallucination scores, and more. The loss is mean‑squared error over a dataset of roughly 10 k token sequences drawn from OpenWebText, each paired with its induced graph and ground‑truth performance values.
Experiments span three model families (GPT‑2, Pythia, Qwen2.5) across six scales ranging from 160 M to 14 B parameters. Across all settings, graph‑based probes dramatically outperform traditional activation‑based probes. For perplexity prediction, graph probes achieve R² values of 0.85–0.96, Pearson correlations above 0.93, and Spearman correlations above 0.95, whereas activation probes linger around 0.35–0.45 R². The MSE and MAE of graph probes are 4–6× lower. Remarkably, retaining only 1 % of the strongest edges (i.e., 99 % sparsification) barely degrades performance, indicating that a tiny core of connections carries most predictive power.
Semantic regression experiments on “Arts (time)” and “World Places (space)” datasets further confirm the advantage: graph probes improve R² by up to 67 % relative to activation probes, showing that the structural pattern of neuron interactions is more informative for high‑level concepts than raw activation magnitudes.
The authors identify hub neurons (top 0.5 % by betweenness/cluster centrality) and “default networks” (stable sub‑graphs concentrated in specific layers and attention heads). Causal ablation studies reveal that masking hub neurons sharply reduces probe accuracy, while preserving the default network while pruning the rest leaves performance largely intact. These findings demonstrate that LLMs actively exploit their internal topology during generation.
Two proof‑of‑concept applications illustrate practical utility. First, topology‑guided pruning that keeps only hubs and default networks reduces model parameters by over 70 % with less than 2 % increase in perplexity, outperforming naive weight‑magnitude pruning. Second, perturbing hub connections correlates strongly with higher hallucination scores, suggesting that monitoring topological anomalies could serve as an early warning system for unsafe outputs.
The paper also discusses scalability: although dense graphs scale quadratically with neuron count, the observed sparsity of informative edges enables efficient storage and computation. Limitations include reliance on linear Pearson correlation (which may miss non‑linear interactions) and the need for more sophisticated sparse‑graph representations for the largest models.
In summary, “graph probing” establishes functional connectivity as a powerful, interpretable lens for linking micro‑level neural dynamics to macro‑level language performance. It provides strong empirical evidence that LLMs’ internal wiring—not just activation magnitudes—encodes crucial information for generation, semantic reasoning, and safety, opening new avenues for model analysis, compression, and robust deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment