NEX: Neuron Explore-Exploit Scoring for Label-Free Chain-of-Thought Selection and Model Ranking
Large language models increasingly spend inference compute sampling multiple chain-of-thought traces or searching over merged checkpoints. This shifts the bottleneck from generation to selection, often without supervision on the target distribution. We show entropy-based exploration proxies follow an inverted-U with accuracy, suggesting extra exploration can become redundant and induce overthinking. We propose NEX, a white-box label-free unsupervised scoring framework that views reasoning as alternating E-phase (exploration) and X-phase (exploitation). NEX detects E-phase as spikes in newly activated MLP neurons per token from sparse activation caches, then uses a sticky two-state HMM to infer E-X phases and credits E-introduced neurons by whether they are reused in the following X span. These signals yield interpretable neuron weights and a single Good-Mass Fraction score to rank candidate responses and merged variants without task answers. Across reasoning benchmarks and Qwen3 merge families, NEX computed on a small unlabeled activation set predicts downstream accuracy and identifies better variants; we further validate the E-X signal with human annotations and provide causal evidence via “Effective-vs-Redundant” neuron transfer.
💡 Research Summary
The paper addresses a growing bottleneck in large language model (LLM) inference: as models increasingly generate multiple chain‑of‑thought (CoT) traces or search over merged checkpoints, the costly step shifts from generation to selection. Existing selection methods rely on output‑level proxies such as token entropy or require labeled validation data, which are often unavailable for the target distribution. The authors propose NEX (Neuron Explore‑Exploit Scoring), a white‑box, label‑free framework that evaluates reasoning quality by inspecting internal neuron dynamics rather than external outputs.
NEX’s core insight is to view a CoT trace as a temporal sequence of token‑level activations in the model’s MLP layers. For each fixed‑size row (32 tokens), the set of active sparse neurons 𝒩ᵣ is recorded. The “novelty‑slope” sᵣ = |𝒩ᵣ \ 𝒩<ᵣ| / |Tᵣ| quantifies how many previously unseen neurons are recruited per token. After log‑transform, detrending, and MAD‑based normalization, the series {zᵣ} is fed into a sticky two‑state Gaussian Hidden Markov Model (HMM). The state with the higher emission mean is labeled as the exploration phase (E‑phase) and the other as the exploitation phase (X‑phase). The sticky transition parameter discourages rapid flips, enforcing that each phase spans multiple rows.
Each contiguous E→X segment constitutes an “E‑X cycle”. For a cycle i, the set of newly introduced neurons Nᵢ is identified. During the subsequent X‑phase, the activation mass of each neuron k (Aₖ,ᵣ) is summed, yielding a reuse share: reuse_shareᵢ = Σ₍ᵣ∈Xᵢ₎ Σ₍ₖ∈Nᵢ₎ Aₖ,ᵣ / (Σ₍ᵣ∈Xᵢ₎ Σ₍ₖ₎ Aₖ,ᵣ + ε). To make this comparable across cycles, the reuse share is centered by subtracting the median reuse across all cycles, producing progressᵢ. Positive progress indicates that neurons introduced during exploration are reused more than typical, suggesting productive exploration.
A consolidation signal consᵢ is also computed as the relative drop in novelty‑slope from the E‑phase to the X‑phase: consᵢ = clip(1 − median(s_X)/median(s_E), 0, 1). Additionally, a binary strength gate strengthᵢ activates only cycles whose E‑phase average novelty‑slope exceeds the overall median; the gate Iᵢ = 1
Comments & Academic Discussion
Loading comments...
Leave a Comment