Dopamine: Brain Modes, Not Brains
Parameter-efficient fine-tuning (PEFT) methods such as \lora{} adapt large pretrained models by adding small weight-space updates. While effective, weight deltas are hard to interpret mechanistically, and they do not directly expose \emph{which} internal computations are reused versus bypassed for a new task. We explore an alternative view inspired by neuromodulation: adaptation as a change in \emph{mode} – selecting and rescaling existing computations – rather than rewriting the underlying weights. We propose \methodname{}, a simple activation-space PEFT technique that freezes base weights and learns per-neuron \emph{thresholds} and \emph{gains}. During training, a smooth gate decides whether a neuron’s activation participates; at inference the gate can be hardened to yield explicit conditional computation and neuron-level attributions. As a proof of concept, we study mode specialization'' on MNIST (0$^\circ$) versus rotated MNIST (45$^\circ$). We pretrain a small MLP on a 50/50 mixture (foundation), freeze its weights, and then specialize to the rotated mode using \methodname{}. Across seeds, \methodname{} improves rotated accuracy over the frozen baseline while using only a few hundred trainable parameters per layer, and exhibits partial activation sparsity (a minority of units strongly active). Compared to \lora{}, \methodname{} trades some accuracy for substantially fewer trainable parameters and a more interpretable which-neurons-fire’’ mechanism. We discuss limitations, including reduced expressivity when the frozen base lacks features needed for the target mode.
💡 Research Summary
The paper introduces TauGate (Threshold‑and‑Gain Tuning), a novel activation‑space parameter‑efficient fine‑tuning (PEFT) method that keeps the backbone weights frozen and learns per‑neuron thresholds and gains. By inserting a smooth sigmoid gate g = σ(s·(z‑τ)) into each layer’s pre‑activation z, the method can either softly modulate neuron activity during training or harden the gate at inference to obtain a deterministic sub‑network defined by 1
Comments & Academic Discussion
Loading comments...
Leave a Comment