CosineGate: Semantic Dynamic Routing via Cosine Incompatibility in Residual Networks
Modern deep residual networks perform substantial redundant computation by evaluating all residual blocks for every input, even when identity mappings suffice. We introduce CosineGate, an end-to-end d
Modern deep residual networks perform substantial redundant computation by evaluating all residual blocks for every input, even when identity mappings suffice. We introduce CosineGate, an end-to-end differentiable architecture for dynamic routing in residual networks that uses cosine incompatibility between identity and residual feature representations as a self-supervised skip signal. CosineGate measures semantic redundancy through the Cosine Incompatibility Ratio (CIR), defined as 1 -cos(x, F (x)), and uses Gumbel-Softmax relaxation to enable persample, per-block gating during training. A progressive FLOPs regularization term controls average compute usage without destabilizing optimization. On CIFAR-10, CosineGate systematically spans the accuracy-efficiency Pareto frontier: an aggressive configuration achieves 89.9% accuracy with 24.1% FLOPs savings, a balanced configuration achieves 91.3% accuracy with 28.5% savings at epoch 160, and a conservative configuration reaches a peak of 93.2% accuracy with minimal compute reduction. These results match or exceed ResNet-20 (91.3%) while reducing computation, without auxiliary supervision, distillation, or task-specific heuristics. Our results demonstrate that simple geometric measures of feature incompatibility provide a principled and effective signal for dynamic residual routing.
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...