T3C: Test-Time Tensor Compression with Consistency Guarantees

We present T3C, a train-once, test-time budgetconditioned compression framework that exposes rank and precision as a controllable deployment knob. T3C combines elastic tensor factorization (maintained

T3C: Test-Time Tensor Compression with Consistency Guarantees

We present T3C, a train-once, test-time budgetconditioned compression framework that exposes rank and precision as a controllable deployment knob. T3C combines elastic tensor factorization (maintained up to a maximal rank) with rank-tied mixed-precision quantization and a lightweight controller that maps a latency/energy/size budget token to per-layer rank/bit assignments; the policy snaps to hardware-aligned profiles and is monotone in the budget. A fast, layerwise consistency certificate, computed from spectral proxies and activation statistics, upper-bounds logit drift and regularizes training, yielding a practical reliability signal with negligible overhead. On ImageNet-1k, T3C shifts the vision Pareto frontier: for ResNet-50 at matched accuracy (≤ 0.5% drop), p50 latency is 1.18 ms with a 38 MB model, outperforming PTQ-8b (1.44 ms, 88 MB); for ViT-B/16, T3C reaches 2.30 ms p50 with 59 MB, improving over strong PTQ/QAT baselines. A single T3C checkpoint therefore provides predictable, certificatebacked accuracy-latency-size trade-offs on demand across devices.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...