LARV: Data-Free Layer-wise Adaptive Rescaling Veneer for Model Merging

LARV: Data-Free Layer-wise Adaptive Rescaling Veneer for Model Merging
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Model merging aims to combine multiple fine-tuned models into a single multi-task model without access to training data. Existing task-vector merging methods such as TIES, TSV-M, and Iso-C/CTS differ in their aggregation rules but treat all layers nearly uniformly. This assumption overlooks the strong layer-wise heterogeneity in large vision transformers, where shallow layers are sensitive to interference while deeper layers encode stable task-specific features. We introduce LARV, a training-free, data-free, merger-agnostic Layer-wise Adaptive Rescaling Veneer that plugs into any task-vector merger and assigns a per-layer scale to each task vector before aggregation, and show it consistently boosts diverse merging rules. LARV adaptively suppresses shallow-layer interference and amplifies deeper-layer alignment using a simple deterministic schedule, requiring no retraining or modification to existing mergers. To our knowledge, this is the first work to perform layer-aware scaling for task-vector merging. LARV computes simple data-free layer proxies and turns them into scales through a lightweight rule; we study several instantiations within one framework (e.g., tiered two/three-level scaling with fixed values, or continuous mappings) and show that tiered choices offer the best robustness, while continuous mappings remain an ablation. LARV is orthogonal to the base merger and adds negligible cost. On FusionBench with Vision Transformers, LARV consistently improves all task-vector baselines across 8/14/20-task settings; for example, Iso-C + LARV reaches 85.9% on ViT-B/32, 89.2% on ViT-B/16, and 92.6% on ViT-L/14. Layerwise analysis and corruption tests further indicate that LARV suppresses shallow-layer interference while modestly amplifying deeper, task-stable features, turning model merging into a robust, layer-aware procedure rather than a uniform one.


💡 Research Summary

Model merging without data has become a practical way to combine several fine‑tuned models into a single multi‑task model by treating each fine‑tuned model as a “task vector” (the difference from a shared pretrained backbone). Existing task‑vector merging methods—such as TIES, TSV‑M, Iso‑C/CTS—apply a single global scaling factor or a uniform rule across all layers. This ignores the well‑known hierarchical nature of deep vision transformers: shallow layers encode high‑frequency, local patterns and are highly sensitive to noise and interference, while deeper layers capture semantic, task‑stable features. Consequently, uniform merging can suppress useful deep‑layer updates or amplify harmful shallow‑layer conflicts, leading to unnecessary regressions.

The paper introduces LARV (Layer‑wise Adaptive Rescaling Veneer), a training‑free, data‑free add‑on that can be wrapped around any base merger. LARV computes two weight‑only diagnostics for each layer ℓ:

  1. Effective‑Rank Contrast (eℓ) – a spectral‑entropy based measure that compares the effective rank of the pretrained layer weights θ₀,ℓ with that of the merged update Δθℓ. A higher eℓ indicates that the update is low‑rank (concentrated) relative to the base, which the authors interpret as “information‑rich” and thus trustworthy.

  2. Commutator Conflict Coefficient (cℓ) – a normalized Frobenius norm of the commutator


Comments & Academic Discussion

Loading comments...

Leave a Comment