Beyond One-Way Pruning: Bidirectional Pruning-Regrowth for Extreme Accuracy-Sparsity Tradeoff

Reading time: 4 minute
...

📝 Original Info

  • Title: Beyond One-Way Pruning: Bidirectional Pruning-Regrowth for Extreme Accuracy-Sparsity Tradeoff
  • ArXiv ID: 2511.11675
  • Date: 2025-11-11
  • Authors: 정보 없음 (논문에 저자 정보가 제공되지 않음)

📝 Abstract

As a widely adopted model compression technique, model pruning has demonstrated strong effectiveness across various architectures. However, we observe that when sparsity exceeds a certain threshold, both iterative and one-shot pruning methods lead to a steep decline in model performance. This rapid degradation limits the achievable compression ratio and prevents models from meeting the stringent size constraints required by certain hardware platforms, rendering them inoperable. To overcome this limitation, we propose a bidirectional pruning-regrowth strategy. Starting from an extremely compressed network that satisfies hardware constraints, the method selectively regenerates critical connections to recover lost performance, effectively mitigating the sharp accuracy drop commonly observed under high sparsity conditions.

💡 Deep Analysis

📄 Full Content

As artificial intelligence (AI) continues to advance, deep learning models (DNNs) are increasingly deployed on a wide range of edge devices to enable diverse intelligent functionalities [1], [2]. Nevertheless, deploying such models on resource-constrained devices introduces substantial challenges, particularly in terms of model size [3], energy efficiency [4], and inference latency [5]. To address these limitations, model compression techniques such as pruning have been extensively explored [6], [7]. However, when aiming for extreme compression ratios, model performance typically deteriorates sharply. Consequently, achieving a balance between compactness and performance remains a critical challenge, especially when deploying models on highly resource-limited edge devices.

DNNs have evolved significantly in size over time (e.g., VGG16 [8] and DenseNet-121 [9] have tens of millions of trainable parameters), which makes them challenging to deploy on edge devices like smartphones, Raspberry Pi [10], and Nvidia Jetson Nano [11] efficiently. Even tiny models (e.g., ResNet-20 [12] only holds 270k trainable parameters) aiming at the capability of deployment on edge platforms encountered severe performance degradation during the pruning process. As illustrated in Fig. 1, accuracy evaluation across ResNet-20, EfficientNet-B0 [13], DenseNet-121, and VGG16 is presented. In each sub-plot, a green-filled, black-outlined circle marker represents the pretrained model. Meanwhile, the blue and orange square markers denote iterative and one-shot pruning processes, respectively. The x-axis indicates the sparsity level (i.e., the zero-value weight ratio), and the y-axis reveals the accuracy of prediction performance on CIFAR-10 [14]. We can observe a similar performance degradation pattern for all listed models. Even for larger networks like DenseNet, 99% sparsity induces a sharp accuracy drop from 94.86 to 89.83. While these models all achieve a really impressive performance under smaller sparsity (e.g., EfficientNet-B0 trades 0.69 accuracy degradation with 96% sparsity; DenseNet obtains 93.47 accuracy at 95% sparsity).

In order to handle the degradation issue at high sparsity, several approaches have been proposed. Pruning On-thefly [15] chooses to dynamically adjust pruning masks during training. HRank [16] proposes to identify and preserve filters that produce high-rank feature maps. ResRep [17] adopts a structure-aware training scheme that separates the learning of important and redundant filters.

While these solutions have proved their effectiveness, they share several limitations. The criterion HRank utilized requires computing feature maps over batches, which adds considerable overhead for large DNNs. What’s more, some layers may inherently be “low-rank” but are more critical. The technique proposed by Pruning On-the-fly might force some parts of the network to become permanently zeroed, losing flexibility to recover. Dependence on well-chosen hyperparameters poses challenges for ResRep, which requires exhaustive exploration.

Although modern one-shot and iterative pruning methods employ sophisticated designs to remove uninformative parameters at various granularities, we maintain that this process inevitably eliminates critical parameters. This motivates us to regrow the model starting from a highly sparse state. We evaluated our solution on the VGG16 model, and regrew for both one-shot and iterative strategies. As Fig. 2 shows, blue and orange square markers represent the pruned models obtained by one-shot and iterative pruning, respectively. Moreover, red triangles denote the regrowth results. Both subplots reveal the unavoidable performance degradation under extreme compression ratios (i.e., the accuracy degrades from 93.39 (pretrained) to 91.73 by iterative pruning under 98.85% sparsity and 91.03 by one-shot pruning under 99% sparsity).

With the help of proposed model regrowth, we are delighted to find that degradation can be alleviated. In Fig. 2(a), starting iterative regrowing the VGG16 model from 98.85% sparsity, we can recover the prediction performance to 92.64 (96.85% sparsity). Similarly, we can obtain an impressive performance elevation for one-shot approach, which is revealed in Fig. 2(b) that we can recover the accuracy to 92.77 (96.39% sparsity) and 92.9 (92.96% sparsity). Even at 96.39% sparsity, our method outperforms the one-shot pruning baseline at 95%, revealing the sub-optimal weight selection during pruning.

In this work, we mitigate the performance degradation, which is caused by aggressive pruning, by introducing a model regrowth strategy that selectively restores pruned weights. Beyond enhancing pruning outcomes, we hope our approach contributes to making DNNs more practical for deployment on resource-constrained edge devices. Furthermore, we envision extending this framework to large-scale models such as large language models (LLMs), where maintaining performance while reducing computati

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut