Bonsai: A Framework for Convolutional Neural Network Acceleration Using Criterion-Based Pruning
As the need for more accurate and powerful Convolutional Neural Networks (CNNs) increases, so too does the size, execution time, memory footprint, and power consumption. To overcome this, solutions such as pruning have been proposed with their own metrics and methodologies, or criteria, for how weights should be removed. These solutions do not share a common implementation and are difficult to implement and compare. In this work, we introduce Combine, a criterion- based pruning solution and demonstrate that it is fast and effective framework for iterative pruning, demonstrate that criterion have differing effects on different models, create a standard language for comparing criterion functions, and propose a few novel criterion functions. We show the capacity of these criterion functions and the framework on VGG inspired models, pruning up to 79% of filters while retaining or improving accuracy, and reducing the computations needed by the network by up to 68%.
💡 Research Summary
The paper addresses the growing challenge of deploying increasingly large convolutional neural networks (CNNs) on resource‑constrained platforms such as embedded devices and multi‑threaded web services. While pruning has emerged as a popular technique to reduce model size, memory footprint, and computational demand, the literature suffers from a lack of standardization: different works propose various “pruning criteria” (measures of filter importance) and disparate implementation details, making reproducibility and fair comparison difficult.
To solve this, the authors introduce Combine, a unified, criterion‑based pruning framework built on top of Keras. Combine abstracts the pruning process into four configurable components: (1) Prunable layer selection – users can designate convolutional, dense, or both types of layers as candidates for removal; (2) Criterion function – any mapping from a filter’s weight tensor to a scalar importance score. The paper implements four simple statistical criteria (standard deviation, range, mean absolute value, maximum absolute value) and allows custom, layer‑specific or rank‑based functions; (3) Application mode – either static (compute all scores once before any pruning) or progressive (re‑compute scores after each layer’s pruning, better reflecting the changing network dynamics); (4) Threshold – a single scalar t that discards all filters whose criterion value falls below t.
The framework’s algorithmic flow is detailed in pseudo‑code (Algorithm 1) and an iterative threshold‑search procedure (Algorithm 2). The threshold is chosen by evaluating a small validation subset across a range of t values, plotting a “threshold function” that maps pruning percentage to loss (or accuracy). The area under this curve can be used to compare criteria on the same model, while the plateau region indicates a safe pruning zone.
Experiments are conducted on three VGG‑inspired architectures: Model A (convolution‑heavy), Model B (balanced), and Model C (dense‑heavy). Each model is trained on MNIST and CIFAR‑10, providing a scenario where the networks are over‑parameterized for MNIST but under‑parameterized for CIFAR‑10. For every combination of (prunable‑layer set, criterion function), the authors generate threshold curves and perform iterative pruning. Key findings include:
- Model B, when pruning both convolutional and dense layers, can lose up to 79 % of its filters while maintaining ≈99 % accuracy on MNIST and ≈73 % top‑1 on CIFAR‑10.
- Computational demand (measured in FLOPs) drops by as much as 68 %.
- Different criteria exhibit distinct trade‑offs: mean absolute value and maximum absolute value generally preserve accuracy better than standard deviation, which can be overly aggressive on certain layers.
- Progressive application yields higher post‑pruning accuracy at the cost of additional computation during pruning, whereas static application is faster but may prune sub‑optimally.
The authors claim four main contributions: (1) a general, library‑agnostic pruning framework that can reproduce prior methods; (2) a systematic language for describing and comparing pruning criteria; (3) demonstration of novel simple criteria that perform competitively; and (4) empirical evidence that criterion choice and layer selection significantly affect pruning outcomes.
Limitations are acknowledged: the current implementation does not support dynamic, input‑dependent pruning, hardware‑aware optimizations, or sophisticated learned importance measures (e.g., gradient‑based). Experiments are limited to relatively small datasets and VGG‑style networks; scalability to deeper architectures such as ResNet or to large‑scale datasets like ImageNet remains untested.
Future work suggested includes automated criterion selection (e.g., Bayesian optimization), integration with hardware‑specific cost models, and extension to non‑linear or learning‑based importance scores. Overall, Combine provides a practical, extensible tool for researchers and engineers to explore pruning strategies systematically, facilitating reproducible research and accelerating the deployment of efficient CNNs on constrained platforms.
Comments & Academic Discussion
Loading comments...
Leave a Comment