Accelerated Rotation-Invariant Convolution for UAV Image Segmentation
Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-scale details. Conventional segmentation architectures like U-Net rely on convolution operators that are not rotation-invariant, leading to degraded segmentation accuracy across varying viewpoints. Rotation invariance can be achieved by expanding the filter bank across multiple orientations; however, this will significantly increase computational cost and memory traffic. In this paper, we introduce a GPU-optimized rotation-invariant convolution framework that eliminates the traditional data-lowering (im2col) step required for matrix-multiplication-based convolution. By exploiting structured data sharing among symmetrically rotated filters, our method achieves multi-orientation convolution with greatly reduced memory traffic and computational redundancy. We further generalize the approach to accelerate convolution with arbitrary (non-symmetric) rotation angles. Across extensive benchmarks, the proposed convolution achieves 20–55% faster training and 15–45% lower energy consumption than CUDNN, while maintaining accuracy comparable to state-of-the-art rotation-invariant methods. In the eight-orientation setting, our approach achieves up to 45% speedup and 41% energy savings on 256(\times)256 inputs, and 32% speedup and 23% lower energy usage on 1024(\times)1024 inputs. Integrated into a U-Net segmentation model, the framework yields up to 6% improvement in accuracy over the non-rotation-aware baseline. These results demonstrate that the proposed method provides an effective and highly efficient alternative to existing rotation-invariant CNN frameworks.
💡 Research Summary
This paper presents a novel and highly efficient framework for achieving rotation-invariant convolution, specifically targeting the challenge of semantic segmentation in UAV aerial imagery where objects appear in arbitrary orientations. The core problem addressed is the high computational cost and memory overhead associated with conventional rotation-invariant approaches, which typically expand the filter bank across multiple orientations (e.g., 4 or 8 rotations), drastically increasing FLOPs and memory traffic.
The authors’ key innovation is the reformulation of the convolution operation using a “scatter” paradigm, as opposed to the standard “gather” approach. In traditional convolution, data from input neighborhoods is gathered and multiplied to produce a single output pixel. The proposed scatter convolution reverses this flow: each input pixel is multiplied by the filter weights, and the resulting products are directly scattered (added) to their corresponding destination locations in the output feature map for all rotations. This fundamental shift eliminates the need for the “im2col” data-lowering step, a major source of memory duplication and bandwidth bottleneck in GPU-optimized matrix multiplication-based convolutions.
The true power of the scatter approach is revealed when applied to rotation invariance. For symmetric rotations (e.g., 90° increments), the rotated filters are simply permutations of the same set of weights. The scatter framework naturally exposes this structured weight sharing. It allows computing the multiplication of an input pixel with a filter weight once and then reusing the result by scattering it to appropriately rotated positions in the output grids for all symmetric orientations. This avoids redundant computations, collapsing what would be N separate convolutions for N rotations into a much more efficient single operation with shared arithmetic. The method is further generalized to support convolution with arbitrary (non-symmetric) rotation angles using interpolation.
The proposed framework is integrated into a U-Net segmentation model and evaluated extensively. The results demonstrate significant advantages:
- Accuracy: The rotation-invariant model achieves up to a 5.7% improvement in segmentation accuracy over a non-rotation-aware baseline, validating the importance of rotation invariance for UAV imagery.
- Speed & Efficiency: Compared to the highly optimized cuDNN library, the proposed convolution achieves 20-57% faster training times and 15-45% lower energy consumption across various workloads. For example, with eight orientations on 256x256 inputs, it attains up to 45% speedup and 41% energy savings.
- Practical High-Resolution Rotation: The efficiency gains are so substantial that they enable the practical use of 16-orientation convolution and pooling, a setting that is infeasible for conventional rotation-invariant methods due to exploding memory requirements for intermediate feature maps. This leads to further accuracy gains.
In summary, this work provides a groundbreaking solution to the efficiency problem plaguing rotation-invariant CNNs. By leveraging a scatter-based convolution that eliminates data-lowering overhead and exploits structural weight sharing across rotations, it delivers state-of-the-art accuracy with dramatically improved computational performance and energy efficiency. This opens the door for deploying robust, rotation-invariant segmentation models in real-world UAV and other vision applications where computational resources are often constrained.
Comments & Academic Discussion
Loading comments...
Leave a Comment