FGAA-FPN: Foreground-Guided Angle-Aware Feature Pyramid Network for Oriented Object Detection

FGAA-FPN: Foreground-Guided Angle-Aware Feature Pyramid Network for Oriented Object Detection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the increasing availability of high-resolution remote sensing and aerial imagery, oriented object detection has become a key capability for geographic information updating, maritime surveillance, and disaster response. However, it remains challenging due to cluttered backgrounds, severe scale variation, and large orientation changes. Existing approaches largely improve performance through multi-scale feature fusion with feature pyramid networks or contextual modeling with attention, but they often lack explicit foreground modeling and do not leverage geometric orientation priors, which limits feature discriminability. To overcome these limitations, we propose FGAA-FPN, a Foreground-Guided Angle-Aware Feature Pyramid Network for oriented object detection. FGAA-FPN is built on a hierarchical functional decomposition that accounts for the distinct spatial resolution and semantic abstraction across pyramid levels, thereby strengthening multi-scale representations. Concretely, a Foreground-Guided Feature Modulation module learns foreground saliency under weak supervision to enhance object regions and suppress background interference in low-level features. In parallel, an Angle-Aware Multi-Head Attention module encodes relative orientation relationships to guide global interactions among high-level semantic features. Extensive experiments on DOTA v1.0 and DOTA v1.5 demonstrate that FGAA-FPN achieves state-of-the-art results, reaching 75.5% and 68.3% mAP, respectively.


💡 Research Summary

The paper introduces FGAA‑FPN, a novel feature‑pyramid network designed specifically for oriented object detection in high‑resolution remote sensing imagery. The authors identify two critical shortcomings of existing FPN‑based detectors: (1) low‑level pyramid features, while rich in spatial detail, are highly susceptible to background clutter, leading to poor detection of small or low‑contrast objects; (2) high‑level semantic features, although robust, tend to lose orientation information during multi‑scale fusion, which degrades the accuracy of rotated bounding‑box regression. To address these issues, FGAA‑FPN incorporates two complementary modules placed at different pyramid levels.

The Foreground‑Guided Feature Modulation (FGFM) module operates on the lower pyramid levels (e.g., P2, P3). It first predicts a coarse foreground probability map using a lightweight two‑layer convolutional head followed by a sigmoid activation. This map is then refined through a learnable calibration function parameterized by λ, k, and b, which gradually sharpens the foreground‑background separation as training progresses. The calibrated map is concatenated with the original feature tensor and passed through a small convolution‑normalization‑activation block to generate channel‑wise modulation weights. Finally, a residual scaling operation (1 + α·weight) re‑weights the original features, effectively amplifying object regions while suppressing background noise. Because FGFM does not rely on proposals or detection heads, it provides early, explicit foreground guidance that improves the propagation of discriminative cues throughout the pyramid.

The Angle‑Aware Multi‑Head Attention (AAMHA) module is placed on the higher pyramid levels (e.g., P4, P5). It extends the standard Transformer multi‑head self‑attention by injecting angle embeddings into the query and key projections. For each attention head, a set of predefined rotation angles is encoded as sine‑cosine vectors and either concatenated or added to the Q/K tensors. Consequently, the attention scores become a sum of the usual scaled dot‑product term and an angle‑dependent bias, allowing the network to attend preferentially to features that share similar orientation cues. This design preserves directional information during global feature interaction, leading to more accurate rotation regression in the downstream detection head.

FGAA‑FPN is integrated into an Oriented R‑CNN two‑stage detector. The overall loss combines the conventional RPN and RoI‑head losses with a foreground‑guided binary cross‑entropy term that supervises the FGFM probability maps using weak image‑level labels.

Extensive experiments on the DOTA v1.0 and v1.5 benchmarks demonstrate the effectiveness of the proposed architecture. FGAA‑FPN achieves 75.5 % mAP on DOTA v1.0 and 68.3 % mAP on DOTA v1.5, surpassing recent state‑of‑the‑art methods such as PANet, AugFPN, and ReDet. Ablation studies show that FGFM alone contributes a 1.8 %p gain, AAMHA alone contributes a 2.1 %p gain, and their combination yields a synergistic improvement of about 3.5 %p. The method particularly benefits small‑object categories and objects with extreme rotation angles, where gains of 4–5 %p are observed. Computationally, the added modules introduce less than a 5 % increase in parameters and maintain a processing speed of roughly 12 FPS on a single 1080Ti GPU, making the approach suitable for near‑real‑time applications.

The authors also discuss limitations: the foreground probability map is purely spatial and may struggle with highly textured backgrounds, and the angle embeddings are fixed, potentially limiting fine‑grained orientation modeling. Future work could explore multi‑spectral or 3‑D cues for more robust foreground guidance and dynamic angle representations for continuous orientation awareness.

In summary, FGAA‑FPN presents a thoughtfully engineered, level‑aware enhancement to feature pyramids that simultaneously tackles background interference and orientation preservation, delivering state‑of‑the‑art performance on challenging remote‑sensing detection tasks while preserving efficiency.


Comments & Academic Discussion

Loading comments...

Leave a Comment