AurigaNet: A Real-Time Multi-Task Network for Enhanced Urban Driving Perception

AurigaNet: A Real-Time Multi-Task Network for Enhanced Urban Driving Perception
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Self-driving cars hold significant potential to reduce traffic accidents, alleviate congestion, and enhance urban mobility. However, developing reliable AI systems for autonomous vehicles remains a substantial challenge. Over the past decade, multi-task learning has emerged as a powerful approach to address complex problems in driving perception. Multi-task networks offer several advantages, including increased computational efficiency, real-time processing capabilities, optimized resource utilization, and improved generalization. In this study, we present AurigaNet, an advanced multi-task network architecture designed to push the boundaries of autonomous driving perception. AurigaNet integrates three critical tasks: object detection, lane detection, and drivable area instance segmentation. The system is trained and evaluated using the BDD100K dataset, renowned for its diversity in driving conditions. Key innovations of AurigaNet include its end-to-end instance segmentation capability, which significantly enhances both accuracy and efficiency in path estimation for autonomous vehicles. Experimental results demonstrate that AurigaNet achieves an 85.2% IoU in drivable area segmentation, outperforming its closest competitor by 0.7%. In lane detection, AurigaNet achieves a remarkable 60.8% IoU, surpassing other models by more than 30%. Furthermore, the network achieves an mAP@0.5:0.95 of 47.6% in traffic object detection, exceeding the next leading model by 2.9%. Additionally, we validate the practical feasibility of AurigaNet by deploying it on embedded devices such as the Jetson Orin NX, where it demonstrates competitive real-time performance. These results underscore AurigaNet’s potential as a robust and efficient solution for autonomous driving perception systems. The code can be found here https://github.com/KiaRational/AurigaNet.


💡 Research Summary

AurigaNet is a unified, real‑time multi‑task deep neural network designed for autonomous driving perception. It simultaneously performs three critical perception tasks—traffic object detection, lane detection, and drivable‑area instance segmentation—within a single architecture, thereby reducing computational redundancy and memory footprint compared with deploying separate models for each task.

The backbone of the network is CSPDarknet, chosen for its efficient gradient flow and low parameter count. A neck composed of Spatial Pyramid Pooling Fusion (SPPF) and a Feature Pyramid Network (FPN) aggregates multi‑scale features, which are then fed into three dedicated decoder heads. The object detection head follows a YOLOv5‑style anchor‑based design, employing a Path Aggregation Network (PAN) to fuse top‑down and bottom‑up features and predicting bounding‑box offsets, confidence scores, and class probabilities. The lane detection head is a binary segmentation branch that mirrors the drivable‑area segmentation pipeline but outputs a single‑channel lane mask.

The most innovative component is the drivable‑area instance segmentation head. It consists of (i) a binary segmentation branch that produces a per‑pixel drivable‑area probability map and (ii) a feature‑embedding branch that learns an 8‑dimensional embedding for each pixel. Two technical mechanisms enable end‑to‑end instance segmentation without any post‑processing clustering: (1) deformable convolutions introduce learnable 2‑D offsets to the regular convolution grid, allowing the receptive field to adapt to irregular lane geometries and free‑space boundaries; (2) a discriminative loss (as proposed by De Brabandere et al.) simultaneously minimizes intra‑instance variance and maximizes inter‑instance variance in the embedding space, ensuring that pixels belonging to the same drivable region are tightly clustered while different regions are well separated. This eliminates the need for computationally expensive clustering algorithms such as DBSCAN or K‑means, which are commonly used in prior works.

The total training loss is a weighted sum of task‑specific losses: object detection loss (box loss using L‑SIoU, objectness loss, and class loss), drivable‑area loss (Dice + binary cross‑entropy) combined with the discriminative embedding loss, and lane loss (binary cross‑entropy). Hyper‑parameters γ₁, γ₂, γ₃ control the relative importance of each task, while α coefficients balance the components within each loss term.

Experiments were conducted on the BDD100K dataset, which offers diverse weather, illumination, and road‑type conditions. AurigaNet achieved 85.2 % IoU for drivable‑area segmentation, surpassing the closest competitor by 0.7 percentage points. For lane detection, it reached 60.8 % IoU, more than 30 % higher than previously reported methods. In traffic object detection, the model obtained a mean average precision (mAP) of 47.6 % at IoU thresholds 0.5–0.95, improving over the next best model by 2.9 %.

Real‑time feasibility was validated on an NVIDIA Jetson Orin NX embedded platform. The network runs at over 30 frames per second while maintaining a modest memory footprint and power consumption suitable for low‑cost Advanced Driver‑Assistance Systems (ADAS). The combination of a lightweight backbone, efficient neck, and streamlined decoder heads contributes to this performance.

Limitations include reliance solely on RGB camera input; the absence of LiDAR or radar data may affect robustness in adverse weather or low‑visibility scenarios. Additionally, while the discriminative loss provides strong instance separation, extremely crowded scenes with many overlapping drivable regions could still challenge the embedding space. Future work is suggested to incorporate multimodal sensor fusion, explore more sophisticated embedding regularization, and test on larger-scale urban datasets.

In summary, AurigaNet demonstrates that a carefully engineered multi‑task network can deliver state‑of‑the‑art perception accuracy across detection, lane, and instance segmentation tasks while satisfying the stringent latency and resource constraints of embedded autonomous‑driving hardware. The authors have released the code and pretrained models publicly (https://github.com/KiaRational/AurigaNet), facilitating reproducibility and further research in the community.


Comments & Academic Discussion

Loading comments...

Leave a Comment