A New Dataset and Performance Benchmark for Real-time Spacecraft Segmentation in Onboard Computers

A New Dataset and Performance Benchmark for Real-time Spacecraft Segmentation in Onboard Computers
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Spacecraft deployed in outer space are routinely subjected to various forms of damage due to exposure to hazardous environments. In addition, there are significant risks to the subsequent process of in-space repairs through human extravehicular activity or robotic manipulation, incurring substantial operational costs. Recent developments in image segmentation could enable the development of reliable and cost-effective autonomous inspection systems. While these models often require large amounts of training data to achieve satisfactory results, publicly available annotated spacecraft segmentation data are very scarce. Here, we present a new dataset of nearly 64k annotated spacecraft images that was created using real spacecraft models, superimposed on a mixture of real and synthetic backgrounds generated using NASA’s TTALOS pipeline. To mimic camera distortions and noise in real-world image acquisition, we also added different types of noise and distortion to the images. Our dataset includes images with several real-world challenges, including noise, camera distortions, glare, varying lighting conditions, varying field of view, partial spacecraft visibility, brightly-lit city backgrounds, densely patterned and confounding backgrounds, aurora borealis, and a wide variety of spacecraft geometries. Finally, we finetuned YOLOv8 and YOLOv11 models for spacecraft segmentation to generate performance benchmarks for the dataset under well-defined hardware and inference time constraints to mimic real-world image segmentation challenges for real-time onboard applications in space on NASA’s inspector spacecraft. The resulting models, when tested under these constraints, achieved a Dice score of 0.92, Hausdorff distance of 0.69, and an inference time of about 0.5 second. The dataset and models for performance benchmark are available at https://github.com/RiceD2KLab/SWiM.


💡 Research Summary

The paper addresses a critical gap in autonomous spacecraft inspection by providing a large‑scale, high‑quality dataset and a well‑defined performance benchmark for real‑time segmentation on resource‑constrained onboard computers. Existing public datasets such as NASA’s PoseBowl and the Spacecrafts dataset either lack pixel‑level masks or contain limited geometric and environmental diversity, making them unsuitable for training models that must operate under the strict hardware limits of an inspector spacecraft.

To overcome these limitations, the authors construct the Spacecraft With Masks (SWiM) dataset, comprising nearly 64,000 images with full‑pixel segmentation masks. The dataset is built through a dual‑methodology pipeline. First, they take the PoseBowl and Spacecrafts datasets, convert all images to a uniform JPEG format, and generate accurate masks automatically using Segment Anything Model 2 (SAM 2). Bounding boxes are transformed to VOC format, SAM 2 produces binary masks, and these masks are merged and converted into polygon contours compatible with the YOLO‑mask format. Second, they synthesize additional images using NASA’s TT ALOS pipeline combined with Stable Diffusion. This approach renders 3D spacecraft models from multiple agencies under random poses, lighting conditions, and camera parameters, then composites them onto both real satellite imagery and AI‑generated backgrounds. A comprehensive augmentation suite—glare, motion blur, exposure shifts, Gaussian noise, and color distortions—is applied to emulate the harsh imaging conditions encountered in orbit (e.g., aurora, city lights, glare from the Sun). The resulting dataset is split into baseline and augmented versions, each pre‑partitioned into training, validation, and test sets to facilitate reproducible benchmarking.

The authors explicitly define the operational constraints for onboard inference: a 4‑core CPU with less than 4 GB of RAM and a maximum inference latency of 0.95 seconds per frame. These constraints mirror the specifications of NASA’s inspector spacecraft flight computers. Within this environment, they fine‑tune two state‑of‑the‑art real‑time object detection frameworks—YOLOv8 and YOLOv11—by adding a segmentation head that predicts pixel masks alongside bounding boxes. Training is performed on the full SWiM dataset, while evaluation uses both region‑based (Dice coefficient) and boundary‑aware (Hausdorff distance) metrics, addressing the inadequacy of Dice alone for proximity‑critical tasks.

Experimental results show that the YOLOv11‑based model achieves a Dice score of 0.92, a Hausdorff distance of 0.69, and an average inference time of approximately 0.5 seconds on the target hardware. This performance satisfies the predefined latency budget while delivering high‑quality segmentation suitable for downstream tasks such as pose estimation, collision avoidance, and robotic manipulation. The paper highlights four primary contributions: (1) the creation of the largest publicly available spacecraft segmentation dataset tailored for hardware‑constrained environments; (2) a formal problem formulation that encodes real‑world deployment constraints; (3) a dual‑metric evaluation protocol that captures both area overlap and boundary precision; and (4) a benchmark suite that reports concrete performance numbers under realistic onboard conditions.

By releasing the dataset, code, and trained models on GitHub, the authors enable the research community to reproduce results, compare alternative lightweight architectures (e.g., MobileNet‑based segmenters, transformer‑based models), and explore further optimizations such as model pruning, quantization, or neural architecture search targeted at space‑grade processors. The work sets a solid foundation for future advancements in autonomous, real‑time spacecraft inspection, and it underscores the importance of aligning dataset diversity, evaluation metrics, and hardware constraints to drive practical space‑borne computer vision solutions.


Comments & Academic Discussion

Loading comments...

Leave a Comment