R2MF-Net: A Recurrent Residual Multi-Path Fusion Network for Robust Multi-directional Spine X-ray Segmentation

Reading time: 5 minute
...

📝 Original Info

  • Title: R2MF-Net: A Recurrent Residual Multi-Path Fusion Network for Robust Multi-directional Spine X-ray Segmentation
  • ArXiv ID: 2512.07576
  • Date: 2025-12-08
  • Authors: Xuecheng Li, Weikuan Jia, Komildzhon Sharipov, Sharipov Hotam Beknazarovich, Farzona S. Ataeva, Qurbonaliev Alisher, Yuanjie Zheng

📝 Abstract

Accurate segmentation of spinal structures in X-ray images is a prerequisite for quantitative scoliosis assessment, including Cobb angle measurement, vertebral translation estimation and curvature classification. In routine practice, clinicians acquire coronal, left-bending and right-bending radiographs to jointly evaluate deformity severity and spinal flexibility. However, the segmentation step remains heavily manual, time-consuming and non-reproducible, particularly in low-contrast images and in the presence of rib shadows or overlapping tissues. To address these limitations, this paper proposes R2MF-Net, a recurrent residual multi-path encoder--decoder network tailored for automatic segmentation of multi-directional spine X-ray images. The overall design consists of a coarse segmentation network and a fine segmentation network connected in cascade. Both stages adopt an improved Inception-style multi-branch feature extractor, while a recurrent residual jump connection (R2-Jump) module is inserted into skip paths to gradually align encoder and decoder semantics. A multi-scale cross-stage skip (MC-Skip) mechanism allows the fine network to reuse hierarchical representations from multiple decoder levels of the coarse network, thereby strengthening the stability of segmentation across imaging directions and contrast conditions. Furthermore, a lightweight spatial-channel squeeze-and-excitation block (SCSE-Lite) is employed at the bottleneck to emphasize spine-related activations and suppress irrelevant structures and background noise. We evaluate R2MF-Net on a clinical multi-view radiograph dataset comprising 228 sets of coronal, left-bending and right-bending spine X-ray images with expert annotations.

💡 Deep Analysis

Figure 1

📄 Full Content

The human spine constitutes the central load-bearing structure of the torso and plays a crucial role in maintaining posture, protecting the spinal cord and enabling complex body movements. Spinal deformities, particularly scoliosis, have a substantial impact on quality of life. Adolescent idiopathic scoliosis is reported to affect a non-negligible proportion of adolescents worldwide, and severe deformities may lead to chronic pain, cosmetic concerns, respiratory compromise or neurological complications [1]. For early detection, progression monitoring and treatment planning, radiographic examination remains the primary imaging modality.

In clinical practice, the assessment of scoliosis relies not only on a single anteroposterior radiograph but often on a series of multi-directional X-ray images, including coronal standing, left-bending, right-bending and sometimes sagittal views [2]. Coronal radiographs provide an overview of global curvature, whereas bending radiographs are essential for evaluating the flexibility of the spine, distinguishing structural and non-structural curves and guiding surgical planning. The most widely used quantitative metric, the Cobb angle, is measured between the endplates of end vertebrae in specific views [3]. Other parameters such as apical vertebral translation, T1 tilt and coronal balance also depend on accurate identification of spinal structures.

Despite advances in digital imaging and picture archiving systems, the delineation of spine contours and vertebral boundaries is still largely performed manually. Radiologists or spine surgeons manually draw or infer the lines for endplates and vertebral walls, a process that is inherently subjective, time-consuming and prone to intra-and interobserver variability. These issues become pronounced in conditions where spinal anatomy is obscured by ribs, scapulae, bowel gas or imaging noise. Consequently, the development of automatic, robust and reproducible segmentation methods for spine X-ray images is of high clinical relevance.

Traditional segmentation approaches based on edge detection, thresholding, region growing or deformable models have been applied to spine images with varying degrees of success [4,5]. However, due to the low contrast of X-rays, overlapping bone structures and variable projection geometry, these methods often fail in challenging cases. Machine learning techniques, including random forests, support vector machines and clustering-based schemes, have improved robustness but still require extensive feature engineering and may not generalize well across diverse imaging conditions [6].

More recently, convolutional neural networks (CNNs) and fully convolutional architectures have revolutionized semantic segmentation in both natural and medical imaging domains. U-shaped encoder-decoder networks with skip connections, such as U-Net and its variants, have become standard baselines for organ and lesion segmentation [7,8]. Nevertheless, directly applying generic CNN architectures to multi-directional spine X-ray segmentation still faces several challenges:

• Directional diversity: coronal and bending radiographs exhibit markedly different spine curvature, rib orientation and soft-tissue overlap. Models trained on a single view often show degraded performance on other views.

• Semantic gap in skip connections: classical skip connections transfer shallow encoder features directly to the decoder. These features are high in spatial resolution but low in semantic abstraction. When simply concatenated with deep decoder features, they may introduce noise or conflicting cues, especially near blurred boundaries [9].

• Sensitivity to image quality: X-ray images vary in exposure, contrast, noise level and occlusion. Generic segmentation networks may overfit high-quality images and fail on low-contrast cases or images contaminated by artifacts.

To cope with these challenges, we propose R2MF-Net, a recurrent residual multi-path network designed specifically for multi-directional spine X-ray segmentation. The central objective is to robustly segment the spinal column region across coronal and bending radiographs, providing accurate masks for downstream measurement algorithms.

The main contributions of this work are summarized as follows:

  1. We design a two-stage segmentation framework consisting of a coarse network and a fine network. The first stage focuses on global localization and coarse segmentation, while the second stage refines edges and corrects local errors. Both stages adopt enhanced Inception-style modules to capture multi-scale context without significantly increasing computational cost. 2. We propose a recurrent residual jump connection (R2-Jump) mechanism that replaces standard skip connections.

By applying recurrent convolutions and residual projections to encoder features before they are fused into the decoder, R2-Jump gradually narrows the semantic gap between encoder and decoder representations, leading t

📸 Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut