Disc-Centric Contrastive Learning for Lumbar Spine Severity Grading

Disc-Centric Contrastive Learning for Lumbar Spine Severity Grading
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This work examines a disc-centric approach for automated severity grading of lumbar spinal stenosis from sagittal T2-weighted MRI. The method combines contrastive pretraining with disc-level fine-tuning, using a single anatomically localized region of interest per intervertebral disc. Contrastive learning is employed to help the model focus on meaningful disc features and reduce sensitivity to irrelevant differences in image appearance. The framework includes an auxiliary regression task for disc localization and applies weighted focal loss to address class imbalance. Experiments demonstrate a 78.1% balanced accuracy and a reduced severe-to-normal misclassification rate of 2.13% compared with supervised training from scratch. Detecting discs with moderate severity can still be challenging, but focusing on disc-level features provides a practical way to assess the lumbar spinal stenosis.


💡 Research Summary

The paper presents a disc‑centric contrastive learning framework for automated severity grading of lumbar spinal stenosis (LSS) from sagittal T2‑weighted MRI. Recognizing that clinical assessment of LSS is fundamentally disc‑based, the authors propose to focus the entire pipeline on a single anatomically localized region of interest (ROI) per intervertebral disc, rather than processing whole 3D volumes or relying on extensive manual annotations.

Data are drawn from the RSNA 2024 Lumbar Spine Degenerative Classification Dataset, comprising 2,697 patients and 8,593 image series across multiple institutions. Only the sagittal T2 sequence is used, as it provides optimal contrast for disc height, hydration, and canal narrowing. The raw DICOM files are converted to PNG after intensity normalization, padding, and coordinate‑guided cropping to extract a fixed‑size ROI centered on the disc. Splits are performed at the disc level with stratification to preserve the three‑class (Normal/Mild, Moderate, Severe) distribution.

An auxiliary regression head predicts the (x, y) disc centre coordinates from a 2.5‑D input (three consecutive slices) to automate ROI extraction at inference time. This task is trained independently of the main grading network.

The core representation learning stage employs a ResNet‑18 encoder adapted for single‑channel MRI input, with the final fully‑connected layer removed to keep the encoder generic. For each disc, three stochastic augmentations generate multiple positive views, forming a multi‑positive contrastive setup. The authors use the NT‑Xent (InfoNCE) loss, maximizing cosine similarity among augmented views of the same disc while minimizing similarity across different discs. This encourages invariance to acquisition‑related artifacts and focuses the encoder on disc‑specific morphological cues.

After self‑supervised pretraining, the encoder is transferred to a supervised grading task. Disc embeddings are pooled using a Deep Sets mean aggregation, which currently operates on a single ROI (S = 1) but is designed to handle variable‑size inputs (S > 1) without architectural changes, facilitating future multi‑slice extensions. The classifier head maps the pooled representation to three ordinal grades. To address severe class imbalance (few Severe cases), the authors replace standard cross‑entropy with a weighted focal loss, where class‑specific weights α are inversely proportional to frequency and the focusing parameter γ emphasizes hard examples.

Training proceeds with differential fine‑tuning: the ResNet‑18 backbone is partially frozen and updated with a lower learning rate than the classification head, mitigating catastrophic forgetting of the robust disc‑level features learned during contrastive pretraining.

Experimental results show a balanced accuracy of 78.1 % and a severe‑to‑normal misclassification rate of only 2.13 %, both substantially better than a baseline model trained from scratch on the same data. The authors note that distinguishing Moderate from Normal remains challenging, suggesting that further augmentation or multi‑scale modeling could improve performance.

In summary, the paper contributes four key innovations: (1) ROI‑based data efficiency that reduces memory and computational demands; (2) multi‑positive contrastive pretraining that yields disc‑consistent, artifact‑robust representations; (3) weighted focal loss to handle extreme class imbalance; and (4) a Deep Sets‑based pooling mechanism that enables seamless scaling to volumetric inputs. By focusing on disc‑level features, the proposed method offers a practical, safer, and more interpretable solution for automated lumbar spinal stenosis grading, with clear potential for clinical integration.


Comments & Academic Discussion

Loading comments...

Leave a Comment