Future frame prediction in chest and liver cine MRI using the PCA respiratory motion model: comparing transformers and dynamically trained recurrent neural networks

Future frame prediction in chest and liver cine MRI using the PCA respiratory motion model: comparing transformers and dynamically trained recurrent neural networks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Respiratory motion complicates accurate irradiation of thoraco-abdominal tumors in radiotherapy, as treatment-system latency entails target-location uncertainties. This work addresses frame forecasting in chest and liver cine MRI to compensate for such delays. We investigate RNNs trained with online learning algorithms, enabling adaptation to changing respiratory patterns via on-the-fly parameter updates, and transformers, increasingly common in time series forecasting for their ability to capture long-term dependencies. Experiments were conducted using 12 sagittal thoracic and upper-abdominal cine-MRI sequences from ETH Zürich and OvGU. PCA decomposes the Lucas-Kanade optical-flow field into static deformations and low-dimensional time-dependent weights. We compare various methods forecasting the latter: linear filters, population and sequence-specific encoder-only transformers, and RNNs trained with real-time recurrent learning (RTRL), unbiased online recurrent optimization, decoupled neural interfaces, and sparse one-step approximation (SnAp-1). Predicted displacements were used to warp the reference frame and generate future images. Prediction accuracy decreased with the horizon h. Linear regression performed best at short horizons (1.3mm geometrical error at h=0.32s, ETH Zürich data), while RTRL and SnAp-1 outperformed the other algorithms at medium-to-long horizons, with geometrical errors below 1.4mm and 2.8mm on the sequences from ETH Zürich and OvGU (the latter featuring higher motion variability, noise, and lower contrast), respectively. The sequence-specific transformer was competitive for low-to-medium horizons, but transformers remained overall limited by data scarcity and domain shift between datasets. Predicted frames visually resembled the ground truth, with notable errors occurring near the diaphragm at end-inspiration and regions affected by out-of-plane motion.


💡 Research Summary

This paper tackles a critical problem in MR‑guided radiotherapy: the latency between image acquisition and beam delivery creates a mismatch between the planned and actual tumor position due to respiratory motion. To mitigate this, the authors propose a future‑frame prediction pipeline that operates directly on 2‑D sagittal cine‑MRI of the thorax and upper abdomen. The core of the method is a Principal Component Analysis (PCA) of Lucas‑Kanade optical‑flow fields. PCA separates a static deformation field (the reference anatomy) from a low‑dimensional set of time‑dependent weights (the principal‑component coefficients). Because breathing can be largely described by the first two components, the forecasting problem reduces to predicting a short multivariate time series of these coefficients.

Four families of predictors are evaluated on twelve cine‑MRI sequences (six from ETH Zürich, six from the Otto‑von‑Guericke University Magdeburg). The predictors are: (i) simple linear filters (linear regression), (ii) encoder‑only transformers trained either on the whole population or on a single sequence, and (iii) recurrent neural networks (RNNs) equipped with online learning algorithms that update parameters on‑the‑fly. The online RNN variants include Real‑Time Recurrent Learning (RTRL), Unbiased Online Recurrent Optimization (UORO), Decoupled Neural Interfaces (DNI), and Sparse One‑step Approximation (SnAp‑1). After forecasting the PCA weights, the predicted displacement field is used to warp the reference frame, thereby generating future MR images.

Performance is measured by the average geometric error (mm) between the predicted and ground‑truth displacement fields, and by visual inspection of the warped images. As expected, prediction error grows with the look‑ahead horizon h. Linear regression excels at very short horizons (h ≈ 0.32 s), achieving a geometric error of 1.3 mm on the ETH Zürich data. For medium‑to‑long horizons, the online‑trained RNNs (RTRL and SnAp‑1) outperform all other methods, staying below 1.4 mm on the ETH data and below 2.8 mm on the OvGU data, which is more challenging due to higher motion variability, noise, and lower contrast. The sequence‑specific transformer is competitive for low‑to‑medium horizons but its performance is limited by the scarcity of training data (only twelve sequences) and by domain shift between the two institutions. Transformers also suffer from quadratic memory and compute scaling with input length, forcing a short context window that hampers long‑range dependency modeling.

Qualitatively, predicted frames closely resemble the ground truth across the image, with the most noticeable errors near the diaphragm at end‑inspiration and in regions affected by out‑of‑plane motion—areas where the two‑component PCA model cannot fully capture complex non‑linear deformations. The online RNNs demonstrate rapid adaptation to drift and amplitude shifts, updating their parameters after each new observation, which is crucial for handling intra‑fraction breathing irregularities. In contrast, the transformer, which is trained offline and then used only for inference, cannot adjust to sudden pattern changes.

The study’s contributions are threefold: (1) it validates a compact PCA‑based representation of respiratory motion for real‑time prediction, (2) it shows that online‑trained recurrent networks are more data‑efficient and robust than transformers in low‑resource medical imaging settings, and (3) it provides a thorough analysis of why transformers underperform here—principally data scarcity, domain shift, and limited context length. The authors suggest future work such as incorporating additional principal components, extending the approach to 3‑D volumetric data, and exploring hybrid architectures that combine the sample efficiency of RNNs with the long‑range modeling capabilities of transformers (e.g., LST‑Transformer or attention‑augmented RNNs). Ultimately, integrating such a predictor into MR‑guided radiotherapy systems could enable real‑time motion compensation, reducing geometric and dosimetric errors and improving treatment outcomes.


Comments & Academic Discussion

Loading comments...

Leave a Comment