GMG: A Video Prediction Method Based on Global Focus and Motion Guided

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent years, weather forecasting has gained significant attention. However, accurately predicting weather remains a challenge due to the rapid variability of meteorological data and potential teleconnections. Current spatiotemporal forecasting models primarily rely on convolution operations or sliding windows for feature extraction. These methods are limited by the size of the convolutional kernel or sliding window, making it difficult to capture and identify potential teleconnection features in meteorological data. Additionally, weather data often involve non-rigid bodies, whose motion processes are accompanied by unpredictable deformations, further complicating the forecasting task. In this paper, we propose the GMG model to address these two core challenges. The Global Focus Module, a key component of our model, enhances the global receptive field, while the Motion Guided Module adapts to the growth or dissipation processes of non-rigid bodies. Through extensive evaluations, our method demonstrates competitive performance across various complex tasks, providing a novel approach to improving the predictive accuracy of complex spatiotemporal data.

💡 Research Summary

The paper introduces GMG, a video prediction framework specifically designed to tackle two persistent challenges in spatiotemporal forecasting: (1) the difficulty of capturing long‑range dependencies such as teleconnections in meteorological data, and (2) the difficulty of modeling non‑rigid objects whose shapes grow, shrink, or deform over time (e.g., cloud systems). GMG is built around two novel modules. The Global Focus Module (GFM) extracts a global representation from the entire input frame using adaptive average pooling followed by a 1×1 convolution and a linear projection. This global feature is fused with the local hidden state via a learned gate and further refined with multi‑scale convolutions (kernel sizes 1, 3, and 5). Unlike conventional self‑attention, GFM adds virtually no extra computational overhead while still providing a mechanism for the network to attend to distant regions, thereby expanding the effective receptive field beyond the limits of fixed convolution kernels or sliding windows.

The Motion Guided Module (MGM) addresses non‑rigid motion by introducing two deformation factors: a balance factor α that controls how much local motion contributes to the overall shape, and a decay factor β that models temporal attenuation (growth or dissipation) of the object’s morphology. These factors are applied to the hidden state after the self‑attention memory stage, allowing the model to explicitly represent growth, shrinkage, and inelastic shape changes that are common in atmospheric phenomena.

The overall architecture stacks four GMG units per time step. Each unit consists of an ST‑ConvLSTM cell that updates the hidden state and spatiotemporal memory, the GFM that injects global context, a Self‑Attention Memory (SAM) module that preserves long‑term dependencies, and the MGM that refines motion dynamics. A “time‑delay” connection passes the memory from the fourth layer to the first layer of the next time step, ensuring deep temporal propagation. Gradient Highway Units and causal LSTM structures are retained to mitigate gradient vanishing.

Extensive experiments were conducted on six diverse datasets, including radar echo sequences, precipitation maps, and traffic flow videos. Quantitative metrics (PSNR, SSIM, MSE) show that GMG consistently outperforms state‑of‑the‑art baselines such as PredRNN++, MIM, SwinLSTM, and SimVP‑ViT, achieving improvements of 1.2–2.5 dB in PSNR and 3–5 % in SSIM. Qualitative analysis highlights that GFM successfully captures teleconnection‑like patterns, while MGM accurately predicts the expansion and contraction of cloud regions, reducing visual artifacts associated with shape deformation.

In summary, GMG provides a unified solution that simultaneously expands the receptive field to capture global correlations and introduces a principled motion‑guidance mechanism for non‑rigid dynamics. The modular nature of GFM allows it to be integrated into other transformer‑based or convolutional forecasting models without architectural changes, and the deformation factors in MGM could be further linked to physical parameters in weather models. The work opens a promising avenue for more accurate, interpretable, and computationally efficient video prediction in meteorology and other domains where long‑range dependencies and non‑rigid motion are prevalent.

GMG: A Video Prediction Method Based on Global Focus and Motion Guided

💡 Research Summary

Comments & Academic Discussion

Leave a Comment