Skillful Kilometer-Scale Regional Weather Forecasting via Global and Regional Coupling

Data-driven weather models have advanced global medium-range forecasting, yet high-resolution regional prediction remains challenging due to unresolved multiscale interactions between large-scale dynamics and small-scale processes such as terrain-ind…

Authors: Weiqi Chen, Wenwei Wang, Qilong Yuan

Skillful Kilometer-Scale Regional Weather Forecasting via Global and Regional Coupling
Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling W eiqi Chen D AMO Academy , Alibaba Group Hangzhou, China W enwei W ang D AMO Academy , Alibaba Group Hangzhou, China Qilong Y uan Northwest Polytechnical University Xi’an, China Lefei Shen Zhejiang University Hangzhou, China Bingqing Peng D AMO Academy , Alibaba Group Hangzhou, China Jiawei Chen Zhejiang University Hangzhou, China Bo W u Institute of Atmospheric Physics, Chinese Academy of Sciences Beijing, China Liang Sun D AMO Academy , Alibaba Group Hangzhou, China Abstract Data-driven weather models have advanced global medium-range forecasting, yet high-resolution r egional prediction remains chal- lenging due to unresolved multiscale interactions between large- scale dynamics and small-scale processes such as terrain-induced circulations and coastal eects. This paper presents a global-regional coupling framework for kilometer-scale r egional weather fore- casting that synergistically couples a pretrained T ransformer-based global model with a high-resolution regional netw ork via a novel bidirectional coupling module, ScaleMixer . ScaleMixer dynami- cally identies meteorologically critical regions thr ough adaptive key-position sampling and enables cross-scale feature interaction through dedicated attention mechanisms. The framew ork produces forecasts at 0 . 05 ◦ ( ∼ 5 km ) and 1-hour r esolution o ver China, signif- icantly outperforming operational N WP and AI baselines on both gridded reanalysis data and real-time weather station obser vations. It exhibits exceptional skill in capturing ne-graine d phenomena such as orographic wind patterns and Foehn warming, demon- strating eective global-scale coherence with high-r esolution - delity . The code is available at https://anonymous.4open.science/r/ ScaleMixer- 6B66. Ke ywords Regional weather forecasting, Downscaling, Deep Neural netw orks A CM Refer ence Format: W eiqi Chen, W enwei W ang, Qilong Y uan, Lefei Shen, Bingqing Peng, Ji- awei Chen, Bo W u, and Liang Sun. 2018. Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling. In . ACM, New Y ork, NY, USA, 24 pages. https://doi.org/XXXXXXX.XXXXXXX Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee pr ovided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honor ed. Abstracting with credit is permitted. T o copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and /or a fee. Request permissions from permissions@acm.org. Conference’17, W ashington, DC, USA © 2018 Copyright held by the owner/author(s). Publication rights licensed to A CM. ACM ISBN 978-1-4503-XXXX -X/2018/06 https://doi.org/XXXXXXX.XXXXXXX 1 Introduction Accurate weather forecasting is essential for disaster mitigation, agriculture, transportation, and energy management [ 6 ]. Tradi- tional numerical weather prediction (NWP) systems solve the gov- erning equations of atmospheric dynamics involving mass con- tinuity , momentum conservation, and thermodynamics, and pa- rameterize subgrid-scale processes such as turbulence and cloud microphysics [ 3 , 13 ]. Although N WP models provide physically consistent forecasts and remain op erational standards, their compu- tational demands and sensitivity to parameterization schemes limit the skill in resolving kilometer-scale weather phenomena governed by multiscale interactions. Recent data-driven AI models, particularly Transformer-based architectures trained on global reanalysis data such as ERA5, have achieved remarkable success in medium-range for ecasting at syn- optic scales at resolution of 0 . 25 ◦ and coarser . However , high- resolution operational regional forecasting ( e.g., 0 . 05 ◦ , or ∼ 5 km ) remains a signicant challenge. Kilometer-scale weather is gov- erned by complex multiscale interactions: large-scale circulations modulate lo cal processes such as topographic ows, coastal breezes, and convective systems, while ne-scale features also feedback to broader dynamics. A prime e xample is the Hengduan Mountains, where large-scale dynamics including the Indian Monsoon, East Asian Monso on, and Tibetan Plateau climate , interact with extreme terrain gradients. These terrain gradients, which exceed 3 , 000 m within 100 km , drive localized wind accelerations, sharp tempera- ture contrasts, and convective processes that are p oorly captured by coarse global models or isolated r egional models [ 30 ]. Such intricate multiscale interactions challenge conventional models, necessitat- ing forecasting models that reconcile global-scale coherence with high-resolution delity . Recent studies have b egun to explore data-driven regional weather forecasting and downscaling, typically treating global forecasts as static inputs [ 20 , 21 , 23 , 26 , 31 ]. However , these decoupled meth- ods neglect dynamic cross-scale interactions and suer from tem- poral misalignment between low-frequency global forecasts ( e.g., 6-hourly) and high-r esolution regional observations (e .g., hourly). In summary , to make accurate high-resolution regional weather Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun forecasting requires addressing two key challenges: (1) a mecha- nism to dynamically identify regions where cross-scale interactions are active , and (2) a bidirectional coupling framework that ensures spatial-temporal consistency across scales . T o address the aforementioned challenges, we propose a nov el global–regional coupling frame work for high-resolution re- gional weather prediction. Our approach seamlessly integrates a pretrained global Transformer model, which pr ovides synoptic- scale (large scale) context, with a regional renement model operat- ing at 0 . 05 ◦ resolution. Central to this architecture is ScaleMixer , a module that adaptively identies ke y spatial regions e xhibiting strong multiscale interactions and enable bidir ectional feature en- coding between global and regional tokens. This allows the model to prioritize meteorologically critical areas such as typhoon b ound- aries and mountain ridges, and maintain global coherence while resolving ne-grained regional dynamics. The main contributions of this work are summarized as follows: • A global–regional coupling framework for 0 . 05 ◦ and 1 - hour forecasting by integrating a pretrained global model for synoptic-scale context with a high-resolution regional model • The ScaleMixer module for dynamic identication of cross- scale interaction regions and bidirectional feature fusion. • Experiments on both hindcast and operational settings sho w our model’s superiority against operational N WP and lead- ing AI baselines. Case studies over complex terrain in China further demonstrates the model’s notable skill in capturing orographic wind eects and Foehn warming. • Similar to other AI-based mo dels, the model is highly e- cient during inference. It takes less than 3 minutes for 48- hour forecasting on a single GP U , while numerical models like IFS-HRES typically requires about 1 hour on massive CP U clusters for a similar forecasting window . 2 Related W ork Numerical W eather Forecasting. As the predominant paradigm, NWP systems typically formulate the atmospheric physical laws through PDEs and then solve them using numerical simulations. Representative examples include earth system models (ESMs) [ 13 ] and the operational Integrated Forecast System (IFS) of European Centre for Medium-Range W eather Forecasts (ECMWF) [ 3 ]. By integrating physics laws, NWP approaches have enjoyed remark- able success with great accuracy , stability , and interpretability . IFS- HRES [ 10 ] is a world-leading high-resolution deterministic NWP system ( 0 . 1 ◦ ) and serves as a benchmark for operational forecasting and research. However , N WP mo dels are sensitive to initial con- ditions, prone to errors in parameterization, and computationally expensive [ 15 ]. These limitations hinder their ability to accurately resolve kilometer-scale w eather driven by comple x multiscale in- teractions. Deep Learning for Global W eather Forecasting. Recent progress in deep learning models for global weather forecasting has b een trans- formative. They predominantly employ two architectural paradigms: Transformer-based models [ 2 , 4 , 5 , 22 ] and Graph Neural Net- work (GNN)-based architectures [ 14 , 16 , 25 ]. These models demon- strate computational eciency and competitive skill in predict- ing synoptic-scale weather patterns. Howev er , they fail to capture ner-grained mesoscale weather dynamics due to limited resolution ( 0 . 25 ◦ or coarser). Deep Learning for Regional W eather Forecasting and Downscal- ing. Recently , regional weather mo dels have been developed for ne-scale forecasting and downscaling over r egions of interest. For instance, CorrDi [ 20 ] combines U-Net and diusion models to correct and downscale the global forecasts to improve local pre- dictions. Machine learning limited area models [ 1 ] further take into account boundary conditions, b ecause the evolution of atmo- spheric states relies on b oth the internal dynamics and the exter- nal forcing. A sequence of GNN lay ers is applied to capture both large-scale circulations and local microscale pr ocesses. Other lim- ited area modeling methods also employ GNN ar chitectures with stretched-grid [ 11 ] and neste d-grid [ 21 ] to make low-r esolution global and high-resolution regional weather forecasts simultane- ously . They model cross-scale interactions through grid deforma- tion and nesting, with a static graph structur e; how ever , these rigid, geometry-driven interactions limit the model’s ability to eciently capture highly dynamic and non-local coupling processes. In con- trast, our model employs a bidirectional coupling module to learn the content-dependent cross-scale interactions between global and regional tokens adaptively . 3 Methodology Accurate regional weather forecasting requires seamless integration of large-scale atmospheric dynamics with localized, high-resolution features. As nearly all AI-based global models are trained on the EAR5 dataset, we assume a pretrained Vision Transformer (ViT)- based global weather forecasting model, denote d M global , which operates on low-resolution ( 0 . 25 ◦ ) global reanalysis data U 𝑡 0 ∈ R 𝐻 × 𝑊 × 𝐶 . At time 𝑡 0 , the model generates 6-hour-ahead global pre- dictions capturing synoptic-scale dynamics: ˆ U 𝑡 0 + 6H = M global ( U 𝑡 0 ) , (1) where 𝐻 × 𝑊 × 𝐶 represents the stacked weather state with multi- ple levels of upper air and surface variables, in which latitude and longitude are divided into 𝐻 and 𝑊 grids for each variable. Con- currently , high-resolution regional analysis data 𝒖 𝑡 0 ∈ R ℎ × 𝑤 × 𝑉 reg provides critical surface variables ( wind components 𝑈 , 𝑉 , temper- ature 𝑇 , specic humidity 𝑄 , pr essure 𝑃 , radiation uxes 𝑆 𝑆 𝑅𝐷 , and total cloud cover 𝑇 𝐶𝐶 ) within a region of interest at 1 hour temporal resolution and 0 . 05 ◦ spatial resolution. Problem Formulation. As the fundamental challenge lies in ef- fectively coupling multiscale information: coarse-grained global features from M global and ne-resolution regional features, we for- malize the task as dev eloping a hybrid global-regional weather fore- casting framework M global − regional that extends M global through the integration of large-scale atmospheric dynamics and small-scale weather eects. ˆ U 𝑡 0 + 6H ;  ˆ 𝒖 𝑡 0 + 𝑖 H  6 𝑖 = 1 = M global − regional  U 𝑡 0 ;  𝒖 𝑡 0 + 𝑖 H  0 𝑖 = − 5  , (2) Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA where  ˆ 𝒖 𝑡 0 + 𝑖 H  0 𝑖 = − 5 denotes the temporally aligned regional analy- sis data with 1-hour intervals. This formulation establishes a prin- cipled framework for generating high-delity regional forecasts by systematically bridging global-scale dynamics with lo calized meteorological processes with deep learning architectures. Model Overview . W e propose a multiscale w eather forecasting framework that dynamically integrates global and regional-scale atmospheric dynamics to resolve high-resolution mesoscale fea- tures in the target region. As shown in Figure 1, the framework comprises two T ransformer-based sub-models with shared archi- tectural principles: (1) a global model for synoptic-scale dynamics, and (2) a regional model for mesoscale processes. The ScaleMixer module enables bidirectional coupling between the global and re- gional models via adaptive key position identication and encod- ing to preserve cross-scale meteorological consistency . The global model, pretrained on ERA5 reanalysis data [ 12 ], remains xed dur- ing regional optimization, while the regional model and ScaleMixer module are trained from scratch. 3.1 Pretrained Global W eather Model Our global model M global prioritizes architectural simplicity , exi- bility , and scalability , and implements a vision T ransformer (ViT) architecture [ 7 ]. Without loss of generality , our framework can work with any ViT -based global weather forecasting model. In the following discussion, we will limit the discussion on our in- house dev eloped ViT -based global model, comprising three core components: Patch Embedding and T okenization: A 2-D convolutional layer partitions the multivariate input atmospheric state U 𝑡 0 ∈ R 𝐻 × 𝑊 × 𝐶 into non-overlapping spatial patches of size 𝑃 × 𝑃 . This generates token representations S ∈ R 𝑁 × 𝑑 , where 𝑁 = ( 𝐻 / 𝑃 ) × ( 𝑊 / 𝑃 ) and 𝑑 is the embedding dimension. Transformer Encoder: A stack of 𝑀 Transformer encoder layers processes the sequence S through multi-head self-attention and fe ed-forward networks [ 7 , 29 ], enabling global information interaction across spatial scales. Prediction Head: A deconvolution block upscales the processed sequence back to the original spatial resolution 𝐻 × 𝑊 , produc- ing a 6-hour ahead deterministic global forecast U 𝑡 0 + 6H of the full atmospheric state. The global model M global is pretrained on ERA5 reanalysis [ 12 ] using weighed mean absolute error (MAE) as the loss function (detailed in Section 3.4). The dataset includes ve pressure level variables (13 vertical levels each): geopotential ( 𝑧 ), specic humid- ity ( 𝑞 ), wind components ( 𝑢 , 𝑣 ), and temperature (t), and multiple surface variables, e.g., 2-meter temp erature (t2m), 10-meter wind (u10, v10), and mean sea level pressure (msl), surface pressure ( sp), etc. (detailed in Section B). 3.2 Modications in Regional W eather Model The regional model M regional inherits the Transformer ar chitecture from the global model but introduces necessary mo dications: (1) modied patch embedding layer to incorporate ne-grained to- pography and temporal encodings, (2) enhanced prediction head with adaptive layer normalization (A daLN) [ 24 ] to amplify the high-frequency signal for hourly temporal alignment, and (3) fewer Transformer encoder layers ( 𝑘 ≪ 𝑀 ) to reduce computational overhead while preserving regional meteorological delity . Patch Embedding: In addition to the input  𝒖 𝑡 0 + 𝑖 H  0 𝑖 = − 5 ∈ R ℎ × 𝑤 × 𝑉 reg × 6 , the block also nee ds to process the static topography , land-sea mask, and dynamic hourly temporal information. Regional analyses are tokenized acr oss 6 time steps using a shared patch embedding layer , with topography , land-sea masks, and temporal embeddings (hour-of-day , day-of-year) added via MLP. T o ensure ge- ographic consistency with global patches, we set patch size 𝑝 = 5 × 𝑃 , generating regional tokens s ∈ R 𝑛 × 𝑑 , where 𝑛 = ( ℎ / 𝑝 ) × ( 𝑤 / 𝑝 ) . Transformer Encoders: The regional model employs 𝑘 encoder layers ( 𝑘 ≪ 𝑀 , where 𝑀 = 𝑘 × 𝐿 ) to achieve computational e- ciency in regional optimization. Each cross-scale coupling block comprises 𝐿 global encoder layers, 1 regional encoder layer , and 1 ScaleMixer module. Prediction Head: T o generate 6-hour forecasts at hourly in- tervals, 6 dedicated prediction heads produce lead time-specic outputs ( Δ 𝑡 = 1H to 6H ). T emporal alignment is enforced via AdaLN[ 24 ], where scale and shift parameters 𝛾 , 𝛽 are derived from Fourier embeddings of Δ 𝑡 : FourierEmbed ( Δ 𝑡 ) = [ cos ( 2 𝜋 𝑎 𝑖 Δ 𝑡 + 𝑏 𝑖 ) , sin ( 2 𝜋 𝑎 𝑖 Δ 𝑡 + 𝑏 𝑖 ) ] (3) for 0 ≤ 𝑖 < 𝑑 / 2 , 𝛾 , 𝛽 = MLP ( FourierEmbed ( Δ 𝑡 ) ) , (4) where 𝑎 𝑖 and 𝑏 𝑖 are learnable Fourier embedding parameters. This formulation ensures high-frequency signal amplication for re- gional forecasting. Moreover , regional prediction heads take the concatenation of regional tokens and spatially-aligned global to- kens as input to make full use of multi-scale information. 3.3 ScaleMixer: Bidirectional Global and Regional Scale Coupling Accurate high-resolution regional prediction requires resolving multiscale atmospheric processes–from synoptic-scale forcings to mesoscale circulations–while maintaining global dynamical con- sistency . T o this end, we introduce ScaleMixer , a dierentiable coupling mechanism that explicitly models interactions between the global foundation mo del and the regional renement model. As illustrated in Figure 1 (right), ScaleMixer enables bidirectional feature fusion by adaptively identifying meteorologically critical re- gions and performing token-level encoding, eectively prioritizing areas with strong cross-scale interactions. Adaptive ke y position identication. T o capture spatial regions exhibiting strong multiscale interactions, we implement a dynamics- aware sampling mo dule that identies critical spatial positions from global token emb eddings S . Spatial dynamics are extracted via a con- volutional network, follo wed by softmax-normalized importance scores Pr ∈ R 𝑁 ( 𝑁 is the number of global tokens): Pr = Somax ( Conv ( S ) ) , (5) where Conv ( · ) consists of a convolutional layer followed by a linear projection. W e then sele ct top- 𝑚 salient positions: c = arg top - 𝑚 ( Pr ) , h = Pr [ c ] ⊙ S [ c ] , (6) Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun Prediction Head ERA5 pretrained ScaleMixer T ransformer Encoder T ransformer Encoder T ransformer Encoder T ransformer Encoder T ransformer Encoder AAAB73icbVA9SwNBEJ3zM8avqKXNYhCswl2QaBmwsbCIYD4gOcLeZi9Zsrd37s4J4cifsLFQxNa/Y+e/cZNcoYkPBh7vzTAzL0ikMOi6387a+sbm1nZhp7i7t39wWDo6bpk41Yw3WSxj3Qmo4VIo3kSBkncSzWkUSN4Oxjczv/3EtRGxesBJwv2IDpUIBaNopU4PRcQNueuXym7FnYOsEi8nZcjR6Je+eoOYpRFXyCQ1puu5CfoZ1SiY5NNiLzU8oWxMh7xrqaJ2jZ/N752Sc6sMSBhrWwrJXP09kdHImEkU2M6I4sgsezPxP6+bYnjtZ0IlKXLFFovCVBKMyex5MhCaM5QTSyjTwt5K2IhqytBGVLQheMsvr5JWteLVKrX7y3K9msdRgFM4gwvw4ArqcAsNaAIDCc/wCm/Oo/PivDsfi9Y1J585gT9wPn8Ap6aPrw== → L AAAB73icbVA9SwNBEJ3zM8avqKXNYhCswl2QaBmwsbCIYD4gOcLeZi9Zsrd37s4J4cifsLFQxNa/Y+e/cZNcoYkPBh7vzTAzL0ikMOi6387a+sbm1nZhp7i7t39wWDo6bpk41Yw3WSxj3Qmo4VIo3kSBkncSzWkUSN4Oxjczv/3EtRGxesBJwv2IDpUIBaNopU4PRcQNueuXym7FnYOsEi8nZcjR6Je+eoOYpRFXyCQ1puu5CfoZ1SiY5NNiLzU8oWxMh7xrqaJ2jZ/N752Sc6sMSBhrWwrJXP09kdHImEkU2M6I4sgsezPxP6+bYnjtZ0IlKXLFFovCVBKMyex5MhCaM5QTSyjTwt5K2IhqytBGVLQheMsvr5JWteLVKrX7y3K9msdRgFM4gwvw4ArqcAsNaAIDCc/wCm/Oo/PivDsfi9Y1J585gT9wPn8Ap6aPrw== → L Patch Embedding Patch Embedding Prediction Head Coupling Block AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2A9oQ9lsN+3SzSbuToQS+ie8eFDEq3/Hm//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1gNOE+xEdKREKRtFK3T6KiBsyGZQrbtVdgKwTLycVyNEclL/6w5ilEVfIJDWm57kJ+hnVKJjks1I/NTyhbEJHvGeponaNny3unZELqwxJGGtbCslC/T2R0ciYaRTYzoji2Kx6c/E/r5dieONnQiUpcsWWi8JUEozJ/HkyFJozlFNLKNPC3krYmGrK0EZUsiF4qy+vk3at6tWr9furSqOWx1GEMziHS/DgGhpwB01oAQMJz/AKb86j8+K8Ox/L1oKTz5zCHzifP9aij84= → k AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0mKVI8FLx4r2A9oQ9lsN+3SzSbuToQS+ie8eFDEq3/Hm//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1gNOE+xEdKREKRtFK3T6KiBsyGZQrbtVdgKwTLycVyNEclL/6w5ilEVfIJDWm57kJ+hnVKJjks1I/NTyhbEJHvGeponaNny3unZELqwxJGGtbCslC/T2R0ciYaRTYzoji2Kx6c/E/r5dieONnQiUpcsWWi8JUEozJ/HkyFJozlFNLKNPC3krYmGrK0EZUsiF4qy+vk3at6tWr9furSqOWx1GEMziHS/DgGhpwB01oAQMJz/AKb86j8+K8Ox/L1oKTz5zCHzifP9aij84= → k … AAAB+XicbVBNS8NAFNzUr1q/oh69LBbBU0mKVI8FLx4rmLbQhrDZbtqlm03YfSmU0H/ixYMiXv0n3vw3btoctHVgYZh5jzc7YSq4Bsf5tipb2zu7e9X92sHh0fGJfXrW1UmmKPNoIhLVD4lmgkvmAQfB+qliJA4F64XT+8LvzZjSPJFPME+ZH5Ox5BGnBIwU2PYwJjAJo9xbBDkEziKw607DWQJvErckdVSiE9hfw1FCs5hJoIJoPXCdFPycKOBUsEVtmGmWEjolYzYwVJKYaT9fJl/gK6OMcJQo8yTgpfp7Iyex1vM4NJNFTr3uFeJ/3iCD6M7PuUwzYJKuDkWZwJDgogY84opREHNDCFXcZMV0QhShYMqqmRLc9S9vkm6z4bYarcebertZ1lFFF+gSXSMX3aI2ekAd5CGKZugZvaI3K7derHfrYzVascqdc/QH1ucPy+iTuw== U t 0 AAACKXicbVDLSsNAFJ34tr6iLt0EiyCIJSla3QgFNy4rWBU6MUwmk3Zw8mDmRihDfseNv+JGQVG3/oiTtgtfB4Y5nHMv994T5oIrcN13a2p6ZnZufmGxtrS8srpmr29cqqyQlHVpJjJ5HRLFBE9ZFzgIdp1LRpJQsKvw9rTyr+6YVDxLL2CYMz8h/ZTHnBIwUmC3sWAxYI0HBDQOMxGpYWI+XZRloCFw9zhOCAxkos/KEkveHwA2Dj/ZPyxvtFsGdt1tuCM4f4k3IXU0QSewn3GU0SJhKVBBlOp5bg6+JhI4Fays4UKxnNBb0mc9Q1OSMOXr0aWls2OUyIkzaV4Kzkj93qFJoqoDTGW1tfrtVeJ/Xq+A+NjXPM0LYCkdD4oL4UDmVLE5EZeMghgaQqjkZleHDogkFEy4NROC9/vkv+Sy2fBajdb5Qb3dnMSxgLbQNtpFHjpCbXSGOqiLKLpHj+gFvVoP1pP1Zn2MS6esSc8m+gHr8wsLYqjg { ˆ u t 0 + i H } 0 i = → 5 : Regional patch embeddings AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclRmR6rLoxmUF+4B2LJlMpg3NJEOSUcrQ/3DjQhG3/os7/8ZMOwttPRByOOdecnKChDNtXPfbWVldW9/YLG2Vt3d29/YrB4dtLVNFaItILlU3wJpyJmjLMMNpN1EUxwGnnWB8k/udR6o0k+LeTBLqx3goWMQINlZ66AeSh3oS2yvT00Gl6tbcGdAy8QpShQLNQeWrH0qSxlQYwrHWPc9NjJ9hZRjhdFrup5ommIzxkPYsFTim2s9mqafo1CohiqSyRxg0U39vZDjWeTQ7GWMz0oteLv7n9VITXfkZE0lqqCDzh6KUIyNRXgEKmaLE8IklmChmsyIywgoTY4sq2xK8xS8vk/Z5zavX6ncX1cZ1UUcJjuEEzsCDS2jALTShBQQUPMMrvDlPzovz7nzMR1ecYucI/sD5/AFPRZMP s : Global patch embeddings AAAB8XicbVDLSgMxFL1TX7W+qi7dBIvgqsyIVJdFNy4r2ge2pWTSO21oJjMkGaEM/Qs3LhRx69+482/MtLPQ6oHA4Zx7ybnHjwXXxnW/nMLK6tr6RnGztLW9s7tX3j9o6ShRDJssEpHq+FSj4BKbhhuBnVghDX2BbX9ynfntR1SaR/LeTGPsh3QkecAZNVZ66IXUjP0gvZsNyhW36s5B/hIvJxXI0RiUP3vDiCUhSsME1brrubHpp1QZzgTOSr1EY0zZhI6wa6mkIep+Ok88IydWGZIgUvZJQ+bqz42UhlpPQ99OZgn1speJ/3ndxASX/ZTLODEo2eKjIBHERCQ7nwy5QmbE1BLKFLdZCRtTRZmxJZVsCd7yyX9J66zq1aq12/NK/SqvowhHcAyn4MEF1OEGGtAEBhKe4AVeHe08O2/O+2K04OQ7h/ALzsc3yJKRAg== S Key position Identification Position embeddings Global-to-position Attention Position-to-regional Attention Q K V Q K V Concat & Projection Orography Time embedding Adaptive Layer Normalization Forecast lead time AAAB73icbVBNS8NAEJ34WetX1aOXxSJ4KolI9VjUg8cK9gPaUDbbTbt0s4m7E6GE/gkvHhTx6t/x5r9x2+agrQ8GHu/NMDMvSKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjjVjDdYLGPdDqjhUijeQIGStxPNaRRI3gpGN1O/9cS1EbF6wHHC/YgOlAgFo2ildveWS6QEe6WyW3FnIMvEy0kZctR7pa9uP2ZpxBUySY3peG6CfkY1Cib5pNhNDU8oG9EB71iqaMSNn83unZBTq/RJGGtbCslM/T2R0ciYcRTYzoji0Cx6U/E/r5NieOVnQiUpcsXmi8JUEozJ9HnSF5ozlGNLKNPC3krYkGrK0EZUtCF4iy8vk+Z5xatWqvcX5dp1HkcBjuEEzsCDS6jBHdShAQwkPMMrvDmPzovz7nzMW1ecfOYI/sD5/AGTB4+v ! t Global Forecast Regional Forecast Pretrained and frozen T rain from scratch T arget region in global field Figure 1: Left: The Architecture of Global-Regional W eather Forecasting Model: Synoptic-scale context ( M global ) drives mesoscale regional renement ( M regional ) via ScaleMixer , ensuring cross-scale coupling and consistency . Right: ScaleMixer Module: Bidirectional Cross-Scale Coupling via Key Position Identication and encoding. Key components include (1) key position identication, (2) coupling regional dynamics with global context via global-to-position and position-to-regional attention, and (3) global token adaptation incorporating regional features. with ⊙ denoting element-wise product, c = { c 𝑖 } 𝑚 𝑖 = 1 ∈ R 𝑚 × 2 ( p 𝑖 ∈ [ 0 : 𝐻 / 𝑃 − 1 ] × [ 0 : 𝑊 / 𝑃 − 1 ] ) representing the coordinates of 𝑚 selected tokens, and h ∈ R 𝑚 × 𝑑 their corresponding embeddings. Regional features alignment with global context. T o eectively bridge the scale gap between global context and regional features, we design a two-stage cross-attention mechanism operating on identied key positions. Directly correlating all global and regional tokens is computationally expensive and may weaken localize d meteorological features. Instead, we rst condense global informa- tion into a sparse set of dynamically identied key positions, then propagate these enriched features to regional tokens. Global-to-Position Attention rst aggregates global context into the key positions. Using the concatenated token embeddings and coordinates of key positions h | | c ∈ R 𝑚 × ( 𝑑 + 2 ) as queries, and the global tokens S as keys and values, we compute: Glo-to-Pos ( h | | c , S , S ) = Somax  ( W 𝑄 · h | | c ) ( W 𝐾 S ) ⊤ √ 𝑑  W 𝑉 S , (7) h global | | c ′ = h | | c + Glo-to-Pos ( h | | c , S , S ) , (8) where W 𝑄 , W 𝐾 and W 𝑉 are linear projections. T o better model the dynamics of key positions, the key representations are further rened by incorporating regional features via bilinear interpolation at the updated coordinates c ′ h ′ = MLP Proj  Bilinear ( s , c ′ ) | | h global  . (9) Position-to-regional attention subsequently integrates the globally informed key features into regional tokens s : Pos-to-Reg ( s , h ′ | | c ′ , h ′ | | c ′ ) = Somax W ′ 𝑄 s ( W ′ 𝐾 · h ′ | | c ′ ) ⊤ √ 𝑑 ! W ′ 𝑉 · h ′ | | c ′ , (10) s ′ = s + Pos-to-Reg ( s , c ′ | | p ′ , c ′ | | p ′ ) . (11) with distinct learnable projections W ′ 𝑄 , W ′ 𝐾 , and W ′ 𝑉 . This two-step attention mechanism ensures that synoptic-scale dynamics are ef- fectively integrated into high-resolution regional features, enabling globally consistent and locally accurate weather prediction. Global token adaptation with regional feedback. T o enable large- scale dynamics to adapt to regional details, global tokens spatially aligned with regional tokens ( S aligned ) are updated via token-wise concatenation and an adapter MLP: S ′ aligned = Concat  S aligned , s ′  ∈ R 𝑛 × 2 𝑑 , (12) S ′′ aligned = S aligned + MLP Adapter  S ′ aligned  . (13) These adapted tokens S ′′ aligned replace their counterparts in the global token sequence, allowing regional ne-scale information to r ecur- sively inuence the global context in subsequent encoder layers. 3.4 Model Optimization The optimization schedule follows a three-stage training protocol: (1) global model pretraining, (2) regional model one-step training ( 6 -hours ahead), and (3) regional model autoregressive roll-out ne-tuning ( 12 ∼ 48 -hours ahead). Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA Training objective. For global model pretraining, we employ the weighted mean absolute error (MAE) across multivariate atmo- spheric states. Decomp osing the weather state U 𝑡 into surface-le vel variables and upper-air atmospheric variables, ˆ U 𝑡 = ( ˆ S 𝑡 , ˆ A 𝑡 ) and U 𝑡 = ( S 𝑡 , A 𝑡 ) , the loss can be written as: L ( ˆ U , U 𝑡 ) = 1 𝑉 𝑆 + 𝑉 𝐴 " 𝑉 𝑆  𝑘 = 1 𝑤 𝑆 𝑘 𝐻 × 𝑊 𝐻  𝑖 = 1 𝑊  𝑗 = 1 | ˆ S 𝑡 𝑖 , 𝑗 ,𝑘 − S 𝑡 𝑖 , 𝑗 ,𝑘 | ! + 𝑉 𝐴  𝑘 = 1 1 𝐻 × 𝑊 × 𝑃 𝑃  𝑝 = 1 𝑤 𝐴 𝑐 , 𝑘 𝐻  𝑖 = 1 𝑊  𝑗 = 1 | ˆ A 𝑡 𝑖 , 𝑗 ,𝑝 ,𝑘 − A 𝑡 𝑖 , 𝑗 ,𝑝 ,𝑘 | ! # , (14) where 𝑉 𝐴 and 𝑉 𝐾 are numbers of upper-air and surface varibles, 𝑃 is the numb er of pressure lev els, 𝑤 𝑆 𝑘 is the weight associated with surface-lev el variable 𝑘 , and 𝑤 𝐴 𝑘 , 𝑐 is the weight associated with atmospheric variable 𝑘 at pressure lev el 𝑝 . During both one-step training and roll-out ne-tuning of re- gional model, we directly using MAE as the objective: L ( ˆ 𝒖 , 𝒖 𝑡 ) = 1 𝑉 reg 𝑉 reg  𝑘 = 1 1 ℎ × 𝑤 ℎ  𝑖 = 1 𝑤  𝑗 = 1 | ˆ 𝒖 𝑡 𝑖 , 𝑗 ,𝑘 − 𝒖 𝑡 𝑖 , 𝑗 ,𝑘 | ! , (15) where 𝑉 reg is the number of variables in regional analyses. Roll-out Fine-tuning. The model makes forecasts for the ne xt 6 hours in one step, and longer forecasts are obtaine d by rolling auto-regressively , which may suer from error accumulations. T o enhance the multi-step forecasting accuracy , we adopt a rolling-out ne-tuning strategy for 48 hours. It is performed by predicting ˆ U 𝑡 0 + 6 ( 𝑛 + 1 ) + 6H and  ˆ 𝒖 𝑡 0 + 6 ( 𝑛 + 1 ) + 𝑖 H  6 𝑖 = 1 with the predictions from the previous step ˆ U 𝑡 0 + 6 𝑛 + 6H and  ˆ 𝒖 𝑡 0 + 6 𝑛 + 𝑖 H  6 𝑖 = 1 as input, for 𝑛 = 0 , 1 , . .. recursively , and optimize the loss function of 3.4 and 3.4 ov er the 48 time spans. Implementation and Training Details. The global Transformer encoder comprises 24 layers ( 𝑀 = 24 ), while the regional encoder and ScaleMixer modules each contain 4 layers ( 𝑘 = 4 ). The model employs a hidden dimension of 1536 and identies 𝑚 = 64 key positions for cross-scale interaction in each ScaleMixer module. The framework contains 1.07 billion parameters, with the global model M global accounting for 736 million. Full implementation details are summarized in Section A. The global model was pretrained for 150 , 000 steps on 32 × NVIDIA A800 GP Us using the A damW optimizer [ 19 ] with a per-GP U batch size of 1. A cosine learning rate sche dule was applied with linear warmup over 1,000 steps, de caying from 7 × 10 − 4 to 1 × 10 − 7 . Regional model training followed identical hyperparameters over 80 , 000 it- erations on 8 × A800 GPUs, with M global parameters fr ozen. During regional roll-out ne-tuning, the model was trained for 100 , 000 steps at a xed learning rate of 1 × 10 − 6 . The one-time training takes approximately 20 days on 8 NVIDIA A800 GP Us, and the inference stage is quite ecient, taking less than 3 minutes for 48-hour forecasting on a single GP U. 4 Experiments T o resolve high-impact meteorological phenomena such as con- vective storms and boundary layer dynamics, weather prediction (a) Latitude-weighted RMSE for 7 surface variables (2024/10–2024/12 hindcast period) (b) Latitude-weighted RMSE for 7 surface variables (2025/01–2025/04 operational period) Figure 2: ScaleMixer demonstrates superior deterministic forecasting skill compared to IFS-HRES at 0.05 ° resolu- tion. Seven surface variables (T2M, U10, V10, Q, P , TCC, and SSRD) are evaluated using latitude-weighted RMSE (lower values indicate sup erior performance). (a) Hind- cast results show ScaleMixer outperforms IFS-HRES across all variables during 2024/10–2024/12. ( b) Operational fore- casts conrm ScaleMixer maintains superiority p erformance (2025/01–2025/04). systems require high-resolution spatial-temporal modeling capabil- ities. W e evaluate ScaleMixer through two complementary experi- mental paradigms: (1) hindcast for verication using reanalysis data, and (2) operational forecast to assess predictive skill under dynamically evolving initial conditions consistent with production environment management system. 4.1 Datasets Global Reanalysis (ERA5). The Eur opean Centre for Medium- Range W eather Forecasts (ECMWF) ERA5 reanalysis provides 0.25 ° horizontal resolution (1440 × 720 latitude-longitude grid) atmo- spheric states with 37 hybrid pressure lev els. Spanning 1979–2015, this dataset serves as the primar y training source for the global model ( M global ), with 2016 reserved for validation. ERA5’s spa- tiotemporal continuity and multivariate delity make it a standard for data-driven weather modeling [ 12 ]. Similar to most AI-based weather forecasting foundation models, we sele ct 5 atmospheric variables (u-comp onent of wind spe ed, v-comp onent of wind spe ed, temperature, specic humidity , geopotential) at 13 pressure levels (50 hPa, 100 hPa, 150 hPa, 200 hPa, 250 hPa, 300 hPa, 400 hPa, 500 hPa, 600 hPa, 700 hPa, 850 hPa, 925 hPa, 1000 hPa), 6 surface vari- ables and 6 static variables from the raw ERA5 dataset, and use z-score normalization for each variable. All variables are inputs of M 𝑔𝑙𝑜 𝑏𝑎𝑙 , and all variables except the static ones are used as model outputs. Global Operational A nalysis. Operational analysis utilize the ini- tial conditions from ECMWF’s High-Resolution Deterministic Pre- diction (HRES) system, which assimilates observations through Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun T able 1: A verage RMSE and ACC across Δ 𝑡 = 1 ∼ 48 hours lead time regional weather hindcast at 0.05 ◦ resolution. The b est results are bolded. V ariable Latitude-weighted RMSE Latitude-weighted A CC IFS-HRES Baguan M global M regional OneForecast LAM ScaleMixer Δ RMSE IFS-HRES M global M regional OneForecast LAM ScaleMixer Δ ACC T2M 1.815 1.928 1.991 2.452 1.571 1.742 1.382 ↓ 23.86% 0.862 0.881 0.845 0.897 0.901 0.921 ↑ 6.84% U10 1.934 1.956 1.967 2.631 2.251 1.889 1.644 ↓ 14.99% 0.721 0.753 0.716 0.744 0.742 0.793 ↑ 9.99% V10 1.928 1.937 1.970 2.886 2.316 1.905 1.617 ↓ 16.13% 0.723 0.751 0.713 0.732 0.737 0.785 ↑ 8.57% Q 0.807 0.611 0.811 1.243 0.714 0.676 0.559 ↓ 30.73% 0.774 0.768 0.771 0.751 0.781 0.812 ↑ 4.91% P 13.27 22.97 10.27 4.241 2.545 2.14 1.874 ↓ 85.88% 0.887 0.902 0.894 0.899 0.912 0.920 ↑ 3.72% TCC 38.83 31.83 (N/A) 43.71 37.74 34.13 28.76 ↓ 25.93% 0.563 (N/A) 0.617 0.621 0.679 0.721 ↑ 28.06% SSRD 51.25 41.18 (N/A) 68.42 52.56 42.41 33.26 ↓ 34.04% 0.824 (N/A) 0.834 0.838 0.850 0.887 ↑ 7.64% (1) Δ RMSE and Δ ACC donote RMSE and ACC improv ement of ScaleMixer compared to IFS-HRES. (2) M global denotes standalone global model, and M regional denotes uncoupled regional model. (3) Results of IFS-HRES (0.1 ° ) and Baguan, and M global (0.25 ° ) are corrected and downscaled to target grid (0.05 ° ) using a pretrained bias-correction and downscaling model (based on a ViT backbone traine d on ERA5 and CLDAS data) for comparison. (4) T2M: 2m temperature; U10/V10: 10m wind components; Q: Specic humidity; P: Surface Pressure; TCC: T otal cloud cover; SSRD: Radiation ux (surface solar radiation downward). 2024/10/30 22 UTC 2024/10/31 8 UTC CLDAS ScaleMixer IFS-HRES CLDAS ScaleMixer IFS-HRES 2m temperature (°C) 10m wind speed (m/s) Wind Ridge Leeward slope Widward slope Figure 3: Left: T emporal evolution of 10m wind speed predictions initialized at 2024/10/30 12 U TC over the Hengduan Moun- tains (25.0–35.0 ° N, 95.0–105.0 ° E), China. Black arrows r epresent wind ow elds. ScaleMixer resolv es enhanced resolution of orographic wind heterogeneity (peaking >10 m/s at crests and <2 m/s in valleys). Right: Corresponding temperature elds. Foehn eects are illustrated in the picture, characterized by 4–8 ° C leeward warming relativ e to windward slopes through adiabatic compression processes. ScaleMixer captures ne-grained temp erature gradients, contrasting with IFS-HRES exhibiting spatial smoothing forecasts. 4D-variational data assimilation [ 27 ]. The 0.1 ° analysis elds (inter- polated to ERA5 resolution, 0.25 ° ) pro vide dynamically real-time initial conditions for ScaleMixer’s operational deployment during 2025/01–2025/04. Regional A nalysis (CLDAS). The China Meteorological Adminis- tration’s Land Data Assimilation System (CLDAS) is a near-realtime regional reanalysis product. It contains 7 critical surface varialbes with 1 hour intervals, covering East A sia (0–65 ° N, 60–160 ° E) area at a spatial resolution 0f 0.01 ° . T o reduce computational comple x- icity and maintain detailed information, we interp olate the data to 0.05 ° latitude-longitude gridding. The full dataset covers from 2020 onward. W e use data from 2022/01-2024/09 for training the global-regional model ( M global − regional ), with two independent eval- uation periods dened as: Hindcast evaluation (ERA5 input): 2024/10–2024/12 and Operational evaluation (operational anal- ysis input): 2025/01–2025/04. All raw variables are also z-score normalized before fed into the model. More details of datasets and experimental settings can b e found in Section B. 4.2 Evaluation Metrics and Baselines Evaluation metrics. T o measure the performance of regional weather forecasting, we evaluate all methods using latitude-weighted root mean squared error (RMSE) and latitude-weighted anomaly correlation coecient (A CC). More details of metrics can be found in Section C. Baselines. W e comprehensiv ely evaluate ScaleMixer against sev- eral strong baselines: (1) our internal global model ( M global ); (2) a standalone regional model initialized from CLD AS data without global coupling ( M regional ); (3) an AI-based global forecast model Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA Baguan [ 22 ], which demonstrates superior performance among a set of state-of-the-art data-driven weather mo dels on W eather- Bench 1 and provides more comprehensiv e surface meteorological variables than other global models like Pangu- W eather [ 2 ] and GraphCast [ 17 ] (detailed comparisons can be found in Section F); (4) the operational high-resolution NWP system IFS-HRES from ECMWF [ 9 ], which serves as a gold-standard reference. The res- olutions of AI-based global forecasts and IFS-HRES ar e 0.25 ◦ and 0.1 ◦ , respectively , and there may exist systematic bias between their forecast values and CLDAS. For a fair comparison, we employ a downscaling and correction model to map the original forecast values to the target 0.05 ◦ grids. The downscaling and correction model is trained on ERA5 and CLD AS, using Swin T ransformer [ 18 ] as the backbone, containing 4 layers blo ck and a hidden dimension of 192; (5) OneForecast [ 11 ], which introduces a Neural Nested Grid method that typically passes boundar y feature maps between grids of dierent resolutions via dir ect interpolation and concatenation; and (6) Limited Area Model (LAM), which is built upon M 𝑟 𝑒 𝑔𝑖𝑜𝑛𝑎𝑙 (Standalone Regional Model) but enhanced to take both regional initial conditions and external global forecasts as input. 4.3 Skillful Regional W eather Forecasting at 0.05 ◦ Resolution W e focus on short-term forecasting for next 48 hours, primarily because such outlooks have a more immediate impact on soci- etal functions and daily routines. Furthermore, this is the period that NWP models have optimal performances. The deterministic forecasting results of ScaleMixer and baselines are summarized in T able 1, and Figure 2, evaluating forecast skill across Δ 𝑡 = 1 ∼ 48 hours lead time. Hindcast evaluation. For the ERA5-driven hindcast period (2024/10 – 2024/12), ScaleMixer achieves signicant improvements across all seven surface variables (T2M, U10, V10, Q, P , TCC, and SSRD ) compared to both standalone global/regional baselines and IFS- HRES (T able 1), verifying the eectiveness of coupling global and regional scales. ScaleMixer achieves 40.86% lower latitude-weighted RMSE and 9.96% higher ACC compared to IFS-HRES, indicating enhanced resolution capability for mesoscale conv ective systems and boundary layer dynamics. As sho wn in Figure 2a, performance advantages persist consistently across forecast horizons. Operational forecast evaluation. Under dynamically evolving op- erational initial conditions (2025/01–2025/04), ScaleMixer main- tains sup erior skill despite real-time analysis eld uncertainties (Figure 2b). Compared to IFS-HRES at 0.1 ° resolution, statistically signicant RMSE improv ements are sustained through 48-hour lead times under operational constraints, with pronounced im- provements in 1–24-hours ahead predictions where regional-scale processes dominate. Evaluation against station observations. Although we have used downscaling models to map IFS-HRES to CLD AS, there may still ex- ist some fairness concerns since ScaleMixer is end-to-end trained on the ground truth. W e make extensive comparisons against station observations, which is more fair . The real-time station observation 1 https://sites.research.google/gr/weatherbench/scorecards-2020/ Figure 4: Station distrbution map. The station observation dataset contains 2216 weather stations across China, which record hourly observations of mete orological variables such as temperature, air pressure , and wind spee d. dataset is a product pro vided by China Meterological Administra- tions. It contains more than 2000 stations distribute d across China, with a portion located in complex terrain areas such as plateaus and mountainous regions, as shown in Figure 4. These stations have undergone quality control and pr ovide hourly obser vations of several meteorological variables. W e focus on some key variables with high importance for downstream usages and provided in our model: 2m temperature (T2M), 2m dewpoint temp erature (D2M) and 10m wind speed (WS10). In implementation, we interpolate the grid predicitions of ScaleMixer and IFS-HRES to stations according to their latitudes and longitudes. T able 2 shows the RMSE of IFS- HRES, Baguan and ScaleMixer . Compared to IFS-HRES and Baguan, ScaleMixer achieves an average impro vement of 27.63% across the three meteorological variables. T able 2: A verage RMSE across Δ 𝑡 = 1 ∼ 48 hours lead time regional w eather operational forecasting against weather station observations. V ariable IFS-HRES Baguan ScaleMixer Δ RMSE T2M 3.158 3.108 1.822 ↓ 41.4% D2M 2.812 2.655 1.789 ↓ 36.4% WS10 1.401 1.402 1.329 ↓ 5.10% 4.4 Case Studies Orographic-induced wind and temperature. As e xemplied in Fig- ure 3 (left) for wind prediction of the comple x terrain regions in the Hengduan Mountains (25.0–35.0 ° N, 95.0–105.0 ° E) China, ScaleMixer (0.05 ° ) resolves wind ch aracteristics across topographic gradients: maximum wind speed at mountain crests (e xceeding 10 m/s) and deceleration within valleys (<2 m/s). This contrasts with IFS-HRES (0.1 ° ) which exhibits systematic underestimation of orographic wind characteristics due to insucient subgrid-scale orographic parametrization. Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun T able 3: Ablation study of ScaleMixer components on 48-hour forecast performance. Model V ariant Conguration Details T2M RMSE U10 RMSE ScaleMixer A daptive Sampling + Bidirectional 1.382 1.644 V ariant A Random Sampling + Bidirectional 1.605 ( +16.1%) 1.882 ( +14.5%) V ariant B Fixed Uniform Grid + Bidirectional 1.512 ( +9.4%) 1.795 (+9.2%) V ariant C Adaptive Sampling + Unidirectional 1.468 (+6.2%) 1.721 (+4.7%) V ariant D No Interaction (Standalone) 1.991 ( +44.1%) 1.967 ( +19.6%) Moreover , the same orographic forcing that generates wind heterogeneity also drives temperature variations. ScaleMixer re- solves pronounced temp erature contrasts across elevation gradients (Fig. 3, right), with leeward slopes exhibiting 4–8 ° C warming rela- tive to windward sides, a canonical Foehn ee ct signature 2 , arising from adiabatic compression of descending air masses. In contrast, IFS-HRES underestimates these temperature gradients, failing to capture dependencies between terrain steepness and temperature variation. The enhanced resolution with data-driven method in ScaleMixer enables superior representation ne-grained weather features in complex terrain. Additional visualizations of forecasts are pr ovided in Section E, which demonstrate the framework’s capability to captur e high- resolution meteorological details. 4.5 Ablation Studies T o rigorously validate the architectural design of ScaleMixer , we conducted ne-graine d ablation studies focusing on two core di- mensions: the sampling strategy for key p osition identication and the directionality of cross-scale coupling. Furthermore, we analyze d the sensitivity of the model to critical hyperparameters. All ablation experiments were conducted on the validation set with a forecast lead time of Δ 𝑡 = 24 hours. Eectiveness of ScaleMixer Components. W e compared our proposed framework against four variants: ( A) Random Sampling , replacing adaptive identication with random selection; (B) Fixed Uniform Grid , utilizing a static grid for interaction; (C) Unidirectional Coupling , allowing only global-to-r egional information ow; and (D) No Interaction , equivalent to the standalone regional mo del. The results are summarized in T able 3. The results demonstrate: (1) Eectiveness of A daptive Sam- pling: The proposed adaptive key p osition identication signi- cantly outperforms the Fixed Uniform Grid (V ariant B), reducing T2M RMSE by 9.4%. This demonstrates that dynamically focusing computation on meteorologically active regions ( e.g., high-gradient boundaries) is far more ecient than uniform processing, which may waste capacity on static areas; and (2) Ee ctiveness of Bidi- rectional Coupling: Compared to unidirectional coupling (V ariant C), our bidirectional mechanism achiev es a 6.2% improv ement in T2M RMSE. This conrms that allowing high-r esolution regional features to explicitly r ene global tokens creates a ne cessary close d- loop feedback, enhancing the consistency of the synoptic-scale context. 2 https://en.wikipedia.org/wiki/Foehn_wind Hyperparameter Sensitivity . W e further investigated the sen- sitivity of the regional encoder depth ( 𝑘 ). As shown in T able 5, increasing the number of layers fr om 𝑘 = 2 to 𝑘 = 4 yields signif- icant gains, while 𝑘 = 8 oers diminishing returns with doubled computational cost. Based on these r esults, w e adopted 𝑘 = 4 as the default conguration to balance forecasting accuracy and inference eciency . T able 4: Sensitivity analysis of Regional Encoder Layers ( 𝑘 ). Metric Regional Encoder Layers ( 𝑘 ) 𝑘 = 2 𝑘 = 4 𝑘 = 8 T2M RMSE 1.485 1.382 1.379 U10 RMSE 1.752 1.644 1.641 Inference Time 22ms 28ms 51ms 5 Conclusion In this paper , we present a multiscale de ep learning framework for high-resolution regional weather forecasting that bridges synoptic- scale dynamics with localized mesoscale processes. By integrating a pretrained global foundation model and a nov el bidirectional global-regional coupling module, ScaleMixer achieves state-of-the- art performance in resolving complex weather phenomena at 0 . 05 ◦ ( ∼ 5 km ) resolution. Experimental results establish ScaleMixer as a robust data-driven approach for regional weather forecasting. In the future, w e will extend the framew ork to probabilistic for ecasting and assimilate multi-modal observations (e.g., radar , satellite) for real-time forecasting. Limitations and Ethical Considerations Limitations. While ScaleMixer demonstrates signicant advance- ments in regional weather forecasting, it is a purely data-driven model and lacks explicit equation-based constraints (e .g., Navier–Stokes or hydrostatic balance), which can cause unphysical artifacts in long roll-outs, espe cially in extreme regimes. Future work will add physics-informed regularization to better enfor ce conservation laws. Ethical Considerations Both global reanalysis and global op- erational analysis data are publicly available . The CLD AS dataset and weather station obser vations were obtaine d from the China Meteorological Administration, and we have been granted permis- sion to use them for academic resear ch, which poses no potential ethical risks. Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA References [1] Simon Adamov , Joel Oskarsson, Leif Denby , T omas Landelius, Kasper Hintz, Simon Christiansen, Irene Schicker , Carlos Osuna, Fredrik Lindsten, Oliver Fuhrer , et al . 2025. Building Machine Learning Limited Area Models: Kilometer-Scale W eather Forecasting in Realistic Settings. arXiv preprint arXiv:2504.09340 (2025). [2] Kaifeng Bi, Lingxi Xie , Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. 2023. Accurate medium-range global weather for ecasting with 3D neural networks. Nature 619, 7970 (2023), 533–538. [3] Zied Ben Bouallègue, Mariana C. A. Clare , Linus Magnusson, Estibaliz Gascón, Michael Maier-Gerber , Martin Janoušek, Mark Rodwell, Florian Pinault, Jesper S. Dramsch, Simon T . K. Lang, Baudouin Raoult, Florence Rabier , Matthieu Cheval- lier , Irina Sandu, Peter Dueben, Matthew Chantry , and F lorian Pappenb erger . 2024. The Rise of Data-Driven W eather Forecasting: A First Statistical Assess- ment of Machine Learning–Based W eather Forecasts in an Op erational-Like Context. Bulletin of the A merican Meteorological Society 105, 6 (2024), E864 – E883. doi:10.1175/BAMS- D- 23- 0162.1 [4] Kang Chen, T ao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, et al . 2023. Feng Wu: Pushing the Skillful Global Medium-range W eather Forecast beyond 10 Days Lead. arXiv preprint arXiv:2304.02948 (2023). [5] Lei Chen, Xiaohui Zhong, Feng Zhang, Y uan Cheng, Yinghui Xu, Y uan Qi, and Hao Li. 2023. FuXi: A cascade machine learning forecasting system for 15-day global weather forecast. npj Climate and Atmospheric Science 6, 1 (2023), 190. [6] Jean Coier . 2011. Fundamentals of numerical weather prediction . Cambridge University Press, Cambridge; New Y ork. [7] Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk W eissenborn, Xi- aohua Zhai, Thomas Unterthiner , Mostafa Dehghani, Matthias Minderer , Ge org Heigold, Sylvain Gelly , et al . 2020. An image is worth 16x16 wor ds: T ransformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020). [8] Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk W eissenborn, Xi- aohua Zhai, Thomas Unterthiner , Mostafa Dehghani, Matthias Minderer , Ge org Heigold, Sylvain Gelly , Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is W orth 16x16 W ords: Transformers for Image Recognition at Scale. In International Conference on Learning Representations . [9] ECMWF. 2023. IFS Documentation CY48R1 . ECMWF. [10] European Centre for Medium-Range W eather Forecasts. 2023. Description of the Integrated Forecasting System (IFS) (cycle 48r1 ed.). https://www.ecmwf.int/en/ publications/manuals/deterministic- model [11] Y uan Gao, Hao Wu, Ruiqi Shu, Huanshuo Dong, Fan Xu, Rui Chen, Yibo Y an, Qingsong W en, Xuming Hu, Kun W ang, et al . 2025. OneForecast: A Univer- sal Framework for Global and Regional W eather Forecasting. arXiv preprint arXiv:2502.00338 (2025). [12] Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater , Julien Nicolas, Car ole Peubey , Raluca Radu, Dinand Schepers, et al . 2020. The ERA5 global reanalysis. Quarterly Journal of the Royal Meteoro- logical Society 146, 730 (2020), 1999–2049. [13] James Hurrell, Marika Holland, Peter Gent, Steven Ghan, Jennifer Kay , Paul Kushner , J-F Lamarque, William Large, D Lawrence , Keith Lindsay , et al . 2013. The community earth system model: a framew ork for collab orative research. Bulletin of the A merican Mete orological Society 94, 9 (2013), 1339–1360. [14] Ryan Keisler . 2022. Forecasting global weather with graph neural networks. arXiv preprint arXiv:2202.07575 (2022). [15] Dmitrii Kochkov , Janni Y uval, Ian Langmore, Peter Norgaard, Jamie Smith, Grin Mooers, Milan Klöwer , James Lottes, Stephan Rasp, Peter Düben, et al . 2024. Neural general circulation models for weather and climate. Nature 632, 8027 (2024), 1060–1066. [16] Remi Lam, Alvar o Sanchez-Gonzalez, Matthew Willson, Peter Wirnsb erger , Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, W eihua Hu, et al . 2023. Learning skillful medium-range global weather forecasting. Science 382, 6677 (2023), 1416–1421. [17] Remi Lam, Alvar o Sanchez-Gonzalez, Matthew Willson, Peter Wirnsb erger , Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, W eihua Hu, et al . 2023. Learning skillful medium-range global weather forecasting. Science 382, 6677 (2023), 1416–1421. [18] Ze Liu, Y utong Lin, Y ue Cao, Han Hu, Yixuan W ei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer us- ing shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision . 10012–10022. [19] Ilya Loshchilov and Frank Hutter . 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017). [20] Morteza Mardani, Noah Brenowitz, Y air Cohen, Jaideep Pathak, Chieh-Y u Chen, Cheng-Chin Liu, Arash V ahdat, Mohammad Amin Nabian, T ao Ge, Akshay Subramaniam, et al . 2025. Residual corrective diusion modeling for km-scale atmospheric do wnscaling. Communications Earth & Environment 6, 1 (2025), 124. [21] Thomas Nils Nipen, Håvar d Homleid Haugen, Magnus Sikora Ingstad, Even Mar- ius Nordhagen, Aram Farhad Shaq Salihi, Paulina T edesco, Ivar Ambjørn Seier- stad, Jørn Kristiansen, Simon Lang, Mihai Alexe, et al . 2024. Regional data-driven weather modeling with a global stretched-grid. arXiv preprint (2024). [22] Peisong Niu, Ziqing Ma, Tian Zhou, W eiqi Chen, Lefei Shen, Rong Jin, and Liang Sun. 2025. Utilizing strategic pre-training to reduce overtting: Baguan-a pre-trained weather forecasting model. In Proceedings of the 31st A CM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2 . 2186–2197. [23] Joel Oskarsson, T omas Landelius, and Fr edrik Lindsten. 2023. Graph-base d neural weather prediction for limite d area modeling. arXiv preprint (2023). [24] William Peebles and Saining Xie. 2023. Scalable diusion models with transform- ers. In Proceedings of the IEEE/CVF international conference on computer vision . 4195–4205. [25] Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Timo Ewalds, Andrew El-Kadi, Jacklynn Stott, Shakir Mohame d, Peter Battaglia, Remi Lam, and Matthew Willson. 2023. GenCast: Diusion-based ensemble forecasting for medium-range weather . arXiv preprint arXiv:2312.15796 (2023). [26] Haoyu Qin, Y ungang Chen, Qianchuan Jiang, Pengchao Sun, Xiancai Y e, and Chao Lin. 2024. Metmamba: Regional weather forecasting with spatial-temporal mamba model. arXiv preprint arXiv:2408.06400 (2024). [27] Florence Rabier and Zhiquan Liu. 2003. V ariational data assimilation: theory and overview . In Proc. ECMWF Seminar on Recent Developments in Data Assimilation for Atmosphere and Ocean, Reading, UK . 29–43. [28] Stephan Rasp, Peter D Dueben, Sebastian Scher, Jonathan A W eyn, Soukayna Mouatadid, and Nils Thuerey . 2020. W eatherBench: a benchmark data set for data-driven weather forecasting. Journal of Advances in Modeling Earth Systems 12, 11 (2020), e2020MS002203. [29] Ashish V aswani, Noam Shazeer , Niki Parmar , Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser , and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017). [30] Ruolan Xiang, Christian R Steger , Shuping Li, Loïc Pellissier , Silje Lund Sør- land, Sean D Willett, and Christoph Schär . 2024. Assessing the regional climate response to dierent Hengduan Mountains geometries with a high-resolution regional climate model. Journal of Geophysical Research: Atmospheres 129, 6 (2024), e2023JD040208. [31] Pengbo Xu, Xiaogu Zheng, Tianyan Gao, Yu W ang, Junping Yin, Juan Zhang, Xuanze Zhang, San Luo, Zhonglei W ang, Zhimin Zhang, et al . [n. d.]. YingLong- weather: AI-Based Limited Area Models for Forecasting. ([n. d.]). Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun A Implementation Details In our multiscale regional weather for ecasting framework, the back- bones of the global model ( M global ) and regional model ( M regional ) are based on ViT [ 8 ]. The framework contains 1.07 billion parame- ters, with the global model M global accounting for 736 million. The hyperparameter congurations of the mo del are summarized in T able 5. B Dataset Details and Experimental Settings In our experiments, we use the preprocessed ERA5 data from W eatherBench [ 28 ]. EAR5 is a well-acknowledged weather fore- casting benchmark dataset and it is widely used in data-driven weather forecasting methods. W eatherBench pr ocessed the raw ERA5 dataset 3 , which includes 8 atmospheric variables across 13 pressure lev els, 6 surface variables, and 3 static variables. W e nor- malize all the inputs via z-score normalization for each variable at each pressure le vel. Also, w e apply the inverse normalization for the predictions of future states for performance evaluation. W e collected and processed op erational analysis data, which are used for operational forecast, from initial conditions of ECMWF’s High-Resolution Deterministic Prediction (HRES) system, assim- ilating obser vations with 4D-variational data assimilation. The 0.1 ° analysis elds (interpolated to ERA5 resolution, 0.25 ° ) provide dynamically real-time initial conditions. W e set and process the atmosphere variables consistent with the ERA5 Dataset. W e also collecte d and pr ocessed the China Meteorological Ad- ministration’s Land Data System data (CLD AS), which oers 0.01 ° resolution meteorological elds over East A sia (0–65 ° N, 60–160 ° E). The regional analysis dataset is used to train and evaluate regional weather forecasting model. The dataset includes 7 critical surface variables: wind components (U, V), temperature (T), specic hu- midity (Q ), pressure (P), radiation uxes (SSRD), and total cloud cover (TCC). W e normalize all the inputs via z-score normalization for each variable at each pressure level. Also, w e apply the inverse normalization for the predictions of future states for performance evaluation. B.1 ERA5 and op erational anlysis with 0 . 25 ° resolution we sele cted 6 atmospheric variables at all 13 pressure levels , 3 surface variables, and 3 static variables for the ERA5 dataset with 0 . 25 ° resolution, as detailed in T able 6. In our model training, we choose all variables as input variables, and all variables except three static variables as output variables that are used for loss calculation to pretrain global model M global . B.2 CLD AS with 0 . 05 ° r esolution W e selected 7 surface variables for the CLDAS dataset with 0 . 25 ° resolution, as detailed in T able 7. In our model training, we choose all variables as input variables, and all variables as output variables that are used for loss calculation to train global-regional model M global − regional . 3 More details of ERA5 data can be found in https://conuence.ecmwf.int/display/CKB/ ERA5%3A +data+documentation. C Evaluation Metrics for Regional W eather Forecasting This se ction provides detaile d explanations of all the evaluation met- rics for regional weather forecasting used in the main experiments. For each metric, 𝒖 and ˆ 𝒖 represent the predicted and ground truth values, respectively , both shape d as ℎ × 𝑤 × 𝑉 reg , where 𝑉 reg is the number of total weather factors, and ℎ × 𝑤 is the spatial resolution of latitude ( ℎ ) and longitude ( 𝑤 ). T o account for the non-uniform grid cell areas, the latitude weighting term 𝛼 ( ·) is introduced. Latitude-weighted Root Mean Square Error (RMSE) . assesses model accuracy while considering the Earth’s curvature. The lati- tude weighting adjusts for the varying grid cell ar eas at dierent latitudes, ensuring that errors are appropriately measured. Lower RMSE values indicate better model performance. RMSE = 1 𝑉 reg 𝑉 reg  𝑘 = 1 v u t 1 ℎ𝑤 ℎ  𝑖 = 1 𝑤  𝑗 = 1 𝛼 ( 𝑖 )  ˆ 𝒖 𝑖 , 𝑗 ,𝑘 − 𝒖 𝑖 , 𝑗 ,𝑘  2 , 𝛼 ( 𝑖 ) = cos ( lat ( 𝑖 ) ) 1 ℎ Í ℎ 𝑖 ′ = 1 cos ( lat ( 𝑖 ′ ) ) . A nomaly Correlation Coecient (A CC) . measures a model’s ability to predict deviations from the mean. Higher ACC values indicate better accuracy in capturing anomalies, which is crucial in meteorology and climate science. A CC = Í 𝑖 , 𝑗 ,𝑘 ˆ 𝒖 ′ 𝑖 , 𝑗 ,𝑘 𝒖 ′ 𝑖 , 𝑗 ,𝑘  Í 𝑖 , 𝑗 ,𝑘 𝛼 ( ℎ ) ( ˆ 𝒖 ′ 𝑖 , 𝑗 ,𝑘 ) 2 Í 𝑖 , 𝑗 ,𝑘 𝛼 ( ℎ ) ( 𝒖 ′ 𝑖 , 𝑗 ,𝑘 ) 2 , where 𝒖 ′ = 𝒖 − 𝐶 and ˆ 𝒖 ′ = ˆ 𝒖 − 𝐶 , with climatology 𝐶 representing the temporal mean of the same period over the training set. D Broader Impacts This research focuses on high-resolution regional w eather fore- casting, which has an essential inuence on relevant elds such as energy , transportation, and agriculture. A s an AI application for so- cial good, our model boosts predictions for various weather factors such as temperature, wind speed, and radiation ux. It is essential to note that our work fo cuses solely on scientic issues, and we also ensure that ethical considerations are carefully taken into account. Thus, we believe that there is no ethical risk associated with our research. E Visualization of Forecasts T o intuitively demonstrate the forecasting capacity of our model, we present the show cases of weather forecasting results in China and zoomed-in regions in Figures 5, 6, 7 8, 9, 10, 11, 12, 13, 14, 15, and 16. F Comparison with W eatherbench2’s Baselines The global branch of our framework M 𝑔𝑙𝑜 𝑏𝑎𝑙 demonstrates promis- ing performance that aligns with top-tier AI weather models. Fig- ure 17 shows RMSE and A CC for some key variables of our global Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA T able 5: Default hyp erparameters of the framework. Module Hyperparameter Description V alue Global Model M global 𝑃 Patch size of global tokens 6 𝑑 hidden dimension 1536 𝑀 Number of Transformer encoder layers of in the global model 24 Heads Number of attention heads 8 MLP ratio Expansion factor for MLP 4 . Depth of prediction head Number of deconvolution layers of the nal prediction head 2 Drop path Stochastic depth rate 0 . 1 Dropout Dropout rate 0 . 1 Regional Model M regional 𝑝 Patch size of regional tokens 30 𝑑 hidden dimension 1536 𝑘 Number of Transformer encoder layers of in the regional model 4 Heads Number of attention heads 8 MLP ratio Expansion factor for MLP 4 . Depth of prediction head Number of deconvolution layers of the nal prediction head 2 Drop path Stochastic depth rate 0 . 1 Dropout Dropout rate 0 . 1 ScaleMixer Depth total number of ScaleMix modules 4 Depth of position identication block number of convolution layers in position identication block 1 Kernel size kernel size of convolution layers in position identication block 3 𝑚 number of key positions 64 model and Pangu- W eather [ 2 ], Fuxi [ 5 ], Graphcast [ 17 ] and IFS- HRES [ 9 ], reported on W eatherbench2 4 . It is evaluated on 2022 ERA5 dataset, with lead time ranging from 6 to 240 hours. For both surface variables and pressure level variables, our model out- performs Pangu- W eather , Granphcast and EC-IFS and is compa- rable to Fuxi in most cases. Notably , our global model achieves an average improv ement of 12.59%, 0.90%, 3.56%, 16.30% on RMSE over pangu-weather , fuxi, graphcast, EC-IFS, respectively , and an average improvement of 8.07%, 0.87%, 4.32%, 9.51% on ACC over pangu-weather , fuxi, graphcast, EC-IFS, respectively . 4 https://sites.research.google/gr/weatherbench/deterministic-scores/ Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun T able 6: Summar y of ECMWF variables utilized in the ERA5 and operational analysis dataset with 0 . 25 ° r esolution. The variables 𝑙 𝑠𝑚 and 𝑜 𝑟 𝑜 ar e constant and invariant with time. T ype V ariable Name Abbrev . Description Pressure Levels Static V ariable Land-sea mask 𝑙 𝑠𝑚 Binary mask distinguishing land (1) from sea (0) N/A Orography 𝑜 𝑟 𝑜 Height of Earth’s surface N/A Latitude 𝑙 𝑎𝑡 Latitude of each grid point N/A Surface V ariable 2 metre temperature 𝑡 2 𝑚 T emperature measured 2 meters above the surface Single level 10 metre U wind component 𝑢 10 East-west wind speed at 10 meters above the surface Single level 10 metre V wind component 𝑣 10 North-south wind sp eed at 10 meters above the surface Single level Mean sea level presure 𝑚𝑠𝑙 Pressure of the atmosphere adjusted to the height of mean sea level Single level Surface pressure 𝑠 𝑝 Pressure of the atmospher e on the surface of land, sea and in- land water Single level 2 metre dewpoint temperature 𝑑 2 𝑚 T emperature to which the air , at 2 metres above the surface of the Earth Single level Upper-air V ariable Geopotential 𝑧 Height relative to a pressur e level 50 , 100 , 150 , 200 , 250 , 300 , 400 , 500 , 600 , 700 , 850 , 925 , 1000 hPa U wind component 𝑢 Wind speed in the east-west direction 50 , 100 , 150 , 200 , 250 , 300 , 400 , 500 , 600 , 700 , 850 , 925 , 1000 hPa V wind component 𝑣 Wind sp eed in the north-south direction 50 , 100 , 150 , 200 , 250 , 300 , 400 , 500 , 600 , 700 , 850 , 925 , 1000 hPa T emperature 𝑡 Atmospheric temperature 50 , 100 , 150 , 200 , 250 , 300 , 400 , 500 , 600 , 700 , 850 , 925 , 1000 hPa Specic humidity 𝑞 Mixing ratio of water vapor to total air mass 50 , 100 , 150 , 200 , 250 , 300 , 400 , 500 , 600 , 700 , 850 , 925 , 1000 hPa T able 7: Summar y of variables utilized in CLDAS with 0 . 05 ° resolution. T yp e V ariable Name Abbrev . Description Surface V ariable 2 metre temperature 𝑇 T emperature measured 2 meters above the surface 10 metre U wind component 𝑈 East-w est wind speed at 10 meters above the surface 10 metre V wind component 𝑉 North-south wind speed at 10 meters ab ove the surface Surface specic humidity 𝑄 Mixing ratio of water vapor to total air mass at 2 meters above the surface Surface pressure 𝑃 Pressur e of the atmosphere on the surface of land, sea and in-land water T otal cloud cover 𝑇 𝐶𝐶 Cloud occurring at dierent model levels through the atmospher e Radiation ux (surface solar radiation ux downwards) 𝑆 𝑆 𝑅 𝐷 Flux of solar radiation that r eaches a horizontal plane at the surface of the Earth Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA 2024/11/19 23 UTC (Forecast lead time = 11) 2024/11/19 11 UTC (Forecast lead time = 23) Figure 5: 2 metre temperature forecasts over China Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun 2024/11/19 23 UTC (F orecast lead time = 11) 2024/11/19 11 UTC (F orecast lead time = 23) Figure 6: 2 metre temperature forecasts over a subregion of latitudes in [ 30 , 40 ] and longitudes in [ 95 , 115 ] Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA 2024/11/19 23 UTC (Forecast lead time = 11) 2024/11/19 11 UTC (Forecast lead time = 23) Figure 7: surface pressure forecasts ov er China Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun 2024/11/19 23 UTC (Forecast lead time = 11) 2024/11/19 11 UTC (Forecast lead time = 23) Figure 8: surface pressure forecasts ov er a subregion of latitudes in [ 30 , 40 ] and longitudes in [ 95 , 115 ] Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA 2024/11/19 23 UTC (Forecast lead time = 11) 2024/11/19 11 UTC (Forecast lead time = 23) Figure 9: 10 metre Wind speed U component forecasts ov er China Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun 2024/11/19 23 UT C (Forecast lead time = 11) 2024/11/19 11 UT C (Forecast lead time = 23) Figure 10: 10 metre wind speed U component forecasts ov er a subregion of latitudes in [ 16 . 3 , 26 ] and longitudes in [ 115 , 125 ] Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA 2024/11/19 23 UTC (Forecast lead time = 11) 2024/11/19 11 UTC (Forecast lead time = 23) Figure 11: 10 metre Wind speed V component forecasts ov er China Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun 2024/11/19 23 UTC (Forecast lead time = 11) 2024/11/19 11 UTC (Forecast lead time = 23) Figure 12: 10 metre wind speed V component forecasts ov er a subregion of latitudes in [ 16 . 3 , 26 ] and longitudes in [ 115 , 125 ] Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA 2025/01/19 2 UTC (Forecast lead time = 14) 2025/01/19 4 UTC (Forecast lead time = 16) Figure 13: surface solar radiation downwards forecasts over China Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun 2025/01/19 2 UTC (Forecast lead time = 14) 2025/01/19 4 UTC (Forecast lead time = 16) Figure 14: surface solar radiation downwards forecasts over a subr egion of latitudes in [ 35 , 50 ] and longitudes in [ 115 , 135 ] Skillful Kilometer-Scale Regional W eather Forecasting via Global and Regional Coupling Conference’17, July 2017, W ashington, DC, USA 2025/01/18 13 UTC (Forecast lead time = 1) 2025/01/18 20 UTC (Forecast lead time = 8) Figure 15: total cloud cover forecasts over China Conference’17, July 2017, W ashington, DC, USA W eiqi Chen, W enwei W ang, Qilong Yuan, Lefei Shen, Bingqing Peng, Jiawei Chen, Bo Wu, and Liang Sun 2025/01/18 13 UTC (Forecast lead time = 1) 2025/01/18 20 UTC (Forecast lead time = 8) Figure 16: total cloud cover forecasts over a subr egion of latitudes in [ 35 , 50 ] and longitudes in [ 115 , 135 ] Figure 17: RMSE and A CC for some key variables of various AI-based weather models on W eatherbench2

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment