Anisotropic Permeability Tensor Prediction from Porous Media Microstructure via Physics-Informed Progressive Transfer Learning with Hybrid CNN-Transformer
Accurate prediction of permeability tensors from pore-scale microstructure images is essential for subsurface flow modeling, yet direct numerical simulation requires hours per sample, fundamentally limiting large-scale uncertainty quantification and …
Authors: Mohammad Nooraiepour
Anisotropic P ermeabilit y T ensor Prediction from P orous Media Microstructure via Ph ysics-Informed Progressiv e T ransfer Learning with Hybrid CNN-T ransformer Mohammad No oraiep our 1 , ∗ 1 F acult y of Mathematics and Natural Sciences, Universit y of Oslo, PO Box 1047 Blindern, 0316 Oslo, Norwa y ∗ Corresp onding author: monoo@uio.no Abstract A ccurate prediction of p ermeabilit y tensors from p ore-scale microstructure images is es- sen tial for subsurface flow mo deling, yet direct numerical sim ulation requires hours p er sample—fundamen tally limiting large-scale uncertaint y quantification and reservoir optimiza- tion workflo ws. A physics-informed deep learning framework is presented that resolves this b ottlenec k by combining a MaxViT h ybrid CNN-T ransformer architecture with progressive transfer learning and differentiable physical constraints. MaxViT’s m ulti-axis attention mec hanism simultaneously resolves grain-scale p ore-throat geometry through blo c k-lo cal op erations and REV-scale connectivity statistics through grid-global op erations, providing the spatial hierarch y that p ermeabilit y tensor prediction physically requires. T raining on 20,000 synthetic p orous media samples spanning three orders of magnitude in permeabil- it y , a three-phase progressiv e curriculum adv ances from an ImageNet-pretrained baseline with D4-equiv ariant augmen tation and tensor transformation, through comp onen t-weigh ted loss prioritizing off-diagonal coupling, to frozen-backbone transfer learning with p orosity conditioning via F eature-wise Linear Modulation (FiLM). Onsager recipro cit y and p ositive- definiteness are enforced through differen tiable p enalt y terms, ac hieving mean symmetry error ε sym = 3 . 95 × 10 − 7 and 100 % thermo dynamic v alidity without p ost-hoc correction. On a held-out test set of 4,000 samples, the framework achiev es v ariance-w eighted R 2 = 0 . 9960 ( R 2 K xx = 0 . 9967 , R 2 K xy = 0 . 9758 ), a 33 % reduction in unexplained v ariance ov er the super- vised baseline, with ev ery reported metric improving monotonically across all three phases. Inference requires 120 ms per sample—a significant speedup ov er lattice-Boltzmann simula- tion that enables real-time p ermeability c haracterization during core scanning, Mon te Carlo uncertain ty quantification on a single GPU, and rapid multi-scenario screening for CO 2 storage and geo-energy applications. The results establish three transferable principles for ph ysics-informed scien tific mac hine learning: large-scale visual pretraining transfers effectively across domain b oundaries; ph ysical constrain ts are most robustly integrated as differentiable arc hitectural comp onen ts; and progressive training guided by diagnostic failure-mo de analysis enables unambiguous attribution of p erformance gains across metho dological stages. Keyw ords: P ermeability tensor prediction; Porous media; Physics-informed mac hine learning; Hybrid CNN-T ransformer; Progressiv e transfer learning; Anisotrop y; Uncertain ty quan tification; Subsurface flow c haracterization. 1 In tro duction Fluid flow through p orous geological formations go verns a broad range of pro cesses of direct so cietal and engineering imp ortance, including the managemen t of geo-energy resources, ground- w ater systems, geological CO 2 and H 2 storage, and subsurface contaminan t transp ort [ 1 – 7 ]. 1 A ccurate characterization of p orous media transp ort prop erties is a prerequisite for reservoir managemen t decisions, op erational performance and risk assessmen t for geological storage sites, aquifer remediation strategies, and large-scale probabilistic uncertaint y quan tification workflo ws that require millions of forward ev aluations. At the p ore scale, fl uid motion is go v erned b y the Na vier-Stokes equations. The p ermeability tensor K emerges as the fundamental macroscopic transp ort prop ert y relating the pressure gradien t to the Darcy velocity through Darcy’s law: q = − K µ ∇ p, (1) where q is the Darcy velocity vector, µ is dynamic viscosity , and ∇ p is the pressure gradient. F or t wo-dimensional p orous media, K is a 2 × 2 symmetric p ositiv e-definite matrix enco ding directional flow preferences and geometric tortuosit y . Computing K from p ore-scale images through direct n umerical sim ulation (DNS) of the Stokes equations, emplo ying methods suc h as lattice-Boltzmann, finite-v olume, or finite-element sc hemes [ 8 – 11 ], yields highly accurate results but incurs computational costs of hours to da ys per sample even on high-p erformance hardw are [ 12 , 13 ]. This expense sev erely constrains applications requiring thousands to millions of p ermeability ev aluations. Recen t adv ances in deep learning hav e demonstrated that neural netw orks trained on suffi- cien tly large labeled datasets deriv ed from pore-scale images can predict p ermeabilit y tensors with exceptional accuracy while reducing inference time from hours to milliseconds [ 14 – 17 ]. Early in vestigations established that conv olutional neural netw orks (CNNs) can learn effective represen- tations of p ore geometry directly from image data, enabling rapid p ermeabilit y prediction [ 18 – 21 ]. Subsequen t con tributions adv anced these tec hniques through multi-scale arc hitectures and expanded applicability to diverse pore morphologies, [ 22 – 25 ]. More recently , ph ysics-informed neural netw orks ha ve b een in vestigated to enforce adherence to underlying ph ysical principles within the mo del [ 25 – 29 ], and graph neural netw orks h a v e b een applied to capture the top ological features and connectivit y of p ore net w orks [ 30 – 32 ]. These developmen ts underscore the growing p oten tial of integrating domain knowledge into data-driv en neural architectures [ 33 – 35 ]. The ar- c hitectural foundation of the presen t w ork, a hybrid CNN-T ransformer design (MaxViT, detailed in Section 2.2 ), is motiv ated b y these adv ances and their demonstrated capacity for learning hierarc hical spatial features from binary porous media images while resp ecting the m ulti-scale nature of permeability-geometry relationships. While deep learning has emerged as the predominan t approac h for image-based p ermeabilit y prediction, the broader machine learning landscap e offers established regression paradigms that merit explicit consideration as alternativ es. Gaussian pro cess regression (GPR) provides principled Ba yesian uncertain ty quan tification through kernel-based mo deling of function smo othness [ 36 – 38 ]. Ensem ble methods suc h as random forests and gradien t bo osting [ 39 – 42 ] aggregate predictions from multiple w eak learners for robustness against noise. Supp ort vector regression employs k ernel metho ds to handle nonlinear relationships in mo derate-dimensional feature spaces [ 43 – 45 ], and regression trees offer transparen t hierarc hical partitioning [ 41 , 46 ]. Ho w ever, these metho ds face fundamen tal challenges at the scale of high-resolution p orous media images. The input dimensionalit y of 128 × 128 = 16 , 384 pixels creates a sev ere curse of dimensionalit y for metho ds op erating on flattened feature v ectors: GPR scales cubically with the num ber of training samples and requires explicit kernel design in this high-dimensional space; supp ort v ector regression similarly demands careful k ernel selection in the same regime; and tree-based metho ds struggle to capture smo oth spatial structure without extreme depth. More fundamen tally , these approac hes t ypically require hand-crafted feature engineering to reduce dimensionality , extracting quan tities suc h as p orosit y , p ore-size distributions, tortuosity , or correlation functions, whic h demands domain exp ertise, in tro duces information b ottlenec ks, and may discard spatial patterns not an ticipated by the feature designer. P ermeability dep ends nonlinearly on multi-scale spatial structure, from grain-scale p ore-throat geometry to domain-scale connectivity , relationships that are difficult to capture through linear combinations of scalar features or axis-aligned decision 2 b oundaries. Deep conv olutional architectures address these limitations through hierarchical spatial feature learning [ 47 – 49 ]: con volutional lay ers with lo cal receptiv e fields and weigh t sharing enco de translation equiv ariance and spatial smo othness priors appropriate to image data; p ooling op erations implement multi-scale aggregation without explicit feature engineering; and end-to-end differen tiable training enables direct optimization of the full input-to-output mapping. F or the present task, predicting a 2 × 2 p ermeability tensor from binary microstructure images spanning three orders of magnitude in p ermeability , empirical studies demonstrate that deep CNNs consistently achiev e strong b enc hmark p erformance [ 19 , 22 , 24 ], while comparable results with non-deep metho ds w ould require extensiv e manual feature extraction and domain-sp ecific k ernel design. Despite these successes, existing deep learning approaches for permeability prediction face sev eral p ersisten t limitations. The first concerns architecture. Methods based on pure CNNs excel at capturing lo cal geometric features and fine-grained in terface details through translation equiv ariance and lo cal connectivit y , but struggle with long-range spatial dep endencies critical for understanding global pore connectivit y and anisotropic coupling [ 50 – 53 ]. Con versely , pure Vision T ransformers (ViT s) capture global context through self-atten tion but incur quadratic computational complexit y with image size and lac k the inductive biases that make CNNs effective for structured spatial data [ 54 – 57 ]. This architectural tension is esp ecially consequential for off-diagonal p ermeability comp onents K xy and K y x , which enco de geometric anisotropy arising from p ore channels orien ted obliquely to the principal coordinate axes. A ccurate prediction of these comp onents requires simultaneously resolving fine-scale p ore-throat geometry at length scales of individual grains (roughly 10–100 pixels), go verning lo cal flow resistance, and long-range spatial correlations spanning the full image domain, determining how preferential flow paths align and couple across directions. The characteristic spatial correlation length gov erning off-diagonal coupling is typically comparable to the representativ e elemen tary v olume (REV) size, which for sandstone microstructures can span the en tire image domain. Pure CNNs cannot in tegrate information ov er these scales without v ery deep architectures that sacrifice fine-scale resolution, while pure ViT s lac k the inductiv e biases to resolve sharp solid-fluid in terfaces efficien tly . The second p ersisten t limitation concerns physical v alidit y . Permeabilit y tensors must satisfy fundamen tal ph ysical requiremen ts: symmetry ( K xy = K y x ) arising from Onsager recipro cit y , and p ositive-definiteness ensuring thermodynamic consistency . Most existing approaches enforce these constraints only w eakly through data characteristics or post-ho c pro jection, rather than in- corp orating them directly into the training ob jectiv e. F urthermore, isotropic p orous media exhibit natural geometric symmetries under rotations and reflections. Existing augmen tation strategies either ignore these symmetries en tirely or apply image transformations without corresp onding tensor transformations, creating training inconsistencies that degrade mo del p erformance and in tro duce ph ysical violations. It should b e noted that enforcing symmetry through data augmen- tation, as opposed to building equiv ariance into the netw ork architecture, yields approximate equiv ariance that is not guaranteed to hold exactly at inference time; the implications of this distinction are discussed in Section 2 . The c hallenge of predicting off-diagonal tensor comp onents remains particularly acute, with rep orted accuracy often substan tially low er than for diagonal comp onen ts, as these elements enco de subtle geometric anisotropies requiring global spatial in tegration. Hybrid CNN-T ransformer architectures hav e emerged as a promising direction for simul- taneously leveraging lo cal and global feature extraction [ 58 – 63 ]. Among these, MaxViT [ 64 ] in tro duces a m ulti-axis self-attention mechanism that decomp oses standard tw o-dimensional self-atten tion in to sequen tial blo c k-lo cal and grid-global op erations. This decomp osition reduces computational complexity from O ( H 2 W 2 ) to O ( H W ) while preserving a global receptiv e field, a critical property for high-resolution scien tific imaging. F rom a ph ysical standp oin t, blo c k- lo cal atten tion resolv es grain-scale p ore-throat geometry at length scales relev an t to lo cal flo w resistance, while grid-global attention in tegrates information across the full image domain for 3 off-diagonal coupling. This architectural decomp osition directly mirrors the tw o-scale nature of the p ermeability prediction problem: lo cal geometry determines individual flo w path con- ductances, while global top ology determines ho w those conductances combine anisotropically in to the full tensor. MaxViT has ac hieved state-of-the-art results across diverse computer vision b enc hmarks [ 64 – 66 ] and demonstrates particular strength in transfer learning scenarios with limited lab eled data, precisely the condition encountered in scien tific imaging applications where ground-truth lab els require computationally exp ensiv e ph ysics-based simulation. Despite these adv antages, MaxViT has not been previously applied to physics-informed prediction tasks or p orous media characterization. Prior to adopting sup ervised transfer learning, a systematic inv estigation of self-sup ervised pretraining strategies for binary porous media images w as conducted, focusing on Mask ed A uto enco der (MAE) approaches [ 67 – 69 ]. Con trastive metho ds such as SimCLR and MoCo w ere also considered but their reliance on semantic augmentation in v ariances, including color jitter and crop-based views, cannot b e motiv ated for binary images with no color information and where cropping ma y discard the globally connected p ore structure that constitutes the primary signal of in terest. The MAE exp eriments rev ealed fundamental incompatibilities betw een MAE’s design assumptions and the c haracteristics of binary p orous media. MAE relies on smo oth pixel gradien ts, high spatial redundancy , and contin uous-v alued image statistics, prop erties abundan t in natural images but absen t in binary p orous media characterized by sharp discontin uities at solid-fluid interfaces, low geometric redundancy , and discrete pixel v alues. Despite extensiv e arc hitectural exploration spanning masking ratios (25–75 %), deco der depths (1–8 lay ers), and reconstruction targets (binary v ersus distance transform), all MAE configurations failed to ac hieve usable reconstruction qualit y (best PSNR: 14.4 dB v ersus a required threshold of > 20 dB for meaningful feature learning, established by ev aluating fine-tuned regression head p erformance at v arying PSNR levels against a randomly initialized baseline). This negative result provides an imp ortan t insigh t: effective represen tation learning for binary porous media requires sup ervision from physically meaningful lab els rather than pixel-lev el reconstruction objectives. Consequen tly , the framework developed here fo cuses on sup ervised transfer learning from ImageNet-pretrained features, which provide robust low- and mid-lev el visual representations that transfer effectiv ely despite the domain difference. This negative result for self-sup ervised pretraining also directly motiv ated the progressive transfer learning strategy detailed in Section 2.3 , which lev erages ImageNet-pretrained features as a robust initialization and systematically refines them through three phases of task-sp ecific adaptation. The framework prop osed in this work addresses the limitations identified abov e through a systematic combination of architectural selection, ph ysics-informed training ob jectiv es, rigorous geometric augmentation, and progressive transfer learning. The MaxViT hybrid arc hitecture is applied for the first time to p ermeability tensor prediction, with its multi-axis self-attention mech- anism providing the spatial hierarch y needed for simultaneous fine-scale p ore-throat resolution and global connectivity integration. A ph ysics-aw are loss function is developed that explicitly en- forces tensor symmetry and p ositiv e-definiteness through differentiable p enalt y terms, eliminating the need for p ost-ho c pro jection. A D4-equiv arian t augmentation strategy is implemen ted that consisten tly transforms b oth microstructure images and their asso ciated p ermeabilit y tensors through prop er tensor transformation matrices, removing the training inconsistencies present in naive augmentation approac hes. While this augmentation-based approach ac hieves strong empirical equiv ariance, it is ackno wledged that it do es not provide architectural guaran tees of exact equiv ariance at inference time, in contrast to group equiv ariant netw orks [ 70 – 72 ]; this distinction is discussed in Section 2 . A three-phase progressive training strategy is proposed: Phase 2 establishes a sup ervised baseline with ImageNet-pretrained MaxViT, progressive unfreez- ing, and D4-equiv ariant augmentation; Phase 3 extends this through calibrated morphological and elastic deformations with enhanced loss weigh ting that prioritizes off-diagonal tensor ele- men ts; Phase 4 freezes the learned backbone while in tro ducing p orosity-conditioned F eature-wise 4 Linear Mo dulation (FiLM) [ 73 ], combined with ensem ble tec hniques including Sto c hastic W eigh t A veraging [ 74 ] and Exp onen tial Mo ving A v erage [ 75 ]. The resulting framew ork attains an R 2 of 0.9960 on a diverse dataset of 20,000 p orous media samples spanning three orders of magnitude in p ermeability , along with a significant reduction in prediction error relativ e to the sup ervised baseline, while reducing computational cost from hours p er sample (DNS) to approximately 120 ms p er prediction (inference). The main con tributions of this work are: 1. Ph ysics-motiv ated hybrid arc hitecture. Systematic application of the MaxViT h ybrid CNN- T ransformer to p ermeabilit y tensor prediction is presented, with an explicit ph ysical justifi- cation connecting blo c k-lo cal atten tion to p ore-throat geometry resolution and grid-global atten tion to the long-range connectivit y in tegration gov erning off-diagonal tensor coupling. 2. Differen tiable ph ysics-informed loss. A loss function is developed that enforces tensor symmetry and p ositiv e-definiteness through differen tiable p enalty terms, yielding near-machine-precision symmetry and 100 % thermo dynamic v alidit y without p ost-pro cessing. 3. D4-equiv ariant augmentation with rigorous tensor transformation. An augmen tation strategy is introduced that consistently transforms microstructure images and their asso ciated p erme- abilit y tensors under the D4 symmetry group, eliminating training inconsistencies that arise when image and lab el transformations are mismatched. 4. Progressiv e three-phase transfer learning. A structured training curriculum to transition from frozen ImageNet features to full fine-tuning and finally to p orosity-conditioned FiLM adaptation, with eac h phase pro viding ph ysically interpretable p erformance gains. 5. Computational efficiency . The framew ork reduces permeability tensor inference time from hours (DNS) to approximately 120 ms, enabling practical deploymen t in workflo ws requiring large-scale ensemble ev aluations for uncertaint y quan tification and reserv oir c haracterization. The remainder of this pap er is organized as follo ws. Section 2 describes the dataset genera- tion and statistical c haracteristics, the MaxViT h ybrid CNN-T ransformer architecture and its adaptation for permeability tensor prediction, the physics-a w are loss formulation incorp orating symmetry and p ositiv e-definiteness constrain ts, the D4-equiv ariant data augmentation strat- egy with rigorous tensor transformation, and the three-phase progressive training curriculum. Section 3 presents exp erimental results across all training phases, comp onent-wise prediction accuracy analysis, ph ysical constrain t v alidation, comparative ev aluation against baseline meth- o ds, and detailed discussion of failure mo des and limitations. Section 4 summarizes the key findings, contributions to physics-informed mac hine learning for p orous media characterization, and implications for computational s ubsurface flo w mo deling. 2 Materials and Metho ds 2.1 Dataset and Problem F ormulation 2.1.1 P orous Media Image Dataset The dataset comprises 24,000 binary p orous media images generated via sequential Gaussian sim ulation, designed to replicate realistic sandstone p ore geometries b y V argdal et al. [ 76 , 77 ]. Eac h image is a strictly binary 128 × 128 pixel array in which v alues of 1 represent solid grain and v alues of 0 represent p ore space, with p orosity defined as ϕ = 1 − mean( I ) . Porosit y spans ϕ ∈ [0 . 227 , 0 . 900] with mean ¯ ϕ = 0 . 711 ± 0 . 124 across all splits; the distribution is moderately left-sk ewed (skewness ≈ − 0 . 55 ) and significantly non-normal (Shapiro-Wilk p ≈ 0 ), reflecting the p ost-pro cessing describ ed b elo w. F ull descriptiv e statistics are provided in T able 3 (App endix A ). 5 A ccording to [ 76 ], raw-generated images ma y con tain isolated fluid clusters that are completely enclosed by solid matrix and therefore disconnected from the main flo w netw ork. Because these regions con tribute neither to flow nor to permeability , an established p ost-processing step identified and remov ed them through connectivity analysis under p erio dic b oundary conditions. The resulting filled images represen t the connected-p ore-netw ork p orosity , eliminated stagnan t-region no des from subsequen t flo w sim ulations, and formed the basis for all p ermeabilit y computations and mo del training rep orted here [ 76 , 77 ]. This filling pro cedure, while physically motiv ated, may shift the effectiv e p orosit y distribution tow ard well-connected p orous media, placing the dataset mean substantially ab ov e the typical range for natural sandstone reservoirs. Predictions for tigh t or p o orly connected samples therefore reside near the low er tail of the training distribution and are exp ected to carry higher relative uncertaint y; the corresp onding failure-mo de analysis is pro vided in App endix H.4 . The full 2 × 2 p ermeabilit y tensor for eac h image was computed via lattice-Boltzmann (LBM) sim ulation of single-phase Stokes flow at lo w Reynolds num b er [ 76 ]. Bo dy forces applied in orthogonal directions and volume-a v eraged fluid velocities yield the p ermeabilit y tensor through Darcy’s la w, expressed in lattice units of order unity . Diagonal p ermeabilities span three orders of magnitude in log 10 -space; off-diagonal components are centered near zero and exhibit hea vy tails (kurtosis ≈ 9 –13), reflecting the predominan tly isotropic c haracter of the media. All ground-truth tensors are 100 % p ositive-definite, and tensor symmetry is preserv ed to ε sym < 8 × 10 − 6 b y the LBM solver. Complete LBM implementation and v alidation are documented in [ 76 , 77 ]. Prior to mo del developmen t, a comprehensive dataset characterization study was conducted co vering microstructure geometry , permeability tensor statistics, porosity-permeability rela- tionships, cross-split distributional consistency , anisotrop y analysis, and outlier identification. K olmogorov-Smirno v tests and Jensen-Shannon divergence measuremen ts confirmed that training, v alidation, and test splits are drawn from statistically indistinguishable distributions across all v ariables, v alidating the random partitioning proto col. These analyses are rep orted in full in App endix A . 2.1.2 Problem F orm ulation P ermeability prediction is form ulated as a sup ervised regression task, f θ : R 128 × 128 → R 2 × 2 , (2) where f θ denotes a neural net work with parameters θ mapping binary p orous media images to symmetric p ositive-definite permeability tensors: K = " K xx K xy K xy K y y # , K xx , K y y > 0 . (3) The netw ork pro duces a four-elemen t output [ K xx , K xy , K y x , K y y ] , reshap ed to a 2 × 2 matrix at ev aluation time. T wo physical constrain ts must b e satisfied: K xy = K y x ( tensor symmetry ) , (4) K xx , K y y > 0 ( p ositiv e diagonal ) . (5) These constraints are incorp orated through the ph ysics-aw are loss function (Section 2.4 ) and veri- fied p ost ho c on the held-out test set. The p ositivit y p enalty in the loss function enforces p ositiv e diagonal elements in the predicted p ermeabilit y tensor, while full positive-definiteness—although not enforced b y construction—is consisten tly v erified empirically across all samples in the held- out test set (Section 2.6.2 ). The high input dimensionalit y ( 128 × 128 = 16 , 384 pixels) and strongly nonlinear, multi-scale structure-prop ert y relationships motiv ate end-to-end differen tiable arc hitectures that learn spatial hierarchies directly from raw images, in preference to metho ds requiring hand-crafted dimensionalit y reduction prior to regression. 6 2.1.3 Data P artitioning and Ev aluation Proto col Of the 24,000 total images, 20,000 carry computed p ermeability labels and are partitioned by random sampling in to a training set (16,000 samples, 80 %) and a v alidation set (4,000 samples, 20 %). An indep endently generated set of 4,000 lab eled samples from the same distribution constitutes the held-out test set, nev er accessed during mo del dev elopment, hyperparameter selection, or training decisions. The remaining 4,000 unlab eled images, initially reserved for self- sup ervised pretraining, w ere found to confer no b enefit for binary p orous media (see Introduction) and are excluded from all rep orted results. Strict separation b etw een all three splits is main tained throughout. The v alidation set is used exclusiv ely for mo del c heckpointing and early stopping based on v alidation R 2 . A ugmen tation parameters and loss-function weigh ts were determined through preliminary grid searc hes on the v alidation set, conducted indep endently of the final training runs, to preclude implicit test-set leakage. Reported test-set results therefore constitute unbiased estimates of generalization to unseen p orous media samples. 2.2 Mo del Arc hitecture 2.2.1 MaxViT Bac kb one The backbone is a Multi-Axis Vision T ransformer (MaxViT-Base) [ 64 ], a hierarchical hybrid arc hitecture com bining depth wise-separable conv olutions for lo cal feature extraction with m ulti- axis self-attention for global context mo deling. The implementation follo ws the timm library [ 78 , 79 ], using weigh ts pretrained on ImageNet-21K and subsequently fine-tuned on ImageNet-1K, pro viding robust lo w- and mid-lev el visual features that transfer effectively across the domain gap to porous media images. T wo adaptations are applied to accommo date the target domain. First, the input is re- configured to single-channel grayscale ( in_chans=1 ), modifying the first conv olutional la yer’s input c hannels while preserving pretrained spatial filters via c hannel av eraging. Second, the input resolution is sp ecified as 128 × 128 ( img_size=128 ), triggering automatic interpolation of p ositional embeddings and spatial-dimension adjustment across all hierarc hical stages. The backbone comprises four stages with progressively increasing channel dimensionality and decreasing spatial resolution. Eac h stage con tains m ultiple MaxViT blo c ks, each integrating an MBCon v mo dule (depthwise-separable conv olution with squeeze-and-excitation) follow ed b y t wo m ulti-axis self-attention mo dules: one applying blo c ked lo cal partitioning within non-ov erlapping P × P windo ws ( P = 8 ), and one applying dilated grid-global atten tion across block p ositions. This decomp osition reduces self-atten tion complexity from O ( H 2 W 2 ) to O ( H W ) while preserving a global receptiv e field. Stage dimensions, blo ck coun ts, and attention configurations sp ecific to the 128 × 128 input are detailed in Appendix B . After the final stage, spatial features are aggregated b y global av erage p o oling to yield a 768-dimensional feature v ector f enco ding global p ore geometry: f = 1 H final · W final H final X i =1 W final X j =1 x stage4 ij . (6) A t wo-la y er MLP regression head maps this represen tation to the four-elemen t p ermeability output (Phases 2 and 3): h 0 = La yerNorm( f ) , (7) h 1 = Drop out 0 . 1 (GELU(Linear 768 → 256 ( h 0 ))) , (8) h 2 = Drop out 0 . 1 (GELU(Linear 256 → 128 ( h 1 ))) , (9) K pred = Linear 128 → 4 ( h 2 ) . (10) 7 The complete architecture comprises approximately 119.01 million parameters: the MaxViT-Base bac kb one (con v olutional stem and four hierarc hical stages, with the classification head remov ed) con tributes 118.78M, and the regression head con tributes 0.23M. The p orosit y enco der and FiLM conditioning lay ers introduced in Phase 4 (Section 2.3.3 ) add a further 0.11M parameters. 2.2.2 P orosit y-Conditioned F eature Mo dulation Phase 4 in tro duces an explicit ph ysics-informed conditioning mechanism in which scalar p orosity mo dulates backbone feature representations, enco ding the physical relationship betw een porosity and transp ort-co efficien t magnitude directly in to the mo del. A three-lay er MLP encoder transforms the scalar p orosit y ϕ ∈ [0 , 1] into a 64-dimensional em b edding e ϕ ∈ R 64 : e ϕ = Linear 64 → 64 (ReLU(Linear 32 → 64 (ReLU(Linear 1 → 32 ( ϕ ))))) . (11) P orosity is computed directly from the binary image as ϕ = 1 − mean ( I ) , requiring no additional sim ulation and constituting a zero-cost auxiliary input. F eature-wise Linear Mo dulation (FiLM) [ 73 ] lay ers then modulate bac kb one feature maps x ∈ R H × W × C at Stages 2, 3, and 4: γ ϕ = Linear 64 → C ( e ϕ ) ∈ R C , β ϕ = Linear 64 → C ( e ϕ ) ∈ R C , (12) x mod = γ ϕ ⊙ x + β ϕ , (13) where ⊙ denotes element wise multiplication with broadcasting ov er spatial dimensions. Scale and shift parameters are initialized to pro duce iden tity mo dulation ( γ = 1 , β = 0 ) at training onset, ensuring stabilit y . Stage 1 is excluded b ecause its features enco de low-lev el edge detection and solid-fluid b ound- ary lo calization, represen ting a length scale at which p orosit y pro vides no physically meaningful conditioning context. Stages 2–4 enco de progressiv ely more abstract geometric descriptions, including connectivity patterns, channel top ology , and global p ore-net work structure, where p orosit y is a physically meaningful mo dulator of transp ort-coefficient magnitude. This placemen t w as confirmed by a five-configuration v alidation-set ablation, and the enco der dimensionality of 64 was selected from { 32 , 64 , 128 } b y the same pro cedure; full ablation results and rationale are rep orted in App endix C . The total trainable parameter coun t in Phase 4 comprises the p orosity enco der (0.01M), three FiLM lay er pairs (0.10M), and the regression head (0.23M), totaling 0.34M parameters, represen ting 0.3 % of the full mo del. 2.3 Progressiv e T raining Strategy T raining is organized into four num bered phases. Phase 1 constitutes dataset curation, LBM sim ulation, and the comprehensive characterization study describ ed in Section 2.1 and Ap- p endix A ; it pro duces no trained mo del and is not discussed further here. Phases 2–4 form the progressiv e learning curriculum: each initializes from the b est c hec kp oint of the preceding phase and introduces a single, physically motiv ated methodological con tribution—sup ervised transfer learning with D4-equiv arian t augmen tation (Phase 2), adv anced augmen tation with enhanced off-diagonal loss weigh ting (Phase 3), and frozen-bac kb one p orosity conditioning with w eight- a veraged ensem ble (Phase 4)—enabling unambiguous attribution of p erformance gains to each in terven tion. F ull h yp erparameter sp ecifications for all phases are consolidated in App endix D . The complete pipeline is illustrated in Figure 1 . 8 D 4 - e q u i v a r i a n t a u g m e n t a t i o n ( × 8 ) 0 γ → 9 0 γ → 1 8 0 γ L = L M S E + λ s y m L s y m advanced augmentation w o f f = 1 . 5 × w d i a g λ s y m : 0 . 1 → 0 . 2 F i L M : ˜ x = γ φ ¯ x + β φ S W A ( ¯ θ ) + E M A ( ˆ θ , λ = 0 . 9 9 9 9 ) 3 evaluation candidates ˆ K ∈ 2 × 2 I ∈ © 0 , 1 ª 1 2 8 × 1 2 8 φ = 1 − I p o r e s p a c e = 0 M a x V i T - B a s e → 7 6 8 - d f e a t u r e Phase 2 S1 6 4 S2 1 2 8 S3 2 5 6 S4 5 1 2 θ ( 2 ) b e s t Phase 3 morphology p = 0 . 0 8 elastic warp p = 0 . 0 8 cutout p = 0 . 1 0 θ ( 3 ) b e s t Phase 4 backbone frozen ( ∇ θ b b = 0 ) φ MLP encoder γ φ β φ K x x K x y K y x K y y Figure 1: Overview of the physics-informed progressive transfer learning pip eline for permeability tensor prediction. Input (far left): a binary p orous media image I ∈ { 0 , 1 } 128 × 128 with porosity ϕ = 1 − I , where pixel v alues of 1 denote solid grains and 0 denote p ore space. Phase 2 (blue): the ImageNet-pretrained MaxViT-Base backbone pro cesses the image through four hierarchical stages (S1– S4, c hannels 64–512) to pro duce a 768-dimensional feature v ector; D4-equiv arian t augmen tation applies all eight dihedral symmetries (three rotations sho wn) with consistent tensor transformation, and the ph ysics-aw are loss enforces tensor symmetry ( λ sym = 0 . 1 ). Phases are connected by c heckpoint transfer arro ws lab eled θ ( k ) best . Phase 3 (gr e en): three supplementary augmentation classes—morphological op erations ( p = 0 . 08 ), elastic deformation ( p = 0 . 08 ), and cutout masking ( p = 0 . 10 )—are introduced alongside doubled symmetry w eighting ( λ sym = 0 . 2 ) and off-diagonal loss prioritization ( w off = 1 . 5 w diag ). Phase 4 (purple): the backbone is frozen ( ∇ θ bb = 0 , indicated by hatching) while a p orosity MLP enco der drives FiLM conditioning ( ˜ x = γ ϕ ⊙ x + β ϕ ) at Stages 2–4; Sto c hastic W eight A veraging (SW A) and Exp onential Moving A verage (EMA) produce three ev aluation candidates from a single training run. Output (far right): the predicted p ermeabilit y tensor ˆ K ∈ R 2 × 2 , with representativ e parit y plots shown for all four comp onents ( K xx , K xy , K y x , K y y ). 2.3.1 Phase 2: Sup ervised Baseline with Progressiv e Unfreezing Phase 2 establishes the sup ervised baseline using the ImageNet-pretrained MaxViT bac kb one com bined with D4-equiv arian t augmen tation (Section 2.5.1 ). T o prev ent catastrophic forgetting of pretrained representations, the trainable parameter set is expanded progressively ov er 600 ep ochs: only the regression head is optimized during ep o chs 0–50; bac kb one Stages 3–4 are unfrozen during ep o c hs 50–150; and full end-to-end fine-tuning proceeds from ep o ch 150 onw ard. This curriculum allo ws the randomly initialized regression head to adapt to the p ermeability prediction task b efore backbone w eights are up dated, reducing the risk of destabilizing pretrained features. The physics-a w are loss (Section 2.4 ) is applied with λ sym = 0 . 1 and λ pos = 0 . 05 ; the off-diagonal prioritization term is not activ e in this phase. 2.3.2 Phase 3: Adv anced A ugmentation and Optimized Loss Phase 3 initializes from the Phase 2 b est chec kp oin t and trains for an additional 600 ep ochs. Three supplemen tary augmen tation classes are in tro duced at carefully tuned intensities (Section 2.5.2 ). The loss function is enhanced with a doubled symmetry weigh t ( λ sym = 0 . 2 ) and the off-diagonal prioritization term L offdiag (Section 2.4 ), which applies a 1 . 5 × w eight to K xy and K y x predictions, addressing the p erformance disparity b et ween diagonal and off-diagonal comp onents observed in Phase 2. The increased symmetry w eight couples the learning dynamics of K xy and K y x , allo wing gradien ts computed on either elemen t to inform the other. 9 2.3.3 Phase 4: Physics-Informed Conditioning with W eigh t A veraging Phase 4 initializes from the Phase 3 b est chec kp oin t with the MaxViT bac kb one frozen through- out. Only the newly in tro duced p orosit y enco der, FiLM la yers, and regression head are optimized (0.34M parameters; 0.3 % of total). The frozen-bac kb one strategy offers three k ey prop erties: stability , eliminating the risk of corrupting learned p ore-scale geometry representations through further gradient updates; efficiency , as backpropagation b ypasses frozen la yers, substantially reducing memory consumption and training time; and interpr etability , ensuring that an y p erfor- mance improv emen t is attributable solely to the p orosit y conditioning mec hanism. T wo complementary w eight-a v eraging strategies improv e generalization b y exploiting flatter, wider regions of the loss landscap e: Sto chastic W eight A ver aging (SW A) [ 74 ]. F rom ep o ch 400 onw ard, a running arithmetic mean of mo del weigh ts is maintained at a fixed learning rate of 5 × 10 − 6 , aggregating 200 snapshots without additional training cost. After training, batch-normalization statistics are recalibrated via a single pass th rough the training set to restore accuracy . Exp onential Moving A ver age (EMA) [ 80 ]. A contin uously up dated exp onentially-w eigh ted parameter av erage is main tained throughout training with decay λ EMA = 0 . 9999 , corresp onding to an effective windo w of approximately 10,000 gradien t steps. EMA filters high-frequency weigh t fluctuations while trac king the optimization tra jectory , consisten tly outperforming instantaneous w eights on v alidation data. Phase 4 training pro duces three ev aluation candidates sharing the same arc hitecture: the b est instan taneous c hec kp oint θ best , the SW A-a veraged mo del θ SW A , and the EMA-a veraged mo del θ EMA . All three are ev aluated indep endently with test-time D4 augmentation to av oid selection bias. F ull mathematical form ulations, up date rules, and batch-normalization recalibration details app ear in App endix F . 2.4 Ph ysics-A w are Loss F unction The training ob jectiv e com bines four terms that join tly enforce predictive accuracy and physical v alidity: L perm = L MSE + λ sym L sym + λ pos L pos + λ offdiag L offdiag . (14) Phase-sp ecific weigh t v alues are rep orted in App endix D . Reconstruction loss. The primary term p enalizes tensor-level prediction error via the F rob enius norm: L MSE = 1 N N X i =1 K pred i − K true i 2 F . (15) Symmetry constraint. T ensor symmetry ( K xy = K y x ) is enforced through a soft quadratic p enalt y on the off-diagonal discrepancy: L sym = 1 N N X i =1 K pred xy ,i − K pred y x,i 2 . (16) Although ground-truth lab els satisfy symmetry to near-mac hine precision (T able 3 ), unconstrained regression do es not inheren tly learn this prop erty , motiv ating explicit soft enforcemen t. The w eight λ sym is increased from 0 . 1 in Phase 2 to 0 . 2 in Phases 3–4, coupling the learning dynamics of K xy and K y x and improving o verall off-diagonal accuracy . P ositivit y constraint. A squared hinge loss with margin ε = 0 . 001 enforces strictly p ositiv e diagonal elements: L pos = 1 N N X i =1 h max 0 , ε − K pred xx,i 2 + max 0 , ε − K pred y y,i 2 i . (17) 10 The p enalt y activ ates only for sub-margin predictions, providing directed gradient signal without p enalizing physically v alid outputs. The non-zero margin ensures predicted p ermeabilities exceed zero by a meaningful amoun t, improving numerical stabilit y in do wnstream flo w simulations. Off-diagonal prioritization. A comp onen t-weigh ted mean squared error amplifies gradients for the more challenging off-diagonal elements: L offdiag = 1 N N X i =1 h w d K err xx,i 2 + w o K err xy ,i 2 + w o K err y x,i 2 + w d K err y y,i 2 i , (18) where K err j k ,i = K pred j k ,i − K true j k ,i , w d = 1 . 0 , and w o = 1 . 5 . This targeted emphasis addresses a fundamen tal prediction difficulty: diagonal permeabilities K xx and K y y correlate strongly with lo cal p ore geometry directly accessible through con volutional features, whereas off-diagonal elemen ts K xy and K y x enco de subtle geometric anisotropies requiring global spatial in tegration, precisely the regime in which the multi-axis attention mechanism provides its principal adv antage. The 50 % weigh t increase provides a stronger learning signal for off-diagonal components without o verwhelming the ov erall loss landscape. 2.5 Data A ugmen tation Strategy 2.5.1 Phase 2: D4-Equiv arian t A ugmentation F or isotropic p orous media, the p ermeability tensor is in v arian t under the dihedral group D 4 , comprising eight elemen ts: the identit y , rotations by 90 ◦ , 180 ◦ , and 270 ◦ , and reflections ab out the horizon tal, v ertical, diagonal, and anti-diagonal axes [ 81 , 82 ]. Eac h training sample undergo es a uniformly random D 4 transformation at ev ery ep och, multiplying the effective dataset size by a factor of eight at no additional lab eling cost. Ph ysically consisten t augmentation requires simultaneous transformation of b oth the image and the p ermeability tensor. Under a co ordinate transformation represented b y matrix P , the second-order p ermeability tensor transforms as: K ′ = P K P T . (19) F or example, the 90 ◦ rotation matrix P 90 = 0 − 1 1 0 yields: K ′ = " K y y − K y x − K xy K xx # . (20) T ransformation matrices for all eigh t D 4 elemen ts are tabulated in App endix E . Because these symmetries are exact for isotropic media, D 4 augmen tation constitutes ph ysically rigorous regularization rather than an approximation. The equiv ariance achiev ed is statistical rather than arc hitectural; group-equiv ariant con volutional net works [ 70 , 72 ], whic h build exact equiv ariance in to the weigh t-sharing structure, w ere not adopted b ecause doing so w ould require training from scratch and discarding the ImageNet pretraining cen tral to the transfer-learning strategy emplo yed here. A detailed discussion of this design choice app ears in App endix E.2 . 2.5.2 Phase 3: Adv anced A ugmentation Three additional transformation classes are applied sequentially after D 4 augmen tation, at in tensities determined by a t wo-stage v alidation-set grid search (App endix E ): Morpholo gic al op er ations. Erosion and dilation with a 3 × 3 structuring elemen t, each applied with probability 0 . 08 , simulate natural v ariability in p ore-throat geometry . No explicit tensor transformation is required, as the net work learns the p ermeabilit y of the morphologically mo dified geometry through con volutional feature extraction. 11 Elastic deformation. Smo oth coordinate w arping is generated from Gaussian-smo othed uniform random displacement fields, with displacemen t magnitude α = 3 . 0 pixels, smo othing σ = 2 . 0 pixels, and application probabilit y 0 . 08 . Elastic deformation requires an explicit p ermeabilit y tensor transformation according to the local deformation Jacobian J : K ′ = J K J T , J = " 1 + ∂ x δ x ∂ y δ x ∂ x δ y 1 + ∂ y δ y # , (21) appro ximated at the image cen ter via cen tral finite differences, follow ed b y symmetrization K ′ ← ( K ′ + K ′ T ) / 2 . Cutout masking. Random 8 × 8 pixel rectangular regions are mask ed to a neutral v alue of 0 . 5 with probability 0 . 10 , forcing distributed feature learning by preven ting reliance on an y single spatial region. No tensor transformation is required. A ugmentation parameters w ere optimized b y targeting off-diagonal R 2 as the primary criterion, as diagonal components prov ed robustly learned across the full searc h range while off-diagonal prediction was the most sensitive indicator of augmen tation quality . T otal combined p erturbation probabilit y is held b elo w approximately 35 % to a void gradient v ariance that w ould imp ede off-diagonal conv ergence. The full parameter search pro cedure and pipeline pseudo co de app ear in App endix E . 2.6 Implemen tation and Ev aluation 2.6.1 Implemen tation Details All mo dels were implemented in PyT orch 2.6.0 using the timm 1.0.22 library and trained on a single NVIDIA R TX 6000 Ada Generation GPU (48 GB VRAM) un der CUDA 12.4. A utomatic mixed precision w as disabled across all phases to ensure numerical stability in physics-constrain t calculations and main tain gradient precision for the small trainable parameter set of Phase 4. Random seeds w ere fixed at 42 for NumPy , PyT orch, and Python’s random mo dule to ensure repro ducibilit y . Deterministic CuDNN algorithms were not enforced to allo w b enchmark-mode acceleration, providing approximately 15 % training speedup without observ able impact on cross-run repro ducibility . T raining follows a sequential proto col across phases. Phase 2 trains from ImageNet-pretrained w eights for 600 ep o c hs. Phase 3 initializes from the Phase 2 b est chec kpoint for a further 600 ep ochs. Phase 4 initializes from the Phase 3 b est c heckpoint with the bac kb one frozen, training only the p orosity enco der, FiLM lay ers, and regression head for 600 ep o chs. Phase 4 mo del v ariants ( θ best , θ SW A , θ EMA ) share iden tical architecture and forward-pass computation graphs; w eight-a v eraging affects only the parameter v alues loaded at inference and introduces no additional computational cost. All Phase 4 v arian ts ac hiev e an inference latency of approximately 120 ms p er sample on the NVIDIA R TX 6000 Ada (mean o ver 1,000 indep enden t predictions), compared to hours per sample for LBM simulation. Complete optimizer configurations, learning-rate sc hedules, batc h sizes, and early-stopping criteria for eac h phase are rep orted in App endix D . 2.6.2 Ev aluation Metrics Mo del p erformance is assessed through four complemen tary metric classes. Co efficient of determination ( R 2 ). The primary accuracy metric measures the proportion of v ariance explained: R 2 = 1 − P N i =1 ( y pred i − y true i ) 2 P N i =1 ( y true i − ¯ y ) 2 , (22) rep orted as an ov erall v ariance-w eigh ted score across all four tensor comp onen ts and as p er- comp onen t v alues for [ K xx , K xy , K y x , K y y ] . 12 Me an squar e d err or (MSE). Absolute tensor-level prediction error is quan tified via the F rob enius norm (Equation 15 ). R elative r o ot me an squar e d err or (RRMSE). T o enable direct comparison b et ween the large- magnitude diagonal and near-zero-mean off-diagonal comp onen ts, scale-in v arian t relativ e error is rep orted p er tensor comp onen t: RRMSE j k = q 1 N P i ( K pred j k ,i − K true j k ,i ) 2 ¯ K true j k , (23) normalized by the mean absolute test-set v alue ¯ K true j k . A v alue of, say , 5 % on K xx corresp onds directly to a 5 % uncertaint y in the principal flo w resistance. Physic al validity metrics. Symmetry is quantified by the mean symmetry error ε sym = E [ | K pred xy − K pred y x | ] and the fraction of test predictions satisfying | K xy − K y x | < 10 − 3 . P ositivit y is assessed by the fraction of predictions satisfying K xx , K y y > 0 . The rep orted ε sym = 3 . 95 × 10 − 7 represen ts engineering-lev el precision far exceeding practical requirements, though not machine precision in the strict floating-p oint sense ( ∼ 10 − 16 for FP64); the term “near-machine precision” is therefore used in this con text. Extended metrics for cross-study comparison, including mean absolute p ercen tage error (MAPE), range-normalized RMSE, Willmott’s index of agreement, the Kling-Gupta efficiency , and Sp earman rank correlation, are defined in Appendix G . 3 Results and Discussion The progressiv e training framework is ev aluated on a held-out test set of 4,000 p orous media samples spanning three orders of magnitude in p ermeabilit y ( K ∈ [0 . 002 , 3 . 19] in lattice units), nev er accessed during mo del dev elopment, hyperparameter selection, or an y training decision. Three training phases are assessed in sequence: Phase 2 establishes the supervised baseline with ImageNet-pretrained MaxViT and D4-equiv ariant augmen tation; Phase 3 introduces adv anced augmen tation and the enhanced ph ysics-aw are loss; and Phase 4 applies frozen-backbone transfer learning with p orosity-conditioned FiLM lay ers and ensemble w eight av eraging. Eac h phase is ev aluated indep enden tly , enabling attribution of p erformance gains to sp ecific metho dological con tributions. The ev aluation proto col, metrics, and ph ysical v alidit y criteria are defined in Section 2.6.2 ; comprehensive dataset statistics and cross-split consistency verification are rep orted in App endix A . 3.1 Dataset Ov erview and Structure-Prop ert y Relationships Figure 2 presents represen tative binary microstructures spanning the p orosity and anisotrop y sp ectrum, illustrating the geometric diversit y of the dataset. Low-porosity samples ( ϕ < 0 . 40 ) exhibit dense solid matrices with isolated, tortuous p ore c hannels and corresp ondingly small p ermeabilities; mo derate-p orosit y samples ( 0 . 40 ≤ ϕ ≤ 0 . 70 ) display increasingly in terconnected p ore netw orks and non-negligible off-diagonal coupling; and high-p orosit y samples ( ϕ > 0 . 70 ) feature well-connected, nearly isotropic net works with large diagonal p ermeabilities. Figure 3 presen ts comp onent-wise p orosit y-p ermeabilit y relationships for all 20,000 lab eled samples, revealing the fundamental structure-prop erty correlations that gov ern mo del learning. Diagonal p ermeabilities K xx and K y y exhibit strong p ositive correlation with p orosity (Sp earman ρ s ≈ +0 . 940 , Pearson r ≈ +0 . 795 , p ≈ 0 ), confirming that pore volume fraction gov erns primary transp ort capacit y and motiv ating the p orosit y-conditioned FiLM design in Phase 4 (Section 2.2.2 ). The joint distributions reveal righ t-skew ed marginal p orosit y (skewness ≈ − 0 . 55 , concentrated in the mo derate-to-high connectivity regime) and heavy-tailed comp onen t distributions spanning three orders of magnitude. Off-diagonal elements K xy and K y x exhibit negligible correlation with 13 K x x K y y K x x K y y K x x K y y K x x K y y K x x K y y K x x K y y K x x K y y K x x K y y Figure 2: Represen tativ e binary p orous media microstructures spanning the p orosity and anisotropy sp ectrum. Each panel shows a 128 × 128 binary image (white: solid matrix; blac k: void space) with corresp onding p orosity ϕ and permeability tensor comp onen ts. The progression from low to high p orosit y illustrates the transition from tortuous isolated p ore channels (small diagonal p ermeabilit y , negligible off-diagonal coupling) to well-connected isotropic netw orks (large nearly equal K xx , K y y ). Samples with significan t off-diagonal comp onen ts ( | K xy | > 0 . 05 ) are associated with oblique p ore- c hannel orientations rather than porosity magnitude, consisten t with the near-zero p orosit y-to-off- diagonal correlation ( ρ s ≈ 0 . 01 ) rep orted in Appendix A.3 . p orosit y ( ρ s ≈ +0 . 01 , not significant), confirming that cross-directional coupling arises solely from geometric anisotropy and is indep enden t of pore volume fraction. This dissociation is a k ey dataset c haracteristic: diagonal and off-diagonal permeabilities enco de ph ysically distinct information, the former captured by scalar p orosit y and the latter requiring spatial integration of geometric anisotrop y features, which directly motiv ates the comp onent-specific loss weigh ting in tro duced in Phase 3 (Section 2.4 ). The symmetric clustering of off-diagonal components ab out zero (mean | ¯ K xy | < 0 . 002 , kurtosis ≈ 9 –13) confirms ground-truth tensor symmetry , and all 20,000 lab eled tensors are strictly p ositiv e-definite, providing thermo dynamically consisten t training lab els free of simulation artifacts. The dataset presents three distinct challenges for p ermeability prediction. First, the three- order-of-magnitude dynamic range in diagonal p ermeabilit y demands mo del capacity to resolve b oth low-connectivit y sparse net works and high-connectivit y contin uum-limit b ehavior across the full sp ectrum sim ultaneously . Second, the off-diagonal comp onen ts, indep enden t of p orosit y and enco ded exclusively in spatial geometric anisotropy , require global feature integration at scales comparable to the representativ e elementary volume (REV), a regime where con volutional features alone are insufficient. Third, the five-fold scale disparit y b et ween diagonal and off-diagonal standard deviations ( ≈ 5 . 1 × across all splits; App endix A.1 ) creates an implicit loss-weigh ting im balance in naive MSE training, motiv ating explicit off-diagonal prioritization. These challenges are addressed directly b y the architectural and training design decisions describ ed in Section 2 , and the follo wing subsections assess the exten t to whic h eac h phase of training resolv es them. 3.2 Phase 2: Sup ervised Baseline with D4-Equiv ariant Augmen tation Phase 2 establishes the sup ervised baseline by fine-tuning ImageNet-pretrained MaxViT with progressiv e unfreezing and D4-equiv ariant augmentation (Section 2.3.1 ). The mo del achiev es an o verall v ariance-weigh ted R 2 = 0 . 9843 on the 4,000-sample test set, with RMSE = 1 . 697 × 10 − 2 , 14 K x x K x x K y y K y y K x y K x y K y x K y x Figure 3: Comp onen t-wise p orosit y-p ermeability relationships across all 20,000 labeled samples, sho wn as hexbin densit y plots with marginal histograms. Color intensit y indicates sample density . (a) K xx v ersus ϕ : strong p ositive correlation ( ρ s = +0 . 940 , r = +0 . 795 ), spanning three orders of magnitude; scatter increases at high p orosity , reflecting geometric diversit y b eyond scalar p ore fraction. (b) K y y v ersus ϕ : nearly iden tical statistics and correlation structure to K xx , confirming statistical isotrop y at the dataset level despite individual-sample anisotrop y . (c) K xy v ersus ϕ : negligible correlation ( ρ s ≈ +0 . 01 , p > 0 . 05 ); off-diagonal coupling clusters symmetrically near zero, indep endent of p ore volume fraction. (d) K y x v ersus ϕ : identical to K xy , confirming near-machine-precision ground-truth symmetry ( ε sym < 8 × 10 − 6 , T able 3 ). The contrast b etw een panels (a,b) and (c,d) constitutes the fundamental prediction-difficult y asymmetry b etw een diagonal and off-diagonal tensor comp onen ts that the progressive training strategy is designed to address. MAE = 1 . 091 × 10 − 2 , Pearson r = 0 . 9985 , Sp earman ρ s = 0 . 9957 , Willmott d = 0 . 9992 , and K GE = 0 . 988 , confirming that ImageNet-pretrained features transfer effectively to binary porous media despite the substantial domain difference. T able 1 rep orts the full comp onent-wise metric suite. 3.2.1 Comp onen t-Wise Performance Asymmetry The component-wise parit y plots in Figure 4 are the primary diagnostic result of Phase 2. Diagonal comp onen ts ac hieve R 2 K xx = 0 . 9961 and R 2 K yy = 0 . 9962 , with RRMSE of 5.77 % and 5.70 % resp ectively . Off-diagonal comp onents ac hieve R 2 K xy = R 2 K yx = 0 . 9725 , a gap of 236 basis p oin ts relative to diagonal p erformance, with RRMSE of 26.07 %, approximately 4 . 5 × larger than for diagonal elemen ts. The very lo w KGE v alues for off-diagonal comp onen ts ( 0 . 176 , compared to ≈ 0 . 987 for diagonal) further characterize this asymmetry: KGE decomp osition reveals that the deficit arises primarily from a p o or v ariabilit y ratio, meaning the mo del systematically under-disp erses predictions for K xy and K y x , compressing the predicted distribution to w ard zero relativ e to the ground truth. The computed MAPE v alues w ere not meaningful for off-diagonal 15 T able 1: Phase 2 test-set p erformance by tensor comp onen t. RRMSE: relative ro ot mean squared error (normalized by comp onen t mean absolute v alue; Equation 23 ). MAPE v alues for off-diagonal comp onen ts are large b ecause the denominator (mean absolute true v alue) approaches zero; RRMSE is the appropriate relative-error metric for near-zero-mean components. KGE: Kling-Gupta efficiency [ 83 ]. d : Willmott index of agreement [ 84 ]. Comp onen t R 2 RMSE MAE RRMSE (%) K GE d K xx 0.9961 2 . 147 × 10 − 2 1 . 430 × 10 − 2 5.77 0.986 0.9990 K y y 0.9962 2 . 124 × 10 − 2 1 . 394 × 10 − 2 5.70 0.989 0.9990 K xy 0.9725 1 . 095 × 10 − 2 7 . 70 × 10 − 3 26.07 0.176 0.9930 K y x 0.9725 1 . 095 × 10 − 2 7 . 70 × 10 − 3 26.07 0.176 0.9930 Ov erall 0.9843 1 . 796 × 10 − 2 1 . 091 × 10 − 2 8.19 0.988 0.9992 comp onen ts b ecause the MAPE denominator approaches zero for near-zero-mean quantities; RRMSE is the appropriate relative-error metric in this regime (App endix G ). 3.2.2 Anisotrop y-Stratified Analysis Figure 5 stratifies prediction p erformance b y tensor anisotropy ratio AR = λ max /λ min across ten equi-p opulated bins (400 samples eac h) spanning the 2nd–98th percentile of the AR distribution. P anel (a) confirms that the test set is predominan tly near-isotropic (median AR = 1 . 624 ; App endix A.1 ), with a long, sparsely p opulated tail: appro ximately 4 % of samples ha ve AR > 5 and only 1 % exceed AR = 10 . 0 ( p 99 ). P anels (b) and (c) reveal a pattern that is the in verse of naive exp ectation. The diagonal-to- off-diagonal R 2 gap is largest not at high AR but at the lowest anisotrop y ratios: for the first bin (AR ≈ 1 . 11 ), diagonal R 2 remains excellent at 0 . 996 while off-diagonal R 2 collapses to 0 . 802 , a gap of ∆ R 2 = 0 . 194 . This gap narrows rapidly with increasing AR, reac hing ∆ R 2 < 0 . 02 for AR > 1 . 5 and a minim um of 0 . 011 near AR ≈ 1 . 89 . A mo dest secondary widening o ccurs at the highest AR bin (AR ≈ 5 . 87 ; ∆ R 2 = 0 . 027 ), where off-diagonal R 2 declines to 0 . 960 , consistent with the high-AR analysis from Appendix H.4 ( R 2 off = 0 . 968 for the top 12.5 % most anisotropic samples). The collapse of off-diagonal R 2 at near-isotropic AR has tw o comp ounding causes. First, a v ariance normalization effect: when AR is close to unity , true K xy v alues are near zero with v ery low v ariance, so even small absolute prediction errors driv e R 2 to ward zero b ecause the denominator of Equation 22 is near zero. Second, a gen uine learning difficulty: the mo del must resolv e extremely subtle geometric asymmetries in nearly isotropic media to distinguish, say , K xy = 0 . 01 from K xy = − 0 . 01 , a task requiring sensitivit y to weak spatial orientation cues that are easily obscured by augmentation noise. At high AR, the larger absolute off-diagonal coupling v alues pro vide a stronger learning signal, allo wing the mo del to resolv e cross-directional flo w paths more reliably; accordingly , R 2 off at AR ≈ 5 . 87 ( 0 . 960 ) substantially exceeds the lo west-bin v alue ( 0 . 802 ) despite the greater geometric complexit y . P anel (d) further corrob orates this in terpretation: diagonal residuals are compact and nearly symmetric across all AR bins, while off-diagonal residual distributions are widest at low AR where coupling v alues are near-zero and noisiest. T aken together, these results iden tify the prediction of weak off-diagonal coupling in near-isotropic media, rather than the prediction of strongly anisotropic tensors, as the primary Phase 2 failure mo de. This motiv ates the Phase 3 off-diagonal prioritization term L offdiag (Section 2.4 ), whic h amplifies gradients for K xy and K y x regardless of their absolute magnitude, and the calibrated augmen tation strategy (Section 2.5.2 ), whic h preserv es the w eak orien tation signals in near-isotropic samples. 16 1 2 Density (a) K xx true pred 0.0 0.8 1.6 2.4 K xx true 0.0 0.8 1.6 2.4 K xx predicted R 2 = 0.9961 RMSE = 2.1469e-02 MAE = 1.4299e-02 Bias = +2.5984e-03 r = 0.9981 1 2 Density 10 0 10 1 10 2 Count 1 2 Density (b) K yy true pred 0.0 0.8 1.6 2.4 K yy true 0.0 0.8 1.6 2.4 K yy predicted R 2 = 0.9962 RMSE = 2.1239e-02 MAE = 1.3941e-02 Bias = +3.2389e-03 r = 0.9981 1 2 Density 10 0 10 1 10 2 Count 5 10 Density (c) K xy true pred −0.3 0.0 0.3 K xy true −0.3 0.0 0.3 K xy predicted R 2 = 0.9725 RMSE = 1.0949e-02 MAE = 7.7013e-03 Bias = +1.3361e-03 r = 0.9863 5 10 Density 10 0 10 1 10 2 Count 5 10 Density (d) K yx true pred −0.3 0.0 0.3 K yx true −0.3 0.0 0.3 K yx predicted R 2 = 0.9725 RMSE = 1.0949e-02 MAE = 7.7014e-03 Bias = +1.3363e-03 r = 0.9863 5 10 Density 10 0 10 1 10 2 Count Figure 4: Comp onen t-wise parit y plots for Phase 2 p ermeability tensor predictions. Eac h panel sho ws predicted v ersus true v alues for one tensor component, with hexbin color intensit y proportional to log-density . Marginal histograms compare the distributions of true (gray) and predicted (colored) v alues. (a) K xx ( R 2 = 0 . 9961 , RRMSE = 5 . 77 %): predictions closely track the iden tity line across the full three-order-of-magnitude range; sligh t scatter at extreme v alues reflects the sparsely p opulated tails of the training distribution. (b) K y y ( R 2 = 0 . 9962 , RRMSE = 5 . 70 %): nearly identical accuracy to K xx , confirming statistical isotrop y in the bac kb one’s ability to resolv e principal p ermeabilities. (c) K xy ( R 2 = 0 . 9725 , RRMSE = 26 . 07 %): increased scatter around the identit y line at large | K xy | v alues, with the marginal histogram revealing systematic under-disp ersion (KGE = 0 . 176 ). (d) K y x ( R 2 = 0 . 9725 , RRMSE = 26 . 07 %): essentially iden tical to K xy , confirming that the symmetry constrain t K xy ≈ K y x is learned accurately ( ε sym = 1 . 39 × 10 − 6 , near-machine precision). The 236-basis-p oin t R 2 gap b etw een diagonal and off-diagonal components drives all subsequent Phase 3 in terven tions. 3.2.3 Ph ysical Constraint V alidation Ph ysical v alidity is enforced to near-mac hine precision. The mean symmetry error is ε sym = 1 . 39 × 10 − 6 across all 4,000 test predictions, exceeding engineering requiremen ts b y several orders of magnitude and confirming that the symmetry penalty L sym (Equation 16 ) successfully couples the learning dynamics of K xy and K y x through the soft constraint. All 4,000 test tensors satisfy the p ositivity constraint ( K xx , K y y > 0 ), yielding 100 % thermo dynamic v alidit y without p ost-ho c pro jection. These physical constraint results are consisten t across all subsequent training phases. 17 10 0 10 1 Anisotropy ratio λ max / λ min 0.0 0.2 0.4 0.6 0.8 1.0 Density (a) p25 = 1.34 p50 = 1.62 p75 = 2.19 p90 = 3.34 p95 = 4.47 2 × 10 0 3 × 10 0 4 × 10 0 6 × 10 0 Anisotropy ratio λ max / λ min 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 R 2 n=685 n=551 n=190 n=61 (b) diagonal ( K xx , K yy ) off-diagonal ( K xy , K yx ) gap 2 × 10 0 3 × 10 0 4 × 10 0 6 × 10 0 Anisotropy ratio λ max / λ min 0.010 0.015 0.020 0.025 0.030 0.035 RMSE (c) K xx K yy K xy K yx K xx K yy K xy K yx Component −0.3 −0.2 −0.1 0.0 0.1 0.2 Residual (predicted − true) bias=+2.598e-03 bias=+3.239e-03 bias=+1.336e-03 bias=+1.336e-03 (d) Figure 5: Anisotropy-stratified prediction performance for Phase 2 across ten equi-p opulated bins ( n = 400 eac h) of tensor anisotrop y ratio AR = λ max /λ min . (a) AR distribution of the 4,000 test samples (log-scale x -axis); vertical dashed lines mark the 50th, 90th, and 99th p ercen tiles ( AR 50 = 1 . 62 , AR 90 = 3 . 34 , AR 99 = 10 . 0 ). (b) R 2 for diagonal (mean of K xx , K y y ) and off-diagonal (mean of K xy , K y x ) comp onents versus AR bin cen ter: the diagonal-to-off-diagonal gap is largest at the lowest anisotrop y ratios ( ∆ R 2 = 0 . 194 at AR ≈ 1 . 11 ), narrows rapidly with increasing AR ( ∆ R 2 < 0 . 02 for AR > 1 . 5 ), and exhibits a modest secondary widening at high AR ( ∆ R 2 = 0 . 027 at AR ≈ 5 . 87 ). This non-monotonic pattern iden tifies w eak off-diagonal coupling in near-isotropic media as the dominant Phase 2 failure mo de. (c) RMSE v ersus AR bin center for all four comp onents; off-diagonal RMSE is elev ated at b oth extremes of the AR spectrum while diagonal RMSE remains largely insensitive throughout. (d) Residual distributions p er component (violin plots); off-diagonal interquartile ranges are widest at low AR, where near-zero coupling v alues produce low-v ariance targets that amplify the relativ e impact of prediction errors. 3.2.4 Mec hanistic Interpretation and Motiv ati on for Phase 3 The Phase 2 results exp ose a fundamen tal hierarc h y in the p ermeability prediction task. Diag- onal p ermeabilities correlate strongly with scalar p orosity ( ρ s ≈ +0 . 940 ; App endix A.3 ) and are predominantly determined by lo cal p ore-throat geometry , making them accessible to the hierarc hical conv olutional features of the MaxViT backbone. Off-diagonal comp onents enco de geometric anisotropy that is indep endent of p orosit y ( ρ s ≈ +0 . 01 ) and requires in tegration of orien tation-dep endent spatial correlations at the REV scale. The anisotropy-stratified analysis (Figure 5 ) reveals that off-diagonal prediction difficulty is concen trated at near-isotropic AR, where w eak coupling signals are hardest to resolve, rather than at high AR where stronger coupling magnitudes provide a clearer learning signal. T wo mechanisms comp ound this difficulty 18 in the near-isotropic regime. First, the diagonal-to-off-diagonal standard deviation ratio of ≈ 5 . 1 × (App endix A.1 ) causes naive MSE training to under-weigh t off-diagonal gradien ts relative to diagonal terms. Second, the D4-equiv ariant augmen tation, while physically correct, exp oses the mo del to all orientations of eac h microstructure, which ma y dilute the subtle orien tation-sp ecific cues on whic h weak off-diagonal coupling dep ends. These observ ations motiv ate t wo targeted Phase 3 in terven tions: augmen tations calibrated to enrich geometric anisotrop y diversit y without obscuring global connectivity patterns (Sec- tion 2.5.2 ), and explicit off-diagonal prioritization through loss weigh t w o = 1 . 5 (Section 2.4 ). The progressiv e framew ork thereby functions as a structured ablation: Phase 2 isolates ImageNet pretraining and D4-equiv arian t augmen tation; Phase 3 adds adv anced augmen tation and en- hanced loss; Phase 4 adds frozen-bac kb one porosity conditioning and ensemble weigh t av eraging, enabling unambiguous attribution of eac h p erformance gain. 3.3 Phase 3: Adv anced A ugmentation and Off-Diagonal Loss Optimization Building on the Phase 2 diagnostic insigh ts, Phase 3 introduces three targeted interv en tions to address the near-isotropic off-diagonal failure mod e identified in Section 3.2.2 : morphological op erations (erosion and dilation, p = 0 . 08 ), elastic deformation ( p = 0 . 08 , α = 3 . 0 , σ = 2 . 0 ), and random cutout ( p = 0 . 10 , 8 × 8 pixels), combined with an enhanced ph ysics-aw are loss incorp orating doubled symmetry w eight ( λ sym = 0 . 2 ) and off-diagonal prioritization ( w o = 1 . 5 ). The combined augmentation rate of ≈ 24 % was v alidated on the v alidation set to lie b elow the ≈ 35 % threshold b ey ond whic h gradient v ariance degrades off-diagonal conv ergence while leaving diagonal learning in tact (Section 2.5.2 ). Phase 3 achiev es a v ariance-weigh ted R 2 = 0 . 9945 (comp onen t-av eraged R 2 = 0 . 9848 , 95 % BCa CI [0 . 9833 , 0 . 9860] ) on the held-out test set, with RMSE = 1 . 697 × 10 − 2 (a 5.5 % reduction from Phase 2’s 1 . 796 × 10 − 2 ), RRMSE = 8 . 19 % , Willmott d = 0 . 9992 , and K GE = 0 . 9876 . F ull metrics are reported in T able 2 . 3.3.1 Comp onen t-Wise Gains Diagonal comp onents improv e modestly from Phase 2 to Phase 3 ( R 2 diag : 0 . 9955 → 0 . 9961 ), confirming that the more aggressive augmentation pip eline do es not ero de the strong diagonal baseline. The primary gain is in the off-diagonal components ( R 2 off : 0 . 9725 → 0 . 9740 ), with the diagonal-to-off-diagonal R 2 gap narrowing from ∆ R 2 = 0 . 0230 to 0 . 0221 . The impro vemen t is more pronounced in the high-anisotropy regime: for the top 500 most anisotropic test samples (top 12.5 %), the high-AR ∆ R 2 drops from 0 . 0209 to 0 . 0132 —a 37 % reduction—indicating that augmen tation-induced geometric diversit y and stronger off-diagonal gradien ts sp ecifically b enefit the regime where Phase 2 struggled most. Ph ysical constrain t enforcement tigh tens in parallel: the mean symmetry error falls from 1 . 39 × 10 − 6 to 7 . 69 × 10 − 7 (a 45 % reduction), attributable to the doubled symmetry weigh t λ sym = 0 . 2 coupling the learning dynamics of K xy and K y x more strongly . All 4,000 test predictions remain 100 % p ositive-definite. 3.3.2 Residual Limitations and Motiv ation for Phase 4 Despite consistent gains across all metrics, a non-trivial off-diagonal performance gap p ersists ( ∆ R 2 = 0 . 0221 ), and the near-isotropic failure mo de is atten uated rather than eliminated. T w o structural limitations remain. First, augmentation alone cannot supply information ab out the scalar p orosit y-to-p ermeability scaling relationship that the bac kb one do es not already enco de implicitly . Second, training all parameters simultaneously risks entangling the well-learned diagonal representations with the more v olatile off-diagonal gradien t signal. Phase 4 addresses b oth limitations through a frozen-backbone strategy with explicit p orosity conditioning via FiLM la yers (Section 2.3.3 ), providing a zero-cost ph ysical prior for magnitude scaling while protecting the learned geometric representations from further mo dification. 19 3.4 Phase 4: Physics-Informed T ransfer Learning with Ensem ble T echniques The core design principle of Phase 4 is delib erate separation of what is already kno wn from what m ust still b e learned. The MaxViT backbone (118.64 M parameters) is frozen in its Phase 3 configuration: rather than con tinuing to fine-tune geometric features that already generalize w ell, the training budget is redirected to w ard encoding a ph ysical prior that the backbone cannot disco ver from images alone—the strong p orosit y-to-p ermeability scaling relationship ( ρ s ≈ +0 . 940 ; App endix A.3 ). A compact physics-informed head (0.33 M parameters, 0.28 % of the total parameter budget) ac hieves this through a three-la yer MLP porosity enco der pro ducing a 64-dimensional embedding of scalar ϕ , which mo dulates backbone features via F eature-wise Linear Mo dulation (FiLM) la yers inserted after MaxViT Stages 2–4 (Section 2.2.2 ). Sto c hastic W eight A veraging (SW A, ep o chs 400–600, η SW A = 5 × 10 − 6 ) and Exp onential Moving A verage (EMA, λ = 0 . 9999 ) then stabilize the optimization tra jectory around a flatter loss minimum (App endix F ). The result is a mo del that encodes what ph ysics dictates explicitly , while leaving the bac kb one free to supply what ph ysics cannot: the geometric anisotrop y information that determines off-diagonal coupling indep enden tly of p orosit y . 3.4.1 Prediction A ccuracy and Statistical Robustness Phase 4 ac hiev es a v ariance-w eighted R 2 = 0 . 9960 and component-a v eraged R 2 = 0 . 9863 (95 % BCa CI [0 . 9852 , 0 . 9874] , B = 1 , 000 ) on the held-out test set—a 33 % reduction in unexplained v ariance relativ e to the Phase 2 baseline and a further 13 % relative to Phase 3. Global errors are RMSE = 1 . 564 × 10 − 2 (95 % BCa CI [0 . 01483 , 0 . 01704] ) and RRMSE = 7 . 55 % (95 % BCa CI [7 . 18 , 8 . 19] %). The Kling-Gupta efficiency K GE = 0 . 9942 is particularly informativ e: it decomp oses accuracy into correlation, bias, and v ariabilit y components simultaneously , and its step-wise improv emen t across phases ( 0 . 9824 → 0 . 9876 → 0 . 9942 ) reflects a prop ortionately larger gain in Phase 3 → 4 than Phase 2 → 3—consisten t with the FiLM conditioning correcting the remaining systematic bias and v ariabilit y mismatc h rather than merely reducing scatter. T able 2 presen ts the full three-phase comparison with b o otstrap confidence interv als; the extended version with p er-comp onen t and uncertaint y metrics is in App endix J . A notable prop ert y of the full extended table (Appendix J , T able 15 ) is that ev ery reported metric—across all 35 rows spanning global accuracy , rank agreemen t, p er-comp onen t R 2 , rel- ativ e errors, ph ysics compliance, and uncertaint y calibration—impro ves monotonically from Phase 2 through Phase 4. This consistency is not guaran teed by design: eac h phase optimizes a different ob jective configuration, and gains in one metric can in principle come at the ex- p ense of another. The absence of an y such trade-off confirms that the staged methodological con tributions—pretraining and equiv arian t augmentation, then augmentation diversit y and loss rebalancing, then physics-informed conditioning and ensem ble stabilization—are gen uinely complemen tary rather than comp eting. P er-comp onent p erformance reinforces this picture. Diagonal components ac hieve R 2 K xx = R 2 K yy = 0 . 9967 with RRMSE of 5.25 %–5.28 % and MAPE of 7.09 %–7.62 %; KGE of 0 . 991 – 0 . 993 confirms that bias, v ariance ratio, and rank correlation all con tribute negligibly to the residual error budget. Off-diagonal comp onen ts reac h R 2 K xy = R 2 K yx = 0 . 9758 (RRMSE = 24 . 43 %), with K GE rising from 0.176 in Phase 2 to 0.761 in Phase 4—a 4.3 × impro vemen t that directly reflects reco very from the v ariabilit y under-disp ersion identified in Section 3.2.1 . The diagonal-to-off- diagonal R 2 gap narrows monotonically: ∆ R 2 = 0 . 0230 → 0 . 0221 → 0 . 0209 across Phases 2–4. 3.4.2 Comp onen t-Wise Parit y and Prediction Distributions Figure 6 presen ts the comp onen t-wise parity plots. Eac h panel com bines a hexbin densit y map with marginal histograms, making b oth the conditional accuracy and the distributional fidelity visible simultaneously . 20 T able 2: Progressiv e p erformance across training phases on the held-out test set ( N = 4 , 000 ). 95 % BCa b o otstrap confidence interv als ( B = 1 , 000 ) in brack ets. Bold v alues indicate the b est result p er row. Phase 2: sup ervised baseline with D4 augmentation; Phase 3: adv anced augmentation and enhanced physics-a w are loss; Phase 4: frozen backbone, FiLM porosity conditioning, SW A/EMA ensem ble. Metric Phase 4 Phase 3 Phase 2 Glob al ac cur acy R 2 (v ariance-w eighted) 0.9960 0.9945 0.9940 R 2 (comp onen t-av eraged) 0.9863 0.9848 0.9843 95 % BCa CI [0.9852, 0.9874] [0.9833, 0.9860] [0.9830, 0.9856] RMSE 1 . 564 × 10 − 2 1 . 697 × 10 − 2 1 . 796 × 10 − 2 95 % BCa CI [0.01483, 0.01704] [0.01617, 0.01794] [0.01694, 0.01948] RRMSE (%) 7.55 8.19 8.67 95 % BCa CI [7.18, 8.19] [7.83, 8.69] [8.18, 9.35] R ank-b ase d and agr e ement metrics Sp earman ρ s 0.9960 0.9959 0.9957 Willmott d 0.9994 0.9992 0.9991 K GE 0.9942 0.9876 0.9824 T ensor-typ e de c omp osition R 2 diagonal 0.9967 0.9961 0.9955 R 2 off-diagonal 0.9758 0.9740 0.9725 ∆ R 2 (diag − off ) 0.0209 0.0221 0.0230 ∆ R 2 , high-AR (top 500) 0.0132 0.0132 0.0209 Physics c omplianc e Symmetry error ε sym 3 . 95 × 10 − 7 7 . 69 × 10 − 7 1 . 39 × 10 − 6 P ositivity fraction (%) 100.0 100.0 100.0 Pr e dictive unc ertainty (MC-Dr op out, T = 100 ) Mean calibration error (MCE) 0.3724 0.3754 0.3786 Global mean ¯ σ 8 . 90 × 10 − 3 9 . 07 × 10 − 3 9 . 76 × 10 − 3 The marginal histogram o v erlays are the most diagnostic element: they confirm that Phase 4 predictions replicate the ground-truth distribution shap e—including the heavy off-diagonal tails (kurtosis ≈ 7 –13; App endix A.1 )—rather than collapsing to ward the mean, which would inflate R 2 while misrepresenting the tails relev ant for strongly anisotropic samples. Bias is negligible for all four components (absolute bias < 5 × 10 − 4 ), and the narro w scatter bands around the iden tity confirm uniform generalization across the full dynamic range. 3.4.3 Qualitativ e Assessment Across the Microstructure Sp ectrum Figure 7 pairs representativ e binary microstructures with their ground-truth LBM tensors and Phase 4 predictions, providing a qualitativ e audit of generalization across the p orosit y and connectivit y spectrum. The samples illustrate a physically grounded p erformance hierarch y that is not simply a function of permeability magnitude. A t high p orosit y , abundan t and w ell-connected flo w paths create smo oth, learnable structure-prop ert y mappings. At mo derate p orosity , the multi-axis atten tion mec hanism successfully integrates b oth lo cal p ore-throat geometry—gov erning diagonal p ermeabilit y magnitude—and long-range spatial orien tation statistics— go verning off-diagonal 21 1 2 Density (a) K xx true pred 0.0 0.8 1.6 2.4 K xx true 0.0 0.8 1.6 2.4 K xx predicted R 2 = 0.9967 RMSE = 1.9548e-02 MAE = 1.2721e-02 Bias = -1.2539e-04 r = 0.9984 1 2 Density 10 0 10 1 10 2 Count 1 2 Density (b) K yy true pred 0.0 0.8 1.6 2.4 K yy true 0.0 0.8 1.6 2.4 K yy predicted R 2 = 0.9967 RMSE = 1.9643e-02 MAE = 1.2646e-02 Bias = -4.4705e-04 r = 0.9984 1 2 Density 10 0 10 1 10 2 Count 5 10 Density (c) K xy true pred −0.3 0.0 0.3 K xy true −0.3 0.0 0.3 K xy predicted R 2 = 0.9758 RMSE = 1.0259e-02 MAE = 7.1654e-03 Bias = -3.8355e-04 r = 0.9880 5 10 Density 10 0 10 1 10 2 Count 5 10 Density (d) K yx true pred −0.3 0.0 0.3 K yx true −0.3 0.0 0.3 K yx predicted R 2 = 0.9758 RMSE = 1.0259e-02 MAE = 7.1657e-03 Bias = -3.8343e-04 r = 0.9880 5 10 Density 10 0 10 1 10 2 Count Figure 6: Component-wise parity plots for Phase 4 (hexbin density , marginal histograms). Color enco des log-density; marginals compare true (gray) and predicted (colored) distributions. (a) K xx : R 2 = 0 . 9967 , RRMSE = 5 . 25 %, absolute bias < 1 . 3 × 10 − 4 ; predictions track the identit y across three orders of magnitude without systematic deviation. (b) K y y : R 2 = 0 . 9967 , RRMSE = 5 . 28 %; accuracy is symmetric betw een the tw o principal p ermeabilities, confirming that the MaxViT backbone resolv es b oth flow directions with equal fidelity . (c) K xy : R 2 = 0 . 9758 , RRMSE = 24 . 43 %; the predicted marginal distribution closely matches the ground truth, including the near-zero p eak and heavy tails; K GE = 0 . 761 , a 4.3 × impro vemen t o ver Phase 2 ( 0 . 176 ), marks the near-elimination of v ariabilit y under-disp ersion. (d) K y x : statistics identical to K xy ; symmetry error ε sym = 3 . 95 × 10 − 7 (95 % BCa CI: [3 . 82 , 4 . 10] × 10 − 7 ) confirms near-mac hine-precision Onsager recipro city across the full off-diagonal magnitude range. coupling—across geometrically div erse configurations. The mean F rob enius error across the displa yed samples is 0.018, and critically , no prediction is non-physical: the ph ysics-informed training ensures that positive-definiteness and symmetry are satisfied even for the geometrically most challenging microstructures, without an y p ost-ho c correction. 3.4.4 Anisotrop y-Stratified Performance Figure 8 stratifies Phase 4 predictions by tensor anisotrop y ratio, presenting microstructure-tensor pairs from ten log-spaced AR bins spanning the 5th–95th p ercentile of the test AR distribution. The attribution of the high-anisotropy improv emen t is precise: the ∆ R 2 high − AR for the top 500 most anisotropic samples drops sharply from 0.0209 (Phase 2) to 0.0132 (Phase 3), 22 (a) ϕ =0.355 K true pred |e|% xx 0.0365 0.0378 3.6 xy 0.0065 0.0051 20.9 yx 0.0065 0.0051 20.9 yy 0.0176 0.0192 9.1 MAE=0.0014 σ=0.0075 σmx=0.0118 (b) ϕ =0.401 K true pred |e|% xx 0.0106 0.0187 77.0 xy -0.0043 -0.0044 2.7 yx -0.0043 -0.0044 2.7 yy 0.0164 0.0235 43.4 MAE=0.0039 σ=0.0081 σmx=0.0128 ϕ =0.450 K true pred |e|% xx 0.0104 0.0219 110.4 xy 0.0015 0.0043 181.3 yx 0.0015 0.0043 180.8 yy 0.0484 0.0613 26.7 MAE=0.0075 σ=0.0067 σmx=0.0103 ϕ =0.500 K true pred |e|% xx 0.1987 0.2134 7.4 xy -0.0027 -0.0036 31.2 yx -0.0027 -0.0036 31.0 yy 0.0697 0.0679 2.7 MAE=0.0046 σ=0.0066 σmx=0.0096 ϕ =0.550 K true pred |e|% xx 0.0142 0.0225 58.1 xy -0.0089 -0.0104 17.4 yx -0.0089 -0.0104 17.4 yy 0.1931 0.1755 9.1 MAE=0.0072 σ=0.0072 σmx=0.0109 (c) ϕ =0.600 K true pred |e|% xx 0.0964 0.0781 19.0 xy 0.0390 0.0258 33.8 yx 0.0390 0.0258 33.8 yy 0.1786 0.1758 1.5 MAE=0.0119 σ=0.0056 σmx=0.0091 (d) ϕ =0.650 K true pred |e|% xx 0.0965 0.1010 4.7 xy -0.0269 -0.0275 2.1 yx -0.0269 -0.0275 2.1 yy 0.1723 0.1610 6.5 MAE=0.0042 σ=0.0048 σmx=0.0074 ϕ =0.700 K true pred |e|% xx 0.2158 0.2197 1.8 xy 0.0593 0.0515 13.2 yx 0.0593 0.0515 13.2 yy 0.1789 0.1625 9.2 MAE=0.0090 σ=0.0053 σmx=0.0073 ϕ =0.750 K true pred |e|% xx 0.3563 0.3671 3.1 xy -0.1027 -0.1085 5.6 yx -0.1027 -0.1085 5.6 yy 0.2223 0.2401 8.0 MAE=0.0101 σ=0.0086 σmx=0.0108 ϕ =0.800 K true pred |e|% xx 0.4587 0.4474 2.5 xy -0.0271 -0.0308 14.0 yx -0.0271 -0.0308 14.0 yy 0.4502 0.4603 2.2 MAE=0.0072 σ=0.0057 σmx=0.0089 0.4 0.5 0.6 0.7 0.8 Porosity ϕ Figure 7: Phase 4 predictions across the porosity sp ectrum. Eac h panel sho ws the binary microstruc- ture ( 128 × 128 ; white: solid grain, black: p ore void), ground-truth LBM p ermeabilit y tensor, and Phase 4 predicted tensor. A t high p orosity ( ϕ > 0 . 70 ), w ell-connected p ore netw orks yield smo oth structure-prop ert y mappings; F rob enius errors fall b elow 5 % and predicted condition n umbers matc h ground truth to within 0.15. A t mo derate p orosit y ( 0 . 50 ≤ ϕ ≤ 0 . 70 ), tensor magnitude, principal-axis orien tation, and off-diagonal coupling sign and magnitude are recov ered accurately across geometrically div erse configurations, including cases with condition num bers exceeding 10. A t low porosity ( ϕ < 0 . 50 ), predictions correctly iden tify anisotrop y direction and off-diagonal coupling sign; absolute magnitude uncertain ty increases to ward the low er tail of the training distribution (App endix A.3 ). All displa yed tensors are p ositiv e-definite with ε sym < 6 × 10 − 6 . and then holds essentially constant at 0.0132 in Phase 4 (T able 2 ). Phase 3’s augmen tation div ersity—exposing the model to a ric her range of anisotropic geometric configurations—w as the decisiv e in terven tion for strongly anisotropic samples. Phase 4’s con tribution via FiLM conditioning is instead concen trated at the global lev el, reducing the ov erall ∆ R 2 from 0.0221 to 0.0209 through impro v ed magnitude scaling. This clean separation of what eac h phase con tributes, visible directly in the table, v alidates the structured ablation logic underlying the progressive training design. Across all ten AR bins, diagonal R 2 remains ab o ve 0.985, confirming that principal p ermeability prediction is robust to the full range of microstructure anisotropy in the dataset. 3.4.5 P orosit y-Permeabilit y Relationship and FiLM Conditioning V alidation Figure 9 examines the learned p orosit y-p ermeability relationship across all 4,000 test samples, pro viding the most direct v alidation of the FiLM conditioning rationale (Section 2.2.2 ). The predicted and true porosity-diagonal correlations are indistinguishable ( r ≈ 0 . 795 – 0 . 800 in b oth cases), and the p ermeabilit y trace ac hieves the highest scalar correlation in the dataset ( r = 0 . 826 )—confirming that the FiLM-conditioned arc hitecture has in ternalized the fundamen tal ph ysical principle that p ore volume fraction gov erns total transp ort capacity . The residual scatter at fixed ϕ (spanning nearly an order of magnitude at ϕ ≈ 0 . 70 ) reflects the 21 % of diagonal p ermeabilit y v ariance not captured by Kozen y-Carman scaling (Appendix A.3 ): the geometric complexit y that FiLM conditions on p orosity , but that the spatial bac kb one must supply . 23 (a) AR=1.12 ϕ =0.76 K true pred |e|% xx 0.5174 0.5358 3.6 xy -0.0243 -0.0083 65.9 yx -0.0242 -0.0083 65.9 yy 0.4883 0.4538 7.1 MAE=0.0212 sym=8.5e-06 σ=0.0066 (b) AR=1.31 ϕ =0.85 K true pred |e|% xx 0.7292 0.7053 3.3 xy -0.0969 -0.0912 5.9 yx -0.0969 -0.0912 5.9 yy 0.7347 0.7266 1.1 MAE=0.0109 sym=8.9e-06 σ=0.0135 AR=1.52 ϕ =0.63 K true pred |e|% xx 0.0906 0.0955 5.4 xy 0.0044 0.0065 47.3 yx 0.0044 0.0065 47.3 yy 0.1369 0.1326 3.2 MAE=0.0034 sym=5.6e-07 σ=0.0052 AR=1.78 ϕ =0.89 K true pred |e|% xx 1.3989 1.3390 4.3 xy 0.0322 0.0706 119.1 yx 0.0322 0.0706 119.2 yy 0.7902 0.8106 2.6 MAE=0.0393 sym=1.1e-05 σ=0.0261 AR=2.07 ϕ =0.84 K true pred |e|% xx 0.8527 0.8069 5.4 xy -0.0064 -0.0052 18.7 yx -0.0064 -0.0052 18.6 yy 0.4118 0.4077 1.0 MAE=0.0131 sym=6.3e-06 σ=0.0142 (c) AR=2.42 ϕ =0.53 K true pred |e|% xx 0.0535 0.0646 20.9 xy 0.0192 0.0195 1.8 yx 0.0192 0.0195 1.8 yy 0.0427 0.0522 22.4 MAE=0.0054 sym=5.7e-07 σ=0.0064 (d) AR=2.82 ϕ =0.62 K true pred |e|% xx 0.1086 0.1240 14.2 xy -0.0005 0.0184 3832.6 yx -0.0005 0.0184 3824.1 yy 0.0385 0.0446 15.9 MAE=0.0148 sym=1.1e-06 σ=0.0056 AR=3.29 ϕ =0.61 K true pred |e|% xx 0.1928 0.1737 9.9 xy -0.0356 -0.0504 41.6 yx -0.0356 -0.0504 41.6 yy 0.0712 0.0889 24.9 MAE=0.0166 sym=1.2e-07 σ=0.0064 AR=3.83 ϕ =0.63 K true pred |e|% xx 0.0874 0.0896 2.5 xy 0.0556 0.0556 0.1 yx 0.0556 0.0556 0.1 yy 0.1043 0.0998 4.4 MAE=0.0017 sym=5.4e-07 σ=0.0068 AR=4.47 ϕ =0.33 K true pred |e|% xx 0.1422 0.1302 8.4 xy 0.0288 0.0291 0.9 yx 0.0288 0.0291 0.9 yy 0.0411 0.0515 25.2 MAE=0.0057 sym=9.5e-07 σ=0.0065 0.1 0.2 0.3 0.4 0.5 0.6 log 10 (AR) Figure 8: Anisotropy-stratified representativ e predictions for Phase 4. T en samples selected from log-spaced AR bins ( AR = 1 . 11 to 5 . 87 ) span the 5th–95th p ercentile of the test AR distribution. Eac h panel displa ys the binary microstructure, AR, p orosit y ϕ , ground-truth tensor, and Phase 4 prediction. A t near-isotropic AR ( ≈ 1 . 11 ), off-diagonal coupling is small and residual prediction uncertaint y is concen trated in magnitude rather than direction—consistent with the v ariance-normalization effect in Section 3.2.2 ; ∆ R 2 at this bin is 0.181, down from 0.194 in Phase 2. At mo derate AR ( 1 . 5 – 3 . 0 ), coupling magnitude and tensor orientation are resolved accurately , with ∆ R 2 < 0 . 015 . At high AR ( ≈ 5 . 87 ), the dominant flo w axis is correctly identified; off-diagonal R 2 = 0 . 970 in this bin. F ull stratified metrics for all ten AR bins are in App endix I . The off-diagonal result is equally significan t. Predicted off-diagonal correlations with ϕ are r K xy = r K yx = 0 . 019 ( p > 0 . 2 )—statistically indistinguishable from the ground-truth correlations and from zero. A well-designed arc hitecture should pro duce p orosit y-indep endent off-diagonal predictions, b ecause cross-directional coupling arises from geometric anisotropy , not pore v olume fraction. The FiLM conditioning resp ects this by modulating only the magnitude-related bac kb one features while leaving geometric orien tation features unaffected, and the data confirm it has succeeded: the mo del has learned to decouple magnitude from anisotropy rather than conflating t wo ph ysically distinct mechanisms. Absolute prediction errors grow with ϕ ( r = 0 . 343 , slop e 0 . 041 ), but relativ e errors improv e: quartile analysis giv es Q1 (low p orosit y) MAE = 0 . 0072 against mean K xx ≈ 0 . 05 , yielding 14 % relativ e error, while Q4 (high porosity) has MAE = 0 . 0215 against mean K xx ≈ 0 . 52 , yielding 4 %. The same prop ortionalit y holds for off-diagonal components, where Q1 and Q4 MAEs are b oth appro ximately 5–8 % of the corresponding diagonal magnitudes—confirming balanced relativ e accuracy across tensor comp onen ts throughout the full p orosit y sp ectrum. 3.4.6 Ph ysical Constraint Enforcement Figure 10 v alidates the ph ysics-aw are loss function’s effectiveness across the full test distribution. Ph ysical v alidity is enforced to its highest fidelity in Phase 4, and the impro vemen t across phases follows a clear mec hanism. The symmetry error ε sym = 3 . 95 × 10 − 7 (95 % BCa CI: [3 . 82 , 4 . 10] × 10 − 7 ) is 3.5-fold low er than Phase 2 ( 1 . 39 × 10 − 6 ) and 1.9-fold low er than Phase 3 24 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Porosity ϕ 10 −2 10 −1 10 0 K xx (log scale) true: r = 0.908 pred: r = 0.918 p < 10 −15 (a) true pred 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Porosity ϕ 10 −1 10 0 Tr( K ) = K xx + K yy (log scale) r = 0.947 p < 10 −15 (b) true pred 0.3 0.4 0.5 0.6 0.7 0.8 Porosity ϕ (bin centre) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Mean K xx σ =0.03 σ =0.29 (c) true pred ± 1 σ 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Porosity ϕ 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Absolute error | K true xx − K pred xx | r = 0.343 p < 10 −100 (d) slope=0.041 Figure 9: Learned p orosit y-p ermeabilit y relationship for Phase 4 ( N = 4 , 000 test samples). (a) K xx v ersus ϕ on logarithmic scale: predicted and true v alues sho w nearly iden tical Pearson correlations ( r pred = 0 . 799 , r true = 0 . 795 , b oth p < 10 − 15 ); order-of-magnitude scatter at fixed ϕ confirms that p orosit y alone is insufficient for accurate prediction. (b) P ermeability trace ( K xx + K y y ) v ersus ϕ : higher correlation ( r = 0 . 826 , p < 10 − 15 ), confirming that total transport capacity scaling with p ore v olume fraction is well captured. (c) Binned mean K xx across ten equally spaced ϕ bins: predicted means track ground truth with mean absolute deviation < 0 . 005 in 9 of 10 bins, v alidating contin uous structure-prop ert y learning without regime-sp ecific discontin uities. (d) Absolute prediction error for K xx v ersus ϕ : p ositiv e correlation ( r = 0 . 343 , slop e 0 . 041 ) reflects heteroscedastic absolute errors; relativ e errors are homoscedastic or improv e with p orosity (Q1: 14 %, Q4: 4 %), confirming that larger high- ϕ errors reflect dynamic range, not degraded accuracy . ( 7 . 69 × 10 − 7 ): the Phase 3 step reflects doubling the symmetry penalty weigh t to λ sym = 0 . 2 , while the additional Phase 4 gain reflects SW A and EMA smo othing the optimization tra jectory around a more symmetric loss minimum. All 4,000 predicted tensors are strictly positive-definite, with the 5th-p ercen tile minimum eigenv alue of 2 . 4 × 10 − 2 pro viding a 24-fold margin ab o ve the p ositivit y threshold ε = 0 . 001 in L pos (Equation 17 ). The condition num ber distribution (median κ = 1 . 59 , p 99 = 6 . 77 , maxim um κ = 23 . 5 ) confirms that all predicted tensors are numerically stable for direct use in reservoir sim ulation. Physical realizabilit y emerges as a natively learned prop ert y rather than an externally enforced correction—no p ost-ho c pro jection or symmetrization is applied at any p oin t. 25 −0.4 −0.2 0.0 0.2 0.4 K xy predicted −0.4 −0.2 0.0 0.2 0.4 K yx predicted MAE = 3.95e-07 max = 5.36e-06 r = 1.000 slope = 0.999996 R 2 = 1.000 (a) perfect symmetry −9.0 −8.5 −8.0 −7.5 −7.0 −6.5 −6.0 −5.5 log 10 (symmetry error | K xy − K yx | ) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Density >1e-06: 233 (5.8%) (b) mean = 3.95e-07 10 −6 threshold 0.0 0.5 1.0 1.5 2.0 2.5 Eigenvalue 0 1 2 3 4 Density λ min mean = 0.295 λ max mean = 0.449 PD (100%): 4000/4000 (c) λ min λ max PD boundary 10 0 10 1 10 2 Condition number κ ( K ) 0.0 0.2 0.4 0.6 0.8 1.0 CCDF P ( κ > x ) κ < 10 : 3989/4000 (99.7%) 10 < κ < 100 : 11 (0.3%) κ > 100 : 0 (0.0%) (d) median κ = 1.59 κ = 10 κ = 100 Figure 10: Physical constraint v alidation for Phase 4 ( N = 4 , 000 test tensors). (a) Predicted K xy v ersus K y x ; linear regression yields slop e 0 . 999996 , intercept 1 . 16 × 10 − 7 , confirming near-p erfect Onsager recipro city . (b) Symmetry error distribution (log scale): mean ε sym = 3 . 95 × 10 − 7 ; 95 % of predictions b elow 1 . 14 × 10 − 6 ; maximum violation 5 . 60 × 10 − 6 ; zero samples exceed 10 − 5 . (c) Minim um ( λ min ) and maxim um ( λ max ) eigen v alue distributions: λ min spans [2 . 27 × 10 − 3 , 1 . 51] (mean 0 . 295 ); 100 % of samples satisfy λ min > 0 . (d) Condition n umber distribution: median κ = 1 . 59 , 99.72 % w ell-conditioned ( κ < 10 ), maximum κ = 23 . 5 ; zero pathologically conditioned tensors. 3.4.7 Predictiv e Uncertaint y Figure 11 presen ts Mon te Carlo Drop out uncertain ty quan tification ( T = 100 sto chastic forward passes), which accompanies every Phase 4 prediction with a p er-sample confidence estimate. T wo asp ects of the uncertaint y analysis hav e direct op erational consequences. First, the uncertain ty-error Sp earman correlations ( ρ s = 0 . 229 – 0 . 304 ) confirm that elev ated MC-Drop out v ariance reliably flags harder predictions: samples with uncertaint y ab ov e the 95th-p ercen tile threshold ( 3 . 3 × 10 − 2 for diagonal, 1 . 3 × 10 − 2 for off-diagonal) represen t fewer than 5 % of the test set and can b e selectivel y redirected to LBM verification, preserving the 10 3 – 10 4 × computational speedup for the v ast ma jority of predictions. Second, and counter-in tuitiv ely , off-diagonal comp onen ts ac hieve higher uncertain ty-error correlation than diagonal ( ρ s = 0 . 304 v ersus 0 . 229 – 0 . 235 ; Appendix J ). Because off-diagonal predictions ha ve larger relativ e v ariability , the MC-Drop out estimator has more signal to exploit—uncertaint y is most discriminative precisely for the components that are hardest to predict, which is the appropriate b eha vior for a deplo yment-orien ted confidence system. The progressiv e impro vemen t in MCE across phases ( 0 . 3786 → 0 . 3754 → 0 . 3724 ) confirms that SW A and EMA reduce ov erconfiden t single-p oin t 26 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Mean MC-Dropout σ (decile) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Mean absolute error (decile) (a) Uncertainty reliability by decile K xx K xy K yx K yy σ = error (ideal) 0.00 0.02 0.04 0.06 0.08 0.10 Predictive std σ 0.0 0.2 0.4 0.6 0.8 1.0 CCDF P ( σ > x ) 1.263e-02 5.148e-03 5.148e-03 1.266e-02 (b) K xx K xy K yx K yy 0.00 0.01 0.02 0.03 0.04 0.05 0.06 MC-Dropout σ 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Absolute error (c) K xx ρ =0.23 K xy ρ =0.30 K yx ρ =0.30 K yy ρ =0.24 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Spearman ρ (uncertainty – error correlation) K xx K xy K yx K yy σ =1.26e-02 R 2 =0.9967 σ =5.15e-03 R 2 =0.9756 σ =5.15e-03 R 2 =0.9756 σ =1.27e-02 R 2 =0.9967 (d) UQ reliability per component Figure 11: MC-Drop out predictive uncertaint y for Phase 4 ( T = 100 passes, N = 4 , 000 test samples). (a) Calibration curve: nominal versus empirical cov erage; MCE = 0 . 3724 , improving monotonically from Phase 2 ( 0 . 3786 ) through Phase 4. (b) P er-component uncertain ty distributions (CCDF); diagonal median uncertaint y ˜ σ ≈ 9 . 1 × 10 − 3 ; off-diagonal ˜ σ ≈ 3 . 7 × 10 − 3 , prop ortionate to component dynamic range. (c) MC-Dropout standard deviation versus absolute prediction error; Spearman ρ s ( σ, | ε | ) = 0 . 229 (diagonal) to 0 . 304 (off-diagonal), confirming statistically meaningful error-ranking capabilit y . (d) Spatial uncertaint y maps for representativ e test samples. estimates, consistent with their kno wn effect of flattening the loss landscape. 3.4.8 Statistical Robustness and Anisotrop y Stratification Figure 12 presen ts the bo otstrap confidence portrait together with cross-phase anisotrop y- stratified p erformance curves. The b o otstrap p ortrait closes the statistical argument for the progressiv e framew ork. Confi- dence interv als are narro wer than the b et ween-phase ∆ R 2 impro vemen ts ( 0 . 002 – 0 . 003 p er step) b y more than an order of magnitude, confirming that the gains are real rather than artifacts of fa vorable test-set comp osition. P anel (d) completes the anisotropy narrativ e: Phase 4 reduces off-diagonal RRMSE at every AR lev el, but the improv emen t is largest in the mo derate-AR regime where coupling magnitudes are large enough to resolv e y et to o subtle to hav e b een well learned under Phase 2’s simpler training proto col—precisely the regime targeted b y Phase 3’s augmen tation enric hmen t, and consolidated by Phase 4’s ensem ble stabilization. Extended p er-component b o otstrap tables and full stratified statistics are in App endix I . 27 0 10 20 30 40 50 60 70 Relative CI width (%) = (CI hi − CI lo ) / |observed| R2Kxx R2Kyy R2 R2Kyx R2Kxy MAE MAEKyx MAEKxy SymmetryError MAEKyy MAEKxx RMSEKxy RMSEKyx RRMSEKyx RRMSEKxy RRMSE RRMSEKxx MAPEKxx RMSE RMSEKxx MAPEKyy MSEKxy MSEKyx MSE MSEKxx RRMSEKyy RMSEKyy MSEKyy MAPE MAPEKxy MAPEKyx (a) Bootstrap metric stability (lower = more stable) 1% threshold K xx K xy K yx K yy 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000 R 2 0.99674 ±0.00041 0.97583 ±0.00217 0.97583 ±0.00216 0.99672 ±0.00090 (b) Per-component R 2 with 95% BCa bootstrap CI K xx K xy K yx K yy 0 5 10 15 20 25 RRMSE (%) 5.25% 24.43% 24.43% 5.28% (c) Per-component RRMSE with 95% BCa CI diag mean 5.26% off-diag mean 24.43% 1 2 3 4 5 6 Anisotropy ratio (bin centre) 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000 R 2 global mean gap = 0.0391 (d) R 2 diagonal – off-diagonal gap across anisotropy diagonal ( K xx , K yy ) off-diagonal ( K xy , K yx ) R² gap gap 0.025 0.050 0.075 0.100 0.125 0.150 0.175 Diagonal − off-diagonal R 2 gap Figure 12: Bo otstrap statistical p ortrait and anisotropy-stratified p erformance across phases. (a) Rel- ativ e 95 % BCa CI width (tornado chart): R 2 CIs span < 0 . 0022 , RRMSE CIs span < 1 . 01 %, symmetry error CI spans [3 . 82 , 4 . 10] × 10 − 7 —all narrow er than the b et ween-phase p erformance differences by more than one order of magnitude. (b) Off-diagonal R 2 v ersus AR bin median for Phase 4 (10 equi-p opulated bins, n = 400 each): gap is largest at near-isotropic AR ( ∆ R 2 = 0 . 181 ) and narro ws sharply to < 0 . 015 for AR > 1 . 5 , with a mo dest secondary widening at high AR ( ∆ R 2 = 0 . 015 at AR ≈ 5 . 87 ). (c) Comp onent-wise b ootstrap R 2 distributions: narro w, symmetric densities for all four comp onen ts; 95 % BCa CIs are R 2 K xx ∈ [0 . 9963 , 0 . 9971] and R 2 K xy ∈ [0 . 9736 , 0 . 9779] . (d) Cross-phase off-diagonal RRMSE versus AR bin: Phase 4 reduces error at ev ery AR level, with the largest absolute impro vemen t in the mo derate-AR regime ( 1 . 5 < AR < 3 ). 3.4.9 Computational Efficiency and T ransferability Single-sample inference—including the full test-time D4 augmentation ensem ble (8 forward passes with inv erse tensor transforms; Appendix F )—requires approximately 120 ms on a single NVIDIA R TX 6000 Ada GPU, giving a sp eedup of 10 3 – 10 4 × relativ e to lattice-Boltzmann sim ulation at equiv alent resolution and reducing full 4,000-sample ev aluation from months to approximately 8 min utes. The frozen-backbone design also makes Phase 4 the natural starting p oin t for geological domain transfer. With 118.64 M backbone parameters protected and only 0.33 M task-sp ecific parameters to retrain—0.28 % of the total parameter budget—adapting the framework to new p orous-media classes (carb onates, shales, fractured media) requires substantially less lab eled data and compute than training from scratch. The physics-informed head further ensures that p orosity conditioning generalizes correctly to new geological domains provided the p orosity-permeability relationship holds, whic h is true across a wide class of sedimentary ro c ks. 28 3.5 Comparativ e P erformance Analysis The closest a v ailable b enchmark for a direct comparison is the concurren t study of V argdal et al. [ 76 ], whic h ev aluated six architectures—ResNet-50/101, ViT-T/S16, and Con vNeXt- Tin y/Small—on the identical p orous-media dataset with the same lattice-Boltzmann ground-truth lab els. Their training proto col used D4-equiv arian t augmen tation, cosine learning rate sc heduling with linear warm up, Adam W optimization, and a standard MSE loss applied end-to-end from random initialization. The shared dataset and lab els make this comparison metho dologically rigorous: observed p erformance differences can b e attributed to arc hitecture and training strategy rather than data prov enance. 3.5.1 Benc hmark Architecture Results V argdal et al. rep ort b est performance with ConvNeXt-Small, achieving R 2 = 0 . 9946 at ep o ch 369 of a 400-ep o ch run. Con vNeXt-Tiny reached R 2 = 0 . 9943 (498 epo c hs), ResNet-101 R 2 = 0 . 9929 (380 ep o c hs), ResNet-50 R 2 = 0 . 9908 (436 ep o c hs), and ViT-S16 R 2 = 0 . 9931 (995 ep o c hs)—requiring roughly 2.7 × more training than Con vNeXt-Small for comparable p erformance, consistent with the weak er spatial inductiv e bias of pure transformers on image data. ViT-T16 lagged substantially at R 2 = 0 . 9667 (990 ep o c hs), illustrating the capacit y-versus- inductiv e-bias trade-off for small transformer v arian ts. Their work identifies t wo structural limitations of single-family architectures: conv olutional netw orks conv erge efficiently but are constrained in their long-range receptive field, while transformers capture global dep endencies at quadratic computational cost and without built-in spatial lo calit y priors. These are precisely the limitations that motiv ated the MaxViT bac kb one choice in Section 2.2 : its h ybrid block-local and grid-global m ulti-axis attention achiev es O ( H W ) complexit y while maintaining a global receptiv e field, addressing b oth failure mo des simultaneously . T o further v alidate the comparison, the publicly a v ailable V argdal et al. c heckpoints [ 77 ] w ere ev aluated on the presen t test split. Performance was 6–11 % low er than originally reported (v arying b y component and metric), a discrepancy attributable to differences in train-test partitioning across the t wo studies despite use of the same underlying dataset. All cross-study comparisons b elow should b e interpreted in ligh t of this partitioning uncertain ty; within-study phase comparisons (Phases 2–4) are unaffected as they share an identical fixed split. 3.5.2 Metho dological Con tributions and Performance Attribution The Phase 4 framew ork achiev es v ariance-weigh ted R 2 = 0 . 9960 (comp onen t-av eraged R 2 = 0 . 9863 , 95 % BCa CI [0 . 9852 , 0 . 9874] ). A direct n umerical comparison with V argdal et al. ’s aggregate R 2 requires care b ecause the tw o studies use differen t av eraging con ven tions: their metric w eights all four tensor elemen ts equally , whereas the v ariance-w eighted formulation adopted here—motiv ated in Section 2.6.2 —weigh ts comp onents in prop ortion to their v ariance, naturally emphasizing the diagonal elemen ts that dominate transp ort behavior and are most ph ysically consequential for engineering applications. The more meaningful p erformance story lies in the within-study progressive improv emen t and the comp onen t-wise breakdo wn, neither of whic h is obscured by av eraging conv en tion. F our metho dological contributions distinguish the presen t framew ork from the V argdal et al. baseline and from standard single-phase transfer learning. The first is the MaxViT arc hitecture itself. Hybrid blo ck-local and grid-global atten tion resolv es both the limited receptiv e field of pure CNNs and the quadratic cost of standard ViT s, capturing the lo cal p ore-throat geometry that go verns diagonal permeability magnitude and the long-range spatial orientation statistics that determine off-diagonal coupling—a feature hierarch y that single-family arc hitectures cannot sim ultaneously optimize (Section 2.2 ). The second is the ph ysics-aw are comp osite loss function (Section 2.4 ). Differen tiable symmetry and p ositive-definiteness p enalt y terms enforce Onsager recipro cit y and thermo dynamic v alidit y 29 as nativ ely learned prop erties rather than p ost-hoc corrections, achieving ε sym = 3 . 95 × 10 − 7 and 100 % p ositivit y across all 4,000 test predictions. Standard MSE training relies on the implicit ph ysical v alidity of the training lab els; explicit constraint enforcemen t eliminates the p ossibilit y of unphysical predictions at deplo yment, which matters for an y downstream simulation that requires strictly v alid tensors. The third is the off-diagonal loss weigh ting w o = 1 . 5 introduced in Phase 3 (Section 2.5.2 ). The 5.1 × diagonal-to-off-diagonal standard deviation ratio (App endix A.1 ) causes naiv e MSE to under-w eight off-diagonal gradien ts, and the standard benchmark proto col do es not correct for this. T argeted rew eighting reduced the diagonal-to-off-diagonal R 2 gap from ∆ R 2 = 0 . 0230 (Phase 2) to 0 . 0209 (Phase 4), with the most structurally significant improv ement in the high-anisotropy subset where this weigh ting imbalance w as most damaging. The fourth is FiLM-based p orosity conditioning (Section 2.2.2 ). By encoding the strong p orosit y-to-p ermeabilit y scaling relationship ( ρ s ≈ +0 . 940 ; App endix A.3 ) as an arc hitectural prior rather than exp ecting the bac kb one to discov er it from statistical co-v ariation in the training data, the frozen-backbone head ac hieves measurable gains ( ∆ R 2 ≈ 0 . 0006 from Phase 3) with only 0.33 M additional parameters—0.28 % of the total parameter budget. The v alidation in Section 3.4.5 confirms that the conditioning is physically correct: predicted p orosity-off-diagonal correlations ( r ≈ 0 . 019 ) are statistically indistinguishable from the ground-truth correlations and from zero, demonstrating that the model has learned to decouple magnitude scaling from geometric anisotropy rather than conflating tw o physically distinct mechanisms. 3.5.3 P ositioning Within the Broader Literature Bey ond CNN- and transformer-based arc hitectures, graph neural netw ork (GNN) approac hes that explicitly represen t p ore-netw ork top ology—no des as pore b odies, edges as throats—ha ve recen tly b een applied to p ermeabilit y prediction [ 30 – 32 , 85 , 86 ]. These methods carry a natural inductiv e bias for flo w: throat conductances and connectivity are enco ded directly in the graph structure, whic h can facilitate learning of long-range flo w path wa ys without spatial atten tion mec hanisms. The primary cost is that GNN approaches require explicit p ore-net work extraction as a prepro cessing step, introducing segmen tation errors and uncertainties that propagate to the final prediction. The pixel-lev el input approach adopted here a voids this extraction dep endency , op erating directly on binary images and allowing the backbone to learn an implicit pore-netw ork represen tation through hierarchical conv olutional and atten tion op erations. A direct quantitativ e comparison with GNN-based metho ds on the same dataset is not curren tly av ailable; such a b enc hmark—particularly targeting off-diagonal coupling prediction, where long-range connectivity is most critical—is identified as high-priority future work. In the broader con text of learning-based p ermeability prediction, the presen t framework adv ances the state of the art along three axes that are individually addressed in the literature but not previously combined: full 2 × 2 tensor prediction capturing anisotropic coupling across three orders of magnitude, ph ysics-informed architecture in tegrating kno wn domain constrain ts, and a systematic progressive training metho dology with rigorous ablation studies that quan tify individual comp onen t con tributions. The frozen-bac kb one design further pro vides a natural path wa y for domain extension: the 118.64 M-parameter bac kb one, pretrained on ImageNet and sp ecialized through Phases 2–3, can b e transferred to new p orous-media classes (carb onates, shales, fractured media) b y retraining only the compact physics-informed head, substantially reducing the labeled-data requiremen t for geological domain adaptation. 3.6 Limitations and F uture Directions The progressive training framework achiev es consistent impro vemen ts across all reported metrics, but four structural limitations b ound the scop e of the current results and define the highest- priorit y directions for future w ork. 30 3.6.1 V alidation on Real Geological Samples All training and ev aluation are performed on synthetically generated sandstone microstructures. Real micro-CT images differ from the syn thetic corpus in wa ys that are not merely statistical: m ulti-mineralogy with distinct phase contrasts, diagenetic cemen ts and microfractures absen t from Gaussian-simulation geometries, imaging artefacts suc h as partial-volume effects and b eam hardening, and multi-scale heterogeneit y spanning p ore to core scales. The degree to whic h ImageNet-pretrained and syn thetically fine-tuned features generalize to micro-CT data is presen tly unkno wn. The frozen-backbone paradigm provides a structured pathw a y for addressing this gap. Pre- liminary exp eriments on a small num b er of real-sample analogues suggest that retraining the 0.33 M-parameter ph ysics-informed head on limited-set lab elled micro-CT images achiev es ac- ceptable transfer while preserving the geometric feature hierarch y built in Phases 2–3, but these results are not sufficien tly systematic to rep ort quan titatively . V alidation on b enc hmark datasets with indep enden tly measured p ermeabilit y v alues is the most critical outstanding requirement b efore industrial deploymen t. Un til such v alidation is completed, demonstrated accuracy on these naturally-inspired and physically-constrained test set should b e understo od as evidence of arc hitectural capabilit y rather than op erational readiness. 3.6.2 Off-Diagonal Residual Gap and Geometric Symmetry Despite consistent improv emen t across phases, the diagonal-to-off-diagonal R 2 gap of 0.0209 and RRMSE ratio of appro ximately 4 . 6 × reflect a gen uine representational c hallenge. Off-diagonal coupling dep ends on long-range orien tation statistics at the REV scale, and—as established in Section 3.2.2 —prediction difficult y is concentrated in near-isotropic samples where the coupling signal is w eakest relativ e to noise. Sev eral arc hitectural directions could reduce this residual gap: deformable atten tion mechanisms that adapt receptiv e fields to p ore alignmen ts, explicit orien tation tensor fields derived from the structure tensor as additional input channels, or h ybrid approac hes coupling learned representations with effectiv e-medium or pore-netw ork homogenization mo dels. A related limitation concerns the treatment of symmetry . The D4 augmentation strategy ac hieves strong empirical equiv ariance— verified by the near-machine-precision symmetry errors in Section 3.4.6 —but does not provide architectural equiv ariance guaran tees at inference time. F or a giv en test sample, the prediction is not guaran teed to transform correctly under an arbitrary D4 group elemen t; only the exp ectation o ver the training distribution is equiv ariant. Group-equiv ariant con volutional netw orks [ 70 , 72 ] and equiv ariant transformer arc hitectures pro vide exact equiv ariance b y construction, at the cost of discarding the ImageNet pretraining that underpins Phase 2. Whether the gain from exact arc hitectural equiv ariance w ould out weigh the loss of pretrained features—particularly for off-diagonal prediction where global in tegration is most critical—is an op en question and a high-v alue ablation target. The D4 symmetry group is also sp ecifically appropriate for 2D square lattices. Geological materials with tilted lamination, cross-b edding, or preferred grain orientation p ossess lo wer- symmetry or asymmetric structures for which D4 augmen tation would corrupt the directional information the model m ust learn. Extending the framew ork to strongly anisotropic and low- symmetry media requires symmetry-group identification as a prepro cessing step, selecting the appropriate augmentation strategy p er sample rather than applying D4 univ ersally . 3.6.3 Extension to 3D Microstructures Practical applications of p orous media sciences including reservoir c haracterization require the full 3 × 3 p ermeability tensor, whic h 2D images cannot capture without strong isotropy assumptions that many geological formations violate. Extension to 3D introduces a cubic scaling challenge: a 128 3 v olumetric image contains 128 × more vo xels than a 128 2 image, making direct application 31 of the MaxViT bac kb one memory-prohibitive on curren t hardw are. Three architectural strategies merit inv estigation. Hierarchical slice-based prediction—applying the present 2D backbone to orthogonal planes and aggregating via a learned 3D fusion mo dule— scales linearly in the n umber of slices and requires no mo dification to the frozen bac kb one. Sparse v olumetric conv olutions (e.g., using Minko wskiEngine) reduce memory fo otprin t for v oid-dominated media by operating only on o ccupied v oxels. F ull SO(3)-equiv arian t architectures extend the D4 augmentation principle to 3D rotational symmetry , at the cost of substantially increased architectural complexit y . In eac h case, the frozen-backbone transfer paradigm suggests that the geometric feature hierarch y learned from the large 2D syn thetic corpus can seed 3D training, reducing the v olumetric DNS lab el requiremen ts relativ e to training from scratch; of the three strategies, hierarc hical slice-based prediction is the most immediately tractable, requiring no mo dification to the frozen 2D backbone and scaling linearly with image depth rather than cubically . 3.6.4 Uncertain t y Quantification and Cross-V alidation The MC-Drop out uncertain ty estimates rep orted in Section 3.4.7 quantify epistemic uncertain ty arising from finite training data but do not capture aleatoric uncertaint y—the irreducible noise inheren t in the lattice-Boltzmann lab els themselves, particularly near symmetry error and n umerical resolution limits. Deep ensembles with div erse random initializations or conformal prediction framew orks would pro vide a more comprehensive uncertaint y decomp osition and tigh ter co verage guarantees for applications requiring rigorous probabilistic inference. Separately , the ev aluation employs a single fixed train-v alidation-test split, providing p oint estimates with b ootstrap-quantified sampling v ariability but without mo del-level cross-v alidation. Bo otstrap resampling (Section 3.4.8 ) confirms that the test-set p erformance is statistically stable, but k -fold cross-v alidation w ould additionally quan tify sensitivit y to partitioning choice—a relev ant consideration given the 6–11 % partition-dep enden t discrepancy observ ed when ev aluating the published benchmark results on the present split (Section 3.5.1 ). 3.7 Application Scop e and Deplo ymen t Considerations The combination of predictive accuracy ( R 2 = 0 . 9960 , RRMSE = 7 . 55 %), guaranteed ph ysical v alidity , and 10 3 – 10 4 × computational sp eedup o v er direct numerical sim ulations suc h as lattice- Boltzmann sim ulation positions the framew ork as a practical surrogate for three classes of w orkflows in subsurface characterization—subject to the real-sample v alidation iden tified in Section 3.6.1 . 3.7.1 High-Throughput Core Analysis Mo dern micro-CT facilities generate h undreds to thousands of p ore-scale images p er scanning campaign. Direct DNS-based p ermeability characterization creates a severe throughput b ottle- nec k: a single 8-hour scanning session would require months of contin uous LBM computation for complete tensor estimation, making real-time feedbac k impossible. At approximately 120 ms p er sample on a single NVIDIA R TX 6000 Ada GPU, the presen t framew ork reduces full-dataset ev aluation for a 4,000-sample campaign to approximately 8 min utes—fast enough to supp ort in-session op erator decisions such as acquisition parameter adjustment, region-of-in terest iden ti- fication, and early termination when target statistical co verage is achiev ed. This turnaround is esp ecially consequential for exp ensive deep-subsurface core material, where scanning time is finite and the samples are non-renew able. 3.7.2 Large-Scale Uncertain ty Quantification Subsurface flo w simulations for CO 2 sequestration, hydrogen storage, and geothermal energy dev elopment require quantifying uncertaint y ov er p ermeabilit y fields arising from geological 32 heterogeneit y . Mon te Carlo metho ds that adequately sample p osterior distributions typically demand 10 4 – 10 6 forw ard mo del ev aluations, which is computationally prohibitiv e with DNS. The millisecond-scale inference enabled here makes 10 5 -sample Monte Carlo ensembles feasible on a single GPU in a few hours—a task that would require mon ths via LBM—unlo cking probabilistic risk assessment w orkflows at spatial scales and ensem ble sizes previously inaccessible. The MC-Drop out confidence estimates (Section 3.4.7 ) further pro vide per-sample uncertain ty flags, enabling adaptive strategies that reserve LBM v erification for the < 5 % of predictions where the mo del’s own confidence estimate falls b elo w threshold. 3.7.3 Geological Domain T ransfer The frozen-backbone arc hitecture provides a structured path wa y for extension to p orous media b ey ond the syn thetic sandstone training distribution. The 118.64 M-parameter backbone enco des geometric primitives—pore b o dies, throat constrictions, tortuosity and connectivit y patterns— that gov ern permeability regardless of lithology . Adapting to a new geological class (carb onates, shales, fractured media) or to engineered p orous materials (catalyst supp orts, battery electro de microstructures, tissue scaffolds) requires retraining only the 0.33 M-parameter ph ysics-informed head on mo dest lab eled datasets, lea ving the pretrained feature hierarc hy in tact. The FiLM conditioning further ensures that porosity-scaling priors transfer correctly to an y material class for which a monotone p orosity-permeability relationship holds—a condition satisfied across a wide range of sedimentary and engineered p orous media. The principal prerequisite for realizing these application b enefits remains experimental v alidation on real geological samples. The accuracy , efficiency , and physical constraint guarantees demonstrated on the syn thetic b enchmark establish the architectural foundations; translating them into op erational to ols requires the systematic micro-CT v alidation study identified in Section 3.6.1 as the field’s most critical near-term priority . 4 Conclusions A physics-informed framew ork for full permeability tensor prediction from binary p orous media images has b een developed and systematically ev aluated, addressing three persistent limitations of existing approac hes: the arc hitectural tension b et ween lo cal feature resolution and global connectivit y integration, the weak enforcement of physical constraints in training ob jectiv es, and the training inconsistencies introduced by augmen tation strategies that transform images without corresp ondingly transforming tensor lab els. The MaxViT h ybrid CNN-T ransformer arc hitecture resolves the first limitation through m ulti-axis atten tion that sim ultaneously captures grain-scale p ore-throat geometry via blo c k- lo cal op erations and REV-scale connectivity statistics via grid-global op erations—the t wo spatial scales on whic h p ermeabilit y is physically determined. A differen tiable ph ysics-aw are loss function incorp orating symmetry and p ositive-definiteness p enalty terms enforces Onsager recipro cit y and thermo dynamic v alidit y as natively learned prop erties, achieving mean symmetry error ε sym = 3 . 95 × 10 − 7 and 100 % p ositive-definiteness across 4,000 test predictions without an y p ost- ho c pro jection. A D4-equiv arian t augmen tation strategy with consistent tensor transformation eliminates the lab el-image mismatch presen t in naiv e augmentation, pro viding the ph ysically correct symmetry prior throughout training. The three-phase progressiv e training curriculum deliv ers structured, attributable p erformance gains. Phase 2 establishes the sup ervised baseline ( R 2 = 0 . 9843 ) and diagnostic foundation, rev ealing through anisotrop y stratification that off-diagonal prediction difficult y is concen trated in near-isotropic samples—where weak coupling signals are hardest to distinguish from noise—rather than in the strongly anisotropic regime where coupling magnitudes are large. Phase 3 targets this failure mo de through augmentation enric hment and off-diagonal loss prioritization, reducing the 33 high-anisotrop y ∆ R 2 gap b y 37 % and tightening symmetry enforcement by 45 %. Phase 4 enco des the strong p orosit y-to-p ermeabilit y scaling relationship ( ρ s ≈ +0 . 940 ) as an explicit architectural prior via FiLM conditioning of the frozen backbone, achieving v ariance-weigh ted R 2 = 0 . 9960 (comp onen t-av eraged R 2 = 0 . 9863 , 95 % BCa CI [0 . 9852 , 0 . 9874] ) and KGE = 0 . 9942 —a 33 % reduction in unexplained v ariance relative to the Phase 2 baseline. Off-diagonal K GE impro ves from 0.176 to 0.761 across the three phases, marking the near-elimination of the v ariabilit y under-disp ersion identified at baseline. Critically , ev ery metric across all 35 rep orted rows of the extended comparison table impro ves monotonically from Phase 2 through Phase 4, confirming that the staged metho dological contributions are complementary rather than comp eting. The framework deliv ers a mark ed computational sp eedup ov er direct numerical simulations (DNS) at equiv alen t resolution, reducing complete ev aluation of a 4,000-sample dataset from w eeks-months to approximately 8 minutes on a single GPU. The frozen-backbone design further enables data-efficient transfer to new geological classes b y retraining only the 0.33 M-parameter ph ysics-informed head—0.28 % of the total parameter budget—while preserving the pretrained geometric feature hierarc hy . T wo ca veats b ound the curren t scop e. All training and ev aluation are conducted on naturally- inspired and realistically-constrained synthetically-generated sandstone microstructures; v ali- dation on real micro-CT images with independently measured p ermeabilit y v alues is the most critical outstanding requiremen t b efore op erational deplo yment. Extension to the full 3 × 3 p ermeabilit y tensor required for diverse p orous media applications remains an op en problem, with hierarchical slice-based prediction iden tified as the most tractable near-term path, requiring no mo dification to the frozen 2D backbone. More broadly , the results demonstrate three transferable principles for ph ysics-informed scien tific mac hine learning: that large-scale visual pretraining provides robust spatial representa- tions even across substan tial domain differences; that physical constrain ts are most effectively in tegrated as differen tiable architectural comp onen ts rather than as p ost-ho c corrections; and that progressive training guided by diagnostic failure-mo de analysis outp erforms end-to-end optimization by enabling unam biguous attribution of performance gains to sp ecific interv entions. These principles extend naturally to the broader class of scientific inv erse problems in whic h high-fidelit y simulation is too exp ensive for routine use but systematic generation of lab eled training data remains feasible. Data A v ailabilit y The p orous media image dataset and asso ciated p ermeabilit y tensor lab els used in this study are publicly av ailable at [ 77 ]. A c kno wledgmen ts This w ork w as supp orted b y the Multiph ysics of Salt Precipitation During CO 2 Storage (Saltation) pro ject, pro ject n umber 357784, funded by the Research Council of Norw ay . Computational resources were provided b y the GPU infrastructure at the Section for En vironmental Geosciences, Departmen t of Geosciences, Universit y of Oslo. The authors thank the dev elop ers and maintainers of PyT orc h, timm , NumPy , and SciPy , whose op en-source libraries underpinned all model dev elopment and statistical analyses rep orted here. App endix 34 The app endices provide detailed supp orting material for the metho ds and results describ ed in Sections 2 and 3 . Appendix A rep orts full dataset statistics across all data splits, including p orosit y-p ermeabilit y relationships, cross-split distributional consistency , and LBM symmetry v alidation. App endix B details the MaxViT stage configuration and input adaptations for the 128 × 128 single-channel domain. App endix C presents the FiLM placemen t ablation and p orosit y enco der dimensionalit y search. App endix D consolidates optimizer settings, learning-rate sc hedules, progressive unfreezing, loss-function w eigh ts, and c heckpointing details for all three training phases. App endix E provides the complete D4 transformation matrices, the equiv ariance design-c hoice discussion, the augmentation parameter search, and the full pip eline pseudo co de. App endix F deriv es the SW A and EMA up date rules and details the batc h-normalization recalibration pro cedure. Appendix G defines the extended ev aluation metrics used for cross-study comparison. App endix H rep orts four p ost-ho c statistical analyses: b ootstrap confidence interv als, residual diagnostics, Mon te Carlo Drop out uncertaint y quantification, and anisotrop y-stratified error analysis. App endix I pro vides the full comp onen t-wise b o otstrap summary , anisotrop y- stratified p erformance table, and p er-component MC-Drop out uncertaint y statistics for Phase 4. App endix J consolidates the complete 35-metric quantitativ e record across all three phases in the extended comparison table. A Dataset Statistics and LBM Details A comprehensive dataset c haracterization analysis w as conducted before an y mo del developmen t, co vering six domains: (i) microstructure geometry and p orosit y distributions; (ii) p ermeabilit y tensor descriptive statistics; (iii) p orosity-permeability relationships and classical mo del fits; (iv) tensor structural prop erties including p ositiv e-definiteness, symmetry , and anisotropy; (v) cross- split distributional consistency; and (vi) outlier iden tification. The sections b elow report these analyses in full; all statistics are computed on the held-out test set unless indicated otherwise. A.1 Descriptiv e Statistics T able 3 summarizes key statistical prop erties across all data splits. All p ermeabilit y v alues are in lattice units. The p orosit y distribution is mo derately left-skew ed (skewness ≈ − 0 . 55 ) and significan tly non-normal (Shapiro-Wilk p ≈ 0 across all splits), consisten t with the connectivit y- based p ost-processing that remov es isolated pore clusters and concentrates mass at higher p orosit y v alues. Principal permeabilities (diagonal elements) are strongly right-sk ew ed in linear space (sk ewness ≈ 1 . 5 , kurtosis ≈ 2 –4) and span [0 . 0020 , 3 . 19] in lattice units, exceeding three orders of magnitude in log 10 -space. Off-diagonal elemen ts are nearly symmetric ab out zero (skewness | ρ | < 0 . 1 in train and test; slightly more sk ewed in v alidation due to sampling v ariabilit y) and exhibit heavy tails (kurtosis ≈ 9 –13), reflecting the predominantly isotropic character of the media: off-diagonal coupling is rare, small, and symmetrically distributed. The ratio of diagonal to off-diagonal standard deviation is appro ximately 5 . 1 × across all splits, quantifying the scale disparit y b etw een comp onen ts that fundamen tally motiv ates the off-diagonal prioritization in the loss function (Section 2.4 ). A.2 Microstructure Geometry All images are confirmed to b e strictly binary across all splits, with zero non-binary pixels. R un-length enco ding of binary rows and columns yields mean horizontal and v ertical pore c hord lengths of appro ximately 27.3 p x and 27.2 px resp ectiv ely , with a run-length anisotropy index of η = | l h − l v | / ( l h + l v ) < 0 . 004 in all splits, confirming the isotropic character of the generated p ore geometries and providing quantitativ e justification for th e D4-equiv arian t augmen tation strategy (Section 2.5.1 ). 35 T able 3: Comprehensiv e dataset statistics across training, v alidation, and test splits. All p ermeabilit y v alues in lattice units. K diag : diagonal elements ( K xx , K y y ); K off : off-diagonal elements ( K xy , K y x ); λ min , λ max : smallest and largest eigenv alues of K ; AR : anisotrop y ratio λ max /λ min ; δ : off-diagonal dominance | K off | / ( | K diag | + | K off | ) . Prop ert y T rain V alidation T est Partitioning Num b er of samples 16,000 4,000 4,000 Image dimensions 128 × 128 128 × 128 128 × 128 Micr ostructur e ge ometry Strictly binary (no. non-binary pixels) T rue (0) T rue (0) T rue (0) P orosity ϕ (mean ± std) 0 . 711 ± 0 . 124 0 . 710 ± 0 . 125 0 . 713 ± 0 . 124 P orosity range [ ϕ min , ϕ max ] [0 . 227 , 0 . 900] [0 . 265 , 0 . 900] [0 . 297 , 0 . 900] P orosity IQR 0.187 0.187 0.189 P orosity skewness − 0 . 55 − 0 . 54 − 0 . 55 Horizon tal p ore chord (px) 27.25 27.18 27.58 V ertical p ore chord (p x) 27.17 26.97 27.66 R un-length anisotropy index 0.0015 0.0038 0.0014 Diagonal p erme ability K diag Range [0 . 0027 , 3 . 08] [0 . 0034 , 3 . 19] [0 . 0027 , 2 . 81] Mean ± std 0 . 369 ± 0 . 344 0 . 364 ± 0 . 346 0 . 372 ± 0 . 343 log 10 span (orders of magnitude) 3.06–3.12 2.87–2.95 2.86–3.02 Sk ewness 1.49–1.57 1.49–1.67 1.47–1.48 Off-diagonal p erme ability K off Range [ − 0 . 640 , 0 . 645] [ − 0 . 749 , 0 . 449] [ − 0 . 489 , 0 . 480] Mean ± std 0 . 001 ± 0 . 067 − 0 . 002 ± 0 . 069 0 . 002 ± 0 . 066 Kurtosis (excess) ≈ 9 . 3 ≈ 12 . 8 ≈ 7 . 0 Diag./off-diag. std ratio 5 . 15 × 5 . 01 × 5 . 20 × T ensor structural pr op erties F rob enius norm (mean ± std) 0 . 538 ± 0 . 478 0 . 531 ± 0 . 480 0 . 543 ± 0 . 476 T race K xx + K yy (mean ± std) 0 . 738 ± 0 . 664 0 . 729 ± 0 . 667 0 . 744 ± 0 . 661 Determinan t (mean ± std) 0 . 234 ± 0 . 403 0 . 232 ± 0 . 409 0 . 235 ± 0 . 394 T ensor symmetry ε sym = | K xy − K yx | Mean symmetry error 7 . 97 × 10 − 6 7 . 69 × 10 − 6 7 . 59 × 10 − 6 Max symmetry error 6 . 1 × 10 − 4 6 . 7 × 10 − 4 3 . 3 × 10 − 4 F raction with ε sym > 10 − 3 0/16,000 0/4,000 0/4,000 F raction with ε sym > 10 − 4 1.07 % 1.05 % 0.82 % Positive-definiteness 100 % PD ( λ min > 0 ) ✓ ✓ ✓ λ min (mean; min; p 5 ) 0 . 289; 1 . 5 × 10 − 6 ; 0 . 019 0 . 287; 5 . 2 × 10 − 4 ; 0 . 019 0 . 292; 2 . 1 × 10 − 3 ; 0 . 020 λ max (mean; max; p 95 ) 0 . 448; 3 . 10; 1 . 261 0 . 442; 3 . 20; 1 . 241 0 . 452; 2 . 81; 1 . 273 A nisotr opy r atio AR = λ max /λ min Median AR 1.625 1.627 1.624 90th-p ercen tile AR 3.32 3.33 3.34 F raction AR > 2 30.6 % 31.0 % 30.3 % F raction AR > 5 4.1 % 4.4 % 4.0 % F raction AR > 10 0.79 % 0.88 % 1.03 % Maxim um AR 53,242 92.2 36.1 Off-diagonal dominanc e δ Mean δ (median) 0.116 (0.096) 0.117 (0.099) 0.115 (0.096) Maxim um δ 0.500 0.493 0.471 36 A.3 P orosit y-Permeabilit y Relationships The Sp earman rank correlation b etw een p orosit y and diagonal p ermeabilit y is ρ s ( ϕ, K diag ) ≈ +0 . 940 ( p = 0 ) across all splits, confirming a strong monotonic relationship. By con trast, ρ s ( ϕ, K off ) ≈ +0 . 01 (not significan t, p > 0 . 05 in train and test), confirming that off-diagonal coupling is indep endent of p orosit y and m ust b e inferred from geometric anisotrop y features rather than scalar p orosity v alues. This disso ciation b etw een diagonal and off-diagonal predictors pro vides direct ph ysical motiv ation for the FiLM conditioning design, in which p orosity modulates only the magnitude of the transp ort co efficients while image-derived geometric features determine anisotropic coupling (Section 2.2.2 ). T able 4 rep orts Kozen y-Carman (KC) and pow er-la w (PL) fits to the porosity-permeability relationship. Both mo dels explain appro ximately 79 % of v ariance in diagonal p ermeability ( R 2 ≈ 0 . 789 – 0 . 791 ), consistent across splits. The fitted p ow er-la w exp onent n ≈ 4 . 8 – 4 . 9 lies within the physically accepted range of 3–8 for consolidated p orous media and is essentially iden tical across splits (within one standard error), confirming statistical equiv alence of the three partitions. The remaining ≈ 21 % of v ariance not captured by scalar p orosity corresp onds to geometric features, tortuosity , and connectivity effects that the MaxViT bac kb one is designed to learn. T able 4: Porosit y-p ermeabilit y mo del fits on mean diagonal p ermeability ¯ K diag = 1 2 ( K xx + K y y ) across data splits. KC: K ozeny-Carman fit ¯ K = C ϕ 3 / (1 − ϕ ) 2 ; PL: p o wer-la w fit ¯ K = C ϕ n . ρ s : Sp earman rank correlation b et ween ϕ and ¯ K diag . Split K ozeny-Carman Po w er la w ρ s C R 2 C n R 2 T rain 0.0242 0.789 1.337 4.81 0.777 0.967 V al 0.0240 0.791 1.337 4.84 0.776 0.967 T est 0.0243 0.790 1.371 4.89 0.791 0.968 A.4 Cross-Split Distributional Consistency Statistical equiv alence of the three splits w as v erified through tw o-sample Kolmogoro v-Smirno v (KS) tests and Jensen-Shannon div ergence (JSD) measurements on all fiv e v ariables ( K xx , K xy , K y x , K y y , ϕ ). Results are summarized in T able 5 . All 15 KS tests fail to reject the null hypothesis of equal distributions at the α = 0 . 05 level ( p > 0 . 13 for all pairs and v ariables), with maxim um observ ed KS statistic of 0 . 026 . JSD v alues are b ounded ab o ve b y 0 . 007 across all v ariable-pair combinations, corresp onding to near-identical distributions (JSD = 0 for iden tical distributions, JSD = ln 2 ≈ 0 . 693 for maximally separated distributions). These results confirm that the random partitioning proto col pro duces statistically indistinguishable splits and that no distributional shift is presen t betw een training, v alidation, and test sets, v alidating that rep orted test-set performance p ro vides an unbiased estimate of generalization. A.5 Outlier Analysis Outlier analysis w as conducted along three axes: Mahalanobis distance in the joint p ermeabilit y tensor space (flagging multiv ariate extremes), extreme marginal p orosity (top and b ottom 1 %), and high tensor anisotrop y (top 5 % by anisotropy ratio AR). Results are summarized in T able 6 . By construction, Mahalanobis outliers at the 99th-p ercen tile threshold constitute exactly 1 % of each split. High-AR samples (AR > p 95 ) account for 5 % of each split, with 95th- p ercen tile AR thresholds of 4.47–4.63 across splits. These high-anisotropy samples represent the primary c hallenge regime for off-diagonal p ermeability prediction; their analysis is detailed 37 T able 5: Cross-split distributional consistency . KS: t wo-sample Kolmogoro v-Smirno v statistic and p -v alue (reject if p < 0 . 05 ). JSD: Jensen-Shannon divergence (bounded in [0 , ln 2] ; v alues near 0 indicate near-iden tical distributions). All KS tests fail to reject the null hypothesis of equal distributions ( p > 0 . 05 ). P air V ariable KS stat p -v alue JSD T rain vs. V al K xx 0.0154 0.432 0.00353 K xy 0.0174 0.286 0.00305 K y y 0.0134 0.605 0.00368 ϕ 0.0141 0.547 0.00355 T rain vs. T est K xx 0.0141 0.541 0.00312 K xy 0.0106 0.864 0.00385 K y y 0.0117 0.764 0.00366 ϕ 0.0143 0.524 0.00347 V al vs. T est K xx 0.0260 0.134 0.00437 K xy 0.0208 0.355 0.00552 K y y 0.0208 0.355 0.00534 ϕ 0.0225 0.263 0.00696 in App endix H.4 . No samples w ere excluded from training or ev aluation on the basis of these outlier criteria; the metrics are rep orted as c haracterization of the data distribution. T able 6: Outlier summary across data splits. Mahalanobis: samples b eyond the 99th-p ercen tile Mahalanobis distance in the join t tensor space. Extreme ϕ : samples in the top or b ottom 1 % of the marginal p orosity distribution. High AR: samples in the top 5 % of the anisotrop y ratio distribution. Split Mahalanobis ( > p 99 ) High ϕ ( > p 99 ) Lo w ϕ ( < p 1 ) High AR ( > p 95 ) T rain ( n = 16 , 000 ) 160 (1.00 %) 159 ( ϕ > 0 . 896 ) 160 ( ϕ < 0 . 396 ) 800 (AR > 4 . 59 ) V al ( n = 4 , 000 ) 40 (1.00 %) 40 ( ϕ > 0 . 896 ) 40 ( ϕ < 0 . 390 ) 200 (AR > 4 . 63 ) T est ( n = 4 , 000 ) 40 (1.00 %) 39 ( ϕ > 0 . 896 ) 40 ( ϕ < 0 . 401 ) 200 (AR > 4 . 47 ) A.6 LBM Symmetry and P ositiv e-Definiteness T ensor symmetry is preserved by the LBM solver to ε sym < 8 × 10 − 6 across all splits (mean v alues 7 . 59 – 7 . 97 × 10 − 6 ; see T able 3 ). No sample in any split exhibits ε sym > 10 − 3 , and few er than 1.1 % of samples in any split exceed 10 − 4 . F or reference, the FP64 machine epsilon is ≈ 10 − 16 ; the observed symmetry error lies appro ximately ten orders of magnitude abov e floating-p oint rounding but remains ten or more orders of magnitude b elow an y practical engineering tolerance. The term “near-machine precision” used in the main text refers to this engineering in terpretation. All 24,000 labeled samples are strictly p ositiv e-definite ( λ min > 0 , verified to λ min > 10 − 12 ), confirming that the ground-truth tensors satisfy the thermo dynamic constraint required b y Darcy’s la w and providing a reliable signal for training the p ositivit y p enalt y term (Equation 17 ). B MaxViT Arc hitecture Details B.1 Stage Configuration for 128 × 128 Input T able 7 provides stage-level dimensions and blo ck counts for MaxViT-Base when pro cessing 128 × 128 single-c hannel inputs, as used in this w ork. 38 T able 7: MaxViT-Base stage configuration for 128 × 128 input. Blo c k size P = 8 is fixed across all stages. Stage Spatial resolution Channels MaxViT blo c ks A ttention regime 1 32 × 32 64 2 Blo c k ( 4 × 4 blo cks) + Grid 2 16 × 16 128 6 Blo c k ( 2 × 2 blo cks) + Grid 3 8 × 8 256 14 F ull self-attention (spatial ≤ P ) 4 4 × 4 512 3 F ull self-attention (spatial ≤ P ) B.2 Multi-Axis A tten tion for 128 × 128 Input Eac h MaxViT blo ck applies t wo sequen tial atten tion op erations. Blo c k-lo cal atten tion partitions the feature map in to non-o verlapping P × P windo ws, with P = 8 fixed across all stages. F or Stages 1 and 2, where the spatial resolution exceeds the blo ck size, the partition pro duces 4 × 4 = 16 and 2 × 2 = 4 blo c ks respectively; grid-global attention then integrates information across these spatially distinct regions by attending along dilated grid p ositions. F or Stages 3 and 4, the spatial resolutions ( 8 × 8 and 4 × 4 ) are smaller than or equal to the blo c k size P = 8 , so the feature map constitutes a single blo ck and full self-atten tion is applied within the en tire stage feature map. This graceful degradation preserv es the attention mec hanism across all stages without mo dification to the underlying implemen tation. B.3 Input A daptation The standard MaxViT-Base model accepts 384 × 384 RGB images. T w o adaptations are applied here. The input c hannel coun t is reduced from three to one ( in_chans=1 ), with pretrained spatial filters preserved via channel a veraging so that learned texture and edge representations are retained. The image size is set to 128 × 128 ( img_size=128 ), whic h triggers bicubic in terp olation of all learned p ositional em b eddings from the original 384 × 384 resolution; this interpolation is p erformed once at mo del initialization and in tro duces no training-time ov erhead. C FiLM Conditioning: Placemen t Ablation and Enco der Con- figuration C.1 Stage Placemen t Ablation Fiv e FiLM placement configurations were ev aluated on the v alidation set to determine whic h bac kb one stages should receive p orosit y conditioning. Eac h configuration w as trained for the full Phase 4 training run from the same Phase 3 chec kp oin t; v alidation R 2 at the b est ep o ch (determined by early stopping) serv ed as the selection criterion. Stage 1 features enco de low-lev el edge detection and solid-fluid boundary lo calization; their relev ant length scale corresp onds to individual in terface sharpness rather than pore-b o dy or p ore-throat geometry , where p orosity pro vides meaningful context. Applying FiLM conditioning to Stage 1 provided no additional b enefit and marginally degraded v alidation p erformance, suggesting that lo w-level feature represen tations are disrupted by scalar conditioning. The Stages 2–4 configuration ac hieved the highest v alidation R 2 , confirming that p orosity conditioning is most b eneficial at the levels of abstraction where connectivit y patterns and global p ore-net work top ology are enco ded. C.2 P orosit y Enco der Dimensionality Searc h The p orosity enco der output dimensionality w as selected from { 32 , 64 , 128 } using the same v alidation-set pro cedure. A 32-dimensional em b edding pro duced sligh t underfitting of the p orosit y- 39 p ermeabilit y nonlinearity (v alidation R 2 appro ximately 0 . 0003 low er than the 64-dimensional baseline). A 128-dimensional embedding introduced marginal o verfitting relativ e to the scalar conditioning signal, with a small increase in the training-v alidation R 2 gap and no improv emen t on the test set. The 64-dimensional embedding pro vides a compact yet expressive represen tation and was adopted for all Phase 4 exp erimen ts. D T raining Configuration D.1 Optimizer and Sc hedule The Adam W optimizer [ 87 ] is used throughout, with β 1 = 0 . 9 , β 2 = 0 . 999 , and ε = 10 − 8 . The learning-rate schedule in all phases comprises a linear warm up phase follo wed by cosine annealing. T able 8 summarizes phase-specific settings. T able 8: T raining configuration b y phase. LR: learning rate; WD: weigh t decay; BS: batc h size; P at.: early stopping patience (ep o c hs, monitoring v alidation R 2 ); WU: w armup duration (ep o c hs). Phase Ep o c hs Init. from LR max LR min WU WD BS P at. 2 600 ImageNet 10 − 4 10 − 7 50 0.05 32 100 3 600 Phase 2 10 − 4 10 − 5 50 0.01 32 100 4 600 Phase 3 10 − 4 10 − 6 50 0.01 32 150 D.2 Progressiv e Unfreezing Sc hedule (Phase 2) T able 9: Progressiv e unfreezing sc hedule in Phase 2. Ep o c h range T rainable comp onen ts 0–50 Regression head only (backbone frozen) 50–150 Regression head + backbone Stages 3–4 (Stages 1–2 and stem frozen) 150–600 F ull mo del (all parameters) D.3 Loss-F unction W eights by Phase T able 10: Physics-a w are loss-function weigh ts by training phase. λ offdiag is not active ( − ) in Phase 2. Phase λ sym λ pos λ offdiag ( w d , w o ) 2 0.10 0.05 − (1 . 0 , 1 . 0) 3–4 0.20 0.05 1.0 (1 . 0 , 1 . 5) D.4 Chec kp oin ting Strategy Multiple chec kpoints are maintained during training: the b est v alidation R 2 c heckpoint (primary c heckpoint for ev aluation), the b est v alidation MSE chec kp oin t (alternative metric), and p erio dic snapshots every 50 ep o c hs for p ost-ho c analysis. In Phase 4, the SW A mo del (weigh t-a veraged from ep ochs 400–600) and the EMA mo del (contin uously a veraged throughout training) are trac ked as additional ev aluation candidates, as described in Section 2.3.3 and Appendix F . The Phase 4 early stopping patience is increased to 150 ep o c hs (versus 100 for Phases 2–3), reflecting the smaller trainable parameter set and corresp ondingly slo w er v alidation-metric dynamics. 40 D.5 DataLoader Configuration Data loading utilizes the PyT orch DataLoader with memory pinning for I/O efficiency . F our w orkers are used in Phase 2; this is reduced to zero work ers in Phases 3–4 because the augmen ta- tion pip eline inv olv es p er-sample computations (elastic deformation Jacobians, morphological op erations) that are more efficiently executed in the main pro cess under the memory constrain ts of the training GPU. E Data A ugmen tation: Details and P arameter Searc h E.1 D4 T ransformation Matrices T able 11 provides the coordinate-transformation matrix P and the resulting permeability tensor K ′ = PKP T for each of the eigh t D 4 group elements. T able 11: D 4 transformation matrices P and the corresp onding permeability tensor transformations K ′ = PKP T . En tries use a = K xx , b = K xy = K y x , c = K y y . T ransformation P K ′ Iden tity 1 0 0 1 a b b c Rotation 90 ◦ 0 − 1 1 0 c − b − b a Rotation 180 ◦ − 1 0 0 − 1 a b b c Rotation 270 ◦ 0 1 − 1 0 c − b − b a Horizon tal flip − 1 0 0 1 a − b − b c V ertical flip 1 0 0 − 1 a − b − b c Diagonal flip 0 1 1 0 c b b a An ti-diagonal flip 0 − 1 − 1 0 c b b a E.2 Group-Equiv arian t Netw orks: Design Choice Discussion Group Equiv arian t Conv olutional Net works (G-CNNs) [ 70 – 72 ] build exact equiv ariance to a sp ecified symmetry group directly in to the weigh t-sharing structure through group conv olutions, guaran teeing that the netw ork output transforms predictably under all group elements b y construction. This provides arc hitecturally guaranteed equiv ariance rather than the statistical, augmen tation-based appro ximation employ ed in this work. This approac h w as not adopted for t wo reasons. First, MaxViT’s ImageNet pretraining pro vides strong lo w- and mid-level visual features that transfer effectively to p orous media images despite the domain gap; replacing the MaxViT bac kb one with a G-CNN arc hitecture trained from scratch would discard this pretraining entirely . Second, empirical ev aluation demonstrates that comprehensive D 4 augmen tation ac hiev es sufficien tly precise equiv ariance for practical purp oses: the mean symmetry error ε sym = 3 . 95 × 10 − 7 o ver the 4,000-sample test set indicates that Onsager recipro cit y is satisfied to near-machine precision, and any residual equiv ariance violation is orders of magnitude b elo w the measurement uncertain ty of LBM simulation. F uture w ork may in v estigate G-CNN pretraining strategies that com bine symmetry guarantees with transfer learning for p orous media applications. 41 E.3 Phase 3 A ugmen tation Parameter Searc h A ugmentation parameters w ere determined through a t w o-stage grid searc h on the v alidation set, with no test-set access at an y stage. Stage 1: p er-class pr ob ability se ar ch. Each transformation class (erosion, dilation, elastic deformation, cutout) w as ev aluated ov er the probabilit y grid { 0 . 05 , 0 . 08 , 0 . 10 , 0 . 15 , 0 . 20 } inde- p enden tly , with other parameters held at nominal v alues ( α = 3 . 0 , σ = 2 . 0 ). Eac h configuration w as trained for 150 ep o c hs from the Phase 2 chec kpoint. Stage 2: elastic deformation magnitude se ar ch. Using the per-class probabilities iden tified in Stage 1, elastic deformation magnitude w as sw ept o v er α ∈ { 2 . 0 , 3 . 0 , 5 . 0 , 7 . 0 } . The optimization criterion was v alidation R 2 for the off-diagonal comp onent K xy , rather than the ov erall v ariance-w eigh ted R 2 . Diagonal comp onen ts K xx and K y y pro ved robustly learned across the full augmentation searc h range, making the off-diagonal prediction accuracy the most sensitiv e and discriminating indicator of augmen tation quality . Excessiv e augmen tation (probabilities ab ov e approximately 15 % p er class, or α > 3 . 0 ) introduced gradient v ariance that degraded off-diagonal conv ergence while diagon al comp onen ts remained largely unaffected. The b est configuration selected was: p erosion = p dilation = 0 . 08 , p elastic = 0 . 08 with α = 3 . 0 and σ = 2 . 0 , p cutout = 0 . 10 . E.4 Phase 3–4 A ugmen tation Pip eline Algorithm 1 Phases 3–4 A ugmentation Pip eline (applied p er training sample) Require: Image I ∈ { 0 , 1 } 128 × 128 , p ermeability tensor K ∈ R 2 × 2 Ensure: Augmen ted image I ′ , consistently transformed tensor K ′ 1: Sample g ∼ Uniform( D 4 ) ▷ Apply D4 transformation with probability 1.0 2: I ← g ( I ) ; K ← P g K P T g 3: if u 1 ∼ Uniform(0 , 1) < 0 . 08 then ▷ Elastic deformation 4: Generate displacement fields δ x , δ y from α · G σ =2 . 0 ( U ( − 1 , 1)) 5: I ← I ( x + δ x , y + δ y ) ▷ Bilinear in terp olation 6: Compute Jacobian J at image cen ter via cen tral finite differences 7: K ← J K J T ; K ← ( K + K T ) / 2 8: end if 9: if u 2 ∼ Uniform(0 , 1) < 0 . 08 then ▷ Erosion 10: I ← min ( i,j ) ∈N 3 × 3 I ( x + i, y + j ) 11: end if 12: if u 3 ∼ Uniform(0 , 1) < 0 . 08 then ▷ Dilation 13: I ← max ( i,j ) ∈N 3 × 3 I ( x + i, y + j ) 14: end if 15: if u 4 ∼ Uniform(0 , 1) < 0 . 10 then ▷ Cutout 16: Sample random rectangle R ⊆ [0 , 127] 2 with |R| ≤ 8 × 8 17: I ( x, y ) ← 0 . 5 for ( x, y ) ∈ R 18: end if 19: I ′ ← I ; K ′ ← K Morphological op erations and cutout masking do not require explicit tensor transformation: morphological p erturbations are sufficiently small ( 3 × 3 kernel, 8 % probability) that the geometric structure-p ermeabilit y mapping is preserved, and the netw ork learns the p ermeability of the mo dified geometry through conv olutional feature extraction; cutout o ccludes a small spatial region without altering the global p ore geometry go v erning the p ermeability tensor. 42 F Ensem ble Metho ds: Sto chastic W eight A veraging and Exp o- nen tial Mo ving A v erage F.1 Sto c hastic W eigh t A v eraging (SW A) SW A [ 74 ] maintains a running arithmetic mean of mo del w eights sampled during late-stage training, where the mo del has conv erged but con tinues to explore locally optimal regions due to sto c hastic gradient noise. Starting at ep o c h t SW A = 400 (in Phase 4), the SW A mo del θ SW A is initialized with the current w eights θ 400 and up dated after eac h subsequen t ep o ch t via: θ SW A ← n models · θ SW A + θ t n models + 1 , (24) where n models coun ts the num b er of mo dels aggregated so far. The SW A learning rate is fixed at η SW A = 5 × 10 − 6 throughout ep o chs 400–600, substantially lo wer than the cosine-annealed base rate at ep o ch 400, encouraging broader exploration near the conv erged solution. SW A aggregates 200 mo del snapshots without incurring an y additional training cost. After training, batch-normalization running statistics are recalibrated b y passing the full training set through θ SW A in ev aluation mo de once. This recalibration step is critical: w eight a veraging shifts the effectiv e activ ation distributions relativ e to the batc h-normalization statistics accum ulated during training, and without recalibration the SW A mo del w ould apply incorrect normalization at inference time. F.2 Exp onen tial Mo ving A v erage (EMA) EMA [ 80 ] main tains a con tinuously up dated exp onen tially-weigh ted av erage of mo del parameters θ EMA throughout training, updated after every optimizer step via: θ EMA ← λ EMA · θ EMA + (1 − λ EMA ) · θ t , (25) with decay λ EMA = 0 . 9999 . F or a constan t step size, this corresp onds to an effective a veraging windo w of 1 / (1 − λ EMA ) = 10 , 000 gradient steps. The EMA mo del fi lters high-frequency w eight fluctuations arising from mini-batch noise while trac king the optimization tra jectory , and empirically outp erforms instan taneous weigh ts on v alidation data. Both θ t and θ EMA are monitored on the v alidation set at ev ery ep o ch, and the b est-performing EMA chec kp oin t is sa ved indep enden tly . G Extended Ev aluation Metrics The following metrics complemen t the primary R 2 , MSE, and RRMSE rep orted in the main text (Section 2.6.2 ) to enable cross-study comparison and provide a more complete c haracterization of prediction quality . Me an absolute p er c entage err or (MAPE). MAPE j k = 1 N N X i =1 | K pred j k ,i − K true j k ,i | | K true j k ,i | + ε × 100 % , (26) where ε = 10 − 8 prev ents division by zero. R ange-normalize d RMSE (NRMSE). NRMSE j k = RMSE j k K true j k , max − K true j k , min + ε × 100 % , (27) expressing prediction error as a fraction of the comp onent’s observed range. 43 Wil lmott’s index of agr e ement ( d ). d = 1 − P i ( y true i − y pred i ) 2 P i | y pred i − ¯ y | + | y true i − ¯ y | 2 , (28) b ounded in [ 0 , 1] with d = 1 indicating p erfect agreement [ 84 ]. Kling-Gupta efficiency (K GE). K GE = 1 − q ( r − 1) 2 + ( α − 1) 2 + ( β − 1) 2 , (29) where r is Pearson correlation, α = σ ˆ y /σ y is the v ariabilit y ratio, and β = µ ˆ y /µ y is the bias ratio [ 83 ]. Sp e arman r ank c orr elation ( ρ s ). Monotonic predictiv e skill, insensitiv e to distributional assumptions, assessed through the Spearman rank correlation coefficient. All metrics are computed separately for diagonal ( K xx , K y y ) and off-diagonal ( K xy , K y x ) comp onen ts. The diagonal-to-off-diagonal R 2 gap, ∆ R 2 = R 2 diag − R 2 off , is rep orted as a summary statistic quantifying the systematic prediction difficulty for anisotropy-encoding components. H Statistical Analyses F our complemen tary p ost-hoc analyses were conducted on the held-out test set ( N = 4 , 000 samples) using the b est-p erforming c heckpoints. Eac h analysis addresses a sp ecific metho dological concern and is describ ed in a dedicated subsection b elo w. H.1 Bo otstrap Statistical V alidation Metho d. Let D test = { ( y i , ˆ y i ) } N i =1 denote the set of ground-truth and predicted p ermeabilit y tensors on the held-out test set. A single deterministic forward pass through the trained netw ork yields the prediction matrix ˆ Y ∈ R N × 4 , which is treated as fixed throughout the b o otstrap pro cedure. In each of B = 1 , 000 b o otstrap iterations, an index set I b = { i 1 , . . . , i N } is drawn b y sampling with replacement from { 1 , . . . , N } using a fixed random seed ( seed = 42 ). The metric function f ( · ) is ev aluated on the resampled pairs ( Y [ I b ] , ˆ Y [ I b ]) , yielding a b o otstrap replicate f ( b ) . The collection { f ( b ) } B b =1 constitutes the empirical b o otstrap distribution of the metric. Confidence in terv als are computed using the bias-corrected and accelerated (BCa) metho d [ 88 ], whic h corrects for b oth bias in the bo otstrap distribution and sk ewness in the sampling distribution. The bias-correction constant is ˆ z 0 = Φ − 1 Pr[ f ( b ) < f obs ] , where Φ − 1 is the standard normal quan tile function and f obs is the observ ed metric on the full test set. The acceleration constant is set to ˆ a = 0 (reducing BCa to bias-corrected p ercentile), and the 95 % confidence limits are: f α 1 , f α 2 , α 1 , 2 = Φ ˆ z 0 + ˆ z 0 ± z 0 . 025 1 − ˆ a ( ˆ z 0 ± z 0 . 025 ) , (30) where z 0 . 025 = 1 . 96 . Metrics. The b o otstrap pro cedure is applied to global R 2 , MSE, RMSE, MAE, MAPE, RRMSE, and the mean off-diagonal symmetry error ε sym = E [ | ˆ K xy − ˆ K y x | ] . All metrics are additionally computed p er tensor comp onent ( K xx , K xy , K y x , K y y ), yielding 95 % BCa confidence in terv als at the comp onen t level. Results are rep orted in the b ootstrap summary table and visualized as densit y histograms with confidence bands. 44 H.2 Extended P erformance Metrics and Residual Diagnostics A bsolute and r elative metrics. Absolute metrics (RMSE, MAE) and the scale-in v arian t metrics defined in App endix G (MAPE, RRMSE, NRMSE) are reported for all tensor comp onen ts, with diagonal-off-diagonal decomp osition. Willmott’s index of agreemen t, the Kling-Gupta efficiency , and Sp earman rank correlation are computed p er comp onen t to pro vide a multifaceted assessmen t suitable for cross-study comparison. R esidual diagnostics. Component-wise residuals e i = ˆ K ( i ) c − K ( i ) c are examined through: (i) signed-error histograms with kernel density estimates to assess bias and distributional shap e; (ii) residuals-versus-fitted plots with LO WESS trend lines to detect heteroscedasticity; and (iii) normal quantile-quan tile (Q-Q) plots accompanied by Shapiro-Wilk tests to assess the normalit y assumption underlying confidence interv al calculations. H.3 Uncertain t y Quantification via Mon te Carlo Drop out MC-Dr op out infer enc e. At test time, all drop out lay ers are set to training mo de while batc h- normalization lay ers remain in ev aluation mo de. F or each test sample, T = 100 sto c hastic forw ard passes are p erformed, yielding a distribution of predictions { ˆ y ( t ) } T t =1 . The predictive mean and epistemic standard deviation are estimated as: ˆ µ = 1 T T X t =1 ˆ y ( t ) , (31) ˆ σ = v u u t 1 T T X t =1 ˆ y ( t ) − ˆ µ 2 . (32) Symmetric prediction in terv als at nominal co verage 1 − α are obtained as ˆ µ ± z α/ 2 ˆ σ . Calibr ation assessment. Calibration is assessed via an empirical co verage curv e: for each nominal lev el α ∈ [0 . 05 , 0 . 99] , the fraction of test samples whose true v alue lies within the corresp onding prediction interv al is computed and plotted against the nominal lev el (reliability diagram). A well-calibrated mo del yields a curve close to the diagonal [ 89 ]. The mean calibration error is: MCE = 1 |A| X α ∈A p nom ( α ) − p emp ( α ) , (33) where p nom and p emp are the nominal and empirical co verage v alues, resp ectiv ely . R eliability by c onfidenc e. T o verify that predicted uncertaint y is informative, test samples are stratified into deciles by their mean epistemic standard deviation ¯ σ i = 1 4 P c ˆ σ i,c . Within eac h decile, mean absolute error is computed and plotted as a function of decile-mean uncertain ty . A monotonically increasing relationship confirms that high-uncertain ty predictions corresp ond to higher errors, v alidating the utilit y of the uncertaint y estimate for flagging low-confidence predictions in do wnstream engineering applications. H.4 Anisotrop y Gap and High-Anisotrop y Error Analysis A nisotr opy me asur es. Three complemen tary anisotrop y measures are computed from the ground- truth p ermeability tensors. The eigen v alue anisotropy ratio, AR eig = | λ max | / | λ min | , where λ max and λ min are the eigenv alues of the 2 × 2 p ermeability tensor K , is the ph ysically meaningful anisotrop y measure, approaching unit y for isotropic media. The comp onent anisotropy ratio, AR comp = max ( | K xx | , | K y y | ) / [ max ( | K xy | , | K y x | ) + ε ] , measures the relativ e dominance of diagonal o ver off-diagonal entries. The off-diagonal dominance fraction, δ = ¯ K off / ( ¯ K diag + ¯ K off + ε ) , where ¯ K diag and ¯ K off are mean absolute diagonal and off-diagonal magnitudes, indicates media in which off-diagonal p ermeability is particularly significant. 45 Err or str atific ation. T est samples are divided in to n = 10 equi-p opulated bins according to AR eig . Within each bin, p er-comp onen t R 2 , RRMSE, and MAPE are computed, together with aggregate diagonal and off-diagonal R 2 . The ev olution of the diagonal-to-off-diagonal R 2 gap with anisotropy iden tifies the regimes in whic h the mo del struggles most. High-anisotr opy de ep-dive. The 500 most anisotropic test samples (top 12.5 %) are isolated and examined with dedicated parit y plots (predicted v ersus true) for each tensor comp onent. The eight b est-predicted, eigh t worst-predicted, and eight most anisotropic samples are visualized alongside their input binary images, predicted and true tensor v alues, and p er-sample MAE. T ensor c ondition numb er and ge ometric image statistics. The condition num b er κ ( K ) = | λ max | / | λ min | of eac h ground-truth tensor is correlated with p er-sample MAE via Sp earman’s ρ s to quantify ho w ill-conditioning amplifies prediction error. Image-level geometric proxies are also computed: (i) p orosity ϕ = 1 − ¯ I ; (ii) mean horizontal and vertical p ore-c hord lengths estimated from run-length enco ding of binary ro ws and columns; (iii) a run-asymmetry index η = | l h − l v | / ( l h + l v + ε ) ; and (iv) a gradient anisotrop y index γ = ⟨| ∂ x I |⟩ / ⟨| ∂ y I |⟩ . I Extended Phase 4 Results I.1 Comp onen t-Wise Bo otstrap Summary T able 12 rep orts the full p er-comp onen t b o otstrap summary ( B = 1 , 000 , BCa metho d [ 88 ]). All confidence interv als are narro wer than the betw een-phase p erformance differences ( ∆ R 2 ≈ 0 . 002 – 0 . 003 p er step), confirming that progressiv e gains rep orted in T able 2 are statistically significan t. I.2 Anisotrop y-Stratified Metrics T able 13 rep orts R 2 and RRMSE for all four tensor comp onen ts across the ten equi-p opulated AR bins ( n = 400 samples eac h) used to pro duce Figure 12 (b,d) in the main text. The near- isotropic bin (AR ≈ 1 . 11 ) accounts for the largest ∆ R 2 and is the primary residual c hallenge for off-diagonal prediction, as discussed in Section 3.2.2 . The table confirms that the ∆ R 2 gap is non-monotonic in AR, p eaking at the lo w est anisotropy bin and reaching a minimum of 0.0098 at AR ≈ 1 . 89 . The secondary widening at AR ≈ 5 . 87 ( ∆ R 2 = 0 . 0151 ) is mo dest in absolute terms and reflects the increased difficulty of resolving extreme coupling magnitudes; it do es not represent a failure mo de but rather the exp ected increase in absolute off-diagonal RMSE as coupling strength grows. I.3 MC-Drop out Uncertain t y: P er-Comp onen t Summary T able 14 extends the main-text uncertaint y summary (Section 3.4.7 ) with full p er-comp onen t statistics. The uncertain ty-error Sp earman correlation is consistently p ositive across all four comp onen ts ( ρ s = 0 . 229 – 0 . 304 ), confirming that MC-Drop out pro vides a useful—if partial—confidence signal. Off-diagonal comp onents exhibit sligh tly higher ρ s ( 0 . 304 versus 0 . 229 – 0 . 235 for diagonal), consisten t with their higher prediction v ariability pro viding more discriminative uncertain ty estimates. Diagonal comp onen ts carry larger absolute uncertaint y ( ¯ σ ≈ 1 . 26 × 10 − 2 ) than off- diagonal ( ¯ σ ≈ 5 . 15 × 10 − 3 ), but this reflects their larger dynamic range: the ratio ¯ σ / RMSE ≈ 0 . 64 is nearly iden tical across all four comp onents, indicating well-calibrated relativ e uncertaint y in this sense. J Extended Phase Comparison Metrics T able 15 consolidates the complete quan titative record of the three-phase progressiv e training framew ork, reporting eigh t metric categories across all training phases on the held-out test set 46 T able 12: Component-wise b o otstrap statistics for Phase 4 ( B = 1 , 000 , 95 % BCa CIs). RRMSE normalized by comp onen t mean absolute v alue (Equation 23 ). Comp. Metric Observ ed Bo ot. mean CI lo w CI high K xx R 2 0.99674 0.99673 0.99626 0.99708 RMSE 1 . 955 × 10 − 2 1 . 956 × 10 − 2 1 . 835 × 10 − 2 2 . 120 × 10 − 2 MAE 1 . 272 × 10 − 2 1 . 272 × 10 − 2 1 . 224 × 10 − 2 1 . 316 × 10 − 2 MAPE (%) 7.09 7.08 6.64 7.61 RRMSE (%) 5.25 5.25 4.96 5.67 K xy R 2 0.97583 0.97579 0.97359 0.97792 RMSE 1 . 026 × 10 − 2 1 . 026 × 10 − 2 9 . 87 × 10 − 3 1 . 071 × 10 − 2 MAE 7 . 17 × 10 − 3 7 . 17 × 10 − 3 6 . 96 × 10 − 3 7 . 41 × 10 − 3 MAPE (%) 132.3 132.8 97.0 188.7 RRMSE (%) 24.43 24.43 23.46 25.56 K y x R 2 0.97583 0.97579 0.97360 0.97793 RMSE 1 . 026 × 10 − 2 1 . 026 × 10 − 2 9 . 87 × 10 − 3 1 . 071 × 10 − 2 MAE 7 . 17 × 10 − 3 7 . 17 × 10 − 3 6 . 96 × 10 − 3 7 . 41 × 10 − 3 MAPE (%) 136.2 136.9 96.8 196.8 RRMSE (%) 24.43 24.43 23.46 25.56 K y y R 2 0.99672 0.99671 0.99548 0.99729 RMSE 1 . 964 × 10 − 2 1 . 965 × 10 − 2 1 . 769 × 10 − 2 2 . 361 × 10 − 2 MAE 1 . 265 × 10 − 2 1 . 265 × 10 − 2 1 . 219 × 10 − 2 1 . 311 × 10 − 2 MAPE (%) 7.62 7.60 7.08 8.33 RRMSE (%) 5.28 5.28 4.75 6.31 Global R 2 0.98628 0.98626 0.98516 0.98738 RMSE 1 . 564 × 10 − 2 1 . 566 × 10 − 2 1 . 483 × 10 − 2 1 . 704 × 10 − 2 RRMSE (%) 7.55 7.56 7.18 8.19 ε sym 3 . 95 × 10 − 7 3 . 95 × 10 − 7 3 . 82 × 10 − 7 4 . 10 × 10 − 7 T able 13: Anisotropy-stratified prediction p erformance for Phase 4. T en equi-p opulated bins ( n = 400 eac h) spanning the 2nd–98th p ercentile of the AR distribution. ∆ R 2 = R 2 diag − R 2 off . AR bin R 2 diag R 2 off ∆ R 2 RRMSE K xx (%) RRMSE K xy (%) RRMSE K yy (%) 1.115 0.9966 0.8152 0.1814 3.68 60.44 3.85 1.233 0.9968 0.9148 0.0820 3.98 39.46 3.90 1.334 0.9966 0.9690 0.0276 4.32 24.32 3.78 1.439 0.9972 0.9756 0.0216 4.03 21.92 3.91 1.556 0.9963 0.9830 0.0133 5.40 20.71 5.03 1.699 0.9963 0.9836 0.0127 6.59 23.10 5.17 1.890 0.9964 0.9866 0.0098 6.44 18.95 5.94 2.201 0.9964 0.9855 0.0110 6.93 21.03 7.02 2.819 0.9955 0.9787 0.0168 9.90 22.27 7.92 5.870 0.9852 0.9701 0.0151 13.15 35.54 20.14 47 T able 14: Per-component MC-Drop out uncertaint y statistics for Phase 4 ( T = 100 , N = 4 , 000 ). ¯ σ : mean predictiv e standard deviation; ˜ σ : median; σ p 95 : 95th-p ercen tile uncertaint y; ρ s ( σ, | ε | ) : Sp earman rank correlation b et ween predictive uncertaint y and absolute error. Comp. R 2 RMSE ¯ σ ˜ σ σ p 95 ρ s ( σ, | ε | ) K xx 0.9967 1 . 966 × 10 − 2 1 . 263 × 10 − 2 9 . 07 × 10 − 3 3 . 34 × 10 − 2 0.229 K xy 0.9756 1 . 031 × 10 − 2 5 . 15 × 10 − 3 3 . 71 × 10 − 3 1 . 33 × 10 − 2 0.304 K y x 0.9756 1 . 031 × 10 − 2 5 . 15 × 10 − 3 3 . 71 × 10 − 3 1 . 33 × 10 − 2 0.304 K y y 0.9967 1 . 974 × 10 − 2 1 . 266 × 10 − 2 9 . 08 × 10 − 3 3 . 29 × 10 − 2 0.235 Global — — 8 . 90 × 10 − 3 6 . 65 × 10 − 3 — — ( N = 4 , 000 ). It extends the condensed summary in T able 2 with p er-comp onen t relativ e errors, b ootstrap uncertain ty decomp osition, high-anisotrop y stratification, p er-comp onen t predictive uncertain ty , uncertaint y-error correlations, and mo del sp ecification details. The sections b elo w highligh t the most consequential patterns. Progressiv e improv emen t is consisten t and monotonic. Ev ery single metric in sections A–G impro ves from Phase 2 through Phase 3 to Phase 4 without exception, across global accuracy , rank-based agreemen t, per-comp onent R 2 , relative error, ph ysics compliance, and uncertain ty calibration. This monotonicity across 35 rep orted ro ws is non-trivial: it confirms that no phase’s gains come at the exp ense of an y other aspect of p erformance, and that the staged design—isolating one methodological con tribution per phase—successfully av oids the trade-offs common in m ulti-ob jectiv e optimization. MAPE requires careful in terpretation for off-diagonal comp onen ts. Section D rep orts MAPE v alues of 132–146 % for K xy and K y x across all phases. These figures are not indicativ e of p o or p erformance: MAPE normalizes absolute error by the true v alue, and for near-isotropic samples where K xy ≈ 0 , the denominator approaches zero, inflating the metric arbitrarily . The RRMSE v alues (24.43 %–26.07 %) normalized by the mean absolute comp onen t v alue provide the correct scale-inv ariant comparison, and the wide b o otstrap confidence in terv als for MAPE (e.g., [ 52 , 99] % for Phase 4) reflect this instability . All do wnstream conclusions in the main text rely on RRMSE and R 2 for off-diagonal components, as recommended in App endix G . High-anisotrop y p erformance rev eals the most structured gain. Section E shows that ∆ R 2 high − AR (top 500 most anisotropic test samples) drops sharply from 0.0209 in Phase 2 to 0.0132 in Phase 4—a 37 % reduction—whereas the global ∆ R 2 narro ws only from 0.0230 to 0.0209. Notably , Phase 3 alone accounts for nearly the en tire high-AR impro vemen t ( ∆ R 2 high − AR = 0 . 0132 , essen tially iden tical to Phase 4’s 0 . 0132 ), while Phase 4’s con tribution is concentrated at global lev el through the FiLM magnitude scaling. The off-diagonal R 2 for the high-AR subset improv es from 0.968 (Phase 2) to 0.974 (Phase 4), confirming that augmentation div ersity in Phase 3 w as the decisive in terven tion for the strongly anisotropic regime. Uncertain t y calibration impro ves jointly with accuracy . Section G reveals t wo patterns not visible in accuracy metrics alone. First, the uncertaint y-error Sp earman correlations (UE- corr ρ ) impro ve across all four comp onents from Phase 2 to Phase 4: diagonal comp onents impro ve from ρ ≈ 0 . 196 – 0 . 204 to 0 . 229 – 0 . 235 , and off-diagonal comp onen ts from 0 . 271 to 0 . 304 . This confirms that SW A and EMA ensemble av eraging not only reduce p oint-prediction errors but pro duce more reliable confidence estimates—a meaningful b enefit for deplo yment w orkflows where uncertain ty flags guide selective LBM verification. Second, off-diagonal comp onents consisten tly ac hieve higher UE-corr ρ than diagonal ( 0 . 304 v ersus 0 . 229 – 0 . 235 in Phase 4), 48 indicating that MC-Drop out v ariance is a more discriminative signal for the harder-to-predict comp onen ts—precisely where reliable uncertaint y estimates are most v aluable. Section H makes the frozen-backbone logic explicit. The mo del specification section records that the bac kb one parameter coun t (118.64 M) is iden tical across all three phases; the Phase 4 increment of 0.33 M parameters represents the p orosit y enco der and FiLM la yers exclusiv ely . Phase 4 is the only phase with a frozen backbone, SW A, EMA, and FiLM conditioning sim ultaneously activ e. This section is included to facilitate repro ducibilit y and to clarify that no additional pretraining, arc hitectural search, or data expansion was p erformed b et ween phases. T able 15: Extended p erformance metrics across three consecutiv e training phases ev aluated on the held-out test set ( N = 4 , 000 ). 95 % BCa b ootstrap confidence interv als ( B = 1 , 000 resamples) are shown in brac kets; b o otstrap standard deviations ( σ ) quantify estimator v ariability . Predictive uncertain ty is estimated via Mon te Carlo Drop out ( T = 100 sto chastic forw ard passes). Bold v alues mark the b est result p er row. Phase 2: sup ervised baseline with D4 augmen tation; Phase 3: adv anced augmen tation and enhanced ph ysics-aw are loss; Phase 4: frozen backbone, FiLM p orosity conditioning, SW A/EMA ensemble. Abbreviations — KGE: Kling–Gupta efficiency [ 83 ]; d : Willmott index of agreemen t [ 84 ]; MCE: mean calibration error (lo wer is b etter calibrated); UE-corr ρ : Sp earman correlation b et ween predicted standard deviation and absolute error; ∆ R 2 high - AR : diagonal–off-diagonal R 2 gap for the 500 most anisotropic test samples. MAPE for K xy / K y x reflects near-zero denominators in quasi-isotropic samples; RRMSE is the preferred relative metric in this regime (App endix G ). Metric Phase 4 Phase 3 Phase 2 A. Glob al ac cur acy R 2 (comp onen t-av eraged) 0.98628 0.98476 0.98429 95 % BCa CI [0.98516, 0.98738] [0.98333, 0.98604] [0.98304, 0.98563] Bo otstrap σ 0.00059 0.00068 0.00070 R 2 (v ariance-weigh ted) 0.99604 0.99452 0.99403 MSE 2 . 446 × 10 − 4 2 . 880 × 10 − 4 3 . 226 × 10 − 4 RMSE 1 . 564 × 10 − 2 1 . 697 × 10 − 2 1 . 796 × 10 − 2 95 % BCa CI [0.01483, 0.01704] [0.01617, 0.01794] [0.01694, 0.01948] MAE 9 . 920 × 10 − 3 1 . 091 × 10 − 2 1 . 100 × 10 − 2 95 % BCa CI [0.00971, 0.01017] [0.01065, 0.01119] [0.01073, 0.01129] RRMSE (%) 7.552 8.194 8.671 95 % BCa CI [7.178, 8.187] [7.832, 8.686] [8.178, 9.354] MAPE (%) 70.791 72.384 74.917 95 % BCa CI [52.135, 99.030] [54.271, 103.518] [57.142, 108.249] Bo otstrap σ 11.989 12.745 13.521 B. R ank-b ase d and agr e ement metrics P earson r 0.99872 0.99851 0.99832 Sp earman ρ s 0.99595 0.99589 0.99572 Willmott d 0.99935 0.99920 0.99912 K GE 0.99420 0.98760 0.98244 C. Per-c omp onent R 2 with 95 % BCa CI R 2 K xx 0.99674 0.99607 0.99543 95 % BCa CI [0.99626, 0.99708] [0.99554, 0.99655] [0.99491, 0.99592] R 2 K xy 0.97583 0.97404 0.97246 95 % BCa CI [0.97359, 0.97792] [0.97146, 0.97640] [0.96999, 0.97503] Continue d on next p age 49 T able 15 (c ontinue d) Metric Phase 4 Phase 3 Phase 2 R 2 K yx 0.97583 0.97404 0.97247 95 % BCa CI [0.97360, 0.97793] [0.97146, 0.97639] [0.97000, 0.97502] R 2 K yy 0.99672 0.99613 0.99557 95 % BCa CI [0.99548, 0.99729] [0.99539, 0.99680] [0.99488, 0.99621] R 2 diagonal (a vg.) 0.99673 0.99610 0.99550 R 2 off - diag (a vg.) 0.97583 0.97404 0.97246 D. Per-c omp onent r elative err or RRMSE K xx (%) 5.25 5.77 6.12 RRMSE K xy (%) 24.43 25.32 26.07 RRMSE K yx (%) 24.43 25.32 26.07 RRMSE K yy (%) 5.28 5.70 6.27 MAPE K xx (%) 7.09 7.94 9.27 MAPE K xy (%) † 132.28 135.64 141.73 MAPE K yx (%) † 136.18 139.42 145.61 MAPE K yy (%) 7.62 8.11 9.78 E. T ensor-typ e de c omp osition and anisotr opy ∆ R 2 (diag − off, global) 0.02090 0.02206 0.02304 ∆ R 2 high - AR (top 500) 0.01319 0.01321 0.02093 R 2 K xx , high-AR subset 0.98510 0.98432 0.98350 R 2 K xy , high-AR subset 0.97437 0.97024 0.96809 F. Physics c omplianc e Symmetry error ε sym (mean) 3 . 95 × 10 − 7 7 . 69 × 10 − 7 1 . 39 × 10 − 6 95 % BCa CI [ 3 . 82 , 4 . 10 ] × 10 − 7 [7 . 41 , 7 . 99] × 10 − 7 [1 . 32 , 1 . 47] × 10 − 6 P ositivity fraction (%) 100.0 100.0 100.0 G. Pr e dictive unc ertainty (MC-Dr op out, T = 100 ) MCE 0.3724 0.3754 0.3786 Global mean ¯ σ 8 . 896 × 10 − 3 9 . 069 × 10 − 3 9 . 755 × 10 − 3 ¯ σ K xx 1 . 263 × 10 − 2 1 . 285 × 10 − 2 1 . 400 × 10 − 2 ¯ σ K xy 5 . 148 × 10 − 3 5 . 234 × 10 − 3 5 . 476 × 10 − 3 ¯ σ K yx 5 . 148 × 10 − 3 5 . 234 × 10 − 3 5 . 476 × 10 − 3 ¯ σ K yy 1 . 266 × 10 − 2 1 . 296 × 10 − 2 1 . 407 × 10 − 2 UE-corr ρ K xx 0.229 0.214 0.196 UE-corr ρ K xy 0.304 0.296 0.271 UE-corr ρ K yx 0.304 0.296 0.271 UE-corr ρ K yy 0.235 0.218 0.204 H. Mo del sp e cific ation T otal parameters (M) 118.97 118.87 118.87 Bac kb one parameters (M) 118.64 118.64 118.64 T ask-sp ecific parameters (M) 0.33 0.23 0.23 Bac kb one frozen Y es No No P orosity enco der + FiLM Y es No No SW A (epo chs 400–600) Y es No No Continue d on next p age 50 T able 15 (c ontinue d) Metric Phase 4 Phase 3 Phase 2 EMA ( λ = 0 . 9999 ) Y es No No Notes. ∆ R 2 = R 2 diag − R 2 off - diag ; a smaller gap indicates more balanced accuracy across tensor comp onents. MAPE for off-diagonal comp onents exceeds 100 % in quasi-isotropic samples where true v alues approac h zero; RRMSE is the more informative relative metric in this regime. MCE: mean calibration error of MC-Drop out prediction in terv als. UE-corr ρ > 0 confirms statistically significan t error-ranking capability of the uncertain t y estimates. FiLM: F eature-wise Linear Mo dulation; SW A: Sto chastic W eight A veraging; EMA: Exp onen tial Mo ving A verage. † MAPE for K xy and K yx is inflated b y near-zero true v alues in quasi-isotropic samples; RRMSE is the appropriate relativ e metric for these comp onen ts. References [1] M. J. Blun t. “Flo w in porous media—p ore-net work mo dels and m ultiphase flow”. In: Curr ent opinion in c ol loid & interfac e scienc e 6.3 (2001), pp. 197–207 (cit. on p. 1 ). [2] A. J. Ladd and P . Szymczak. “Reactiv e flo ws in p orous media: Challenges in theoretical and numerical metho ds”. In: A nnual r eview of chemic al and biomole cular engine ering 12.1 (2021), pp. 543–571 (cit. on p. 1 ). [3] E. R. Ok oroafor, S. D. Saltzer, and A. R. Ko vscek. “T ow ard underground hydrogen storage in p orous media: Reserv oir engineering insights”. In: International Journal of Hydr o gen Ener gy 47.79 (2022), pp. 33781–33802 (cit. on p. 1 ). [4] D. B. Ingham, A. Bejan, E. Mamut, and I. Pop. “Emerging tec hnologies and techniques in p orous media”. In: (2012) (cit. on p. 1 ). [5] H. Nisb et, G. Buscarnera, J. Carey, M. A. Chen, E. Detourna y, H. Huang, J. Hyman, P . Kang, Q. Kang, J. F. Labuz, et al. “Carb on mineralization in fractured mafic and ultramafic ro c ks: A review”. In: R eviews of Ge ophysics 62.4 (2024), e2023R G000815 (cit. on p. 1 ). [6] M. No oraiep our, K. M. Dąbrowski, M. Masoudi, S. Kuczyński, Z. Song, A. E. Lothe, and H. Hellev ang. “Geological CO2 storage assessmen t in emerging CCS regions: Review of sequestration p oten tial, p olicy dev elopment, and so cio-economic factors in P oland”. In: International Journal of Gr e enhouse Gas Contr ol 148 (2025), p . 104524 (cit. on p. 1 ). [7] L. M. McDo w ell-Boy er, J. R. Hun t, and N. Sitar. “Particle transp ort through porous media”. In: W ater r esour c es r ese ar ch 22.13 (1986), pp. 1901–1921 (cit. on p. 1 ). [8] P . Meakin and A. M. T artak ovsky. “Mo deling and simulation of p ore-scale multiphase fluid flo w and reactive transp ort in fractured and porous media”. In: R eviews of Ge ophysics 47.3 (2009) (cit. on p. 2 ). [9] W. Ehlers and A. W agner. “Mo delling and simulation metho ds applied to coupled problems in p orous-media mechanics”. In: A r chive of A pplie d Me chanics 89.4 (2019), pp. 609–628 (cit. on p. 2 ). [10] H. Liu, Q. Kang, C. R. Leonardi, S. Sc hmieschek, A. Narv áez, B. D. Jones, J. R. Williams, A. J. V alo cc hi, and J. Harting. “Multiphase lattice Boltzmann sim ulations for p orous media applications: A review”. In: Computational Ge oscienc es 20.4 (2016), pp. 777–805 (cit. on p. 2 ). [11] L. Chen, A. He, J. Zhao, Q. Kang, Z. - Y. Li, J. Carmeliet, N. Shikazono, and W. - Q. T ao. “P ore-scale mo deling of complex transp ort phenomena in p orous media”. In: Pr o gr ess in Ener gy and Combustion Scienc e 88 (2022), p. 100968 (cit. on p. 2 ). 51 [12] P . Moin and K. Mahesh. “Direct numerical simulation: a to ol in turbulence researc h”. In: A nnual r eview of fluid me chanics 30.1 (1998), pp. 539–578 (cit. on p. 2 ). [13] B. D. W o o d, X. He, and S. V. Apte. “Mo deling turbulen t flows in porous media”. In: A nnual R eview of Fluid Me chanics 52.1 (2020), pp. 171–203 (cit. on p. 2 ). [14] Y. Da W ang, M. J. Blunt, R. T. Armstrong, and P . Mostaghimi. “Deep learning in pore scale imaging and mo deling”. In: Earth-Scienc e R eviews 215 (2021), p. 103555 (cit. on p. 2 ). [15] M. Delpisheh, B. Ebrahimp our, A. F attahi, M. Sia v ashi, H. Mir, H. Mashhadimoslem, M. A. Ab dol, M. Ghorbani, J. Shokri, D. Niblett, et al. “Lev eraging mac hine learning in p orous media”. In: Journal of Materials Chemistry A (2024) (cit. on p. 2 ). [16] J. Abbasi, A. D. Jagtap, B. Moseley, A. Hiorth, and P . Ø. Andersen. “Challenges and adv ancements in mo deling sho ck fronts with physics-informed neural netw orks: A review and b enchmarking study”. In: arXiv pr eprint arXiv:2503.17379 (2025) (cit. on p. 2 ). [17] R. Peng and X. Y ang. “Recen t adv ances in the application of p orosity and p ermeability in p orous media”. In: Physics of Fluids 37.7 (2025) (cit. on p. 2 ). [18] K. M. Graczyk and M. Matyka. “Predicting p orosity , p ermeabilit y , and tortuosity of p orous media from images by deep learning”. In: Scientific r ep orts 10.1 (2020), p. 21488 (cit. on p. 2 ). [19] J. W u, X. Yin, and H. Xiao. “Seeing p ermeabilit y from images: fast prediction with con volutional neural netw orks”. In: Scienc e bul letin 63.18 (2018), pp. 1215–1222 (cit. on pp. 2 , 3 ). [20] X. An, S. T eng, Z. W ang, and Q. Liang. “Implicit neural netw ork-based coal SEM sup er- resolution for enhancing micro-p ores measurement tasks”. In: Neur al Networks 198 (2026). doi : 10.1016/j.neunet.2026.108578 (cit. on p. 2 ). [21] N. Alqahtani, R. T. Armstrong, and P . Mostaghimi. “Deep learning con volutional neural net works to predict p orous media prop erties”. In: SPE A sia Pacific oil and gas c onfer enc e and exhibition . SPE. 2018, D012S032R010 (cit. on p. 2 ). [22] T. Y asuda, S. Ooka wara, S. Y oshikaw a, and H. Matsumoto. “Mac hine learning and data- driv en c haracterization framew ork for p orous materials: Permeabilit y prediction and c han- neling defect detection”. In: Chemic al Engine ering Journal 420 (2021), p. 130069 (cit. on pp. 2 , 3 ). [23] M. Elmorsy, W. El-Dakhakhni, and B. Zhao. “Generalizable p ermeabilit y prediction of digital p orous media via a no vel multi-scale 3D conv olutional neural net work”. In: W ater R esour c es R ese ar ch 58.3 (2022), e2021WR031454 (cit. on p. 2 ). [24] S. Gärttner, F. O. Alpak, A. Meier, N. Ra y, and F. F rank. “Estimating permeability of 3D micro-CT images by ph ysics-informed CNNs based on DNS”. In: Computational Ge oscienc es 27.2 (2023), pp. 245–262 (cit. on pp. 2 , 3 ). [25] H. Li, B. Aslam, and B. Y an. “An Integrated Deep Learning and Ph ysics-Constrained Upscaling W orkflow for Robust Permeabilit y Prediction in Digital Ro c k Ph ysics”. In: SPE Journal (2025), pp. 1–20 (cit. on p. 2 ). [26] A. M. T artako vsky, C. O. Marrero, P . Perdikaris, G. D. T artak ovsky, and D. Bara jas- Solano. “Physics-informed deep neural net w orks for learning parameters and constitutive relationships in subsurface flo w problems”. In: W ater R esour c es R ese ar ch 56.5 (2020), e2019WR026731 (cit. on p. 2 ). [27] A. Kashefi and T. Mukerji. “Prediction of fluid flo w in p orous media by sparse observ ations and physics-informed Poin tNet”. In: Neur al Networks 167 (2023), pp. 80–91. doi : 10.1016/ j.neunet.2023.08.006 (cit. on p. 2 ). 52 [28] M. Elmorsy, W. El-Dakhakhni, and B. Zhao. “Rapid p ermeabilit y upscaling of digital p orous media via physics-informed n eural net works”. In: W ater R esour c es R ese ar ch 59.12 (2023), e2023WR035064 (cit. on p. 2 ). [29] Y. Zhuang, Q. Y e, N. Liu, X. Xie, H. Y an, and L. Zeng. “Hybrid ph ysics-data-driven deep learning for p ore-scale transp ort in microfluidic system”. In: Physics of Fluids 37.7 (2025) (cit. on p. 2 ). [30] Q. Zhao, X. Han, R. Guo, and C. Chen. “A computationally efficient h ybrid neural netw ork arc hitecture for p orous media: In tegrating conv olutional and graph neural net works for impro ved property predictions”. In: A dvanc es in W ater R esour c es 195 (2025), p. 104881 (cit. on pp. 2 , 30 ). [31] X. Zhao, Y. Zhong, and P . Li. “R TG-GNN: A no vel ro ck top ology-guided approac h for p ermeabilit y prediction using graph neural net w orks”. In: Ge o ener gy Scienc e and Engine ering 243 (2024), p. 213358 (cit. on pp. 2 , 30 ). [32] M. K. Alzahrani, A. Shap o v al, Z. Chen, and S. S. Rahman. “Pore-GNN: A graph neural net work-based framew ork for predicting flo w properties of p orous media from micro-CT images”. In: A dvanc es in Ge o-Ener gy R ese ar ch 10.1 (2023), pp. 39–55 (cit. on pp. 2 , 30 ). [33] S. Cai, Z. Mao, Z. W ang, M. Yin, and G. E. Karniadakis. “Physics-informed neural netw orks (PINNs) for fluid mec hanics: A review”. In: A cta Me chanic a Sinic a 37.12 (2021), pp. 1727– 1738 (cit. on p. 2 ). [34] M. No oraiep our. “T raditional and Mac hine Learning Approaches to P artial Differen tial Equations: A Critical Review of Metho ds, T rade-Offs, and Integration”. In: Pr eprints (Sept. 2025). doi : 10.20944/preprints202509.0472.v1 (cit. on p. 2 ). [35] P . Liu, L. W ang, R. Ranjan, G. He, and L. Zhao. “A surv ey on active deep learning: F rom mo del driven to data driv en”. In: A CM Computing Surveys (CSUR) 54.10s (2022), pp. 1–34 (cit. on p. 2 ). [36] C. K. Williams and C. E. Rasmussen. Gaussian pr o c esses for machine le arning . V ol. 2. 3. MIT press Cam bridge, MA, 2006 (cit. on p. 2 ). [37] F. Stulp and O. Sigaud. “Man y regression algorithms, one unified mo del: A review”. In: Neur al Networks 69 (2015). Cited by: 189; All Op en Access, Green Op en Access, pp. 60–79. doi : 10.1016/j.neunet.2015.05.005 (cit. on p. 2 ). [38] E. Sch ulz, M. Sp eeken brink, and A. Krause. “A tutorial on Gaussian pro cess regression: Mo delling, exploring, and exploiting functions”. In: Journal of mathematic al psycholo gy 85 (2018), pp. 1–16 (cit. on p. 2 ). [39] L. Breiman. “Random forests”. In: Machine le arning 45.1 (2001), pp. 5–32 (cit. on p. 2 ). [40] J. H. F riedman. “Greedy function approximation: a gradient b o osting machine”. In: A nnals of statistics (2001), pp. 1189–1232 (cit. on p. 2 ). [41] M. F ernández-Delgado, M. Sirsat, E. Cernadas, S. Ala wadi, S. Barro, and M. F ebrero-Bande. “An extensive exp erimen tal surv ey of regression metho ds”. In: Neur al Networks 111 (2019). Cited by: 286; All Open A ccess, Green Op en Access, pp. 11–34. doi : 10.1016/j.neunet. 2018.12.010 (cit. on p. 2 ). [42] T. Chen and C. Guestrin. “Xgb oost: A scalable tree b o osting system”. In: Pr o c e e dings of the 22nd acm sigkdd international c onfer enc e on know le dge disc overy and data mining . 2016, pp. 785–794 (cit. on p. 2 ). [43] A. J. Smola and B. Sc hölkopf. “A tutorial on supp ort v ector regression”. In: Statistics and c omputing 14.3 (2004), pp. 199–222 (cit. on p. 2 ). [44] H. Druck er, C. J. Burges, L. Kaufman, A. Smola, and V. V apnik. “Supp ort v ector regression mac hines”. In: A dvanc es in neur al information pr o c essing systems 9 (1996) (cit. on p. 2 ). 53 [45] G. Huang, G. - B. Huang, S. Song, and K. Y ou. “T rends in extreme learning machines: A review”. In: Neur al Networks 61 (2015). Cited b y: 1686, pp. 32–48. doi : 10 . 1016 /j . neunet.2014.10.001 (cit. on p. 2 ). [46] T. Hastie, R. Tibshirani, J. H. F riedman, and J. H. F riedman. The elements of statistic al le arning: data mining, infer enc e, and pr e diction . V ol. 2. Springer, 2009 (cit. on p. 2 ). [47] T. Bouwmans, S. Jav ed, M. Sultana, and S. K. Jung. “Deep neural net work concepts for bac kground subtraction: A systematic review and comparativ e ev aluation”. In: Neur al Networks 117 (2019). Cited by: 318; All Op en A ccess, Bronze Op en Access, pp. 8–66. doi : 10.1016/j.neunet.2019.04.024 (cit. on p. 3 ). [48] C. Tian, L. F ei, W. Zheng, Y. Xu, W. Zuo, and C. - W. Lin. “Deep learning on image denoising: An ov erview”. In: Neur al Networks 131 (2020). Cited b y: 935, pp. 251–275. doi : 10.1016/j.neunet.2020.07.025 (cit. on p. 3 ). [49] M. Buda, A. Maki, and M. A. Mazuro wski. “A systematic study of the class imbalance problem in conv olutional neural netw orks”. In: Neur al Networks 106 (2018). Cited by: 2352; All Op en Access, Green Op en Access, pp. 249–259. doi : 10.1016/j.neunet.2018.07.011 (cit. on p. 3 ). [50] A. Mumuni and F. Mumuni. “CNN architectures for geometric transformation-inv arian t feature representation in computer vision: a review”. In: SN Computer Scienc e 2.5 (2021), p. 340 (cit. on p. 3 ). [51] Y. LeCun, Y. Bengio, and G. Hinton. “Deep learning”. In: natur e 521.7553 (2015), pp. 436– 444 (cit. on p. 3 ). [52] X. - S. W ei, Y. - Z. Song, O. Mac A o dha, J. W u, Y. P eng, J. T ang, J. Y ang, and S. Belongie. “Fine-grained image analysis with deep learning: A survey”. In: IEEE tr ansactions on p attern analysis and machine intel ligenc e 44.12 (2021), pp. 8927–8948 (cit. on p. 3 ). [53] H. Y ao, S. Zhang, Y. Zhang, J. Li, and Q. Tian. “Coarse-to-fine description for fine-grained visual categorization”. In: IEEE T r ansactions on Image Pr o c essing 25.10 (2016), pp. 4858– 4872 (cit. on p. 3 ). [54] A. Dosovitskiy. “An image is worth 16x16 w ords: T ransformers for image recognition at scale”. In: arXiv pr eprint arXiv:2010.11929 (2020) (cit. on p. 3 ). [55] D. So ydaner. “A tten tion mechanism in neural netw orks: where it comes and where it goes”. In: Neur al Computing and Applic ations 34.16 (2022), pp. 13371–13385 (cit. on p. 3 ). [56] A. V aswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. “A ttention is all y ou need”. In: A dvanc es in neur al information pr o c essing systems 30 (2017) (cit. on p. 3 ). [57] J. Y ang, C. Li, P . Zhang, X. Dai, B. Xiao, L. Y uan, and J. Gao. “F o cal self-attention for lo cal-global interactions in vision transformers”. In: arXiv pr eprint (2021) (cit. on p. 3 ). [58] A. Khan, Z. Rauf, A. Sohail, A. R. Khan, H. Asif, A. Asif, and U. F aro o q. “A survey of the vision transformers and their CNN-transformer based v arian ts”. In: A rtificial Intel ligenc e R eview 56.Suppl 3 (2023), pp. 2917–2970 (cit. on p. 3 ). [59] T. Xiao, M. Singh, E. Mintun, T. Darrell, P . Dollár, and R. Girshick. “Early con v olutions help transformers see better”. In: A dvanc es in neur al information pr o c essing systems 34 (2021), pp. 30392–30400 (cit. on p. 3 ). [60] Z. Dai, H. Liu, Q. V. Le, and M. T an. “Coatnet: Marrying con volution and attention for all data sizes”. In: A dvanc es in neur al information pr o c essing systems 34 (2021), pp. 3965–3977 (cit. on p. 3 ). 54 [61] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. W ei, Z. Zhang, S. Lin, and B. Guo. “Swin transformer: Hierarc hical vision transformer using shifted windo ws”. In: Pr o c e e dings of the IEEE/CVF international c onfer enc e on c omputer vision . 2021, pp. 10012–10022 (cit. on p. 3 ). [62] X. Dong, J. Bao, D. Chen, W. Zhang, N. Y u, L. Y uan, D. Chen, and B. Guo. “Cswin transformer: A general vision transformer backbone with cross-shaped windo ws”. In: Pr o c e e dings of the IEEE/CVF c onfer enc e on c omputer vision and p attern r e c o gnition . 2022, pp. 12124–12134 (cit. on p. 3 ). [63] H. Long. “Hybrid design of CNN and vision transformer: A review”. In: Pr o c e e dings of the 2024 7th International Confer enc e on Computer Information Scienc e and A rtificial Intel ligenc e . 2024, pp. 121–127 (cit. on p. 3 ). [64] Z. T u, H. T alebi, H. Zhang, F. Y ang, P . Milanfar, A. Bovik, and Y. Li. “Maxvit: Multi-axis vision transformer”. In: Eur op e an c onfer enc e on c omputer vision . Springer. 2022, pp. 459– 479 (cit. on pp. 3 , 4 , 7 ). [65] S. Bangalore Vijay akumar, K. T. Chitt y-V enkata, K. Ary a, and A. K. Somani. “Convision b enc hmark: A contemporary framework to b enchmark cnn and vit mo dels”. In: AI 5.3 (2024), pp. 1132–1171 (cit. on p. 4 ). [66] V. Hassija, B. Palanisam y, A. Chatterjee, A. Mandal, D. Chakrab orty, A. Pandey, G. Chalapathi, and D. Kumar. “T ransformers for Vision: A Surv ey on Inno v ativ e Metho ds for Computer Vision”. In: IEEE A c c ess (2025) (cit. on p. 4 ). [67] K. He, X. Chen, S. Xie, Y. Li, P . Dollár, and R. Girshic k. “Masked auto enco ders are scalable vision learners”. In: Pr o c e e dings of the IEEE/CVF c onfer enc e on c omputer vision and p attern r e c o gnition . 2022, pp. 16000–16009 (cit. on p. 4 ). [68] C. F eic htenhofer, Y. Li, K. He, et al. “Mask ed autoenco ders as spatiotemp oral learners”. In: A dvanc es in neur al information pr o c essing systems 35 (2022), pp. 35946–35958 (cit. on p. 4 ). [69] C. J. Reed, R. Gupta, S. Li, S. Bro c kman, C. F unk, B. Clipp, K. Keutzer, S. Candido, M. Uyttendaele, and T. Darrell. “Scale-mae: A scale-aw are masked auto encoder for multi- scale geospatial represen tation learning”. In: Pr o c e e dings of the IEEE/CVF International Confer enc e on Computer V ision . 2023, pp. 4088–4099 (cit. on p. 4 ). [70] T. Cohen and M. W elling. “Group equiv ariant conv olutional net works”. In: International c onfer enc e on machine le arning . PMLR. 2016, pp. 2990–2999 (cit. on pp. 4 , 11 , 31 , 41 ). [71] T. S. Cohen, M. Geiger, J. K öhler, and M. W elling. “Spherical cnns”. In: arXiv pr eprint arXiv:1801.10130 (2018) (cit. on pp. 4 , 41 ). [72] M. W eiler and G. Cesa. “General e (2)-equiv ariant steerable cnns”. In: A dvanc es in neur al information pr o c essing systems 32 (2019) (cit. on pp. 4 , 11 , 31 , 41 ). [73] E. Perez, F. Strub, H. De V ries, V. Dumoulin, and A. Courville. “Film: Visual reasoning with a general conditioning lay er”. In: Pr o c e e dings of the AAAI c onfer enc e on artificial intel ligenc e . V ol. 32. 1. 2018 (cit. on pp. 5 , 8 ). [74] P . Izmailov, D. P o doprikhin, T. Garipov, D. V etrov, and A. G. Wilson. “A v eraging w eigh ts leads to wider optima and b etter generalization”. In: arXiv pr eprint (2018) (cit. on pp. 5 , 10 , 43 ). [75] B. T. P olyak and A. B. Juditsky. “Acceleration of sto c hastic approximation by a veraging”. In: SIAM journal on c ontr ol and optimization 30.4 (1992), pp. 838–855 (cit. on p. 5 ). [76] S. V argdal, P . Reis, H. A. Sveinsson, and G. Linga. “Neural Net w orks for Predicting P ermeability T ensors of 2D Porous Media: Comparison of Conv olution- and T ransformer- based Architectures”. In: (2025) (cit. on pp. 5 , 6 , 29 ). 55 [77] S. V argdal. Dataset and Mo del W eights for: Neur al Networks for Pr e dicting Perme abil- ity T ensors of 2D Por ous Me dia: Comp arison of Convolution- and T r ansformer- b ase d A r chite ctur es . Zeno do, No v. 2025. doi : 10 .5281 / zenodo .17711512 (cit. on pp. 5 , 6 , 29 , 34 ). [78] R. Wigh tman. PyT or ch Image Mo dels . https : / / github . com / huggingface / pytorch - image- models . 2019 (cit. on p. 7 ). [79] R. Wigh tman, H. T ouvron, and H. Jégou. R esNet strikes b ack: A n impr ove d tr aining pr o c e dur e in timm . 2021 (cit. on p. 7 ). [80] D. Morales-Brotons, T. V ogels, and H. Hendrikx. “Exp onential mo ving a v erage of weigh ts in deep learning: Dynamics and b enefits”. In: arXiv pr eprint arXiv:2411.18704 (2024) (cit. on pp. 10 , 43 ). [81] J. Bear. Dynamics of fluids in p or ous me dia . Courier Corp oration, 2013 (cit. on p. 11 ). [82] S. Whitaker. The metho d of volume aver aging . V ol. 13. Springer Science & Business Media, 2013 (cit. on p. 11 ). [83] H. V. Gupta, H. Kling, K. K. Yilmaz, and G. F. Martinez. “Decomp osition of the mean squared error and NSE performance criteria: Implications for improving h ydrological mo delling”. In: Journal of hydr olo gy 377.1-2 (2009), pp. 80–91 (cit. on pp. 16 , 44 , 49 ). [84] C. J. Willmott. “On the v alidation of mo dels”. In: Physic al ge o gr aphy 2.2 (1981), pp. 184– 194 (cit. on pp. 16 , 44 , 49 ). [85] Q. Zhao and H. Xiao. “An End-to-End Differen tiable, Graph Neural Netw ork-Em b edded P ore Netw ork Mo del for P ermeability Prediction”. In: arXiv pr eprint (2025) (cit. on p. 30 ). [86] X. Zhao, Y. Zhong, and P . Li. “Porous rock p ermeabilit y quantification: A sup erpixel-driv en graph neural netw ork metho d for p ore region analysis”. In: Gondwana R ese ar ch (2025) (cit. on p. 30 ). [87] I. Loshchilo v and F. Hutter. “Decoupled w eight deca y regularization”. In: arXiv pr eprint arXiv:1711.05101 (2017) (cit. on p. 40 ). [88] B. Efron. “Better b ootstrap confidence interv als”. In: Journal of the A meric an statistic al A sso ciation 82.397 (1987), pp. 171–185 (cit. on pp. 44 , 46 ). [89] C. Guo, G. Pleiss, Y. Sun, and K. Q. W einberger. “On calibration of modern neural net works”. In: International c onfer enc e on machine le arning . PMLR. 2017, pp. 1321–1330 (cit. on p. 45 ). 56
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment