PathoSyn: Imaging-Pathology MRI Synthesis via Disentangled Deviation Diffusion
📝 Original Info
- Title: PathoSyn: Imaging-Pathology MRI Synthesis via Disentangled Deviation Diffusion
- ArXiv ID: 2512.23130
- Date: 2025-12-29
- Authors: Jian Wang, Sixing Rong, Jiarui Xing, Yuling Xu, Weide Liu
📝 Abstract
We present PathoSyn, a unified generative framework for Magnetic Resonance Imaging (MRI) image synthesis that reformulates imaging-pathology as a disentangled additive deviation on a stable anatomical manifold. Current generative models typically operate in the global pixel domain or rely on binary masks, these paradigms often suffer from feature entanglement, leading to corrupted anatomical substrates or structural discontinuities. PathoSyn addresses these limitations by decomposing the synthesis task into deterministic anatomical reconstruction and stochastic deviation modeling. Central to our framework is a Deviation-Space Diffusion Model designed to learn the conditional distribution of pathological residuals, thereby capturing localized intensity variations while preserving global structural integrity by construction. To ensure spatial coherence, the diffusion process is coupled with a seam-aware fusion strategy and an inference-time stabilization module, which collectively suppress boundary artifacts and produce high-fidelity internal lesion heterogeneity. PathoSyn provides a mathematically principled pipeline for generating high-fidelity patient-specific synthetic datasets, facilitating the development of robust diagnostic algorithms in low-data regimes. By allowing interpretable counterfactual disease progression modeling, the framework supports precision intervention planning and provides a controlled environment for benchmarking clinical decision-support systems. Quantitative and qualitative evaluations on tumor imaging benchmarks demonstrate that PathoSyn significantly outperforms holistic diffusion and mask-conditioned baselines in both perceptual realism and anatomical fidelity. The source code of this work will be made publicly available.📄 Full Content
Despite growing interest, generating pathological images remains challenging due to the intrinsic asymmetry between anatomy and disease. The anatomical structure, including organ geometry and spatial layout, is largely stable for a given subject, whereas the pathological appearance is the main source of uncertainty, varying in intensity, texture, and clinical stage. The most clinically relevant variability is concentrated in the way disease perturbs a stable anatomical substrate. Existing generative models [3], [12] rarely make this distinction explicit. When operating directly in image space, anatomy and pathology are treated equally stochastic; the model must learn a high-dimensional distribution over all pixels simultaneously. This unnecessarily enlarges the solution space and permits the generator to introduce anatomical modifications in order to fit the data, thereby producing systematic distortions of anatomically stable structures. Consequently, current approaches fall into two suboptimal extremes: either they model the image as a holistic entity with no explicit separation [13], or they enforce complete separation via masking, which identifies location, but overlooks internal lesion appearance and continuity with surrounding tissue.
Although deep learning-based methods have significantly advanced visual fidelity compared to traditional rule-based algorithms, they have remained largely confined to the same image-space paradigm [14], [15]. Early applications of generative adversarial networks (GANs) synthesized lesions by modifying entire slices or volumes, sometimes conditioned on labels or coarse masks [1], [16]- [19]; these models can create sharp and visually plausible images, but tend to be unstable, difficult to control, and prone to hallucination of anatomy that does not correspond to any realistic background. Variational autoencoders (VAEs) introduced a probabilistic latent representation and more stable training [20]- [22], and structured or disentangled variants attempted to separate anatomy and pathology in latent space [23], [24]. However, because decoding still happens directly to full images, they often oversmooth fine pathological details and leak stochastic variation into non-lesional regions. More recently, diffusion models have become the dominant paradigm for high-fidelity synthesis in both natural and medical imaging [12], [13], [25]- [27]. By iteratively denoising from a simple prior, they capture rich, multimodal image statistics and offer strong sample diversity. However, most medical diffusion models still learn a distribution over entire images or over mask-conditioned images, without explicitly isolating where uncertainty should reside. Mask conditioning helps to specify where pathology should appear but not what it should look like internally [28]- [30]; lesions may have plausible boundaries, but lack realistic internal texture, heterogeneity, or progression patterns. As a result, even advanced models still entangle anatomical background with pathological variation and fail to exploit that anatomy is largely deterministic while pathology is a structured deviation (Fig. 1).
These limitations indicate that progress in pathological synthesis is not primarily determined by the choice of network architecture but by how the problem itself is represented. When a model treats the entire image as an unconstrained random variable, it must learn a full high-dimensional distribution in which anatomy and pathology vary simultaneously, even though only one of them is expected to change. Instead, a more principled formulation views a pathological image as the combination of two qualitatively different factors: a stable anatomical substrate that should remain preserved and a pathological deviation that carries the uncertainty, diversity and progression of the disease. Structurally separating these components reduces the effective complexity of the generative task and better reflects clinical reality: anatomy behaves as the baseline, while pathology introduces variation. In other words, disease does not replace the entire image; it perturbs an existing anatomical state. Motivated by this perspective, we introduce PathoSyn, a deviation-based framework for pathological image generation. Rather than end-to-end synthesizing full images, PathoSyn focuses the generative process on the pathology component while preserving the underlying anatomical substrate. We instantiate this concept using a diffusion process defined over the deviation rather than the image itself, enabling pathological variation to be modeled as a controlled and spatially coherent modification that can be integrated with anatomy without introducing unintended structural change. This localizes stochasticity to where it belongs, within pathology, while maintaining anatomical fidelity by construction. As a result, PathoSyn captures clinically significant lesion heterogeneity, preserves global structure, and supports controllable manipulation of pathology appearance without destabilizing non-lesional regions. In summary, PathoSyn provides three key contributions:
• A representation shift that models pathology as a structured deviation from a preserved anatomical substrate, rather than treating the entire image as stochastic. • A joint deep learning framework with deviation-space diffusion formulation that localizes generative uncertainty to disease regions, enabling controlled, interpretable, and anatomically consistent synthesis. • A practical synthesis pipeline that produces realistic pathological variation while reducing anatomical distortion and improving the reliability of downstream analysis.
Prior approaches to pathological image generation can be differentiated by the domain in which the generative distribution is defined and, consequently, which components of the image are treated as stochastic.
a) Holistic Image Synthesis: Early GAN-and VAE-based models [1], [16]- [22] assume a holistic generative process x ∼ p θ (x), which requires the network to assign the probability mass across the entire image manifold X ∈ R H×W ×C . This implicitly treats both the anatomical substrate a and the pathological appearance d as coupled random variables. Because p θ (x) must account for the morphology of the lesion without explicit structural constraints, the model often exploits anatomical degrees of freedom to minimize the training objective, leading to realistic pathology at the cost of distorted background anatomy.
b) Mask-Conditioned and Inpainting Methods: To constrain the generative search space, mask-conditioned methods [28]- [30] narrow the problem to x ∼ p θ (x | m), where m ∈ {0, 1} H×W defines a binary lesion region. Although this specifies where changes occur, it does not define how the disease should manifest. The mask m encodes spatial support but lacks internal semantic identity; consequently, the generator solves a boundary-respecting fill-in problem rather than a clinically grounded deviation. This produces plausible contours but insufficient internal heterogeneity, as m resolves spatial ambiguity but not distributional uncertainty of the state of the disease.
c) Factorized Representation Learning: Representationbased approaches introduce latent factors [31], often decomposing the image into (z anat , z path ) to separate structural and pathological attributes. This aims to model p θ (x) through a factorized prior p(z anat )p(z path ). However, the transformation back to the pixel space, x = f θ (z anat , z path ), inevitably recouples these factors. In practice, stochastic variation intended for the pathology leaks into non-lesional regions. While the representation is disentangled in latent space, the generative mapping is not, meaning anatomical preservation is not guaranteed by construction.
d) Diffusion-based Generative Models: Modern diffusion models improve stability and multimodal sampling, yet they typically evolve the stochastic process over the image x itself. Even context-aware variants implement denoising transitions x t → x t-1 across the entire spatial domain. This enforces neither anatomical invariance nor deviation locality; the model effectively re-synthesizes the entire image rather than expressing uncertainty exclusively where pathology resides. Consequently, diffusion improves local fidelity while retaining the core representational limitation of image-space modeling.
In summary, existing methods either (i) conflate anatomy and pathology in a single stochastic space or (ii) enforce rigid spatial partitioning without modeling pathology as a continuous, anatomy-dependent deviation. Both reflect the same underlying issue: they attempt to learn p θ (x) directly, while a clinically grounded formulation should localize uncertainty to the disease-bearing component d while treating anatomy a as a stable substrate. This motivates the shift toward our proposed deviation-space representation.
We formulate generative modeling of pathological images by explicitly disentangling the subject-specific anatomical structure from disease-induced appearance variability. This decomposition is based on the distinct statistical properties of these two factors: for a given subject, the anatomical structure is largely invariant, while pathological manifestations exhibit high stochasticity, internal heterogeneity, and spatial localization, even within identical anatomical contexts.
In this framework, a pathological image x ∈ R H×W is represented as the additive superposition of a deterministic anatomical substrate x sub and a stochastic pathological deviation field r:
x = x sub + r,
where x sub characterizes the stable anatomical configuration, including organ morphology and global tissue organization, and r represents localized intensity deviations induced by pathological processes. This additive formulation contrasts fundamentally with deformation-based models, such as metamorphosis, which account for pathological changes through geometric transformations of an underlying template. By isolating disease-related variation within the appearance space, our formulation enables the independent modeling of pathological features without distorting the underlying anatomical geometry. Consequently, x sub is treated as a subject-specific baseline, while r is interpreted as a structured residual that captures the uncertainty and progression of the disease.
The decomposition in Eq. ( 1) naturally leads to a conditional probabilistic formulation. Since the anatomical substrate is assumed to be preserved for a given subject, we model the deviation field as a random variable conditioned on both the anatomy and the spatial support of the lesion:
where m ∈ {0, 1} H×W denotes a binary mask defining the spatial domain of the pathology. By conditioning the distribution on m, we explicitly constrain stochastic variation to the lesion site, thereby preventing unintended alterations to healthy tissue and ensuring anatomical fidelity by construction.
a) Inference of Latent Components: Under the additive model defined in Eq. ( 1), we assume that an observed pathological image x admits a latent decomposition x = x sub + r.
Let Ω ⊂ Z 2 denote the discrete image lattice, where x : Ω → R represents the intensity function. The spatial support of the pathology is defined by a binary mask m : Ω → {0, 1}, where m(p) = 1 indicates that the pixel p ∈ Ω resides within the lesional region. We denote the complementary non-lesional mask as m := 1 -m.
Recovering (x sub , r) from a single observation x is an illposed inverse problem. To achieve a well-posed formulation, we impose structural constraints consistent with clinical priors: (i) intensities in the non-lesional domain m are dominated by subject-specific anatomy, and (ii) pathological appearance variations are strictly localized to the support m. These assumptions motivate a constrained inference scheme in which x sub is estimated primarily from the context of healthy tissue and r is limited to the pathological region.
b) Anatomical Substrate Estimation: We define the anatomical substrate x sub : Ω → R as a pathologically-suppressed representation that preserves subject-specific morphology. To enforce structural consistency outside the lesion, we utilize the Hadamard product ⊙. The constraint x sub ≈ x on m is formalized by a masked fidelity term ∥ (x sub -x) ⊙ m∥ 2 2 , where ∥y∥ 2 2 := p∈Ω y(p) 2 . Because the underlying anatomy within the lesional region m is not observed, we introduce a counterfactual healthy reference x ph := Inp(x, m), where Inp(•) denotes a fixed inpainting operator that extrapolates the anatomical context from m to m. We estimate x sub by minimizing the variational objective:
where λ out , λ in > 0 are hyperparameters that balance extrinsic fidelity and intrinsic regularization. In practice, x sub is parameterized by a neural predictor f θ optimized for this objective. c) Pathological Deviation Field: Given the estimated substrate x sub , the subject-specific deviation field is defined as the residual supported by the lesion:
By construction, r 0 (p) = 0 for all p ∈ m, ensuring that stochasticity is restricted to support of the lesion. Unlike deformation-based models that characterize pathology via nonrigid warps, r 0 captures additive intensity and texture deviations in the appearance space. To bound the dynamic range and ensure numerical stability for the subsequent diffusion process, we apply a point-wise saturation operator:
where δ > 0 defines the maximum admissible deviation magnitude.
Building upon the decomposition defined in Eq. ( 1), we model the conditional distribution p(r | x sub , m) using a Denoising Diffusion Probabilistic Model (DDPM) [25]. The diffusion process is defined over the deviation field r rather than the global image manifold x. This formulation ensures that the generative modeling capacity is exclusively allocated to the pathological appearance, while the anatomical substrate remains deterministic and invariant.
a) Forward Diffusion Process: Let r 0 denote the clean deviation field extracted via Eq. ( 4). We define a Markovian forward process that iteratively adds Gaussian noise over T steps according to a variance schedule
The latent state at any timestep t can be expressed in closed form as:
where α t = 1 -β t and ᾱt = t s=1 α s . To ensure that stochasticity remains confined to the pathological domain, we enforce a spatial support constraint at each step:
This projection prevents the diffusion process from introducing spurious variances or intensity shifts in the non-lesional anatomical regions.
b) Reverse Denoising and Training Objective: The reverse process is parameterized by a conditional noise predictor ϵ θ , which estimates the noise component added to the deviation field. The network is conditioned on the noisy deviation r t , the anatomical substrate x sub , the spatial mask m, and the diffusion time step t. We optimize the parameters θ via the mean-squared error (MSE) objective:
where t ∼ U(1, T ) and r t is sampled according to Eq. ( 7) subject to the constraint in Eq. ( 8). c) Ancestral Sampling and Synthesis: During inference, synthesis begins with sampled Gaussian noise restricted to the lesion support: r T ∼ N (0, I) followed by r T ← r T ⊙ m. We iteratively recover the deviation field by applying the reverse transition for t = T, . . . , 1:
where z ∼ N (0, I) for t > 1 and z = 0 for t = 1. To preserve anatomical fidelity, we re-apply the spatial projection r t-1 ← r t-1 ⊙ m after each update. The final sample r = r 0 constitutes a stochastic realization of the pathological deviation conditioned on the subject-specific anatomy.
We implement the proposed formulation through a unified deep learning framework that jointly estimates the anatomical substrate and models the conditional distribution of the pathological deviation field. All components are optimized under a shared objective, ensuring that anatomical-pathological decomposition and subsequent recomposition remain mutually consistent throughout training.
a) Network Architecture: The anatomical substrate estimator f sub is realized as a symmetric U-Net with four downsampling stages and skip connections, facilitating the preservation of high-frequency anatomical details while aggregating global contextual information. The diffusion noise predictor ϵ θ utilizes a high-capacity U-Net architecture composed of Wide-ResNet blocks, with spatial self-attention integrated in the 16 × 16 bottleneck resolution to capture long-range dependencies within pathological regions. The diffusion time step t is projected through sinusoidal positional embeddings and injected into each residual block to condition the denoising process.
b) Pathology-Deviation Regularization: Although the diffusion objective in Eq. ( 9) learns the conditional distribution of pathological deviation fields, it does not explicitly enforce boundary smoothness or subject-specific consistency. We therefore introduce pathology-aware regularization to suppress spatial leakage and stabilize the interaction between the anatomical substrate and the deviation field.
Let S ∈ [0, 1] H×W denote a soft blend map derived by applying a Gaussian kernel to the binary lesion mask m, defining a narrow transition band in the periphery of the lesion. From S, we derive a boundary weighting function:
…
Denoising neural network
Pathology deviation field learning
Eq . ( 3)
Eq . ( 13)
Eq . ( 9) & ( 12)
Eq . ( 14) which emphasizes the lesion boundary while vanishing in both the interior and exterior domains. During training, we compute a denoised estimate of the clean deviation field:
We constrain r0 to match the subject-specific deviation target r 0 = (x -x sub ) ⊙ m via a multi-term regularization loss:
where the terms, respectively, enforce internal fidelity, boundary consistency, and the suppression of pathological leakage into non-lesional regions. c) Seam-Aware Recomposition: To explicitly pair deviation modeling with image synthesis, a seam-aware recomposition layer is incorporated during training. Given the estimated anatomical substrate and deviation field, the synthesized pathological image is formed as follows:
The consistency of reconstruction is enforced through L syn = ∥(x -x) ⊙ S∥ 1 , which prioritizes the supervision of anatomically sensitive transition zones to ensure seamless integration.
Algorithm 2: PathoSyn Inference
// Combine substrate and residual return
x IV. EXPERIMENTAL EVALUATION 1) Dataset and Pre-processing: Experiments are conducted on the BraTS 2020 brain tumor dataset [32]- [34], which comprises multi-institutional, multi-modal magnetic resonance images including T1, T1-weighted contrast-enhanced (T1Gd), T2-weighted and T2 Fluid Attenuated Inversion Recovery (FLAIR) volumes. The dataset provides expert voxel-level annotations for three tumor sub-regions: the necrotic and nonenhancing tumor core (NCR/NET), the peritumoral edema (ED), and the GD-enhancing tumor (ET). All imaging data are provided in a pre-processed format, including co-registration to a common anatomical template, interpolation to a uniform 1 mm 3 resolution, and skull-stripping.
We utilize the official training partition consisting of n = 369 subjects. To ensure a rigorous evaluation framework and prevent data leakage, these subjects are divided at the patient level into three disjoint sets: a training set of 295 subjects, a validation set of 37 subjects and a hold-out testing set of 37 subjects. While the training set is used to optimize the parameters of f sub and ϵ θ , the validation set is used to monitor convergence and hyperparameter tuning. The final synthesis performance is evaluated exclusively on the testing set to ensure that the model is generalizable to unseen anatomical substrates. For computational efficiency, all 3D MRI volumes are processed into 2D axial slices and in HDF5 format, maintaining strict subject-level separation across all partitions.
All scans undergo standardized preprocessing, including skull stripping, spatial resampling to a common grid, and normalization of the intensity. To disentangle the effect of the quality of the synthesis from that of the size of the training set, all the augmentation regimes, comprising the baseline methods (Brain-LDM [13], MaskDiff-Inpaint [35]) are configured to contribute an equivalent number of synthetic samples to the training pool. To ensure a fair, controlled comparison and seamless integration with our proposed methods, we implement a pacth-wise VAE-GAN [36] within our unified PathoSyn framework. This allows us to standardize training protocols, architectural choices, and evaluation procedures across PathoSyn: VAE-GAN and our final model, PathoSyn: Diff, thus isolating the contribution of the generative paradigm rather than confounding factors from differing implementations.
For methodological parity, all task-specific models share an identical 2D architecture and optimization pipeline. We employ the AdamW optimizer (η = 10 -4 , weight decay 10 -5 ) with a cosine annealing scheduler. Generative models are trained for 300 epochs, while downstream task models are trained for 200. PathoSyn-Diff utilizes a T = 1000 step linear noise schedule with DDIM sampling during inference. PathoSyn-VAE-GAN employs a 128-dimensional latent space and a PatchGAN discriminator with an adversarial weight λ adv = 0.1. Training is performed on NVIDIA RTX 3090/A6000 GPUs (24-48 GB VRAM). Inference utilizes test-time normalization only, without augmentation, to prevent masking potential domain shifts.
a) Data Realism Analysis: We quantify visual and statistical fidelity through a real-versus-synthetic discrimination task. A binary classifier is trained to distinguish authentic BRaTS 2023 slices from generated samples across all synthesis paradigms. Performance is measured via Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC).
• 95% bootstrap confidence intervals are used to evaluate statistical robustness.
• Lower AUC values (↓) signify higher realism, indicating that synthetic samples are statistically closer to the real data manifold. b) Mathematical Disentanglement Analysis: To rigorously evaluate the separation of subject-specific anatomy and pathological variations, we analyze the statistical independence between the anatomical substrate x sub and the deviation field r. Ideally, a robust generative framework should achieve highlevel feature orthogonality while maintaining low-level spatial alignment. We define two metrics to quantify the disentanglement in the latent feature space Φ:
- Feature Orthogonality (C): We measure the absolute cosine similarity between the feature vectors f sub = ϕ(x sub ) and f res = ϕ(r) extracted by a frozen medical encoder ϕ(•). A value C ≈ 0 indicates that the two representations reside in orthogonal subspaces. 2) Mutual Information (MI): We estimate the non-linear dependency between f sub and f res . A reduction in MI signifies that the stochastic pathological synthesis does not leak anatomical information into the deviation field. c) Downstream Task Utility: We evaluate the functional utility of synthetic data by measuring performance gains in segmentation and classification tasks. All experiments are conducted in 2D to isolate the influence of generative quality from architectural advantages.
• Segmentation: We benchmark across the supervised regimes (nnU-Net [37], U-Net [38]), semi-supervised regimes (Mean-Teacher U-Net [39]), and unsupervised (Contra-Seg [40]). The results are reported as mean Dice similarity coefficient (DSC) ± standard deviation. Fig. 3 (bottom) details the distribution of discriminability AUC values obtained via bootstrap resampling. The violin and box plots depict the variance in AUC estimates, while diamonds denote the point-estimate AUC derived from the mean ROC curve. Narrower distributions paired with lower medians (↓) reflect improved generative stability and a consistent alignment with the real data distribution. This statistical stability across resamples demonstrates that the achieved realism is an inherent property of the model’s learned representation rather than a stochastic artifact, ensuring robust generalization and minimal domain shift between synthetic and authentic pathological images.
As summarized in Tab. I, we evaluate the mathematical independence between the anatomical substrate and pathological deviations.
Anatomical Corruption in Holistic Models: A fundamental flaw of holistic models like Brain-LDM is the lack of explicit structural constraints. Since they treat the image as a monolithic distribution, the synthesis of a lesion inevitably leads to anatomical leakage, where the intensity and texture of healthy brain tissues are non-linearly corrupted. This is reflected in the high feature coupling (C=0.425), indicating that the model cannot alter the pathology without inadvertently modifying the identity of the underlying subject.
Structured Disentanglement via Deviation Space: Unlike dual-encoder frameworks (e.g., Meta-Auto [31]) that still exhibit significant latent overlap, PathoSyn achieves nearorthogonality (C=0.072). By constraining the generation to a dedicated deviation space, we effectively “shield” the anatomical manifold from pathological stochasticity. This ensures that the digital twins generated are not only visually realistic, but also mathematically consistent, providing a safer and more reliable foundation for the training of clinical AI.
Tab. II summarizes Dice segmentation performance across four learning regimes, from fully supervised to unsupervised. Adding PathoSyn: Diff synthetic data yields consistent, statistically significant improvements (p < 0.01) for all architectures. On the supervised nnU-Net baseline, PathoSyn: Diff reaches 0.845 ± 0.021 Dice, an absolute gain of +6.5% over “No Augmentation.” In the unsupervised Contra-Seg regime, Tab. III summarizes the effect of different synthesis strategies on pathological state classification across three backbones: ResNet-50 (CNN), DenseNet (2D), and Swin Transformer. PathoSyn: Diff augmentation consistently yields the best performance, reaching a peak AUROC of 0.915 on ResNet-50, substantially outperforming holistic synthesis models such as Brain-LDM and MaskDiff-Inpaint. This highlights the benefit of modeling stochasticity within a constrained deviation space rather than across the full image manifold. PathoSyn: Diff also delivers the best calibration, with the lowest Expected Calibration Error (ECE) across all test regimes (0.043-0.045). While conventional and holistic augmentations increase AUROC, Tab. III shows they offer only modest ECE gains. In contrast, our diffusion-based approach reduces calibration error by about 35% relative to No Augmentation. This indicates that high-fidelity, anatomically grounded lesions from PathoSyn prevent overconfidence on synthetic artifacts, yielding more reliable probability estimates crucial for clinical decision support. Fig. 5 (top left) shows tumor segmentation performance for models trained with different synthesis strategies. Beyond overall gains, we assess sensitivity to tumor size. Most generative methods offer modest improvements for large, highcontrast lesions but degrade sharply for smaller ones, largely due to loss of high-frequency texture in the latent space. In contrast, PathoSyn: Diff maintains strong performance across all lesion sizes. For small tumors (lowest quartile of lesion
Fig. 4 qualitatively compares the clinical plausibility of generated lesions. A key requirement in neuro-oncology is respecting anatomical boundaries and tissue-specific intensity profiles. Holistic models like Brain-LDM often generate “low-level” lesions that ignore underlying textures, whereas PathoSyn: Diff ensures that deviations-such as peritumoral edema-follow white matter tracts and ventricular constraints. Preserving the anatomical substrate during diffusion makes the synthesized pathology appear physiologically anchored rather than superimposed.
Fig. 5 (top right) reports Intensity and Texture Consistency, including first-order statistics (Mean, Skewness, Kurtosis) and second-order GLCM metrics (Contrast, Homogeneity). High-grade gliomas are radiologically defined by pronounced heterogeneity, with high intensity variance and characteristic gradients at infiltrative margins. The close match between our results and clinical data across these descriptors shows that PathoSyn accurately models the lesion’s internal photometric structure. This textural fidelity is crucial for training diagnostic models that are sensitive to subtle tissue variations used to determine tumor grade.
Fig. 5 (bottom left) measures the Deep Conceptual Texture Distance as a proxy for high-level semantic alignment. Radiologists characterize gliomas by complex morphologies, including internal mottling, rim enhancement, and central necrosis. Holistic models tend to yield over-smoothed or biologically sterile textures, but PathoSyn: Diff achieves the smallest conceptual distance to the BRaTS distribution, indicating that its latent space captures the semantic markers critical for diagnosis. This high-level alignment helps prevent shortcut learning: if synthetic data contain digital artifacts or lack biological complexity, downstream models may learn fake cues instead of real pathology. By narrowing this conceptual gap, PathoSyn encourages segmentation and classification networks to learn the same radiomic signatures found in patients, improving clinical reliability and generalization to real-world workflows.
Fig. 5 (bottom right) shows the Empirical CDF of the global feature distance to the real pathological manifold, providing a population-level view of generative reliability. The leftward shift of the PathoSyn curve indicates that samples consistently fall within the clinically acceptable region of the true distribution. Clinically, this distributional stability means the framework does not just produce occasional high-quality images but reliably generates diverse, biologically credible digital twins, reducing the risk of outliers that could undermine downstream clinical AI robustness.
In this paper, we introduce PathoSyn, a pathology-aware medical image synthesis framework that disentangles stable anatomical structure from stochastic pathological variation. Instead of synthesizing complete images from a single joint distribution, PathoSyn represents disease as a deviation field superimposed on the preserved anatomy. This formulation limits identity leakage, protects the patient’s specific structure, and improves the fidelity of tumor-related appearance changes. Compared to state-of-the-art approaches (holistic diffusion models, mask-based inpainting, and VAE-GANs), PathoSyn produces images with sharper infiltrative tumor margins, more realistic edema and necrotic textures, and fewer structural distortions. The low empirically measured correlation between anatomical and pathological representations (C = 0.072) indicates that pathology can be modulated with minimal unintended alteration of the anatomy of the core brain. Our framework removes the need for manual post hoc blending or heuristic inference-time adjustments, enabling seamless synthetic images for downstream research and clinical assessment. More broadly, the deviation field formulation provides a conceptual tool for modeling structured variation in computer vision and machine learning. It enables the representation of subtle changes in internal features and localized textural shifts without altering the global geometry. This principle extends to other medical imaging modalities such as CT, PET, ultrasound, and digital pathology, where disease progression often appears as regional texture and intensity changes rather than large-scale shape deformation.
A prominent clinical use case for PathoSyn is preoperative planning for brain tumor surgery. By producing multiple plausible MRI variants for an individual patient, PathoSyn can support clinicians in anticipating the short-term evolution of tumor texture, infiltration patterns, and peritumoral edema in the interval preceding surgery. These synthetic scenarios may help neurosurgeons and radiologists in the operative planning, risk stratification, and evaluation of alternative strategies in settings where updated images are unavailable or when rapid disease progression is suspected. However, the current implementation still has potential for further refinement. PathoSyn primarily models appearance changes and does not yet simulate largescale structural deformations such as mass effect, ventricular compression, or midline shift. In addition, robustness across scanners, institutions, and acquisition protocols remains constrained by domain shift, indicating a need for improved domain adaptation and harmonization. These issues must be resolved before the framework can be considered for regulated clinical deployment.
Future work will extend PathoSyn into a unified generative framework capable of jointly modeling both appearance alterations and geometric deformations. This expansion will enable representation of not only tumor texture but also the spatial impact of neoplastic growth on adjacent brain structures. The overarching objective is to support realistic data generation for surgical planning, longitudinal disease monitoring, and robustness evaluation of machine learning models. By explicitly combining anatomical stability with controlled, realistic pathological variation, PathoSyn constitutes a step toward clinically meaningful synthetic data generation that can underpin more reliable, interpretable, and trustworthy medical AI systems.
Input: