A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets

February 20, 2026

Reading time: 6 minute

...

📝 Original Info

Title: A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets
ArXiv ID: 2512.21760
Date: 2025-12-25
Authors: ** - Arunkumar Va (University College of Engineering, Bharathidasan Institute of Technology Campus, Anna University, Tiruchirappalli, Tamilnadu, India) - V. M. Firosb (National Institute of Technology, Tiruchirappalli, Tamil Nadu, India) - S. Senthilkumara (National Institute of Technology, Tiruchirappalli, Tamil Nadu, India) - G. R. Gangadharanb (National Institute of Technology, Tiruchirappalli, Tamil Nadu, India) — **

📝 Abstract

Multimodal medical imaging provides complementary information that is crucial for accurate delineation of pathology, but the development of deep learning models is limited by the scarcity of large datasets in which different modalities are paired and spatially aligned. This paper addresses this fundamental limitation by proposing an Adaptive Quaternion Cross-Fusion Network (A-QCF-Net) that learns a single unified segmentation model from completely separate and unpaired CT and MRI cohorts. The architecture exploits the parameter efficiency and expressive power of Quaternion Neural Networks to construct a shared feature space. At its core is the Adaptive Quaternion Cross-Fusion (A-QCF) block, a data driven attention module that enables bidirectional knowledge transfer between the two streams. By learning to modulate the flow of information dynamically, the A-QCF block allows the network to exchange abstract modality specific expertise, such as the sharp anatomical boundary information available in CT and the subtle soft tissue contrast provided by MRI. This mutual exchange regularizes and enriches the feature representations of both streams. We validate the framework by jointly training a single model on the unpaired LiTS (CT) and ATLAS (MRI) datasets. The jointly trained model achieves Tumor Dice scores of 76.7% on CT and 78.3% on MRI, significantly exceeding the strong unimodal nnU-Net baseline by margins of 5.4% and 4.7% respectively. Furthermore, comprehensive explainability analysis using Grad-CAM and Grad-CAM++ confirms that the model correctly focuses on relevant pathological structures, ensuring the learned representations are clinically meaningful. This provides a robust and clinically viable paradigm for unlocking the large unpaired imaging archives that are common in healthcare.

💡 Deep Analysis

📄 Full Content

A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets Arunkumar Va, V. M. Firosb, S. Senthilkumara, G. R. Gangadharanb, aUniversity College of Engineering, Bharathidasan Institute of Technology Campus, Anna University, Tiruchirappalli, Tamilnadu, 620 024, India bNational Institute of Technology, Tiruchirappalli, Tamil Nadu, 620015, India Abstract Multimodal medical imaging provides complementary information that is crucial for ac- curate delineation of pathology, but the development of deep learning models is limited by the scarcity of large datasets in which different modalities are paired and spatially aligned. This paper addresses this fundamental limitation by proposing an Adaptive Quaternion Cross-Fusion Network (A-QCF-Net) that learns a single unified segmentation model from completely separate and unpaired CT and MRI cohorts. The architecture exploits the parameter efficiency and expressive power of Quaternion Neural Networks to construct a shared feature space. At its core is the Adaptive Quaternion Cross-Fusion (A-QCF) block, a data driven attention module that enables bidirectional knowledge transfer be- tween the two streams. By learning to modulate the flow of information dynamically, the A-QCF block allows the network to exchange abstract modality specific expertise, such as the sharp anatomical boundary information available in CT and the subtle soft tissue contrast provided by MRI. This mutual exchange regularizes and enriches the feature representations of both streams. We validate the framework by jointly training a sin- gle model on the unpaired LiTS (CT) and ATLAS (MRI) datasets. The jointly trained model achieves Tumor Dice scores of 76.7% on CT and 78.3% on MRI, significantly ex- ceeding the strong unimodal nnU-Net baseline by margins of 5.4% and 4.7% respectively. Furthermore, comprehensive explainability analysis using Grad-CAM and Grad-CAM++ confirms that the model correctly focuses on relevant pathological structures, ensuring the learned representations are clinically meaningful. This provides a robust and clini- cally viable paradigm for unlocking the large unpaired imaging archives that are common in healthcare. Keywords: Cross-Attention, Deep Learning, Explainable AI, Grad-CAM, Medical Image Segmentation, Multimodal Learning, Quaternion Neural Networks, Unpaired Data 1. Introduction Multimodal medical imaging, exemplified by Computed Tomography (CT) and Mag- netic Resonance Imaging (MRI), provides complementary views that are clinically im- portant for the diagnosis and delineation of complex pathologies such as liver tumors. In the setting of hepatic malignancy, no single imaging modality captures all clinically relevant information. Computed Tomography, with its submillimeter resolution, offers excellent anatomical detail and is often preferred for visualizing sharp organ boundaries arXiv:2512.21760v1 [cs.CV] 25 Dec 2025 and vascular structures (Heimann et al., 2009). However, its ability to distinguish subtle variations in soft tissue, for example differentiating a necrotic tumor core from a viable margin or identifying small isodense lesions, is limited. Magnetic Resonance Imaging, in contrast, excels at soft tissue contrast and can reveal tumor textures and margins that are often inconspicuous on CT (Gross et al., 2024). Different MRI sequences emphasize dis- tinct biophysical properties, which supports more nuanced lesion characterization. MRI is therefore superior for detecting small satellite lesions and for assessing tumor infiltra- tion into the surrounding parenchyma. These properties suggest that a model capable of integrating the structural clarity of CT with the textural sensitivity of MRI could reach a level of precision and robustness that is not achievable with either modality alone. This clinical motivation creates a clear need for advanced multimodal AI systems. Most existing attempts to exploit unpaired cohorts follow an indirect strategy. They first synthesize a missing modality, for example, CT to MRI or MRI to CT, and then train a segmentation network on the generated images. Such a two stage pipeline ties the final segmentation accuracy to the fidelity of the synthesis model and introduces the risk that synthesis artifacts will propagate into the predicted masks. In this work, we follow a more direct strategy: we learn segmentation from unpaired cohorts without generating synthetic images, by encouraging the network to share modality invariant semantics at the feature level during joint training. This removes the surrogate synthesis objective and aligns optimization with the end segmentation task. Despite the clear clinical motivation for multimodal learning, the dominant deep learn- ing paradigm for multimodal fusion is severely constrained by data availability. Most cur- rent methods assume access to large collections of paired and spatially aligned datasets, in which individual patients

📄 Read Full PDF on ArXiv