Title: MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction
ArXiv ID: 2512.21897
Date: 2025-12-26
Authors: ** Carolina Aparício, Qi Shi, Bo Wen, Tesfaye Yadete, Qiwei Han **
📝 Abstract
Addressing the challenge of multimodal data fusion in high-dimensional biomedical informatics, we propose MMCTOP, a MultiModal Clinical-Trial Outcome Prediction framework that integrates heterogeneous biomedical signals spanning (i) molecular structure representations, (ii) protocol metadata and long-form eligibility narratives, and (iii) disease ontologies. MMCTOP couples schema-guided textualization and input-fidelity validation with modality-aware representation learning, in which domain-specific encoders generate aligned embeddings that are fused by a transformer backbone augmented with a drug-disease-conditioned sparse Mixture-of-Experts (SMoE). This design explicitly supports specialization across therapeutic and design subspaces while maintaining scalable computation through top-k routing. MMCTOP achieves consistent improvements in precision, F1, and AUC over unimodal and multimodal baselines on benchmark datasets, and ablations show that schema-guided textualization and selective expert routing contribute materially to performance and stability. We additionally apply temperature scaling to obtain calibrated probabilities, ensuring reliable risk estimation for downstream decision support. Overall, MMCTOP advances multimodal trial modeling by combining controlled narrative normalization, context-conditioned expert fusion, and operational safeguards aimed at auditability and reproducibility in biomedical informatics.
💡 Deep Analysis
📄 Full Content
1
MMCTOP: A Multimodal Textualization and
Mixture-of-Experts Framework for Clinical Trial
Outcome Prediction
Carolina Apar´ıcio, Qi Shi, Bo Wen, Tesfaye Yadete, and Qiwei Han
Abstract—Addressing the challenge of multimodal data fusion
in high-dimensional biomedical informatics, we propose MMC-
TOP, a MultiModal Clinical-Trial Outcome Prediction frame-
work that integrates heterogeneous biomedical signals spanning
(i) molecular structure representations, (ii) protocol metadata
and long-form eligibility narratives, and (iii) disease ontolo-
gies. MMCTOP couples schema-guided textualization and input-
fidelity validation with modality-aware representation learning, in
which domain-specific encoders generate aligned embeddings that
are fused by a transformer backbone augmented with a drug–
disease–conditioned sparse Mixture-of-Experts (SMoE). This de-
sign explicitly supports specialization across therapeutic and
design subspaces while maintaining scalable computation through
top-k routing. MMCTOP achieves consistent improvements in
precision, F1, and AUC over unimodal and multimodal baselines
on benchmark datasets, and ablations show that schema-guided
textualization and selective expert routing contribute materially
to performance and stability. We additionally apply temperature
scaling to obtain calibrated probabilities, ensuring reliable risk
estimation for downstream decision support. Overall, MMCTOP
advances multimodal trial modeling by combining controlled
narrative normalization, context-conditioned expert fusion, and
operational safeguards aimed at auditability and reproducibility
in biomedical informatics.
Index Terms—Multimodal learning; Clinical trials; Textualiza-
tion; Sparse Mixture-of-Experts; Biomedical informatics
I. INTRODUCTION
Clinical trials are the cornerstone of biomedical innova-
tion, providing the mechanism through which new drugs
and treatments are rigorously vetted for safety and efficacy
before reaching patients. Yet the process remains plagued by
inefficiencies, escalating costs, and low success rates; each
failed study is a double loss, consuming scarce resources
and delaying access to effective therapies [1], [2]. The stakes
are enormous: the global pharmaceutical industry, valued at
$390 billion in 2000, has grown to exceed $1.5 trillion by 2024
This work was funded by Fundac¸˜ao para a Ciˆencia e a Tecnologia
(UIDB/00124/2020, UIDP/00124/2020 and Social Sciences DataLab - PIN-
FRA/22209/2016), POR Lisboa and POR Norte (Social Sciences DataLab,
PINFRA/22209/2016). Bo Wen and Tesfaye Yadete acknowledge support from
the Cleveland Clinic - IBM Discovery Accelerator.
C. Apar´ıcio and Q. Shi are with Nova School of Business and Economics,
Carcavelos, Portugal (e-mail: 61582@novasbe.pt; qi.shi@novasbe.pt).
B. Wen is with Hogarthian Technologies, New York, NY, USA (e-mail:
bwen@hogarthian.com). He was with IBM Research, Yorktown Heights, NY,
USA, when this work was performed.
T. Yadete is with the School of Medicine, Oregon Health & Science
University, Portland, OR, USA (e-mail: yadete@ohsu.edu). He was with the
Cleveland Clinic, Cleveland, OH, USA, when this work was performed.
Q. Han is with Nova School of Business and Economics, Carcavelos,
Portugal (corresponding author to provide e-mail: qiwei.han@novasbe.pt).
[3], intensifying pressure to streamline clinical development.
Regulators, including the U.S. Food and Drug Administration
(FDA), continue to emphasize improving the predictability and
efficiency of clinical research to accelerate patient access [4],
[5].
Typically, clinical trial development proceeds through a
structured, multi-phase pathway. Phase I trials evaluate phar-
macokinetics, tolerability, and initial safety in small cohorts;
Phase II trials assess preliminary efficacy alongside continued
safety monitoring; and Phase III trials conduct large-scale,
controlled evaluations against standard-of-care or placebo with
heightened statistical rigor [5]. Although early-phase studies
may show promising signals, a substantial fraction of programs
fail in Phases II or III, where both financial and temporal stakes
are highest. Late-stage failures are particularly costly: out-
of-pocket expenditures for a single Phase III trial commonly
range from $11.5 to $52.9 million, with additional downstream
costs arising from extended timelines and delayed market entry
[1], [2]. The ability to anticipate trial outcomes, both within
individual phases and across the full development trajectory, is
therefore critical for portfolio de-risking, resource reallocation,
and evidence-based decision-making in biomedical R&D [6].
In particular, clinical trial planning and evaluation increas-
ingly depend on multimodal biomedical evidence. Structured
representations, such as disease ontologies and diagnostic
codes, provide standardized descriptions of indications and
comorbidities; unstructured protocol narratives and eligibility
criteria specify populations, endpoints, and operational con-
straints