Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models
📝 Abstract
Accurate prediction of treatment outcomes in lung cancer remains challenging due to the sparsity, heterogeneity, and contextual overload of real-world electronic health data. Traditional models often fail to capture semantic information across multimodal streams, while large-scale fine-tuning approaches are impractical in clinical workflows. We introduce a framework that uses Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC) to convert laboratory, genomic, and medication data into high-fidelity, task-aligned features. Unlike generic embeddings, GKC produces representations tailored to the prediction objective and operates as an offline preprocessing step that integrates naturally into hospital informatics pipelines. Using a lung cancer cohort (N=184), we benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer. Our approach achieved a mean AUROC of 0.803 (95% CI: 0.799-0.807) and outperformed all baselines. An ablation study further confirmed the complementary value of combining all three modalities. These results show that the quality of semantic representation is a key determinant of predictive accuracy in sparse clinical data settings. By reframing LLMs as knowledge curation engines rather than black-box predictors, this work demonstrates a scalable, interpretable, and workflow-compatible pathway for advancing AI-driven decision support in oncology.
💡 Analysis
Accurate prediction of treatment outcomes in lung cancer remains challenging due to the sparsity, heterogeneity, and contextual overload of real-world electronic health data. Traditional models often fail to capture semantic information across multimodal streams, while large-scale fine-tuning approaches are impractical in clinical workflows. We introduce a framework that uses Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC) to convert laboratory, genomic, and medication data into high-fidelity, task-aligned features. Unlike generic embeddings, GKC produces representations tailored to the prediction objective and operates as an offline preprocessing step that integrates naturally into hospital informatics pipelines. Using a lung cancer cohort (N=184), we benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer. Our approach achieved a mean AUROC of 0.803 (95% CI: 0.799-0.807) and outperformed all baselines. An ablation study further confirmed the complementary value of combining all three modalities. These results show that the quality of semantic representation is a key determinant of predictive accuracy in sparse clinical data settings. By reframing LLMs as knowledge curation engines rather than black-box predictors, this work demonstrates a scalable, interpretable, and workflow-compatible pathway for advancing AI-driven decision support in oncology.
📄 Content
Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models
Mun Hwan Lee, PhD1, Shaika Chowdhury, PhD 1, Xiaodi Li, PhD¹, Sivaraman Rajaganapathy, PhD 1, Eric W. Klee, PhD¹, Ping Yang, MD, PhD¹, Terence Sio, MD, PhD¹, Liewei Wang, MD, PhD¹, James Cerhan, MD, PhD¹, Nansu N. A. Zong, PhD¹
1Mayo Clinic, Rochester, Minnesota, USA
Abstract—Accurate prediction of treatment outcomes in lung
cancer is critical yet remains limited by the sparsity, heterogeneity,
and contextual overload of real-world electronic health data.
Traditional models struggle to capture semantic meaning across
multimodal streams, while large-scale fine-tuning approaches are
infeasible in clinical workflows. We introduce a novel framework
that employs Large Language Models (LLMs) as Goal-oriented
Knowledge Curators (GKC) to transform raw laboratory,
genomic, and medication data into high-fidelity, task-specific
features. Unlike generic embeddings, GKC aligns representation
with the prediction objective and integrates seamlessly as an
offline preprocessing step, making it practical for deployment in
hospital informatics pipelines. Using a lung cancer cohort (N=184),
we benchmarked GKC against expert-engineered features, direct
text embeddings, and an end-to-end transformer. Our approach
achieved a superior mean AUC-ROC of 0.803 (95% CI: 0.799–
0.807) and significantly outperformed all baselines. An ablation
study further confirmed the synergistic value of combining all
three modalities. These results demonstrate that the quality of
semantic representation drives predictive accuracy in sparse
clinical data settings. By reframing LLMs as knowledge curation
engines rather than black-box predictors, our work highlights a
scalable, interpretable, and workflow-compatible pathway for
advancing AI-driven decision support in oncology.
Keywords— Treatment Outcome Prediction, Multi-Modal Data,
Large Language Models (LLMs), Semantic Representation, Lung
Cancer
I. INTRODUCTION
Lung cancer remains the leading cause of cancer-related
mortality worldwide, accounting for over 1.8 million deaths
annually [1]. Despite advances in diagnosis and therapy, the
treatment outcome for lung cancer patients is often poor, with
five-year survival rates languishing below 20% in most
populations [1, 2]. Accurate prediction of treatment outcome is
essential for informed clinical decision-making, guiding
treatment selection, risk stratification, and the allocation of
scarce medical resources [3, 4].
For decades, the TNM staging system has served as the
foundation for lung cancer treatment outcome, providing critical
guidance based on the anatomical extent of disease [5, 6]. While
it remains the gold standard for initial risk stratification,
substantial clinical evidence reveals that patients within the
same TNM stage can exhibit markedly different outcomes [7,
8].
This
variability
arises
from
complex
biological
heterogeneity, which includes molecular driver mutations, the
tumor immune microenvironment, and key laboratory-based
biomarkers [9]. As a result, there is a growing imperative to
develop more accurate, personalized models by integrating
multi-modal data sources—including laboratory results,
genomic profiles, and medication histories—to capture the
multifaceted determinants of patient outcomes.
Numerous studies have explored using traditional machine
learning to integrate multi-modal structured data, such as
concatenating feature vectors from different sources [10-13].
However, these models treat each feature as an isolated, context-
free data point. While they can identify correlations, they fail to
capture the underlying biological or clinical meaning that expert
clinicians extract by synthesizing information holistically across
modalities [14]. This gap highlights a critical need for
computational paradigms capable of performing deeper
semantic integration and interpretive inference on complex
biomedical data.
Recent advances in large language models (LLMs) show promise for biomedical applications [15, 16]. Fine-tuned multimodal systems can integrate diverse inputs [17], and advanced prompt engineering techniques [18] can approach specialist performance without task-specific training. However, practical barriers remain. Fine-tuning demands large, high- quality labeled datasets, while sophisticated few-shot prompting depends on many well-crafted exemplars. In clinical prediction, such resources are rare and datasets are typically small, sparse, and heterogeneous. This creates a clear disconnect: the most powerful LLM strategies are often unusable in real-world clinical settings, withholding advanced AI capabilities precisely where they are most urgently needed. In this study, we address this gap with a framework that uses LLMs for modality-specific semantic summarization. Rather than asking a model
This content is AI-processed based on ArXiv data.