Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models

Reading time: 5 minute
...

📝 Abstract

Accurate prediction of treatment outcomes in lung cancer remains challenging due to the sparsity, heterogeneity, and contextual overload of real-world electronic health data. Traditional models often fail to capture semantic information across multimodal streams, while large-scale fine-tuning approaches are impractical in clinical workflows. We introduce a framework that uses Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC) to convert laboratory, genomic, and medication data into high-fidelity, task-aligned features. Unlike generic embeddings, GKC produces representations tailored to the prediction objective and operates as an offline preprocessing step that integrates naturally into hospital informatics pipelines. Using a lung cancer cohort (N=184), we benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer. Our approach achieved a mean AUROC of 0.803 (95% CI: 0.799-0.807) and outperformed all baselines. An ablation study further confirmed the complementary value of combining all three modalities. These results show that the quality of semantic representation is a key determinant of predictive accuracy in sparse clinical data settings. By reframing LLMs as knowledge curation engines rather than black-box predictors, this work demonstrates a scalable, interpretable, and workflow-compatible pathway for advancing AI-driven decision support in oncology.

💡 Analysis

Accurate prediction of treatment outcomes in lung cancer remains challenging due to the sparsity, heterogeneity, and contextual overload of real-world electronic health data. Traditional models often fail to capture semantic information across multimodal streams, while large-scale fine-tuning approaches are impractical in clinical workflows. We introduce a framework that uses Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC) to convert laboratory, genomic, and medication data into high-fidelity, task-aligned features. Unlike generic embeddings, GKC produces representations tailored to the prediction objective and operates as an offline preprocessing step that integrates naturally into hospital informatics pipelines. Using a lung cancer cohort (N=184), we benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer. Our approach achieved a mean AUROC of 0.803 (95% CI: 0.799-0.807) and outperformed all baselines. An ablation study further confirmed the complementary value of combining all three modalities. These results show that the quality of semantic representation is a key determinant of predictive accuracy in sparse clinical data settings. By reframing LLMs as knowledge curation engines rather than black-box predictors, this work demonstrates a scalable, interpretable, and workflow-compatible pathway for advancing AI-driven decision support in oncology.

📄 Content

Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models

Mun Hwan Lee, PhD1, Shaika Chowdhury, PhD 1, Xiaodi Li, PhD¹, Sivaraman Rajaganapathy, PhD 1, Eric W. Klee, PhD¹, Ping Yang, MD, PhD¹, Terence Sio, MD, PhD¹, Liewei Wang, MD, PhD¹, James Cerhan, MD, PhD¹, Nansu N. A. Zong, PhD¹

1Mayo Clinic, Rochester, Minnesota, USA

Abstract—Accurate prediction of treatment outcomes in lung cancer is critical yet remains limited by the sparsity, heterogeneity, and contextual overload of real-world electronic health data. Traditional models struggle to capture semantic meaning across multimodal streams, while large-scale fine-tuning approaches are infeasible in clinical workflows. We introduce a novel framework that employs Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC) to transform raw laboratory, genomic, and medication data into high-fidelity, task-specific features. Unlike generic embeddings, GKC aligns representation with the prediction objective and integrates seamlessly as an offline preprocessing step, making it practical for deployment in hospital informatics pipelines. Using a lung cancer cohort (N=184), we benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer. Our approach achieved a superior mean AUC-ROC of 0.803 (95% CI: 0.799– 0.807) and significantly outperformed all baselines. An ablation study further confirmed the synergistic value of combining all three modalities. These results demonstrate that the quality of semantic representation drives predictive accuracy in sparse clinical data settings. By reframing LLMs as knowledge curation engines rather than black-box predictors, our work highlights a scalable, interpretable, and workflow-compatible pathway for advancing AI-driven decision support in oncology. Keywords— Treatment Outcome Prediction, Multi-Modal Data, Large Language Models (LLMs), Semantic Representation, Lung Cancer I. INTRODUCTION
Lung cancer remains the leading cause of cancer-related mortality worldwide, accounting for over 1.8 million deaths annually [1]. Despite advances in diagnosis and therapy, the treatment outcome for lung cancer patients is often poor, with five-year survival rates languishing below 20% in most populations [1, 2]. Accurate prediction of treatment outcome is essential for informed clinical decision-making, guiding treatment selection, risk stratification, and the allocation of scarce medical resources [3, 4]. For decades, the TNM staging system has served as the foundation for lung cancer treatment outcome, providing critical guidance based on the anatomical extent of disease [5, 6]. While it remains the gold standard for initial risk stratification, substantial clinical evidence reveals that patients within the same TNM stage can exhibit markedly different outcomes [7, 8]. This variability arises from complex biological heterogeneity, which includes molecular driver mutations, the tumor immune microenvironment, and key laboratory-based biomarkers [9]. As a result, there is a growing imperative to develop more accurate, personalized models by integrating multi-modal data sources—including laboratory results, genomic profiles, and medication histories—to capture the multifaceted determinants of patient outcomes. Numerous studies have explored using traditional machine learning to integrate multi-modal structured data, such as concatenating feature vectors from different sources [10-13]. However, these models treat each feature as an isolated, context- free data point. While they can identify correlations, they fail to capture the underlying biological or clinical meaning that expert clinicians extract by synthesizing information holistically across modalities [14]. This gap highlights a critical need for computational paradigms capable of performing deeper semantic integration and interpretive inference on complex biomedical data.

Recent advances in large language models (LLMs) show promise for biomedical applications [15, 16]. Fine-tuned multimodal systems can integrate diverse inputs [17], and advanced prompt engineering techniques [18] can approach specialist performance without task-specific training. However, practical barriers remain. Fine-tuning demands large, high- quality labeled datasets, while sophisticated few-shot prompting depends on many well-crafted exemplars. In clinical prediction, such resources are rare and datasets are typically small, sparse, and heterogeneous. This creates a clear disconnect: the most powerful LLM strategies are often unusable in real-world clinical settings, withholding advanced AI capabilities precisely where they are most urgently needed. In this study, we address this gap with a framework that uses LLMs for modality-specific semantic summarization. Rather than asking a model

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut