다중과제 학습으로 구현한 투명한 독성 예측 희소 어텐션 기반 분자 조각 해석

Reading time: 5 minute
...

📝 Abstract

Reliable in silico molecular toxicity prediction is a cornerstone of modern drug discovery, offering a scalable alternative to experimental screening. However, the black-box nature of state-of-the-art models remains a significant barrier to adoption, as high-stakes safety decisions demand verifiable structural insights alongside predictive performance. To address this, we propose a novel multi-task learning (MTL) framework designed to jointly enhance accuracy and interpretability. Our architecture integrates a shared chemical language model with task-specific attention modules. By imposing an L1 sparsity penalty on these modules, the framework is constrained to focus on a minimal set of salient molecular fragments for each distinct toxicity endpoint. The resulting framework is trained end-to-end and is readily adaptable to various transformerbased backbones. Evaluated on the ClinTox, SIDER, and Tox21 benchmark datasets, our approach consistently outperforms both single-task and standard MTL baselines. Crucially, the sparse attention weights provide chemically intuitive visualizations that reveal the specific fragments influencing predictions, thereby enhancing insight into the model’s decision-making process.

💡 Analysis

Reliable in silico molecular toxicity prediction is a cornerstone of modern drug discovery, offering a scalable alternative to experimental screening. However, the black-box nature of state-of-the-art models remains a significant barrier to adoption, as high-stakes safety decisions demand verifiable structural insights alongside predictive performance. To address this, we propose a novel multi-task learning (MTL) framework designed to jointly enhance accuracy and interpretability. Our architecture integrates a shared chemical language model with task-specific attention modules. By imposing an L1 sparsity penalty on these modules, the framework is constrained to focus on a minimal set of salient molecular fragments for each distinct toxicity endpoint. The resulting framework is trained end-to-end and is readily adaptable to various transformerbased backbones. Evaluated on the ClinTox, SIDER, and Tox21 benchmark datasets, our approach consistently outperforms both single-task and standard MTL baselines. Crucially, the sparse attention weights provide chemically intuitive visualizations that reveal the specific fragments influencing predictions, thereby enhancing insight into the model’s decision-making process.

📄 Content

Task-Specific Sparse Feature Masks for Molecular Toxicity Prediction with Chemical Language Models Kwun Sy Lee School of Science and Technology Hong Kong Metropolitan University Ho Man Tin, Hong Kong kwslee@hkmu.edu.hk Jiawei Chen School of Science and Technology Hong Kong Metropolitan University Ho Man Tin, Hong Kong jwchen@hkmu.edu.hk Fuk Sheng Ford Chung School of Science and Technology Hong Kong Metropolitan University Ho Man Tin, Hong Kong ffschung@hkmu.edu.hk Tianyu Zhao Faculty of Engineering Hong Kong Polytechnic University Hung Hom, Hong Kong ztyshawnnn@gmail.com Zhenyuan Chen School of Science and Technology Hong Kong Metropolitan University Ho Man Tin, Hong Kong chenzh@hkmu.edu.hk Debby D. Wang∗ School of Science and Technology Hong Kong Metropolitan University Ho Man Tin, Hong Kong dwang@hkmu.edu.hk Corresponding author∗ Abstract—Reliable in silico molecular toxicity prediction is a cornerstone of modern drug discovery, offering a scalable alternative to experimental screening. However, the black-box nature of state-of-the-art models remains a significant barrier to adoption, as high-stakes safety decisions demand verifiable structural insights alongside predictive performance. To address this, we propose a novel multi-task learning (MTL) framework designed to jointly enhance accuracy and interpretability. Our architecture integrates a shared chemical language model with task-specific attention modules. By imposing an L1 sparsity penalty on these modules, the framework is constrained to focus on a minimal set of salient molecular fragments for each distinct toxicity endpoint. The resulting framework is trained end-to-end and is readily adaptable to various transformer- based backbones. Evaluated on the ClinTox, SIDER, and Tox21 benchmark datasets, our approach consistently outperforms both single-task and standard MTL baselines. Crucially, the sparse attention weights provide chemically intuitive visualizations that reveal the specific fragments influencing predictions, thereby enhancing insight into the model’s decision-making process. Index Terms—Chemical language models, computational toxi- cology, explainable AI, molecular property prediction, multi-task learning, sparsity regularization I. INTRODUCTION The assessment of molecular toxicity is a crucial yet challenging step in early-stage drug discovery and chemical safety screening. Traditional experimental assays are often resource-intensive and time-consuming, creating a significant bottleneck. Consequently, developing accurate and efficient in silico methods has become a key priority in computational tox- This work is supported by Hong Kong Metropolitan University (Project RD/2025/1.15) and Hong Kong Research Grants Council (Project UGC/FDS16/E16/23). icology to accelerate the identification of potentially hazardous compounds. In recent years, the field has increasingly adopted sequence- based deep learning approaches operating on SMILES repre- sentations of molecules. Chemical language models, such as ChemBERTa [1], MoLFormer [2], and SMILES-BERT [3], pre-trained on vast unlabeled chemical corpora, have shown remarkable success in learning rich, contextual representations for downstream prediction tasks. However, their performance is often constrained by the scarcity of labeled toxicological data. To address this, MTL [4] has become a widely adopted paradigm [5], [6], enabling models to generalize better by jointly learning from several related endpoints. Despite their advantages, prevailing transformer-based MTL architectures [7], [8], which typically use a hard parameter sharing (HPS) [9] approach, face two critical limitations. First, HPS lacks an explicit mechanism for task-specific feature selection; the final latent representation from the shared back- bone is passed wholesale to each prediction head, assuming all features are equally relevant to every downstream task. This is often suboptimal, as distinct toxicity mechanisms depend on different molecular properties [10], potentially leading to negative transfer where conflicting gradients destabilize the shared representation [11]. Second, the opacity of these models restricts their utility in safety-critical decision making. While standard architectures may achieve high predictive accuracy, they obscure the decision-making process, failing to pro- vide transparent feature attribution relative to the underlying chemical structure. In toxicology, where adverse effects are frequently driven by precise molecular substructures (toxi- cophores), this interpretability gap remains a significant barrier © 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. arXiv:25

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut