Harnessing Large Language Models for Biomedical Named Entity Recognition

February 20, 2026

Reading time: 5 minute

...

📝 Original Info

Title: Harnessing Large Language Models for Biomedical Named Entity Recognition
ArXiv ID: 2512.22738
Date: 2025-12-28
Authors: ** Jian Chen, Leilei Su, Cong Sun* **

📝 Abstract

Background and Objective: Biomedical Named Entity Recognition (BioNER) is a foundational task in medical informatics, crucial for downstream applications like drug discovery and clinical trial matching. However, adapting general-domain Large Language Models (LLMs) to this task is often hampered by their lack of domain-specific knowledge and the performance degradation caused by low-quality training data. To address these challenges, we introduce BioSelectTune, a highly efficient, data-centric framework for fine-tuning LLMs that prioritizes data quality over quantity. Methods and Results: BioSelectTune reformulates BioNER as a structured JSON generation task and leverages our novel Hybrid Superfiltering strategy, a weak-to-strong data curation method that uses a homologous weak model to distill a compact, high-impact training dataset. Conclusions: Through extensive experiments, we demonstrate that BioSelectTune achieves state-of-the-art (SOTA) performance across multiple BioNER benchmarks. Notably, our model, trained on only 50% of the curated positive data, not only surpasses the fully-trained baseline but also outperforms powerful domain-specialized models like BioMedBERT.

💡 Deep Analysis

📄 Full Content

Harnessing Large Language Models for Biomedical Named Entity Recognition Jian Chena, Leilei Sub and Cong Sunc,∗ aDepartment of Data Science and Big Data Technology, Hainan University, Haikou 570228, China bDepartment of Mathematics, Hainan University, Haikou 570228, China cDepartment of Population Health Sciences, Weill Cornell Medicine, New York 10022, USA A R T I C L E I N F O Keywords: Instruction Tuning Data Filtering Large Language Models Biomedical Named Entity Recognition A B S T R A C T Background and Objective: Biomedical Named Entity Recognition (BioNER) is a foundational task in medical informatics, crucial for downstream applications like drug discovery and clinical trial matching. However, adapting general-domain Large Language Models (LLMs) to this task is often hampered by their lack of domain-specific knowledge and the performance degradation caused by low-quality training data. To address these challenges, we introduce BioSelectTune, a highly efficient, data-centric framework for fine-tuning LLMs that prioritizes data quality over quantity. Methods and Results: BioSelectTune reformulates BioNER as a structured JSON generation task and leverages our novel Hybrid Superfiltering strategy, a weak-to-strong data curation method that uses a homologous weak model to distill a compact, high-impact training dataset. Conclusions: Through extensive experiments, we demonstrate that BioSelectTune achieves state-of-the-art (SOTA) performance across multiple BioNER benchmarks. Notably, our model, trained on only 50% of the curated positive data, not only surpasses the fully-trained baseline but also outperforms powerful domain-specialized models like BioMedBERT. 1. INTRODUCTION Large Language Models (LLMs), such as GPT-4 [1], have sparked a paradigm shift in Natural Language Processing (NLP), demonstrating exceptional performance across a wide spectrum of tasks. Pre-trained on vast text corpora, LLMs possess powerful generalization capabilities, enabling them to tackle complex problems through zero-shot and few-shot prompting [2]. This has accelerated their adoption in diverse fields, including education, law, and healthcare. In the biomedical domain, specialized LLMs like Med- PaLM2 [3], PMC-Llama [4], and Chat-Doctor [5] have shown promise in conversational and question-answering tasks. However, a significant performance gap remains when applying these models to fundamental information extrac- tion tasks, particularly Biomedical Named Entity Recogni- tion (BioNER). General-domain LLMs often lack the deep, domain-specific knowledge required to interpret complex biomedical texts accurately. Furthermore, studies have shown that generative LLMs tend to yield low precision and recall on NER tasks, failing to meet the high-accuracy demands of biomedical research [6, 7]. As BioNER is a cornerstone for downstream applications such as drug discovery, gene function analysis, and clinical trial matching, bridging this performance gap is of critical importance. To address these challenges, we formulate BioNER as an instruction-driven, structured data generation task. As ∗Corresponding author Email address: csun.nlp@gmail.com (C. Sun) ORCID(s): illustrated in Figure 1, the model is provided with a piece of biomedical text and a specific instruction and is trained to generate a standardized, machine-readable JSON list of the identified entities. This approach not only unifies the extraction paradigm across different entity types but also capitalizes on the powerful instruction-following and text gen- eration capabilities of modern LLMs. And we propose a novel framework to efficiently adapt general-domain LLMs for high- performance BioNER. Rather than relying on costly domain- specific pre-training, we focus on unlocking the potential of existing models through instruction tuning. We select the Qwen3 family of models as our foundation [8] and, to this end, curate and unify four benchmark BioNER datasets [9] into an instruction-following format. The core of our framework is a novel data curation strategy we term "Hybrid Superfiltering," [10] which leverages a computationally inexpensive "weak" model to intelligently identify and select the most informative and difficult training samples for fine-tuning a more powerful "strong" model. Our main contributions are as follows: • We introduce Hybrid Superfiltering, a weak-to-strong data filtering strategy tailored for BioNER instruc- tion tuning. By separating positive and negative sam- ples and using a homologous weak model to score Instruction-Following Difficulty (IFD), this method curates a high-quality training subset that significantly boosts learning efficiency and model performance. • We reformulate BioNER as an end-to-end text-to- structured-data generation task. By fine-tuning the Su et al.: Preprint submitted to Elsevier Page 1 of 9 arXiv:2512.22738v1 [cs.CL] 28 Dec 2025 Figure 1: Templates for instruction-following data and test data. LLM to directly output ent

📄 Read Full PDF on ArXiv