Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Genomic Foundation Models (GFMs), such as Evolutionary Scale Modeling (ESM), have demonstrated remarkable success in variant effect prediction. However, their security and robustness under adversarial manipulation remain largely unexplored. To address this gap, we introduce the Secure Agentic Genomic Evaluator (SAGE), an agentic framework for auditing the adversarial vulnerabilities of GFMs. SAGE functions through an interpretable and automated risk auditing loop. It injects soft prompt perturbations, monitors model behavior across training checkpoints, computes risk metrics such as AUROC and AUPR, and generates structured reports with large language model-based narrative explanations. This agentic process enables continuous evaluation of embedding-space robustness without modifying the underlying model. Using SAGE, we find that even state-of-the-art GFMs like ESM2 are sensitive to targeted soft prompt attacks, resulting in measurable performance degradation. These findings reveal critical and previously hidden vulnerabilities in genomic foundation models, showing the importance of agentic risk auditing in securing biomedical applications such as clinical variant interpretation.

💡 Research Summary

The paper addresses a critical gap in the security evaluation of genomic foundation models (GFMs), specifically the Evolutionary Scale Modeling (ESM) family, which have become the state‑of‑the‑art for variant effect prediction (VEP). While these models excel at zero‑shot and few‑shot generalization across protein and DNA sequence tasks, their robustness to adversarial manipulation has been largely ignored. To fill this void, the authors introduce the Secure Agentic Genomic Evaluator (SAGE), an autonomous, interpretable auditing framework that continuously monitors a model’s behavior under soft‑prompt perturbations without altering the model’s weights.

SAGE operates in a closed loop consisting of five stages: OBSERVE (load sequences, embed the model, define random soft‑prompt probes), INTERVENE (inject learnable soft‑prompt embeddings at scheduled intervals), EVALUATE (compute risk metrics such as AUROC, AUPR, and pseudo‑log‑likelihood ratio (PLLR) changes), REASON (classify risk levels based on thresholds and generate natural‑language explanations using a large language model), and REPORT (compile markdown/HTML reports with metric trends and narrative insights). This pipeline enables reproducible, scalable, and human‑readable risk assessments for any GFM, even when only black‑box access is available.

The adversarial attack itself is a targeted soft‑prompt optimization. A trainable embedding sequence of ten tokens is prepended to both wild‑type and mutant protein inputs. The attack objective is asymmetric: it minimizes a loss L_benign = −log σ̂(λ) only on benign (label 0) variants, where λ is the absolute difference between the PLL of the wild‑type and mutant sequences, and σ̂ rescales the sigmoid output to the full

Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors

💡 Research Summary

Comments & Academic Discussion

Leave a Comment