Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization

February 23, 2026

Reading time: 5 minute

...

📝 Original Info

Title: Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization
ArXiv ID: 2512.06006
Date: 2025-12-02
Authors: Researchers from original ArXiv paper

📝 Abstract

Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck. Current solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent-generated functions into a production pipeline, demonstrating a clear pathway for real-world impact.

💡 Deep Analysis

Deep Dive into Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization.

Adapting production-level computer vision tools to bespoke scientific datasets is a critical “last mile” bottleneck. Current solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent-generated functions into a production pipeline, demonstrating a clear pathw

📄 Full Content

Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization Xuefei (Julie) Wang1 Kai A. Horstmann2 Ethan Lin2 Jonathan Chen2 Alexander R. Farhang1 Sophia Stiles1 Atharva Sehgal3 Jonathan Light4 David Van Valen1 Yisong Yue1 Jennifer J. Sun2 1Caltech 2Cornell 3UT Austin 4Rensselaer Polytechnic Institute Abstract Adapting production-level computer vision tools to bespoke scientific datasets is a critical “last mile” bottleneck. Cur- rent solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evalua- tion framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent- generated functions into a production pipeline, demonstrat- ing a clear pathway for real-world impact. The code can be found here: https://github.com/xuefei-wang/simple-agent- opt 1. Introduction Automated computer vision (CV) tools are rapidly being adopted as production-level solutions in clinical and labo- ratory settings, fundamentally reshaping scientific discov- ery in biomedical imaging [23, 31, 40, 46]. Despite this progress, a critical and ubiquitous “last mile” bottleneck— tool adaptation—still persists. When a scientist applies these tools to their own bespoke datasets, models frequently underperform or fail [2, 6, 10, 26, 32, 51] due to inevitable variability in acquisition conditions between labs, such as different microscopes, lighting, resolutions, staining proto- cols, or unique artifacts [11, 14, 24, 44, 54]. Current solutions to this adaptation bottleneck are hin- dered by the need for either massive amounts of labeled data Figure 1. Overview. (Top) Production-level tools [23, 31, 41] ac- celerate scientific discovery but face a “last mile” adaptation bot- tleneck. (Middle) Domain experts spend weeks to months man- ually coding preprocessing and postprocessing steps in order to adapt the tools to their bespoke datasets. AI agents can automate this adaptation, but it remains unclear how to navigate their com- plex design space to build simple, practical agents. (Bottom) Our work systematically studies the design choices of tool adaptation agents. or prohibitively long manual development cycles. Scien- tists are typically forced to choose between: (1) Fine-tuning complex models, a process that is data-inefficient and re- arXiv:2512.06006v1 [cs.CV] 2 Dec 2025 quires a large, annotated training set (e.g., thousands of im- ages) that is often unavailable to individual labs [27, 35, 55]; or (2) Manually writing custom pre- and post-processing code to bridge the domain gap, which can take a scientist weeks or months, significantly diverting valuable time away from scientific discovery (Figure 1). Recent work in agentic AI presents a new way to tackle this adaptation bottleneck. In principle, an AI agent could use the small “gold-standard” validation set (typically 10- 100 images) that scientists do have as an objective function to automatically generate the necessary adaptation code. However, most existing AI Agents may not be directly applicable to this specific, highly-demanded application due to their sophisticated architecture or specialized tar- get tasks. “AI Agents for Science” are often large, com- plex systems—featuring hierarchical planning and large tool spaces—designed for high-level, open-ended discov- ery [12, 17, 29] or specialized scientific tasks [25, 36, 37, 47], rather than the targeted adaptation of existing tools. Concurrently, while MLE agents [15, 28, 52] are progress- ing quickly, they typically focus on building new solutions from scratch, not on integrating with and tuning existing, production-level scientific pipelines. Thus, for the narrower problem of tool adaptation, it remains unclear whether such complex designs are necessary or which specific design components drive performance. Therefore, we set out to answer: What is the most practi- cal and simplest agent framework that can reliably adapt a fixed, pretrained production tool to a new, bespoke dataset? To this end, we introduce a systematic evaluation frame- work for benchmarking agentic code optimization for tool adaptation. We apply this framework to three production- level biomedical imaging pipelines: Polaris [23], Cell- pose [41], and MedSAM [31]—chosen for their collective coverage of the full spectrum of biological length scales. Our investigation yields a promising and

…(Full text truncated)…

📄 Read Full PDF on ArXiv