Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck. Current solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent-generated functions into a production pipeline, demonstrating a clear pathway for real-world impact.
Deep Dive into Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization.
Adapting production-level computer vision tools to bespoke scientific datasets is a critical “last mile” bottleneck. Current solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent-generated functions into a production pipeline, demonstrating a clear pathw
Simple Agents Outperform Experts in Biomedical Imaging
Workflow Optimization
Xuefei (Julie) Wang1
Kai A. Horstmann2
Ethan Lin2
Jonathan Chen2
Alexander R. Farhang1
Sophia Stiles1
Atharva Sehgal3
Jonathan Light4
David Van Valen1
Yisong Yue1
Jennifer J. Sun2
1Caltech
2Cornell
3UT Austin
4Rensselaer Polytechnic Institute
Abstract
Adapting production-level computer vision tools to bespoke
scientific datasets is a critical “last mile” bottleneck. Cur-
rent solutions are impractical: fine-tuning requires large
annotated datasets scientists often lack, while manual code
adaptation costs scientists weeks to months of effort. We
consider using AI agents to automate this manual coding,
and focus on the open question of optimal agent design
for this targeted task. We introduce a systematic evalua-
tion framework for agentic code optimization and use it to
study three production-level biomedical imaging pipelines.
We demonstrate that a simple agent framework consistently
generates adaptation code that outperforms human-expert
solutions.
Our analysis reveals that common, complex
agent architectures are not universally beneficial, leading to
a practical roadmap for agent design. We open source our
framework and validate our approach by deploying agent-
generated functions into a production pipeline, demonstrat-
ing a clear pathway for real-world impact. The code can be
found here: https://github.com/xuefei-wang/simple-agent-
opt
1. Introduction
Automated computer vision (CV) tools are rapidly being
adopted as production-level solutions in clinical and labo-
ratory settings, fundamentally reshaping scientific discov-
ery in biomedical imaging [23, 31, 40, 46]. Despite this
progress, a critical and ubiquitous “last mile” bottleneck—
tool adaptation—still persists. When a scientist applies
these tools to their own bespoke datasets, models frequently
underperform or fail [2, 6, 10, 26, 32, 51] due to inevitable
variability in acquisition conditions between labs, such as
different microscopes, lighting, resolutions, staining proto-
cols, or unique artifacts [11, 14, 24, 44, 54].
Current solutions to this adaptation bottleneck are hin-
dered by the need for either massive amounts of labeled data
Figure 1. Overview. (Top) Production-level tools [23, 31, 41] ac-
celerate scientific discovery but face a “last mile” adaptation bot-
tleneck. (Middle) Domain experts spend weeks to months man-
ually coding preprocessing and postprocessing steps in order to
adapt the tools to their bespoke datasets. AI agents can automate
this adaptation, but it remains unclear how to navigate their com-
plex design space to build simple, practical agents. (Bottom) Our
work systematically studies the design choices of tool adaptation
agents.
or prohibitively long manual development cycles. Scien-
tists are typically forced to choose between: (1) Fine-tuning
complex models, a process that is data-inefficient and re-
arXiv:2512.06006v1 [cs.CV] 2 Dec 2025
quires a large, annotated training set (e.g., thousands of im-
ages) that is often unavailable to individual labs [27, 35, 55];
or (2) Manually writing custom pre- and post-processing
code to bridge the domain gap, which can take a scientist
weeks or months, significantly diverting valuable time away
from scientific discovery (Figure 1).
Recent work in agentic AI presents a new way to tackle
this adaptation bottleneck. In principle, an AI agent could
use the small “gold-standard” validation set (typically 10-
100 images) that scientists do have as an objective function
to automatically generate the necessary adaptation code.
However, most existing AI Agents may not be directly
applicable to this specific, highly-demanded application
due to their sophisticated architecture or specialized tar-
get tasks. “AI Agents for Science” are often large, com-
plex systems—featuring hierarchical planning and large
tool spaces—designed for high-level, open-ended discov-
ery [12, 17, 29] or specialized scientific tasks [25, 36, 37,
47], rather than the targeted adaptation of existing tools.
Concurrently, while MLE agents [15, 28, 52] are progress-
ing quickly, they typically focus on building new solutions
from scratch, not on integrating with and tuning existing,
production-level scientific pipelines. Thus, for the narrower
problem of tool adaptation, it remains unclear whether such
complex designs are necessary or which specific design
components drive performance.
Therefore, we set out to answer: What is the most practi-
cal and simplest agent framework that can reliably adapt a
fixed, pretrained production tool to a new, bespoke dataset?
To this end, we introduce a systematic evaluation frame-
work for benchmarking agentic code optimization for tool
adaptation. We apply this framework to three production-
level biomedical imaging pipelines:
Polaris [23], Cell-
pose [41], and MedSAM [31]—chosen for their collective
coverage of the full spectrum of biological length scales.
Our investigation yields a promising and
…(Full text truncated)…
This content is AI-processed based on ArXiv data.