Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization

Reading time: 5 minute
...

📝 Original Info

  • Title: Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization
  • ArXiv ID: 2512.06006
  • Date: 2025-12-02
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck. Current solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent-generated functions into a production pipeline, demonstrating a clear pathway for real-world impact.

💡 Deep Analysis

Deep Dive into Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization.

Adapting production-level computer vision tools to bespoke scientific datasets is a critical “last mile” bottleneck. Current solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent-generated functions into a production pipeline, demonstrating a clear pathw

📄 Full Content

Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization Xuefei (Julie) Wang1 Kai A. Horstmann2 Ethan Lin2 Jonathan Chen2 Alexander R. Farhang1 Sophia Stiles1 Atharva Sehgal3 Jonathan Light4 David Van Valen1 Yisong Yue1 Jennifer J. Sun2 1Caltech 2Cornell 3UT Austin 4Rensselaer Polytechnic Institute Abstract Adapting production-level computer vision tools to bespoke scientific datasets is a critical “last mile” bottleneck. Cur- rent solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evalua- tion framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent- generated functions into a production pipeline, demonstrat- ing a clear pathway for real-world impact. The code can be found here: https://github.com/xuefei-wang/simple-agent- opt 1. Introduction Automated computer vision (CV) tools are rapidly being adopted as production-level solutions in clinical and labo- ratory settings, fundamentally reshaping scientific discov- ery in biomedical imaging [23, 31, 40, 46]. Despite this progress, a critical and ubiquitous “last mile” bottleneck— tool adaptation—still persists. When a scientist applies these tools to their own bespoke datasets, models frequently underperform or fail [2, 6, 10, 26, 32, 51] due to inevitable variability in acquisition conditions between labs, such as different microscopes, lighting, resolutions, staining proto- cols, or unique artifacts [11, 14, 24, 44, 54]. Current solutions to this adaptation bottleneck are hin- dered by the need for either massive amounts of labeled data Figure 1. Overview. (Top) Production-level tools [23, 31, 41] ac- celerate scientific discovery but face a “last mile” adaptation bot- tleneck. (Middle) Domain experts spend weeks to months man- ually coding preprocessing and postprocessing steps in order to adapt the tools to their bespoke datasets. AI agents can automate this adaptation, but it remains unclear how to navigate their com- plex design space to build simple, practical agents. (Bottom) Our work systematically studies the design choices of tool adaptation agents. or prohibitively long manual development cycles. Scien- tists are typically forced to choose between: (1) Fine-tuning complex models, a process that is data-inefficient and re- arXiv:2512.06006v1 [cs.CV] 2 Dec 2025 quires a large, annotated training set (e.g., thousands of im- ages) that is often unavailable to individual labs [27, 35, 55]; or (2) Manually writing custom pre- and post-processing code to bridge the domain gap, which can take a scientist weeks or months, significantly diverting valuable time away from scientific discovery (Figure 1). Recent work in agentic AI presents a new way to tackle this adaptation bottleneck. In principle, an AI agent could use the small “gold-standard” validation set (typically 10- 100 images) that scientists do have as an objective function to automatically generate the necessary adaptation code. However, most existing AI Agents may not be directly applicable to this specific, highly-demanded application due to their sophisticated architecture or specialized tar- get tasks. “AI Agents for Science” are often large, com- plex systems—featuring hierarchical planning and large tool spaces—designed for high-level, open-ended discov- ery [12, 17, 29] or specialized scientific tasks [25, 36, 37, 47], rather than the targeted adaptation of existing tools. Concurrently, while MLE agents [15, 28, 52] are progress- ing quickly, they typically focus on building new solutions from scratch, not on integrating with and tuning existing, production-level scientific pipelines. Thus, for the narrower problem of tool adaptation, it remains unclear whether such complex designs are necessary or which specific design components drive performance. Therefore, we set out to answer: What is the most practi- cal and simplest agent framework that can reliably adapt a fixed, pretrained production tool to a new, bespoke dataset? To this end, we introduce a systematic evaluation frame- work for benchmarking agentic code optimization for tool adaptation. We apply this framework to three production- level biomedical imaging pipelines: Polaris [23], Cell- pose [41], and MedSAM [31]—chosen for their collective coverage of the full spectrum of biological length scales. Our investigation yields a promising and

…(Full text truncated)…

📸 Image Gallery

cellpose_post_ag.png cellpose_post_ag.webp cellpose_post_exp.png cellpose_post_exp.webp cellpose_post_skeleton.png cellpose_post_skeleton.webp cellpose_pre_ag.png cellpose_pre_ag.webp cellpose_pre_exp.png cellpose_pre_exp.webp cvpr_ablate_API_list.png cvpr_ablate_API_list.webp cvpr_dispersion_measure.png cvpr_dispersion_measure.webp cvpr_example_plot_cellpose.png cvpr_example_plot_cellpose.webp cvpr_example_plot_medsam.png cvpr_example_plot_medsam.webp cvpr_example_plot_spot.png cvpr_example_plot_spot.webp cvpr_function_diverstity_and_length.png cvpr_function_diverstity_and_length.webp cvpr_param_indepth_analysis.png cvpr_param_indepth_analysis.webp cvpr_param_measure.png cvpr_param_measure.webp cvpr_sciseek_examples.png cvpr_sciseek_examples.webp cvpr_sciseek_flowchart.png cvpr_sciseek_flowchart.webp cvpr_sciseek_overview_v3.png cvpr_sciseek_overview_v3.webp deployment.png deployment.webp medsam_post_ag.png medsam_post_ag.webp medsam_post_exp.png medsam_post_exp.webp medsam_post_skeleton.png medsam_post_skeleton.webp medsam_pre_ag.png medsam_pre_ag.webp medsam_pre_exp.png medsam_pre_exp.webp spot_post_ag.png spot_post_ag.webp spot_post_exp.png spot_post_exp.webp spot_post_skeleton.png spot_post_skeleton.webp spot_pre_ag.png spot_pre_ag.webp spot_pre_exp.png spot_pre_exp.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut