BiomechAgent: AI-Assisted Biomechanical Analysis Through Code-Generating Agents
Markerless motion capture is making quantitative movement analysis increasingly accessible, yet analyzing the resulting data remains a barrier for clinicians without programming expertise. We present BiomechAgent, a code-generating AI agent that enables biomechanical analysis through natural language and allows users to querying databases, generating visualizations, and even interpret data without requiring users to write code. To evaluate BiomechAgent’s capabilities, we developed a systematic benchmark spanning data retrieval, visualization, activity classification, temporal segmentation, and clinical reasoning. BiomechAgent achieved robust accuracy on data retrieval and visualization tasks and demonstrated emerging clinical reasoning capabilities. We used our dataset to systematically evaluate several of our design decisions. Biomechanically-informed, domain-specific instructions significantly improved performance over generic prompts, and integrating validated specialized tools for gait event detection substantially boosted accuracy on challenging spatiotemporal analysis where the base agent struggled. We also tested BiomechAgent using a local open-weight model instead of a frontier cloud based LLM and found that perform was substantially diminished in most domains other than database retrieval. In short, BiomechAgent makes the data from accessible motion capture and much more useful and accessible to end users.
💡 Research Summary
The paper introduces BiomechAgent, a code‑generating AI agent designed to bridge the gap between the increasing accessibility of markerless motion capture and the technical expertise required to analyze the resulting biomechanical data. Built on the smolagents framework, BiomechAgent receives a natural‑language query, reasons about the computational steps needed, writes executable Python code, runs it in a sandbox, and feeds the results back to the language model for further reasoning or a final answer. The system integrates three specialized tools: (1) a DataJoint‑based database interface for querying participant trials and retrieving kinematic arrays, (2) a gait‑event detection module (GaitTransformer) that extracts foot‑contact timings from kinematics, and (3) a visualization module that produces static or interactive plots using Matplotlib/Plotly. Generated images are also passed to a visual language model (VLM) for combined textual‑visual reasoning.
Two language‑model back‑ends were evaluated: the cloud‑based Gemini 2.5 Flash Lite (a large, up‑to‑date LLM) and the open‑weight MedGemma 27B model running locally on an A100 GPU. Gemini consistently outperformed MedGemma on all tasks except pure database retrieval, highlighting the current performance gap between frontier commercial models and open‑source medical models.
A key design element is the use of custom system prompts that embed biomechanical domain knowledge—formulas, unit conventions, color coding for left/right limbs, and guidance to formulate and test clinical hypotheses. These prompts dramatically reduced coding errors (e.g., missing imports, incorrect axis labels) and encouraged the agent to perform logical reasoning rather than mere pattern matching.
The authors constructed a systematic benchmark covering seven task families: (1) Database queries, (2) Activity classification (2‑alternative forced‑choice), (3) Spatiotemporal parameter estimation (velocity, step length, stride time, gait‑event timing, asymmetry indices), (4) Clinical inference (stroke vs. control, prosthesis side, fall‑risk assessment), (5) Temporal segmentation of complex activities, (6) Visualization generation, and (7) Perception tasks that require the agent to interpret its own plots. Ground‑truth data were drawn from privileged sources (instrumented walkways, clinical records, Berg Balance Scale scores, manual video annotations) that were hidden from the agent during evaluation. Scoring combined deterministic metrics (exact string matches, tolerance‑based numeric errors) with an LLM‑as‑Judge approach for tasks requiring semantic judgment (visualization quality, clinical reasoning).
Results show that BiomechAgent achieves >95 % accuracy on deterministic database and basic visualization tasks, and 88 % accuracy on spatiotemporal estimation when the GaitTransformer tool is available. Activity classification and clinical inference reach ~70 % accuracy, reflecting the difficulty of extracting subtle pathological patterns from kinematics alone. Visualization quality, assessed by domain‑specific rubrics, averages 8.2/10, indicating that the agent reliably produces correctly labeled, appropriately scaled plots. Code error rates are low (<4 % of executions), average response times are 12 seconds with Gemini and 18 seconds with MedGemma, and token usage stays under 2 k per query, demonstrating computational efficiency.
Limitations include reliance on pre‑processed kinematic data (no direct video processing), the subjectivity inherent in LLM‑as‑Judge evaluations, and the performance gap of local open‑weight models. Future work aims to integrate multimodal VLMs for direct video analysis, fine‑tune domain‑specific models, develop interactive debugging interfaces, and conduct user‑centered studies to refine the chat UI for clinical workflows.
In summary, BiomechAgent provides clinicians with a conversational, code‑free interface to query, analyze, visualize, and interpret markerless motion capture data. The combination of biomechanically informed system prompts and specialized tools (especially gait‑event detection) is shown to be essential for high performance, positioning BiomechAgent as a promising step toward democratizing biomechanical analytics in research and clinical practice.
Comments & Academic Discussion
Loading comments...
Leave a Comment