Identifying and addressing performance anti-patterns in machine learning (ML) models is critical for efficient training and inference, but it typically demands deep expertise spanning system infrastructure, ML models and kernel development. While large tech companies rely on dedicated ML infrastructure engineers to analyze torch traces and benchmarks, such resource-intensive workflows are largely inaccessible to computer vision researchers in general. Among the challenges, pinpointing problematic trace segments within lengthy execution traces remains the most time-consuming task, and is difficult to automate with current ML models, including LLMs. In this work, we present the first benchmark dataset specifically designed to evaluate and improve ML models' ability to detect anti patterns in traces. Our dataset contains over 600 PyTorch traces from diverse computer vision models classification, detection, segmentation, and generation collected across multiple hardware platforms. We also propose a novel iterative approach: a lightweight ML model first detects trace segments with anti patterns, followed by a large language model (LLM) for fine grained classification and targeted feedback. Experimental results demonstrate that our method significantly outperforms unsupervised clustering and rule based statistical techniques for detecting anti pattern regions. Our method also effectively compensates LLM's limited context length and reasoning inefficiencies.
💡 Deep Analysis
📄 Full Content
TorchTraceAP: A New Benchmark Dataset for Detecting Performance
Anti-Patterns in Computer Vision Models
Hanning Chen1*
Keyu Man2
Kevin Zhu2
Chenguang Zhu2
Haonan Li3
Tongbo Luo2
Xizhou Feng2
Wei Sun2
Sreen Tallam2
Mohsen Imani1
Partha Kanuparthy2
1 University of California, Irvine, CA, USA
2 Meta, Menlo Park, CA, USA
3 University of California, Riverside, CA, USA
{hanningc}@uci.edu
Abstract
Identifying and addressing performance anti-patterns in
machine learning (ML) models is critical for efficient train-
ing and inference, but it typically demands deep exper-
tise spanning system infrastructure, ML models and ker-
nel development. While large tech companies rely on dedi-
cated ML infrastructure engineers to analyze torch traces
and benchmarks, such resource-intensive workflows are
largely inaccessible to computer vision researchers in gen-
eral. Among the challenges, pinpointing problematic trace
segments within lengthy execution traces remains the most
time-consuming task, and is difficult to automate with cur-
rent ML models, including LLMs. In this work, we present
the first benchmark dataset specifically designed to evaluate
and improve ML models’ ability to detect anti-patterns in
traces. Our dataset contains over 600 PyTorch traces from
diverse computer vision models—classification, detection,
segmentation, and generation—collected across multiple
hardware platforms. We also propose a novel iterative ap-
proach: a lightweight ML model first detects trace segments
with anti-patterns, followed by a large language model
(LLM) for fine-grained classification and targeted feed-
back. Experimental results demonstrate that our method
significantly outperforms unsupervised clustering and rule-
based statistical techniques for detecting anti-pattern re-
gions. Our method also effectively compensates LLM’s lim-
ited context length and reasoning inefficiencies.
1. Introduction
Detecting inefficiencies in the training and inference of
computer vision (CV) models is a critical task for deploy-
ing these models in real-world applications [28]. On iden-
*This work was done during a Meta internship.
ML Model
Accuracy, Loss, etc
Model Trace
Anti-pattern region
COCO
CIFAR
Dataset
…
1
2
3
4
ML Researcher and Engineer Focus
ML Infra and Performance Engineer Focus
(a)
Need background
of both ML models
and System Infra !!!
(b)
Application Model
COCO
CIFAR
Dataset
…
Model Trace
Detection Model
Analysis Model
Anti-pattern region
Use ML model to
improve ML model !!!
PyTorch
or
TensorFlow
PyTorch
PyTorch
or
TensorFlow
LLM
Figure 1. (a) Traditionally ML model development involves both
machine learning engineer and system infra engineer. (b) This
work we propose use ML model to improve ML model’s perfor-
mance.
tical hardware platforms, a well-optimized model pipeline
can achieve up to an 8× in both training and inference com-
pared with the unoptimized version [2]. For large or compu-
tationally intensive models, such a significant performance
improvement can transform an otherwise unusable applica-
tion into a practical and deployable solution.
Despite its importance, torch trace analysis and anti-
pattern detection remain challenging for most computer vi-
sion researchers. The typical workflow uses the PyTorch
profiler [33] to collect torch traces, visualizes them with
tools like TensorBoard or Perfetto [1], and identifies anoma-
lous segments (Figure 1(a)). This is difficult for two rea-
sons: (1) interpreting torch traces requires expertise in ML,
computer architecture, and system profiling—knowledge
often beyond the scope of CV researchers; and (2) the
process is time-consuming, as a single profiling run can
generate thousands of CPU and GPU events (e.g., 8,000
for ResNet-50 [18]), with even greater complexity for ad-
vanced models like vision transformers [13] used in seg-
mentation [22].
As shown in Figure 1(a), a common industry practice is
arXiv:2512.14141v1 [cs.CV] 16 Dec 2025
to divide responsibilities: machine learning engineers and
researchers focus on model accuracy and application de-
velopment, while ML infrastructure engineers concentrate
on profiling and optimizing model efficiency. However, for
many academic and smaller research groups, it is not feasi-
ble to form such multidisciplinary teams. As a result, many
CV models are developed with a primary focus on novel ap-
plications and accuracy improvements, often at the expense
of the efficiency and optimization of the system.
Recent advances in large language models (LLMs) [10],
retrieval augmentation (RAG) [4], and LLM agents [20]
have enabled applications in code generation [29], program
analysis [26, 50], and performance debugging [23]. Build-
ing on these developments and as shown in Figure 1(b), we
propose leveraging ML models to improve other ML mod-
els by simplifying torch trace anti-pattern detection, mak-
ing performance optimization accessible to smaller research
groups. However, LLMs’ limited context length and slow
inference make direct