TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models

Reading time: 5 minute
...

📝 Original Info

  • Title: TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models
  • ArXiv ID: 2512.14141
  • Date: 2025-12-16
  • Authors: Hanning Chen, Keyu Man, Kevin Zhu, Chenguang Zhu, Haonan Li, Tongbo Luo, Xizhou Feng, Wei Sun, Sreen Tallam, Mohsen Imani, Partha Kanuparthy

📝 Abstract

Identifying and addressing performance anti-patterns in machine learning (ML) models is critical for efficient training and inference, but it typically demands deep expertise spanning system infrastructure, ML models and kernel development. While large tech companies rely on dedicated ML infrastructure engineers to analyze torch traces and benchmarks, such resource-intensive workflows are largely inaccessible to computer vision researchers in general. Among the challenges, pinpointing problematic trace segments within lengthy execution traces remains the most time-consuming task, and is difficult to automate with current ML models, including LLMs. In this work, we present the first benchmark dataset specifically designed to evaluate and improve ML models' ability to detect anti patterns in traces. Our dataset contains over 600 PyTorch traces from diverse computer vision models classification, detection, segmentation, and generation collected across multiple hardware platforms. We also propose a novel iterative approach: a lightweight ML model first detects trace segments with anti patterns, followed by a large language model (LLM) for fine grained classification and targeted feedback. Experimental results demonstrate that our method significantly outperforms unsupervised clustering and rule based statistical techniques for detecting anti pattern regions. Our method also effectively compensates LLM's limited context length and reasoning inefficiencies.

💡 Deep Analysis

Figure 1

📄 Full Content

TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models Hanning Chen1* Keyu Man2 Kevin Zhu2 Chenguang Zhu2 Haonan Li3 Tongbo Luo2 Xizhou Feng2 Wei Sun2 Sreen Tallam2 Mohsen Imani1 Partha Kanuparthy2 1 University of California, Irvine, CA, USA 2 Meta, Menlo Park, CA, USA 3 University of California, Riverside, CA, USA {hanningc}@uci.edu Abstract Identifying and addressing performance anti-patterns in machine learning (ML) models is critical for efficient train- ing and inference, but it typically demands deep exper- tise spanning system infrastructure, ML models and ker- nel development. While large tech companies rely on dedi- cated ML infrastructure engineers to analyze torch traces and benchmarks, such resource-intensive workflows are largely inaccessible to computer vision researchers in gen- eral. Among the challenges, pinpointing problematic trace segments within lengthy execution traces remains the most time-consuming task, and is difficult to automate with cur- rent ML models, including LLMs. In this work, we present the first benchmark dataset specifically designed to evaluate and improve ML models’ ability to detect anti-patterns in traces. Our dataset contains over 600 PyTorch traces from diverse computer vision models—classification, detection, segmentation, and generation—collected across multiple hardware platforms. We also propose a novel iterative ap- proach: a lightweight ML model first detects trace segments with anti-patterns, followed by a large language model (LLM) for fine-grained classification and targeted feed- back. Experimental results demonstrate that our method significantly outperforms unsupervised clustering and rule- based statistical techniques for detecting anti-pattern re- gions. Our method also effectively compensates LLM’s lim- ited context length and reasoning inefficiencies. 1. Introduction Detecting inefficiencies in the training and inference of computer vision (CV) models is a critical task for deploy- ing these models in real-world applications [28]. On iden- *This work was done during a Meta internship. ML Model Accuracy, Loss, etc Model Trace Anti-pattern region COCO CIFAR Dataset … 1 2 3 4 ML Researcher and Engineer Focus ML Infra and Performance Engineer Focus (a) Need background of both ML models and System Infra !!! (b) Application Model COCO CIFAR Dataset … Model Trace Detection Model Analysis Model Anti-pattern region Use ML model to improve ML model !!! PyTorch or TensorFlow PyTorch PyTorch or TensorFlow LLM Figure 1. (a) Traditionally ML model development involves both machine learning engineer and system infra engineer. (b) This work we propose use ML model to improve ML model’s perfor- mance. tical hardware platforms, a well-optimized model pipeline can achieve up to an 8× in both training and inference com- pared with the unoptimized version [2]. For large or compu- tationally intensive models, such a significant performance improvement can transform an otherwise unusable applica- tion into a practical and deployable solution. Despite its importance, torch trace analysis and anti- pattern detection remain challenging for most computer vi- sion researchers. The typical workflow uses the PyTorch profiler [33] to collect torch traces, visualizes them with tools like TensorBoard or Perfetto [1], and identifies anoma- lous segments (Figure 1(a)). This is difficult for two rea- sons: (1) interpreting torch traces requires expertise in ML, computer architecture, and system profiling—knowledge often beyond the scope of CV researchers; and (2) the process is time-consuming, as a single profiling run can generate thousands of CPU and GPU events (e.g., 8,000 for ResNet-50 [18]), with even greater complexity for ad- vanced models like vision transformers [13] used in seg- mentation [22]. As shown in Figure 1(a), a common industry practice is arXiv:2512.14141v1 [cs.CV] 16 Dec 2025 to divide responsibilities: machine learning engineers and researchers focus on model accuracy and application de- velopment, while ML infrastructure engineers concentrate on profiling and optimizing model efficiency. However, for many academic and smaller research groups, it is not feasi- ble to form such multidisciplinary teams. As a result, many CV models are developed with a primary focus on novel ap- plications and accuracy improvements, often at the expense of the efficiency and optimization of the system. Recent advances in large language models (LLMs) [10], retrieval augmentation (RAG) [4], and LLM agents [20] have enabled applications in code generation [29], program analysis [26, 50], and performance debugging [23]. Build- ing on these developments and as shown in Figure 1(b), we propose leveraging ML models to improve other ML mod- els by simplifying torch trace anti-pattern detection, mak- ing performance optimization accessible to smaller research groups. However, LLMs’ limited context length and slow inference make direct

📸 Image Gallery

visual.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut