Optimizing video analytics inference pipelines: a case study

February 20, 2026

Reading time: 4 minute

...

📝 Original Info

Title: Optimizing video analytics inference pipelines: a case study
ArXiv ID: 2512.07009
Date: 2025-12-07
Authors: ** Saeid Ghafouri, Yuming Ding, Katerine Diaz Chito, Jesús Martinez del Rincón, Niamh O’Connell, Hans Vandierendonck (Queen’s University Belfast, 영국) **

📝 Abstract

Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We introduce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2x speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deployments as well as other large-scale video analytics applications.

💡 Deep Analysis

📄 Full Content

Optimizing Video Analytics Inference Pipelines: A Case Study Saeid Ghafouri, Yuming Ding, Katerine Diaz Chito, Jesús Martinez del Rincón, Niamh O’Connell, Hans Vandierendonck {s.ghafouri,yding12,k.diazchito,j.martinez-del-rincon,niamh.oconnell,h.vandierendonck}@qub.ac.uk Queen’s University Belfast Belfast, United Kingdom Abstract Cost-effective and scalable video analytics are essential for pre- cision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms gener- ates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare moni- toring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We intro- duce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2× speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deploy- ments as well as other large-scale video analytics applications. CCS Concepts • General and reference →General conference proceedings; • Computing methodologies →Parallel algorithms; Machine learning. Keywords Video Analytics, GPU Acceleration, Parallel Processing, Cloud Com- puting, System Optimisation, Precision Agriculture 1 Introduction Video analytics has emerged as a cornerstone technology across domains requiring automated perception and decision-making, in- cluding smart city surveillance [16], industrial automation [20], autonomous vehicles [12], and healthcare monitoring [7]. Recently, its application has expanded into agriculture and animal husbandry, where continuous video-based observation can provide actionable insights into welfare, health, and productivity [5, 24]. In particular, video analytics enables the collection of high-resolution temporal data that exceeds what is feasible through manual observation in quantity, quality and added value. Despite their potential, deploying large-scale video analytics systems in commercial poultry farms presents significant perfor- mance and cost challenges. A typical poultry house may contain 10,000 to 30,000 birds, and multiple houses per farm can collectively generate terabytes of high-resolution video data each week. Scal- ing analytics workloads involving decoding, inference, and data transfer without careful design leads to inefficient resource use and rapidly growing infrastructure costs. Optimizing video pipelines for latency is therefore critical to increase system efficiency, which in turn substantially lowers operational costs [21, 23]. This paper investigates performance bottlenecks and solutions for the FlockFocus pipeline, a multi-camera video analytics system developed for automated broiler chicken welfare monitoring in commercial farms [5]. It analyzes high-resolution video from mul- tiple behavioral zones including feeder, drinker, activity, and wall areas to extract metrics such as feeding frequency, locomotion, and bird density. The system processes terabytes of video weekly across zones and houses, leading to high compute load, memory use, and data transfer. As is common in data analytics, the system is designed using Python as the main programming language, and leveraging several highly optimized back-end libraries such as Pytorch, skim- age and OpenCV. This software environment poses restrictions on the types of optimizations that can be applied. The primary objective of our optimizations is to improve re- source usage efficiency of the analytics pipelines with a view of reducing the cost of analytics. Our optimizations fall into three categories: (i) increasing utilization of GPU and CPU computing resources through parallel execution; (ii) increasing computational efficiency of analytics by code restructuring and using efficient back-end libraries; (iii) enhancing efficiency of video input. A key lesson is that bottlenecks stem not only from algorithmic complex- ity, and or not centered around neural network inference. Instead, they also stem from inefficiencies in scheduling, data flow, and component interactions. The main contributions of this work are: • A real-world case study of optimizing a multi-zone animal monitoring system, identifying architectural inefficiencies related to scheduling, I/O, and inter-stage communication in a modular pipeline. • A set of system-level optimizations applied to each compo- nent module, from low-level to high-level analytics, com- posing the pipeline, i.e. detection, tracking, clustering, and behavior inference. Optimizations include batched and par- allelized execution, GPU-accelerated post-processing, and effic

📄 Read Full PDF on ArXiv