Optimizing video analytics inference pipelines: a case study
Reading time: 4 minute
...
📝 Original Info
Title: Optimizing video analytics inference pipelines: a case study
ArXiv ID: 2512.07009
Date: 2025-12-07
Authors: ** Saeid Ghafouri, Yuming Ding, Katerine Diaz Chito, Jesús Martinez del Rincón, Niamh O’Connell, Hans Vandierendonck (Queen’s University Belfast, 영국) **
📝 Abstract
Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We introduce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2x speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deployments as well as other large-scale video analytics applications.
💡 Deep Analysis
📄 Full Content
Optimizing Video Analytics Inference Pipelines: A Case Study
Saeid Ghafouri, Yuming Ding, Katerine Diaz Chito, Jesús Martinez del Rincón, Niamh O’Connell,
Hans Vandierendonck
{s.ghafouri,yding12,k.diazchito,j.martinez-del-rincon,niamh.oconnell,h.vandierendonck}@qub.ac.uk
Queen’s University Belfast
Belfast, United Kingdom
Abstract
Cost-effective and scalable video analytics are essential for pre-
cision livestock monitoring, where high-resolution footage and
near-real-time monitoring needs from commercial farms gener-
ates substantial computational workloads. This paper presents a
comprehensive case study on optimizing a poultry welfare moni-
toring system through system-level improvements across detection,
tracking, clustering, and behavioral analysis modules. We intro-
duce a set of optimizations, including multi-level parallelization,
Optimizing code with substituting CPU code with GPU-accelerated
code, vectorized clustering, and memory-efficient post-processing.
Evaluated on real-world farm video footage, these changes deliver
up to a 2× speedup across pipelines without compromising model
accuracy. Our findings highlight practical strategies for building
high-throughput, low-latency video inference systems that reduce
infrastructure demands in agricultural and smart sensing deploy-
ments as well as other large-scale video analytics applications.
CCS Concepts
• General and reference →General conference proceedings;
• Computing methodologies →Parallel algorithms; Machine
learning.
Keywords
Video Analytics, GPU Acceleration, Parallel Processing, Cloud Com-
puting, System Optimisation, Precision Agriculture
1
Introduction
Video analytics has emerged as a cornerstone technology across
domains requiring automated perception and decision-making, in-
cluding smart city surveillance [16], industrial automation [20],
autonomous vehicles [12], and healthcare monitoring [7]. Recently,
its application has expanded into agriculture and animal husbandry,
where continuous video-based observation can provide actionable
insights into welfare, health, and productivity [5, 24]. In particular,
video analytics enables the collection of high-resolution temporal
data that exceeds what is feasible through manual observation in
quantity, quality and added value.
Despite their potential, deploying large-scale video analytics
systems in commercial poultry farms presents significant perfor-
mance and cost challenges. A typical poultry house may contain
10,000 to 30,000 birds, and multiple houses per farm can collectively
generate terabytes of high-resolution video data each week. Scal-
ing analytics workloads involving decoding, inference, and data
transfer without careful design leads to inefficient resource use and
rapidly growing infrastructure costs. Optimizing video pipelines
for latency is therefore critical to increase system efficiency, which
in turn substantially lowers operational costs [21, 23].
This paper investigates performance bottlenecks and solutions
for the FlockFocus pipeline, a multi-camera video analytics system
developed for automated broiler chicken welfare monitoring in
commercial farms [5]. It analyzes high-resolution video from mul-
tiple behavioral zones including feeder, drinker, activity, and wall
areas to extract metrics such as feeding frequency, locomotion, and
bird density.
The system processes terabytes of video weekly across zones
and houses, leading to high compute load, memory use, and data
transfer. As is common in data analytics, the system is designed
using Python as the main programming language, and leveraging
several highly optimized back-end libraries such as Pytorch, skim-
age and OpenCV. This software environment poses restrictions on
the types of optimizations that can be applied.
The primary objective of our optimizations is to improve re-
source usage efficiency of the analytics pipelines with a view of
reducing the cost of analytics. Our optimizations fall into three
categories: (i) increasing utilization of GPU and CPU computing
resources through parallel execution; (ii) increasing computational
efficiency of analytics by code restructuring and using efficient
back-end libraries; (iii) enhancing efficiency of video input. A key
lesson is that bottlenecks stem not only from algorithmic complex-
ity, and or not centered around neural network inference. Instead,
they also stem from inefficiencies in scheduling, data flow, and
component interactions.
The main contributions of this work are:
• A real-world case study of optimizing a multi-zone animal
monitoring system, identifying architectural inefficiencies
related to scheduling, I/O, and inter-stage communication
in a modular pipeline.
• A set of system-level optimizations applied to each compo-
nent module, from low-level to high-level analytics, com-
posing the pipeline, i.e. detection, tracking, clustering, and
behavior inference. Optimizations include batched and par-
allelized execution, GPU-accelerated post-processing, and
effic