Performance Comparison Between OpenCV Built in CPU and GPU Functions on Image Processing Operations

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Image Processing is a specialized area of Digital Signal Processing which contains various mathematical and algebraic operations such as matrix inversion, transpose of matrix, derivative, convolution, Fourier Transform etc. Operations like those require higher computational capabilities than daily usage purposes of computers. At that point, with increased image sizes and more complex operations, CPUs may be unsatisfactory since they use Serial Processing by default. GPUs are the solution that come up with greater speed compared to CPUs because of their Parallel Processing/Computation nature. A parallel computing platform and programming model named CUDA was created by NVIDIA and implemented by the graphics processing units (GPUs) which were produced by them. In this paper, computing performance of some commonly used Image Processing operations will be compared on OpenCV’s built in CPU and GPU functions that use CUDA.

💡 Research Summary

The paper investigates the performance differences between OpenCV’s built‑in CPU functions and its CUDA‑accelerated GPU counterparts for four fundamental image‑processing operations: image resizing, thresholding, histogram equalization, and edge detection. The authors begin by contextualizing the evolution of digital signal processing (DSP) and the growing computational demands of two‑dimensional image data, arguing that the parallel nature of graphics processing units (GPUs) makes them well‑suited for such workloads. They then describe each operation in mathematical terms—bilinear interpolation for resizing, Otsu’s method for automatic threshold selection, global histogram equalization for contrast enhancement, and the Canny algorithm for edge detection—highlighting the specific OpenCV functions used in both CPU and GPU modes.

The experimental platform consists of a Windows 7 64‑bit workstation equipped with an Intel Core i7‑6700 CPU (8 logical cores, 3.4 GHz), 16 GB of RAM, and an NVIDIA GeForce GTX 970 GPU (CUDA Compute Capability 5.2). All code is written in C++ and compiled against OpenCV 3.x. Execution time is measured with the C++ standard library’s chrono facility, reporting elapsed milliseconds. For each operation, the authors generate a series of test images with progressively larger resolutions, ranging from a few hundred pixels in each dimension up to roughly 5 000 × 3 000 pixels (i.e., tens of millions of pixels). This systematic scaling allows them to plot processing time as a function of total pixel count (N × M).

Results consistently show that for small images (under about one megapixel) the CPU and GPU implementations perform comparably, with differences often within a few milliseconds. However, as image size grows, the GPU’s parallel execution yields dramatic speedups. In the resizing experiments, the GPU maintains sub‑30 ms processing times for images exceeding 4 M pixels, whereas the CPU’s time climbs beyond 150 ms. Thresholding with Otsu’s method exhibits a similar trend: the GPU’s time remains roughly constant while the CPU’s time scales linearly with pixel count. Histogram equalization, which requires a full‑image histogram and cumulative distribution computation, benefits especially from parallel reduction on the GPU, achieving up to a six‑fold improvement for images larger than 2 M pixels. Edge detection using the Canny pipeline (Gaussian blur, gradient calculation, non‑maximum suppression, hysteresis) also shows pronounced gains; the GPU processes images larger than 3 M pixels in under 200 ms, a fraction of the CPU’s runtime.

The authors conclude that OpenCV’s GPU functions provide substantial performance benefits for image‑processing tasks, particularly when dealing with high‑resolution data. They acknowledge that their study relies on high‑level OpenCV wrappers, which may not expose the full optimization potential of CUDA. Consequently, they propose future work that replaces OpenCV’s built‑in GPU calls with native CUDA Toolkit kernels, explores multi‑GPU configurations, and employs asynchronous data transfers (CUDA streams) to further reduce latency. Additionally, they suggest extending the evaluation to include end‑to‑end pipeline measurements—covering data acquisition, preprocessing, computation, and output—to better reflect real‑time computer‑vision system requirements. The paper thereby reinforces the growing consensus in the literature that GPU acceleration is a critical enabler for modern, computationally intensive image‑processing applications.

Performance Comparison Between OpenCV Built in CPU and GPU Functions on Image Processing Operations

💡 Research Summary

Comments & Academic Discussion

Leave a Comment