cuAPO: A CUDA-based Parallelization of Artificial Protozoa Optimizer

cuAPO: A CUDA-based Parallelization of Artificial Protozoa Optimizer
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Metaheuristic algorithms are widely used for solving complex problems due to their ability to provide near-optimal solutions. But the execution time of these algorithms increases with the problem size and/or solution space. And, to get more promising results, we have to execute these algorithms for a large number of iterations, requiring a large amount of time and this is one of the main issues found with these algorithms. To handle the same, researchers are now-a-days working on design and development of parallel versions of state-of-the-art metaheuristic optimization algorithms. We, in this paper, present a CUDA-based parallelization of state-of-the-art Artificial Protozoa Optimizer leveraging GPU acceleration. We implement both the existing sequential version and the proposed parallel version of Artificial Protozoa Optimizer for a performance comparison. Our experimental results calculated over a set of CEC2022 benchmark functions demonstrate a significant performance gain i.e. up to 6.7 times speed up is achieved with proposed parallel version. We also use a real world application, i.e., Image Thresholding to compare both algorithms.


💡 Research Summary

This paper, titled “cuAPO: A CUDA-based Parallelization of Artificial Protozoa Optimizer,” addresses the computational bottleneck of metaheuristic algorithms by proposing a GPU-accelerated parallel version of the state-of-the-art Artificial Protozoa Optimizer (APO). The authors identify that while population-based metaheuristics like APO are powerful for solving complex, high-dimensional optimization problems, their iterative nature of evaluating and updating a large population leads to prohibitively long execution times, hindering real-time applications.

The paper begins by establishing the context, classifying metaheuristics and highlighting APO’s proven superiority over 32 other algorithms on challenging benchmarks. APO is a bio-inspired algorithm that mimics the survival strategies of protozoa: dormancy, reproduction, autotrophic foraging, and heterotrophic foraging. These strategies are mathematically modeled to balance exploration (via dormancy and autotrophic foraging) and exploitation (via reproduction and heterotrophic foraging). However, the sequential evaluation of each protozoa (solution) in every iteration creates a significant computational load.

The core contribution, cuAPO, is a novel parallelization scheme using NVIDIA’s CUDA framework. The key design principle is to map the update process of each individual protozoa in the population to a separate CUDA thread. The host (CPU) initializes parameters and the population, which is then transferred to the device (GPU) memory. A GPU kernel is launched where each thread is responsible for the complete update cycle of its assigned protozoa. This includes determining its action based on probabilistic formulas (using globally sorted fitness information), performing the corresponding mathematical operation (e.g., calculating new positions using dormancy or foraging equations), and evaluating the new solution. Since the update of one protozoa is largely independent of another until the next sorting step, this approach enables massive parallelism. The paper provides a detailed, step-by-step example illustrating how four threads would concurrently update four protozoa in one iteration.

For experimental validation, the implementations of both sequential APO and parallel cuAPO were tested on five CEC2022 benchmark functions (Bent Cigar, High Conditioned Elliptic, HGBat, Rosenbrock’s, and Griewank’s) in 1000 dimensions, with varying population sizes from 1,000 to 10,000. The hardware platform was a Google Colab instance with a Tesla T4 GPU. The results, averaged over multiple runs, demonstrate substantial speedups. cuAPO consistently outperformed the sequential version, achieving speedup factors ranging from approximately 3x to over 6.7x across different functions and population sizes. For instance, on the HGBat function with a population of 2,000, cuAPO was 6.72 times faster. The performance gain generally scaled with population size, confirming the effectiveness of the parallelization for larger problems. Additionally, a real-world image thresholding application was used to further corroborate the practical speedup.

In conclusion, the paper successfully demonstrates that the proposed cuAPO framework significantly reduces the execution time of the APO algorithm by leveraging GPU parallelism, making it a viable candidate for high-performance and real-time optimization tasks. The work underscores the potential of GPU acceleration to overcome the computational limitations of advanced metaheuristic algorithms.


Comments & Academic Discussion

Loading comments...

Leave a Comment