DiVinE-CUDA - A Tool for GPU Accelerated LTL Model Checking

DiVinE-CUDA - A Tool for GPU Accelerated LTL Model Checking
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper we present a tool that performs CUDA accelerated LTL Model Checking. The tool exploits parallel algorithm MAP adjusted to the NVIDIA CUDA architecture in order to efficiently detect the presence of accepting cycles in a directed graph. Accepting cycle detection is the core algorithmic procedure in automata-based LTL Model Checking. We demonstrate that the tool outperforms non-accelerated version of the algorithm and we discuss where the limits of the tool are and what we intend to do in the future to avoid them.


💡 Research Summary

The paper introduces DiVinE‑CUDA, a verification tool that accelerates Linear Temporal Logic (LTL) model checking by exploiting the NVIDIA CUDA architecture. The authors focus on the most computationally intensive part of automata‑based LTL verification – detecting accepting cycles in the product graph – and adapt the MAP (Maximal Accepting Predecessor) algorithm to run as a series of sparse matrix‑vector multiplications on a GPU.

The workflow starts from a model described in the DVE language. After preprocessing and compilation, the model is turned into a dynamically linked library that provides state‑generation functions. While the tool explores the state space, a separate CPU thread incrementally builds a compressed sparse row (CSR) representation of the adjacency matrix, but only for those components that contain accepting vertices, thereby reducing the matrix size by roughly 20‑30 % compared to the full state space. As soon as a portion of the matrix is ready, a CUDA kernel launches to perform the MAP iteration: each thread propagates the current “maximal accepting predecessor” value along outgoing edges, updates the target vertex with the maximum of incoming values, and repeats until a fixed point is reached. If a vertex becomes its own maximal predecessor, an accepting cycle is found and the algorithm terminates; otherwise, vertices that cannot belong to a cycle are discarded and the process restarts.

Experimental evaluation was carried out on a workstation equipped with two AMD Phenom II X4 940 CPUs and an NVIDIA GeForce GTX 280 GPU (1 GB). Six benchmark models were used, including leader election, elevator control, Peterson’s and Anderson’s mutual‑exclusion algorithms, and the dining philosophers problem, each tested both with and without specification violations. The results show that the CUDA‑accelerated MAP algorithm achieves an average speed‑up of 5.2× over the non‑accelerated MAP implementation and 6.5× over the OWCTY algorithm when the whole verification pipeline (CSR construction + CUDA computation) is considered. However, the data‑preparation phase (CSR construction) consumes a substantial portion of the total runtime—up to 40 % in some cases—highlighting a bottleneck that limits the overall gain. The first MAP iteration on the CPU is slower than CSR construction because it simultaneously generates the state space and computes initial MAP values.

The authors acknowledge several current limitations. The tool can only handle graphs whose reduced CSR matrix fits into the memory of a single GPU; larger models exceed this bound. Counterexample generation is not supported, so users receive only a Boolean answer about the presence of an accepting cycle. Additionally, CSR construction is performed by a single thread, preventing exploitation of multi‑core CPUs for further speed‑up.

Future work aims to address these issues. The authors plan to implement a swapping mechanism that moves parts of the matrix between host and device memory, and to add support for multiple GPUs, thereby lifting the size restriction. Parallelizing the CSR construction across several CPU cores is expected to dramatically reduce the preparation overhead, as illustrated by a hypothetical linear‑speed‑up scenario in Table 3. Finally, a counterexample extraction module will be added, and hybrid scheduling strategies that run MAP and OWCTY concurrently will be explored to automatically select the most efficient algorithm for a given model.

In summary, DiVinE‑CUDA demonstrates that GPU acceleration can substantially speed up the core cycle‑detection step of LTL model checking. While the current prototype already outperforms traditional CPU‑only approaches, the identified bottlenecks and missing features point to clear avenues for further research and engineering, promising a robust, scalable solution for large‑scale formal verification tasks.


Comments & Academic Discussion

Loading comments...

Leave a Comment