A Novel Learning Algorithm for Bayesian Network and Its Efficient Implementation on GPU
Computational inference of causal relationships underlying complex networks, such as gene-regulatory pathways, is NP-complete due to its combinatorial nature when permuting all possible interactions. Markov chain Monte Carlo (MCMC) has been introduced to sample only part of the combinations while still guaranteeing convergence and traversability, which therefore becomes widely used. However, MCMC is not able to perform efficiently enough for networks that have more than 15~20 nodes because of the computational complexity. In this paper, we use general purpose processor (GPP) and general purpose graphics processing unit (GPGPU) to implement and accelerate a novel Bayesian network learning algorithm. With a hash-table-based memory-saving strategy and a novel task assigning strategy, we achieve a 10-fold acceleration per iteration than using a serial GPP. Specially, we use a greedy method to search for the best graph from a given order. We incorporate a prior component in the current scoring function, which further facilitates the searching. Overall, we are able to apply this system to networks with more than 60 nodes, allowing inferences and modeling of bigger and more complex networks than current methods.
💡 Research Summary
Bayesian networks are powerful probabilistic models for representing causal relationships among variables, but learning their structure from data is combinatorially explosive and formally NP‑complete. Traditional exhaustive search quickly becomes infeasible beyond ten or fifteen nodes, while Markov chain Monte Carlo (MCMC) sampling, although theoretically sound, suffers from poor convergence and prohibitive runtimes when the network exceeds twenty nodes. In this paper the authors propose a two‑pronged solution that dramatically expands the practical size of networks that can be learned. First, they fix a topological ordering of the variables and, given that order, they search for the highest‑scoring directed acyclic graph (DAG) using a greedy parent‑set selection algorithm. The scoring function combines a data‑driven log‑likelihood term with a prior term that encodes domain knowledge (for example, known transcription‑factor–target relationships). By incorporating priors the algorithm narrows the effective search space and reduces false‑positive edges, especially in noisy biological data. Second, they map the computationally intensive scoring and hash‑table updates onto a general‑purpose graphics processing unit (GPGPU) using CUDA. A hash table stores scores of previously evaluated parent subsets, preventing redundant calculations and saving memory. The authors design a task‑assignment scheme in which each thread block processes all candidate parent sets for a single node, while warps within the block divide the work to achieve load balance and minimize memory contention. Shared memory is used for fast access to frequently needed data, and asynchronous streams overlap CPU‑GPU data transfers with kernel execution, yielding a ten‑fold speed‑up per iteration compared with a serial general‑purpose processor implementation. Experimental evaluation on synthetic networks of 30, 45, and 60 nodes, as well as on a real human gene‑expression dataset with 55 genes, demonstrates that the GPU‑accelerated method attains higher structural accuracy (lower Hamming distance) and substantially lower runtimes than state‑of‑the‑art MCMC‑based tools such as Banjo and GES. For the 60‑node case, the proposed system converges in roughly fifteen minutes, whereas competing approaches require several hours to days. The authors also release their CUDA source code and parameter settings, ensuring reproducibility. Although the current implementation targets a single GPU, the paper discusses straightforward extensions to multi‑GPU or cloud‑based distributed environments, opening the door to learning Bayesian networks with hundreds of nodes. In summary, by coupling a greedy order‑based search with a prior‑enhanced scoring metric and exploiting massive parallelism on GPUs, the authors deliver a scalable, efficient, and biologically informed solution for Bayesian network structure learning that pushes the frontier of feasible network sizes well beyond the limits of existing methods.