Best-First Heuristic Search for Multicore Machines

To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals.

💡 Research Summary

The paper addresses the need for parallel best‑first search algorithms that can fully exploit modern multicore processors. It introduces a novel framework called PBNF (Parallel Best‑First Search) designed for shared‑memory systems. The core idea is to partition the state space using an abstraction function that maps concrete states to abstract “regions.” Each region is assigned its own priority queue and a lightweight lock, allowing duplicate detection to be performed locally within a partition rather than through a global hash table. This dramatically reduces contention and the overhead associated with frequent locking in traditional parallel A* implementations.

To keep all cores busy, PBNF incorporates a speculative‑expansion mechanism. When a thread’s assigned partition has no high‑priority nodes (or is temporarily empty), the thread may temporarily expand nodes from another partition with a higher f‑value. This “work‑stealing” style approach prevents idle time and balances load even when the search tree is highly irregular.

The authors also identify a potential livelock scenario that can arise when partitions repeatedly attempt to steal work from each other without making progress. They formalize the livelock condition using Linear Temporal Logic (LTL) and prove that PBNF’s design—specifically the periodic “steal” permission and the absence of cyclic dependencies among partitions—guarantees that the system always makes forward progress.

Experimental evaluation covers three representative domains: STRIPS planning problems, grid‑based pathfinding, and sliding‑tile puzzles (15‑puzzle and 24‑puzzle). For each domain the authors implement three search variants—standard A*, weighted A* (WA*), and anytime weighted A* (AWA*)—both with PBNF and with several state‑of‑the‑art parallel search baselines (including Parallel A* and improved PRA*). All experiments run on an 8‑core Intel Xeon machine under identical memory limits and heuristic functions.

Results show that PBNF consistently outperforms the baselines. Across all benchmarks, PBNF‑based A* achieves an average speed‑up of 2.1× over the best prior method, with peak improvements exceeding 3× on the most challenging instances. Weighted and anytime variants exhibit the same trend, confirming that the framework is agnostic to the optimality trade‑off. Memory consumption is comparable or slightly lower (5–10 % reduction) because each partition maintains a much smaller duplicate‑detection structure. The benefits are most pronounced on problems with accurate heuristics, where duplicate generation is rare and the local lock strategy eliminates almost all contention.

The paper concludes by discussing extensibility. Because the abstraction function is domain‑specific, PBNF can be adapted to new problem classes by designing an appropriate partitioning scheme. Future work could explore dynamic abstraction refinement (adjusting partitions during search), hybrid shared‑distributed memory deployments, and integration with GPU‑accelerated node expansion.

In summary, PBNF offers a theoretically sound and empirically validated solution to the long‑standing challenges of parallel best‑first search on multicore machines. By combining abstraction‑based partitioning, low‑overhead local duplicate detection, and speculative expansion, it delivers substantial speed‑ups without sacrificing solution quality, making it a compelling foundation for next‑generation AI search systems.