Distributed Work Stealing for Constraint Solving

With the dissemination of affordable parallel and distributed hardware, parallel and distributed constraint solving has lately been the focus of some attention. To effectually apply the power of distributed computational systems, there must be an effective sharing of the work involved in the search for a solution to a Constraint Satisfaction Problem (CSP) between all the participating agents, and it must happen dynamically, since it is hard to predict the effort associated with the exploration of some part of the search space. We describe and provide an initial experimental assessment of an implementation of a work stealing-based approach to distributed CSP solving.

💡 Research Summary

The paper addresses the challenge of efficiently solving Constraint Satisfaction Problems (CSPs) on modern, inexpensive parallel and distributed hardware. Traditional parallel CSP solvers often rely on static work partitioning, which assumes that the computational effort required for each part of the search space can be predicted in advance. In practice, the size and difficulty of sub‑trees in a back‑tracking search are highly irregular, leading to severe load imbalance: some processors become idle while others are overloaded.

To overcome this limitation, the authors propose a work‑stealing based framework that dynamically redistributes work among participating agents during the search. Each worker maintains a local deque of sub‑trees (tasks). When a worker’s deque becomes empty, it issues a “steal request” to a randomly chosen peer. The peer responds by removing a sufficiently large sub‑tree from the head of its deque and sending it to the requester. The transferred sub‑tree becomes the new root of the requester’s search, allowing it to continue exploration without waiting for the original owner to finish.

Key technical contributions include:

Task Granularity Definition – A task is defined as a sub‑tree together with the current domain reductions. The authors introduce a heuristic that selects the depth of the sub‑tree to be stolen based on the remaining domain size and estimated branching factor, ensuring that stolen tasks are neither too small (causing excessive communication) nor too large (leading to long idle periods for the victim).
Unique Task Identifiers – To avoid duplicate exploration, each sub‑tree is assigned a globally unique identifier when it is first generated. When a steal occurs, the identifier is transferred along with the task, and both the donor and the receiver update their local bookkeeping to mark the task as “in‑flight”.
Asynchronous Pull‑Based Communication – Rather than pushing work proactively, workers pull tasks only when needed. This reduces unnecessary messages and adapts naturally to heterogeneous network latencies. The protocol also incorporates a back‑off mechanism: after a failed steal attempt, a worker waits for an exponentially increasing interval before trying again, limiting contention on the network.
Steal‑Threshold Policy – The system monitors the number of pending steal requests and dynamically adjusts a threshold that determines when a worker should consider offering work voluntarily, further smoothing load distribution.

The experimental evaluation uses a suite of well‑known CSP benchmarks, including N‑Queens (up to 16 queens), Sudoku, Graph Coloring, and a real‑world scheduling instance. Experiments were conducted on a homogeneous cluster with 8, 16, 32, and 64 workers. Results show:

Speed‑up – The work‑stealing solver achieves an average speed‑up of 2.3× over a static partitioning baseline, with peak improvements of up to 3.8× on highly irregular search spaces.
Load Balance – The standard deviation of per‑worker workload is reduced by more than 70 % compared with static partitioning, indicating a much more even distribution of effort.
Communication Overhead – Steal‑related messages account for less than 5 % of total execution time, even at 64 workers, confirming that the pull‑based protocol scales well.
Scalability – Performance scales nearly linearly up to 32 workers; beyond that, the marginal gains taper slightly due to increased network contention, but the system still outperforms the static approach.

The authors acknowledge several limitations. The current implementation assumes homogeneous processing speeds and network characteristics; extending the approach to heterogeneous clusters or cloud environments may require adaptive load‑estimation models. Moreover, the stealing policy always selects the largest available sub‑tree, which may not be optimal for all problem domains; a more sophisticated policy that considers estimated remaining work or priority could yield further gains.

Future research directions suggested include:

Integrating cost models and priority queues to guide steal decisions.
Extending the framework to hybrid CPU‑GPU architectures, where tasks could be off‑loaded to accelerators.
Applying the work‑stealing mechanism to real‑time CSPs, such as dynamic scheduling or planning, where deadlines impose additional constraints on load redistribution.
Investigating fault‑tolerance mechanisms so that lost tasks can be recovered without compromising correctness.

In conclusion, the paper demonstrates that a work‑stealing strategy, when carefully adapted to the characteristics of CSP search, provides dynamic load balancing, low communication overhead, and good scalability on distributed systems. This makes it a promising foundation for building high‑performance, scalable CSP solvers capable of exploiting the growing availability of inexpensive parallel hardware.

💡 Research Summary

📜 Original Paper Content