Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification

Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

State-of-the-art neural network verifiers operate by encoding neural network verification as constraint satisfaction problems. When dealing with standard piecewise-linear activation functions, such as ReLUs, verifiers typically employ branching heuristics that break a complex constraint satisfaction problem into multiple, simpler problems. The verifier’s performance depends heavily on the order in which this branching is performed: a poor selection may give rise to exponentially many sub-problem, hampering scalability. Here, we focus on the setting where multiple verification queries must be solved for the same neural network. The core idea is to use past experience to make good branching decisions, expediting verification. We present a reinforcement-learning-based branching heuristic that achieves this, by applying a learning from demonstrations (DQfD) techniques. Our experimental evaluation demonstrates a substantial reduction in average verification time and in the average number of iterations required, compared to modern splitting heuristics. These results highlight the great potential of reinforcement learning in the context of neural network verification.


💡 Research Summary

This paper introduces a novel, reinforcement learning (RL)-guided approach to optimize a critical component in formal neural network verification: the splitting heuristic. The verification of neural networks with piecewise-linear activation functions like ReLU is often framed as a constraint satisfaction problem and solved using Branch-and-Bound (BaB) algorithms. A core step in BaB is “splitting” on ambiguous ReLU neurons, deciding whether they are active or inactive. The order of these splits, governed by a heuristic, drastically impacts performance, as a poor choice can lead to an exponential explosion of sub-problems.

Current state-of-the-art verifiers rely on static, hand-crafted heuristics (e.g., Pseudo-Impact, Polarity, BaBSR) chosen a priori. This approach has key limitations: it’s difficult to know the best heuristic beforehand, the optimal heuristic may change during the search, and it fails to leverage knowledge across multiple verification queries on the same network.

To address these issues, the authors propose learning an adaptive splitting policy using Deep Reinforcement Learning. The core idea is to use past verification experience to inform future splitting decisions, expediting the process. They formulate the verification process as a Markov Decision Process (MDP). The state is the current internal state of the verifier (e.g., computed bounds on neuron pre-activations, current constraint violations). The action is the choice of which unfixed ReLU neuron to split on next. The reward is shaped to encourage faster verification completion (e.g., a negative reward per step and a large positive reward upon solving the query).

The methodology employs a Double Deep Q-Network (Double DQN) trained using the “Learning from Demonstrations” (DQfD) technique. DQfD allows the agent to learn effectively from a limited set of high-quality expert trajectories (generated using existing verifiers) while also exploring via its own interactions. This combination accelerates learning and stabilizes training. The learned policy maps the complex verification state to a Q-value estimate for each potential split, enabling dynamic, state-aware decision-making that no single static rule can provide.

The authors implemented a proof-of-concept system on top of the Marabou verifier. They trained and evaluated their RL agent on the ACAS Xu family of benchmarks, considering both standard safety properties and local robustness specifications. The experimental results demonstrate significant improvements over all static heuristics native to Marabou. The RL agent achieved a substantial reduction in average verification time (between 5.88% and 56.20%) and in the average number of BaB iterations required. Notably, the gains were most pronounced on the harder problem instances. The agent also solved more instances within a time limit. Analysis showed that the learned policy often mimicked the best static heuristic when it was effective but could also diverge to discover more efficient splitting strategies.

In conclusion, this work successfully bridges the gap between formal verification and machine learning. By framing heuristic selection as a learnable policy optimization problem, it opens a promising direction for creating more scalable and adaptive verification tools. The results underscore the significant potential of reinforcement learning to tackle core combinatorial challenges in automated reasoning, particularly in settings where multiple related queries allow for knowledge transfer and cumulative improvement.


Comments & Academic Discussion

Loading comments...

Leave a Comment