WSBD: Freezing-Based Optimizer for Quantum Neural Networks

WSBD: Freezing-Based Optimizer for Quantum Neural Networks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The training of Quantum Neural Networks (QNNs) is hindered by the high computational cost of gradient estimation and the barren plateau problem, where optimization landscapes become intractably flat. To address these challenges, we introduce Weighted Stochastic Block Descent (WSBD), a novel optimizer with a dynamic, parameter-wise freezing strategy. WSBD intelligently focuses computational resources by identifying and temporarily freezing less influential parameters based on a gradient-derived importance score. This approach significantly reduces the number of forward passes required per training step and helps navigate the optimization landscape more effectively. Unlike pruning or layer-wise freezing, WSBD maintains full expressive capacity while adapting throughout training. Our extensive evaluation shows that WSBD converges on average 63.9% faster than Adam for the popular ground-state-energy problem, an advantage that grows with QNN size. We provide a formal convergence proof for WSBD and show that parameter-wise freezing outperforms traditional layer-wise approaches in QNNs. Project page: https://github.com/Damrl-lab/WSBD-Stochastic-Freezing-Optimizer.


💡 Research Summary

The paper tackles two fundamental bottlenecks in training quantum neural networks (QNNs): the prohibitive cost of gradient estimation using the parameter‑shift rule (PSR) and the barren‑plateau phenomenon that causes gradients to vanish exponentially with system size. To alleviate both issues, the authors propose Weighted Stochastic Block Descent (WSBD), a QNN‑specific optimizer that dynamically freezes the least influential parameters on a per‑parameter basis during training.

WSBD operates in training windows of size τ (empirically set to 100). Within each window, gradients are computed only for the currently active set A of parameters using PSR. The gradients are accumulated to form an importance score Iₚ(θₖ)=∑ₜ∂C/∂θₖ for each parameter over the window. After adding a tiny constant ε=10⁻⁸ to guarantee positivity, the scores are normalized to produce a probability distribution pₖ∝|Iₚ(θₖ)|+ε. At the end of the window, a fraction λ_f (set to 70 %) of the parameters with the lowest probabilities are stochastically frozen, while the remaining (1‑λ_f)·|θ| parameters stay active. Newly activated parameters have their importance scores reset to zero, whereas scores of already frozen parameters are retained. This stochastic‑plus‑reset mechanism encourages exploration and prevents permanent bias toward parameters that were important only early in training.

The authors provide a convergence proof under standard assumptions: the loss C is L‑smooth and bounded below, the underlying classical optimizer (e.g., SGD, Adam) yields an expected descent direction, and the mask δ(t) is independent of the optimizer’s update with a minimum selection probability p_min>0 (ensured by ε). By bounding the first‑order descent term with p_min and the second‑order term with the smoothness constant, they show that ∑ₜ ηₜ E


Comments & Academic Discussion

Loading comments...

Leave a Comment