BOND: License to Train with Black-Box Functions

BOND: License to Train with Black-Box Functions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce Bounded Numerical Differentiation (BOND), a perturbative method for estimating the gradients of black-box functions. BOND is distinguished by its formulation, which adaptively bounds perturbations to ensure accurate sign estimation, and by its implementation, which operates at black-box interfaces. This enables BOND to be more accurate and scalable compared to existing methods, facilitating end-to-end training of architectures that incorporate non-autodifferentiable modules. We observe that these modules, implemented in our experiments as frozen networks, can enhance model performance without increasing the number of trainable parameters. Our findings highlight the potential of leveraging fixed transformations to expand model capacity, pointing to hybrid analogue - digital devices as a path to scaling networks, and provides insights into the dynamics of adaptive optimizers.


💡 Research Summary

This paper introduces Bounded Numerical Differentiation (BOND), a novel zeroth-order gradient estimation method designed to enable end-to-end training of neural network architectures that incorporate non-differentiable or “black-box” functions. Such black-box functions, which lack a computational graph and are therefore incompatible with standard backpropagation, could include specialized hardware components like neuromorphic devices or physical reservoir computing systems.

The core innovation of BOND lies in its two-part strategy. First, it employs adaptive perturbation bounds. Instead of using a global smoothing coefficient like traditional Simultaneous Perturbation Stochastic Approximation (SPSA), BOUND dynamically calculates upper and lower bounds for the perturbation magnitude applied to the black-box function’s inputs. The upper bound is inspired by the Adam optimizer’s update rule, using estimates of the gradient’s first and second moments, while the lower bound ensures estimation stability. This design focuses on ensuring accurate estimation of the gradient sign, which is crucial for convergence with adaptive optimizers.

Second, BOND uses an interface-based estimation approach. Rather than estimating gradients for all parameters of the upstream network (the “read-in” network), it only estimates the partial derivatives of the black-box function’s outputs with respect to its inputs (∂Y_R/∂Y_A). This estimated Jacobian is then combined with the known gradients from the read-in network’s computational graph via the chain rule. This reduces the estimation complexity from O(d_θ) (number of parameters) to O(d_R) (number of black-box inputs), where d_R is typically much smaller, making BOND highly scalable.

The authors demonstrate BOND’s efficacy by integrating black-box modules—implemented as frozen neural networks or Echo State Networks—into trainable architectures, termed Network-Frozen-Network (NFN) and Network-Echo-Network (NEN). Experiments show that these hybrid models can achieve performance improvements over standard networks without increasing the number of trainable parameters. This suggests that fixed, non-trainable transformations can effectively expand a model’s capacity.

The work positions BOND as a key enabler for future hybrid analog-digital computing systems, where energy-efficient physical devices perform complex, fixed transformations within a learnable digital framework. It also provides insights into optimization dynamics, highlighting the importance of gradient sign accuracy over magnitude precision when using adaptive optimizers like Adam.


Comments & Academic Discussion

Loading comments...

Leave a Comment