Observation-dependent Bayesian active learning via input-warped Gaussian processes

Observation-dependent Bayesian active learning via input-warped Gaussian processes
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Bayesian active learning relies on the precise quantification of predictive uncertainty to explore unknown function landscapes. While Gaussian process surrogates are the standard for such tasks, an underappreciated fact is that their posterior variance depends on the observed outputs only through the hyperparameters, rendering exploration largely insensitive to the actual measurements. We propose to inject observation-dependent feedback by warping the input space with a learned, monotone reparameterization. This mechanism allows the design policy to expand or compress regions of the input space in response to observed variability, thereby shaping the behavior of variance-based acquisition functions. We demonstrate that while such warps can be trained via marginal likelihood, a novel self-supervised objective yields substantially better performance. Our approach improves sample efficiency across a range of active learning benchmarks, particularly in regimes where non-stationarity challenges traditional methods.


💡 Research Summary

The paper addresses a fundamental limitation of Gaussian‑process (GP) based Bayesian active learning (BAL): under fixed hyper‑parameters the posterior variance, which drives most acquisition functions, is conditionally independent of the observed function values. Consequently, the sequence of queries is determined solely by the geometry of the input locations, making the process effectively open‑loop and blind to regions of high variability until they are sampled by chance.

To overcome this, the authors introduce an observation‑dependent input warping mechanism. A bijective, monotone mapping (T_{\phi}) re‑parameterises the input space before the acquisition function is evaluated. The warp is realised with conditional rational‑quadratic splines (C‑RQS), guaranteeing smoothness and global injectivity. Importantly, the GP surrogate itself remains unchanged; only the geometry in which predictive variances are computed is altered. By locally expanding regions where the target function varies rapidly and compressing smoother regions, the warped geometry makes variance‑based acquisition functions (e.g., Expected Information Gain, Max‑Var) sensitive to observed variability without modifying the predictive mean or variance of the underlying GP.

Two training objectives for the warp parameters (\phi) are examined. The first is marginal log‑likelihood (MLL) maximisation, the standard approach for learning GP hyper‑parameters. Empirical results show that MLL, which focuses on fitting the observed data, does not align well with the goal of shaping exploration. The second, novel self‑supervised objective, treats a standard, unwarped GP as a fixed geometric reference. A set of probe points is sampled (Sobol sequence) and the warped GP’s predictive density of the reference mean is maximised, i.e., the expected negative log‑predictive density is minimised. This loss directly regularises the posterior geometry induced by the warp, encouraging a redistribution of predictive variance across the domain while leaving the GP’s predictive performance untouched.

The learning loop alternates between (1) fitting GP hyper‑parameters (\theta) with the warp frozen, (2) updating the warp (\phi) with (\theta) frozen using the self‑supervised loss, and (3) selecting the next query by maximising the acquisition function evaluated on warped inputs. This alternating optimisation enables the acquisition policy to adapt in real time to observed function complexity.

Experiments span synthetic 1‑D and multi‑dimensional functions, non‑stationary physical simulators, and real scientific datasets such as solar‑cell efficiency prediction. Baselines include standard GP‑BALD, Max‑Var, and non‑stationary GP models without warping. Results consistently demonstrate that the self‑supervised warp yields 30‑50 % fewer queries to achieve a target error compared with both the MLL‑trained warp and unwarped baselines, especially in regimes with abrupt changes or heteroscedastic noise. Ablation studies confirm that the monotonic spline parameterisation and regularisation are crucial for stable training and that excessive non‑linearity can distort the GP’s coverage.

The contributions are threefold: (i) a clear exposition of why variance‑based acquisition is inherently open‑loop under stationary GP models, (ii) a decoupled framework that learns an input re‑parameterisation solely for exploration while preserving the predictive GP, and (iii) a self‑supervised geometric loss that aligns warp learning with the objectives of active learning, outperforming traditional marginal‑likelihood optimisation. Limitations include increased computational overhead in high‑dimensional spaces due to spline parameter growth and the need for careful regularisation to avoid over‑warping. Future work may explore more scalable warp architectures, adaptive dimensionality reduction, and integration with gradient‑based acquisition strategies.


Comments & Academic Discussion

Loading comments...

Leave a Comment