Dual Control with Active Learning using Gaussian Process Regression
In many real world problems, control decisions have to be made with limited information. The controller may have no a priori (or even posteriori) data on the nonlinear system, except from a limited number of points that are obtained over time. This is either due to high cost of observation or the highly non-stationary nature of the system. The resulting conflict between information collection (identification, exploration) and control (optimization, exploitation) necessitates an active learning approach for iteratively selecting the control actions which concurrently provide the data points for system identification. This paper presents a dual control approach where the information acquired at each control step is quantified using the entropy measure from information theory and serves as the training input to a state-of-the-art Gaussian process regression (Bayesian learning) method. The explicit quantification of the information obtained from each data point allows for iterative optimization of both identification and control objectives. The approach developed is illustrated with two examples: control of logistic map as a chaotic system and position control of a cart with inverted pendulum.
💡 Research Summary
The paper tackles the problem of controlling a nonlinear discrete‑time system when only a few costly observations are available. Traditional “identify‑then‑control” schemes are unsuitable because the system may be highly non‑stationary and each control action yields only a single new data point. The authors therefore propose a dual‑control framework that tightly couples active learning with control, using Gaussian Process (GP) regression as the learning engine and Shannon entropy as a quantitative measure of information gain.
The methodology proceeds in three recurring steps: (1) observe the current state, (2) update the GP model with the new data point, and (3) solve a multi‑objective optimization problem to select the next control input. The GP provides a posterior mean (the best estimate of the unknown dynamics) and a posterior covariance that quantifies prediction uncertainty. By treating the covariance matrix as a multivariate Gaussian, the entropy of the model at any candidate input can be expressed as a function of the determinant of the updated covariance. The information obtained from a prospective observation is defined as the reduction in entropy, i.e., the difference between the prior and posterior entropies. Maximizing this reduction is equivalent to minimizing the log‑determinant of the augmented covariance matrix.
Control performance (e.g., tracking error) and information gain are combined into a weighted scalar cost, yielding a meta‑optimization problem. Because the decision space is continuous and infinite, the authors approximate it by sampling a finite candidate set Θ from the admissible region X (grid sampling in low dimensions, quasi‑Monte‑Carlo in higher dimensions). The optimal input is then chosen as the candidate that minimizes the product of determinants of the covariance matrices associated with each sampled point, which is a tractable surrogate for the original entropy‑maximization objective.
Two illustrative examples validate the approach. The first controls a chaotic logistic map, demonstrating rapid convergence and lower cumulative control effort compared with a static controller that does not actively seek information. The second example concerns the position control of a cart‑inverted pendulum under noisy, sparse measurements. Despite limited data, the GP‑based controller successfully stabilizes the pendulum and drives the cart to the desired position, outperforming conventional methods that either ignore uncertainty or rely on extensive offline identification.
The authors discuss computational challenges, notably the O(N³) scaling of GP inference, which is mitigated in their setting by the inherently small data set but would require sparse or local GP approximations for larger problems. They also note that the sampling strategy for Θ critically affects performance in high‑dimensional spaces, suggesting adaptive Bayesian optimization techniques as future work. Finally, the paper calls for deeper integration of stability guarantees from control theory with Bayesian learning to enable real‑time, safety‑critical applications. Overall, the work presents a coherent and practically relevant framework that unifies active learning and control through principled information‑theoretic criteria and modern non‑parametric regression.
Comments & Academic Discussion
Loading comments...
Leave a Comment