Closed-loop Model Selection for Kernel-based Models using Bayesian Optimization
Kernel-based nonparametric models have become very attractive for model-based control approaches for nonlinear systems. However, the selection of the kernel and its hyperparameters strongly influences the quality of the learned model. Classically, th…
Authors: Thomas Beckers, Somil Bansal, Claire J. Tomlin
Closed-loop Model Selection f or K ernel-based Models using Bayesian Optimization Thomas Beckers 1 , Somil Bansal 2 , Claire J. T omlin 2 and Sandra Hirche 1 Abstract — Ker nel-based nonparametric models ha ve become very attractive f or model-based control approaches f or non- linear systems. However , the selection of the kernel and its hyperparameters strongly influences the quality of the learned model. Classically , these hyper parameters are optimized to minimize the prediction error of the model but this process totally neglects its later usage in the control loop. In this work, we present a framework to optimize the kernel and hyperparameters of a k ernel-based model directly with r espect to the closed-loop performance of the model. Our framew ork uses Bayesian optimization to iteratively refine the kernel-based model using the observed performance on the actual system until a desired perf ormance is achieved. W e demonstrate the proposed approach in a simulation and on a 3-DoF robotic arm. I . I N T R O D U C T I O N Giv en a dynamic model, control mechanisms such as model predicti ve control and feedback linearization can be used to ef fectiv ely control nonlinear systems. Howe ver , when an accurate mathematical model of the system is not a vail- able, machine learning of fers powerful tools for the modeling of dynamical systems. A special class of models that has obtained a lot of attention recently is kernel-based models, such as Support V ector Machines (SVM) and Gaussian Processes (GP). In contrast to parametric models, kernel- based models require only minimal prior knowledge about the system dynamics, and hav e been sucessfully used to model complex, nonlinear systems [1]. Using the kernel- based approach for modeling a system requires the selection of an appropriate kernel function and a set of hyperparam- eters for that function. T ypically , these selections are data- based, e.g. through minimizing a loss function that is often a trade-off between the prediction error and the complexity of the model. Howe ver , the full comple x and accurate dynamics model might not ev en be required depending on the task. Moreov er , this procedure neglects the fact that the learned model is used for the control of the actual system, which can result in reduced controller performance [2]–[4]. In this work, we propose a Bayesian Optimization (BO)- based acti ve learning frame work to optimize the kernel and its hyperparameters directly with respect to the performance of the closed-loop rather than the prediction error , see Fig. 1. This optimization is performed in a sequential fashion where at each step of the optimization, BO takes into account all the past data points and proposes the most promising kernel 1 are with the Chair of Information-oriented Control (ITR), Department of Electrical and Computer Engineering, T echnical University of Munich, 80333 Munich, Germany , { t.beckers, hirche } @tum.de 2 are with the Department of Electrical Engineering and Computer Sciences, UC, Berkeley , USA, { somil, tomlin } @eecs.berkeley.edu Controller - System Kernel-based model Bayesian optimization Cost function Reference Fig. 1. Closed-loop model selection for kernel-based models. BO is used to optimize the kernel and its hyperparameters directly based on the evaluation of a cost function. and hyperparameters for the next ev aluation. The outcome is used to define a kernel-based model that is utilized by a giv en controller . The obtained model-based controller is then applied to the actual system in a closed-loop fashion to ev aluate its performance. This information is then used by BO to optimize the next ev aluation. Consequently , multiple ev aluations on the actual system must be performed, which is often feasible such as for systems with repetitive trajectories. BO thus does not aim to obtain the most accurate dynamics model of the system, but rather to optimize the performance of the closed-loop system. T ypically , system identification approaches aim to obtain an open-loop dynamics model of the system by minimizing the state prediction error . This problem has been well studied in literature for both linear systems, e.g. [5], as well as for nonlinear systems using the function approximators such as GP [6]–[8] and neural networks (NN) [9], [10]. Ho we ver , a model obtained using this open-loop procedure can result in a reduced controller performance on the actual system [4]. T o ov ercome these challenges, adaptive control mechanisms and iterativ e learning control hav e been studied where the system dynamics or control parameters are optimized based on the performance on the actual system, e.g. [11]–[13]. Howe ver , these approaches are mostly limited to linear systems and controllers or assume at least a parametric system model. Recently , learning-based controller tuning mechanisms have also been proposed [14], [15], b ut such methods might be highly data-inefficient for general nonlinear systems as the y typically completely disreg ard the underlying dynamics [16]. T o overcome the challenges of open-loop system identifi- cation, closed-loop system identification methods have been studied that lead to more rob ust control performance on the actual system [2]–[4]. A similar approach is presented in [17], wherein the authors also propose a goal-driv en dynam- ics learning approach via BO. Howe ver , the authors aim to identify a linear dynamics model from scratch which might be a) unnecessary , as often an approximate dynamics model of the system is av ailable and b) insufficient for general nonlinear systems. Moreover , stability of the closed-loop system where the controller is based on the linear dynamics model cannot be guaranteed, whereas our approach explicitly allows to preserve the con ver gence properties of the initial closed-loop system. T o summarize, our ke y contributions are: a) we present a BO-based framew ork to optimize the kernel function and its hyperparameters of a kernel-based model to maximize the resultant control performance on the actual system; b) through numerical examples and an e xperiment on a real 3-DoF robot, we demonstrate the adv antages of the proposed approach o ver classical model selection methods. Notation: V ectors a are denoted with bold characters. Matri- ces A are described with capital letters. The term A i, : denotes the i-th row of the matrix A . The expression N ( µ, Σ) de- scribes a normal distrib ution with mean µ and co variance Σ . The set R > 0 denotes the set of positiv e real numbers. I I . P R O B L E M S E T T I N G Consider a discrete-time, potentially nonlinear system x k +1 = f ( x k , u k ) , k = { 0 , . . . , n − 1 } , n ∈ N y k = g ( x k , u k ) (1) in which f , g are unknown functions of the state x k ∈ R n x and input u k ∈ R n u . For the follo wing, we assume that the state mapping f : R n x × R n u → R n x and the output mapping g : R n x × R n u → R n y are such that there exist a unique state and output trajectory for all u k ∈ R n u and x 0 , k ≥ 0 . W e assume that a control law h : R n y × R n m → R n u u k = h ( y k − r k , m k ) (2) is giv en for the system (1). The reference r k ∈ R n y is assumed to be zero but the frame work is also applicable for a v arying signal. In addition to the reference, the control law also depends on the output m k ∈ R n m of a kernel-based model, a regression technique that uses a kernel to perform the regression in a higher-dimensional feature space. The output of a kernel-based model, m k , depends on the kernel function K , its hyperparameters ϕ ∈ R n ϕ and system input and output, i.e. m k = M ( u 0: k − 1 , y 0: k , K , ϕ ) , where the function M depends on the class of the kernel-based model, such as GP or SVM, used for the prediction. Remark 1 F or e xample, the output m k can be the predic- tion of the next state or output of the system based on the curr ent state and input, using the mean and probably the variance of a GP model. This information can then be used by the controller to compute an appropriate system input u k . The control law h might be an output tracking controller designed based on the predicted model output. For possible control laws for different classes of systems, we refer to [6]– [8], [18], [19]. The goal of this work is to optimize the model kernel and its hyperparameters such that the corresponding model output m k minimizes the follo wing cost functional C ( y 0: k , u 0: k ) = n − 1 X k =0 c ( y k , u k ) , (3) where c ( y k , u k ) : R n y × R n u → R represents the cost incurred for the control input u k and the system output y k . The cost function here might represent the requirements concerning the closed-loop, e.g. an accurate tracking beha vior or a minimized po wer consumption. Note that the cost functional in (3) implicitly depends on the kernel-based model M through u k , see (2). The optimization of (3) is challenging since the system dynamics in (1) are unkno wn and the kernel- based model output m k indirectly influences the cost. T o ov ercome this challenge, we use BO to optimize the kernel and the hyperparameters based on the direct ev aluation of the control la w in (2) on the system (1) to find those that minimize the cost functional in (3). I I I . P R E L I M I N A R I E S A. Kernel-based models The prediction of parametric models is based on a pa- rameter vector w ∈ R n a which is typically learned using a set of training data points. In contrast, nonparametric models typically maintain a subset of the training data points in memory in order to make predictions for new data points. Many linear models can be transformed into a dual representation where the prediction is based on a linear combination of kernel functions. The idea is to transform the data points of a model to an often high-dimensional feature space where a linear regression can be applied to predict the model output. For a nonlinear feature map φ : R n a → R n φ with n φ ∈ N ∪ {∞} , the kernel function is giv en by the inner product K ( a , a 0 ) = h φ ( a ) , φ ( a 0 ) i , ∀ a , a 0 ∈ R n a . Thus, the kernel implicitly encodes the w ay the data points are transformed into a higher dimensional space. The formu- lation as inner product in a feature space allows to extend many standard re gression methods. A drawback of kernel- based models is that the selection of the kernel and its hyperparameters hea vily influences the interpretation of the data and thus, the quality of the model. Commonly , the kernel and hyperparameters are determined based on the optimization of a loss function such as cross-validation or the likelihood function. In our work, the kernel and its hyperparameters are optimized with respect to performance of the closed-loop system. B. Gaussian pr ocess Extending the concept of kernel functions to probabilistic models leads to the framework of Gaussian process regres- sion (GPR). In particular , GPR is a supervised learning tech- nique which combines sev eral advantages. As probabilistic kernel techniques, GPs provides not only a mean function b ut also a measure for the uncertainty of the regression. In this work, we use GPR in BO to model the unknown closed-loop objectiv e function, as well as for the kernel-based dynamics model M in the experiment. The GPR can be deriv ed using a standard linear regression model q ( a ) = a > w , b = q ( a ) + where a ∈ R n a is the input vector , w the vector of weights and q : R n a → R the function value. The observ ed value b ∈ R is corrupted by Gaussian noise ∼ N (0 , σ 2 n ) . Using the feature map φ ( a ) instead of a , leads to f ( a ) = φ ( a ) > w with f : R n a → R . The analysis of this model is analogous to the standard linear regression, i.e. we put a prior on the weights such that w ∼ N ( 0 , Σ p ) with Σ p ∈ R n φ × n φ . The mean function is usually defined to be zero, see [20]. Based on m collected training data points A = [ a 1 , . . . , a m ] and B = [ b 1 , . . . , b m ] > , the prediction q ∗ ∈ R for a new test point a ∗ ∈ R n a can be computed using the Bayes’ rule. In particular , it is gi ven by q ∗ | a ∗ , A, B ∼ N ( k > ∗ K − 1 ∗∗ B , k ∗∗ − k > ∗ K − 1 ∗∗ k ∗ ) , where K ( a , a 0 ) = φ ( a ) > Σ p φ ( a 0 ) , k ∗∗ = K ( a ∗ , a ∗ ) and k ∗ = [ K ( a ∗ , A 1 , : ) , . . . , K ( a ∗ , A m, : )] > . The cov ariance ma- trix K ∗∗ = ( K + σ 2 n I ) is defined by K i,j = K ( a i , a j ) . Thus, based on the training data A, B , the estimation of the function value q ∗ follows a normal distribution where the mean and the variance depend on the test input a ∗ . Follo wing Remark 1, the mean and variance can be used for state estimation in the control law (2). The choice of the kernel and hyperparameters ϕ ∈ R n ϕ can be seen as degrees of freedom of the re gression. A popular k ernel choice in GPR is the squared e xponential kernel, see [20]. One possibility for estimating the hyperparameters ϕ is by means of the likelihood function, thus by maximizing the probability of ϕ ∗ = 1 2 B > K − 1 ∗∗ B + log | K ∗∗ | + m log 2 π (4) which results in an automatic trade-of f between the data-fit B > K − 1 ∗∗ B and model complexity log | K ∗∗ | , see [20]. C. Bayesian Optimization (BO) Bayesian Optimization is an approach to minimize an unknown objective function based on (only a few) ev aluated samples. W e use BO to optimize the cost function (3) based on the kernel-based model as this is in general a non-con ve x optimization problem with unknown objective function (because the system dynamics are unkno wn), and probably multiple local extrema. BO is well-suited for this optimization as the task ev aluations can be expensiv e and noisy [21]. Futhermore, BO is a gradient-free optimization method which only requires that the objective function can be ev aluated for any giv en input. Since the objectiv e function is unknown, the Bayesian strategy is to treat it as a random function with a prior, often as Gaussian process. Note that this GP here is used for the closed-loop cost functional approximation in BO and is not related to the kernel-based model for the controller (2) as stated in Remark 1. The prior captures the beliefs about the behaviour of the function, e.g. continuity or boundedness. After gathering the cost (3) of the task ev aluation, the prior is updated to form the posterior distribution over the objectiv e function. The posterior distribution is used to construct an acquisition function that determines the most promising kernel/hyperparameters for the next ev aluation to minimize the cost. Different acquisition functions are used in literature to trade off between exploration of unseen kernel/hyperparameters and exploitation of promising combinations during the optimization process. Common acquisition functions are expected improvement, entropy search, and upper confidence bound [22]. T o escape a local objectiv e function minimum, the authors of [23] propose a method to modify the acquisition function when they seem to over -exploit an area, namely expected-impro vement-plus. That results in a more comprehensiv e and also partially random exploration of the area and, thus it is probably faster in finding the global minimum. W e also use this acquisition function for BO in our simulation and the e xperiment. I V . C L O S E D - L O O P M O D E L S E L E C T I O N Our goal is to optimize the model’ s kernel and its hyper - parameters with respect to the cost functional C ( y 0: k , u 0: k ) . Thus, in contrast to the classical kernel selection problem, where the kernel is selected to minimize the state prediction error , our goal here is not to get the most accurate model but the one that achie ves the best closed-loop behavior . W e no w describe the proposed overall procedure for the kernel selection to optimize the closed-loop behavior; we then describe each step in detail. W e start with an initial kernel K with hyperparameters ϕ , and obtain the control law for the system (1) using (2) with the model output m k = M ( u 0: k − 1 , y 0: k , K , ϕ ) . This control law is then applied to the actual system, and the cost function (3) is ev aluated after performing the control task. Depending on the obtained cost value, BO suggests a new kernel and corresponding hyperparameters for the kernel-based model M in order to minimize the cost function on the actual system. With this model, the control task is repeated and, based on the cost ev aluation, BO suggests the next kernel and hyperparameters. This procedure is continued until a maximum number of task e v aluations is reached or the user rates the closed-loop performance as sufficient enough. W e now describe the abov e three steps, i.e. initialization, ev aluation and optimization, in detail. A. Initialization W e define a set K = { K 1 , . . . , K n K } of n K ∈ N kernel candidates K j that we want to choose the kernel from for our kernel-based model. BO will be used to select the kernel with the best closed-loop performance in this set. Remark 2 The selection of possible kernels can be based on prior knowledge about the system, e.g . smoothness with the Mat ´ ern kernel or number of equlibria using a polynomial kernel, see [24] and [1] for general pr operties, r espectively . In addition, each kernel depends on a set of hyperparameters. Since the number of h yperparameters could be different for each kernel, we define a set of sets P = { Φ 1 , . . . , Φ n K } such that Φ j ⊂ R n Φ j is a closed set. Here, n Φ j represents the number of hyperparameters for the kernel K j . Moreover , we assume that Φ j is a v alid hyperparameter set . Definition 1 The set Φ is called a hyperparameter set for a kernel function K iff the set Φ is a domain for the hyperparameters of K . For the first ev aluation of the closed-loop, the kernel-based model function M is created with an initial kernel K j of the set K and hyperparameters ϕ j ∈ Φ j with j ∈ { 1 , . . . , n K } . Remark 3 One potential way to select the initial kernel and hyperparameters is to set them equal to the kernel and hyperparameters of a prediction model that is optimized with r espect to a loss function, e.g., using cr oss-validation or maximization of the likelihood function [1]. B. T ask Evaluation For the i -th task ev aluation, BO determines an inde x value j ∈ { 1 , . . . , n K } and a ϕ j ∈ Φ j . The control law (2) for the kernel-based model M , with the determined ker - nel K j and hyperparameters ϕ j , is applied to the system (1) x k +1 = f ( x k , h ( y k , M ( u 0: k − 1 , y 0: k , K j , ϕ j )) y k = g ( x k , u k ) for k = { 0 , . . . , n − 1 } with fixed x 0 ∈ R n x . Remark 4 W e focus her e on a single, fixed initial state x 0 . However , multiple (close by) initial states can be considered by using the expected cost acr oss all initial states. The corresponding input and output sequences u 0: k and y 0: k , respectiv ely , are recorded. Afterwards, the cost function giv en by C ( y 0: k , u 0: k ) is e v aluated. C. Model Optimization In the next step, we use BO to minimize the cost function with respect to the kernel and its hyperparameters, i.e. [ K j , ϕ j ] = argmin j ∈{ 1 ,...,n K } , ϕ j ∈ Φ j C ( y 0: k , u 0: k ) . (5) Thus, this problem in volv es continuous and discrete vari- ables in the optimization task whereas classical BO assumes continuous variables only . T o overcome this restriction, a modified version of BO is used where the kernel function is transformed in a way such that integer -valued inputs are properly included [25]. Based on previous ev aluations of the cost function, BO updates the prior and minimizes the acquisition function. The result is a kernel K j and hyperparameters ϕ j which are used in the model function M ( u 0: k − 1 , y 0: k , K j , ϕ j ) . Then, the corresponding control la w is evaluated again on the system and the procedure is repeated until a maximum number of task ev aluations has been reached or a sufficient performance lev el has been achie ved. D. Theoretical Analysis In this section, we show that, under some additional assumptions, the stability of the closed-loop is preserved during the task ev aluation process and that BO conv erges to the minimum of the closed-loop cost function. Here, we focus on stationary kernels k ( x , x 0 ) = k ( x − x 0 ) > Σ − 1 ( x − x 0 ) , x , x 0 ∈ R n x with lengthscales ϕ ∈ R n ϕ > 0 and Σ = diag( ϕ 1 , . . . , ϕ n x ) . Stationary kernels can al ways be expressed as a function of the dif ference between their inputs and they are a common choice for k ernel-based models [1]. Assumption 1 Let k f k K ∗ ϕ ∗ < ∞ and the selected contr ol law (2) , based on the model M with stationary kernel K ∗ and hyperparameters ϕ ∗ ∈ R n ϕ > 0 , guarantees that k y k k ≤ r y ∈ R > 0 for the given system (1) for k > n 1 ∈ N . The first part of the assumption, i.e. the bounded repro- ducing kernel Hilbert space (RKHS) norm, is a measure of smoothness of the function with respect to the kernel K with hyperparameters ϕ ∗ ∈ R n ϕ > 0 . It is a common assumption for stabilizing controllers using kernel-based methods and is discussed in more detail in [19]. Controllers that satisfy this property for nonlinear, unknown systems are gi ven, e.g. by [6], [19], [26]. The focus on stationary kernels is barely restrictiv e as many successful applied kernels for nonlinear control are stationary . Lemma 1 W ith Assumption 1, ther e e xists a non-empty set K and a hyperparameter set Φ 1 ⊃ { ϕ ∗ } such that ∀ K j ∈ K , for all ϕ j ∈ Φ j the boundedness k y k k ≤ r y of the system (1) for k > n 1 is preserved. This lemma guarantees that there exists a k ernel set K and a set P of hyperparameters that contains the stabilizing kernel K ∗ and the h yperparameter ϕ ∗ of Assumption 1. Thus, the proposed method can be applied to existing kernel-based control methods without losing achie ved guarantees. Before we start with the proof, the follo wing lemma is recalled. Lemma 2 ( [23, Lemma 4]) If f ∈ H K ϕ then f ∈ H K ϕ 0 holds for all 0 < ϕ 0 i ≤ ϕ i , ∀ i ∈ { 1 , . . . , n ϕ } , and k f k 2 K ϕ 0 ≤ n ϕ Y i =1 ϕ i ϕ 0 i ! k f k 2 K ϕ . Pr oof: [Lemma 1] Assumption 1 inherently guarantees that at least one kernel K 1 = K ∗ exist that preserves the boundedness of the system such that we define K = { K 1 } . Since Assumption 1 ensures that k f k K ∗ ϕ ∗ is bounded and with Lemma 2, the mapping f ∈ H K 1 ϕ ∗ and, thus k f k K 1 ϕ is bounded for ϕ i ∈ R > 0 , ∀ i where ϕ i < ϕ ∗ i , ∀ i . For an upper bound, there exist ϕ i ∈ R > 0 , ∀ i such that ϕ ∗ i < ϕ i and f ∈ H K 1 ϕ , following Lemma 2. Thus, we define the set Φ 1 = { ϕ : ϕ i ≤ ϕ i ≤ ϕ i , ∀ i } (6) as proper superset of ϕ ∗ . Based on this set, k f k K 1 ϕ < ∞ for all ϕ ∈ Φ 1 that guarantees the boundedness. Consequently , with Assumption 1, the stability of the control loop is preserved during the task e valuation. Furthermore, the minimum cost is not worse than the initial cost after BO as stated in the follo wing. Corollary 1 Let C cl be the minimum cost (3) after BO (5) with K = { K 1 = K ∗ } and Φ 1 of (6) . Let C ol be the initial cost based on the contr ol with kernel K ∗ and hyperparame- ter ϕ ∗ then C cl ≤ C ol holds. Pr oof: Since C cl is the minimum cost after BO that starts with the initial, data-based selected kernel K ∗ and hyperparameter ϕ ∗ , it clearly follows that C cl ≤ C ol because of K ∗ ∈ K and ϕ ∗ ∈ Φ 1 . W e now show that BO can conv erge to the global mini- mum of the cost function C under specific conditions starting with the follo wing assumption. Assumption 2 The RKHS norm of the cost function is bounded, i.e . k C k K ≤ r ∈ R > 0 with r espect to the k ernel K of the GP (4) that is used as prior C ∼ G P (0 , K ) of the Bayesian optimization (5) . Intuitiv ely , Assumption 2 states that the kernel of the GP for BO is selected such that the GP can properly approximate the cost function. This sounds paradoxical since the cost function is unknown because of the unknown system behavior . How- ev er , there exist some kernels, so called univ ersal kernels, which can approximate at least any continuous function arbitrarily precisely [27, Lemma 4.55]. Lemma 3 ( [28]) W ith Assumption 2, BO in (5) with upper confidence bound acquisition function [28, Eq.(6)] con ver ge with high pr obability to the global minimum of C . V . E V A L UAT I O N In this section, we present a simple illustrati ve example that highlights our closed-loop model selection approach for kernel-based models. In addition, an example with a 3- DOF robot demonstrates the applicability of the proposed approach to hardware testbeds. BO is used with the e xpected- improv ement-plus as acquisition function because of its satisfactory performance in practical applications, see [23], using a GP as prior . A. Simulation Consider the follo wing one-dimensional system x k +1 = exp( − 1 100 x 2 k ) sin( x k ) + 1 3 x k + u k , y k = x k (7) with state x k and control u k at time k . For the purpose of this e xample, we assume that the system dynamics in (7) are unknown yet we wish to av oid a high-gain control approach due to its unf av orable properties [29], and use the proposed closed-loop model selection framework to optimize the con- trol performance. As control law , a feedback linearization u k = − ˆ f ( x k |M , D ) + 1 2 x k (8) is applied with the prediction ˆ f of a Support vector machine model M . The data set D consists of 11 homogeneously distributed training pairs { x j k , x j k +1 } 11 j =1 of the system (7) in the interv al x k ∈ [ − 10 , 10] with u k = 0 . The linear , polynomial (cubic) and the Gaussian kernel are selected as possible kernel candidates, see T able I for details. The Gaussian kernel possesses one hyperparameter ϕ 1 which is a scaling factor for the data. In addition, the regression of the SVM depends on a hyperparameter ϕ 2 that defines the smoothness of the prediction and af fects the number of support vectors, see [30]. First, we ev aluate a classical, data-based procedure which optimizes the kernel and the hyperparameters with respect to the cross-validation loss function [27] based on the training data only . Using BO, a minimum loss of 0 . 9127 is found using the linear kernel with ϕ 2 = 0 . 0336 , T able II. Using this linear model in the control loop with the nonlinear system (7) and control law (8) for x 0 = 3 , the control error remains above zero, see Fig. 2. With the cost function C = P 9 k =0 k x 2 k , the trajectory generates a cost of 204 . 4769 . In comparison, the hyperparameters and the kernel are op- timized with the proposed method. For this purpose, we ev aluate the performance of the closed-loop system and use BO to compute the next promising kernel and hy- perparameter combination. Figure 3 shows the mean and standard deviation of 20 repetitions over 50 trials each. The repetitions are run since BO exploration of the cost is also affected by randomness. The cost is reduced to a mean value of C = 16 . 410 and the loss is 2 . 491 . Figure 2 sho ws that the regression is more accurate which results in a reduced control error . T able II also presents the results for adding the collected data of all the 50 trials to the e xisting data to redefine the model (Data-based A T). Even with more training data, the data-based optimization fav ors the linear kernel. 1) Discussion: The example demonstrates that the op- timization based on the training data only can lead to a reduced performance of the closed-loop system. T able II clearly shows that the data-based optimization results in a smaller loss with the linear kernel but generates a higher cost of the closed-loop system. In comparison, the closed- loop optimization finds a set of hyperparameters with the Gaussian kernel that significantly reduced the control error ev en if the loss of the model is higher . Thus, especially in the case of sparse data, the data-based optimization can misinterpret the data which can be avoided with the closed- loop model selection. W e observe that at the be ginning of the closed-loop optimization, BO switches a lot between the kernels and towards the end, it focus on the hyperparameters. Using the data which is obtained during the 50 trials to refine the model in data-based manner only slightly improves the performance but heavily increases the computational time of the kernel-based model due to the lar ger training data set. B. Experiment 1) Setup: For the experimental ev aluation, we use the 3-dof SCARA robot CARBO as pictured in Fig. 4. The links between the joints have a length of 0 . 3 m and a T ABLE I K E RN E L C A N DI DAT ES Kernel Formula Linear K ( x , x 0 ) = x > x 0 Polynomial (cubic) K ( x , x 0 ) = (1 + x > x 0 ) 3 Gaussian K ( x , x 0 ) = exp( − k x − x 0 k 2 ϕ 2 1 ) , ϕ 1 ∈ R T ABLE II C O MPA R IS O N B E T WE E N DAT A - BA S ED , D A TA - BA S E D W I TH A D DI T I O NA L T R AI N I N G DA TA A N D C LO S E D - L O O P O PT I M I ZAT IO N Method Selected kernel ϕ 1 , ϕ 2 Loss Cost Data-based Linear − , 0 . 034 0 . 913 204 . 477 Data-based A T Linear − , 0 . 301 0 . 09 199 . 634 Closed-loop Gaussian 2 . 333 , 0 . 013 2 . 491 16 . 410 0 2 4 6 8 0 1 2 3 4 T ime step Control error after data-based optimization after closed-loop optimization Fig. 2. Control error (top) and system model (bottom) using closed-loop model optimization for 20 repetitions with mean and 5 σ deviation (blue) and data-based model selection (red). 10 20 30 40 50 0 100 200 300 400 T ask ev aluations Min. cost C Fig. 3. Minimum of the cost function over the number of trials for 20 repetitions for the closed-loop model selection algorithm. spoon is attached at the end effector of the robot. The goal is to follo w a gi ven trajectory as precisely as possible without using high feedback gains, which might result in sev eral practical disadvantages, see [31]. Therefore, a precise model of the system’ s dynamics is necessary . Since the modeling of the nonlinear fluid dynamics with a parametric model would be v ery time consuming, we use a computed torque control method based on a GP model which allo ws high performance tracking control while also being able to guarantee the stability of the control loop [26]. Underlying, a low lev el PD-controller enforces the generated torque by regulating the v oltage based on a measurement of the current. The controller is implemented in MA TLAB/Simulink on a Linux real-time system with a sample rate of 1 ms . For the implementation of the GP model, we use the GPML toolbox 1 . The desired trajectory follows a circular stirring mov ement through the fluid with a frequenc y of 0 . 5 hz . Modeling: Here, we use a Gaussian process model M as kernel-based model technique based on 223 collected training points. The data is collected around the desired trajectory using a high gain controller . The placement of the training points heavily influences the control performance. Howe ver , the proposed approach focuses on improving the performance based on existing data. Each data pair consists of the position and velocity of all joints [ q , ˙ q ] > and the corresponding torque for the i -th joint, τ i . Since the GP produces one-dimensional outputs only , 3 GPs are used in total for the modeling of the robot’ s dynamics. Each GP i = 1 , . . . , 3 uses a squared e xponential kernel K ( x , x 0 ) = ϕ 2 i exp −k x − x 0 k 2 ϕ 2 i +3 ! , ϕ i ∈ R \ { 0 } 1 http://www .gaussianprocess.or g/gpml/code Fig. 4. Stirring with the 3-dof SCARA robot CARBO. that can approximate any continuous function arbitrarily ex- actly . With ϕ = [ ϕ 1 , . . . , ϕ 6 ] and the signal noise σ n ∈ R 3 , see [20], a total number of 9 parameters must be optimized. In contrast to the simulation, the kernel is fixed to reduce the optimization space and thus, the number of task ev aluations. Control law: The control input, i.e. the torque τ ( ˙ q , q ) for all joints, is generated based on an estimated parametric model and the mean prediction µ of the GP model as feed-forward component and a lo w g ain PD-feedback part τ d = ˆ H ¨ q d + ˆ C ˙ q d + ˆ g + µ ( ˙ q , q |M ) − K d ˙ e − K p e . Here, the desired trajectory is gi ven by q d , ˙ q d and ¨ q d with the error ˙ e = ˙ q d − ˙ q , e = q d − q . The feedback matrices are giv en by K p = diag([60 , 40 , 10]) and K d = diag([1 , 1 , 0 . 4]) . The estimated parametric model is derived from a mathematical model where the parameters are physically measured. For the discretization of the control input, a zero-order method is used. F or more details see [26]. 2) Evaluation: The e valuation of the performance of the closed-loop is based on the cost function C = 1 2000 2000 X k =0 e ( k T ) > e ( k T ) with T = 1 ms . Therefore, the cost function is a measure for the tracking accuracy of the stirring mov ement. W e consider as kernel candidate the squared exponential kernel, such that only the hyperparameters σ n , ϕ are optimized. T a- ble III shows the comparison between the data-based and the closed-loop optimization. In the data-based case, the hyperparameters are optimized based on a gradient method to minimize the log likelihood function (in this case, BO of the hyperparameters results in the same values). In contrast, BO is used to minimize the tracking error in the closed-loop optimization. The initial values of the hyperparameters are set to the values of the data-based optimization. The bounds are defined as the 0 . 5 and 2 times of the initial values. The ev olution of the minimum cost over the trials, where each trial is a single stirring mo vement, is shown in Fig. 5. T ABLE III C O MPA R IS O N : DAT A - BA S ED A ND C L OS E D - LO O P O P TI M I Z A T IO N V alue Data-based Closed-loop σ n [0 . 10 , 3e − 3 , 6e − 4] [0 . 20 , 4e − 3 , 3e − 4] ϕ 1 , 2 , 3 [3 . 49 , 1 . 42 , 2 . 87] [2 . 61 , 1 . 68 , 5 . 70] ϕ 4 , 5 , 6 [1 . 21 , 0 . 25 , 0 . 27] [0 . 80 , 0 . 27 , 0 . 29] Log. likelihood [89 , − 121 , − 176] [115 , − 113 , − 136] Cost (Tracking error) 1 . 49 1 . 05 20 40 60 80 100 1 1 . 2 1 . 4 T ask ev aluations Min. cost C Fig. 5. Minimum of the cost function over the number of trials. 0 0 . 5 1 1 . 5 2 0 2 4 T ime [s] Position error [de g] data-based optimization closed-loop optimization Fig. 6. Comparison of the root square position error of all joints. The comparison of the joint position error for the data-based and closed-loop optimization is shown in Fig. 6. 3) Discussion: After 100 trials, the tracking error is decreased by 30% through the optimization of the Gaussian process model only . Even if the resulting hyperparameters are sub-optimal with respect to the lik elihood function, see T a- ble III, the performance of the closed-loop is significantly improv ed. In comparison to collecting more training data to improv e the model, the proposed method does not increase the computational burden of the Gaussian process prediction which is often critical in real-time applications. Since only the model is adapted, the properties of the closed-loop control architecture are also preserved. C O N C L U S I O N In this paper, we present a framework for the model selection for kernel-based models to directly optimize the ov erall closed-loop control performance. For this purpose, the kernel and its hyperparameters are optimized using Bayesian optimization with respect to a cost function that ev aluates the performance of the closed-loop. It is shown that this approach allows to preserve the control architecture properties as only the model is adapted. Simulations and hardware experiments demonstrate the adv antages of the proposed approach to data-based model selection techniques. A C K N O W L E D G M E N T S The research has receiv ed funding from the ERC Starting Grant “Con-humo” n o 337654 and BaCaT ec grant 9-[2018/1]. R E F E R E N C E S [1] C. M. Bishop et al. , P attern recognition and machine learning , vol. 4. Springer New Y ork, 2006. [2] P . Abbeel, M. Quigley , and A. Y . Ng, “Using inaccurate models in reinforcement learning, ” in International Conference on Machine Learning , 2006. [3] M. Ge vers, “Identification for control: From the early achie vements to the revi val of experiment design, ” Eur opean journal of contr ol , v ol. 11, pp. 1–18, 2005. [4] H. Hjalmarsson, M. Ge vers, and F . De Bruyne, “For model-based control design, closed-loop identification gi ves better performance, ” Automatica , vol. 32, no. 12, pp. 1659–1673, 1996. [5] L. Ljung, “System identification, ” in Signal analysis and prediction , pp. 163–173, Springer , 1998. [6] G. Chowdhary , J. How , and H. Kingravi, “Model reference adaptiv e control using nonparametric adaptive elements, ” in Confer ence on Guidance Navigation and Control , 2012. [7] J. Umlauft, T . Beckers, M. Kimmel, and S. Hirche, “Feedback lin- earization using Gaussian processes, ” in IEEE Confer ence on Decision and Control , 2017. [8] T . Beckers, J. Umlauft, and S. Hirche, “Stable model-based control with Gaussian process regression for robot manipulators, ” in IF AC W orld Congr ess , 2017. [9] K. S. Narendra and K. Parthasarathy , “Identification and control of dynamical systems using neural networks, ” IEEE T ransactions on neural networks , vol. 1, no. 1, pp. 4–27, 1990. [10] S. Bansal, A. K. Akametalu, F . J. Jiang, F . Laine, and C. J. T omlin, “Learning quadrotor dynamics using neural network for flight control, ” in IEEE Confer ence on Decision and Contr ol , 2016. [11] K. J. ˚ Astr ¨ om and B. Wittenmark, Adaptive contr ol . Courier Corpora- tion, 2013. [12] D. Clarke, P . Kanjilal, and C. Mohtadi, “ A generalized LQG approach to self-tuning control part i. aspects of design, ” International Journal of Control , vol. 41, no. 6, pp. 1509–1523, 1985. [13] D. A. Bristow , M. Tharayil, and A. G. Alleyne, “ A survey of iterativ e learning control, ” IEEE contr ol systems magazine , vol. 26, no. 3, pp. 96–114, 2006. [14] F . L. Lewis and D. Liu, Reinfor cement learning and approximate dynamic pr ogramming for feedback contr ol , vol. 17. John Wile y & Sons, 2013. [15] R. Calandra, A. Seyfarth, J. Peters, and M. P . Deisenroth, “Bayesian optimization for learning gaits under uncertainty , ” Annals of Mathe- matics and Artificial Intelligence , vol. 76, no. 1, pp. 5–23, 2015. [16] B. Recht, “ A tour of reinforcement learning: The view from contin- uous control, ” Annual Review of Contr ol, Robotics, and Autonomous Systems , 2018. [17] S. Bansal, R. Calandra, T . Xiao, S. Levine, and C. J. T omlin, “Goal-driv en dynamics learning via Bayesian optimization, ” in IEEE Confer ence on Decision and Contr ol , 2018. [18] J. A. Suykens, J. V andewalle, and B. De Moor , “Optimal control by least squares support vector machines, ” Neural networks , vol. 14, no. 1, pp. 23–35, 2001. [19] F . Berkenkamp, R. Moriconi, A. P . Schoellig, and A. Krause, “Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes, ” in IEEE Conference on Decision and Control , 2016. [20] C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning , vol. 1. MIT press Cambridge, 2006. [21] B. Shahriari, K. Swersky , Z. W ang, R. P . Adams, and N. De Freitas, “T aking the human out of the loop: A revie w of Bayesian optimiza- tion, ” IEEE , vol. 104, no. 1, pp. 148–175, 2016. [22] J. Mockus, Bayesian appr oach to global optimization: theory and applications , vol. 37. Springer Science & Business Media, 2012. [23] A. D. Bull, “Con vergence rates of efficient global optimization al- gorithms, ” Journal of Machine Learning Research , vol. 12, no. Oct, pp. 2879–2904, 2011. [24] T . Beckers and S. Hirche, “Equilibrium distributions and stability analysis of Gaussian process state space models, ” in IEEE Conference on Decision and Control , 2016. [25] E. C. Garrido-Merch ´ an and D. Hern ´ andez-Lobato, “Dealing with integer -valued variables in Bayesian optimization with Gaussian pro- cesses, ” arXiv preprint , 2017. [26] T . Beckers, D. Kuli, and S. Hirche, “Stable Gaussian process based tracking control of Euler-Lagrange systems, ” Automatica , no. 103, pp. 390–397, 2019. [27] I. Steinwart and A. Christmann, Support vector machines . Springer Science & Business Media, 2008. [28] N. Sriniv as, A. Krause, S. M. Kakade, and M. W . Seeger, “Information-theoretic regret bounds for Gaussian process optimiza- tion in the bandit setting, ” IEEE Tr ansactions on Information Theory , vol. 58, no. 5, pp. 3250–3265, 2012. [29] A. Isidori, Nonlinear contr ol systems . Springer Science & Business Media, 2013. [30] V . Kecman, Learning and soft computing: support vector machines, neural networks, and fuzzy logic models . MIT press, 2001. [31] D. Nguyen-T uong, M. Seeger , and J. Peters, “Computed torque control with nonparametric regression models, ” in American Contr ol Confer- ence , 2008.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment