Informative Planning and Online Learning with Sparse Gaussian Processes

Inf ormativ e Planning and Online Lear ning with Sparse Gaussian Pr ocesses Kai-Chieh Ma, Lantao Liu, Gaurav S. Sukhatme Abstract — A big challenge in en vironmental monitoring is the spatiotemporal variation of the phenomena to be observ ed. T o enable persistent sensing and estimation in such a setting, it is beneﬁcial to ha ve a time-varying underlying envir onmental model. Here we present a planning and learning method that enables an autonomous marine vehicle to perf orm persistent ocean monitoring tasks by lear ning and reﬁning an envir on- mental model. T o alle viate the computational bottleneck caused by large-scale data accumulated, we propose a framework that iterates between a planning component aimed at collecting the most information-rich data, and a sparse Gaussian Pro- cess learning component where the envir onmental model and hyperparameters are learned online by taking advantage of only a subset of data that pro vides the gr eatest contribution. Our simulations with ground-truth ocean data shows that the proposed method is both accurate and efﬁcient. I . I N T R O D U C T I O N A N D R E L A T E D W O R K Scientists are able to gain a greater understanding of the en vironmental processes (e.g., physical, chemical or biologi- cal parameters) through en vironmental sensing and monitor- ing [7]. Howev er , many en vironmental monitoring scenarios in volv e large en vironmental space and require considerable amount of work for collecting the data. Increasingly , a variety of autonomous robotic systems including marine vehicles [8], aerial vehicles [27], and ground vehicles [26], are designed and deployed for en vironmental monitoring in order to replace the con ventional method that deploys static sensors to areas of interest [16]. Particularly , the autonomous underwater vehicles (A UVs) such as marine gliders are becoming popular due to their long-range (hundreds of kilometers) and long-term (weeks e ven months) monitoring capabilities [10, 14, 19]. W e are interested in the problem of collecting data about a scalar ﬁeld of important en vironmental attributes such as temperature, salinity , or chlorophyll content of the ocean, and learn a model to best describe the environment (i.e., le vels or contents of the chosen attribute at e very spot in the entire ﬁeld). Howe ver , the unknown environmental phenomena that we are interested in can be non-stationary [18]. Fig. 1 shows the v ariations of salinity data in the Southern California Bight region generated by the Regional Ocean Modeling Systems (R OMS) [23]. In order to provide a good estimate of the state of the en vironment and maintain the prediction model at any time, the environmental sensing (information gathering) needs to be carried out persistently to catch up to possible variations [13]. The authors are with the Department of Computer Science at the Univer - sity of Southern California, Los Angeles, CA 90089, USA. { kaichiem, lantao.liu, gaurav } @usc.edu (a) Aug 15, 2016 (b) Aug 22, 2016 (c) Aug 29, 2016 Fig. 1. Ocean salinity data in the Southern California Bight region generated by the Regional Ocean Modeling Systems (R OMS) [23]. Color indicates levels of salinity content. W e aim at estimating the current state of the en vironment and providing a nowcast (not forecast or hindcast) of the en vironment, via navig ating the robots to collect the infor- mation. T o model spatial phenomena, a common approach is to use a rich class of Gaussian Processes [18, 22, 24] in spatial statistics. In this work, we also employ this broadly- adopted approach to build and learn an underlying model of interest. Still, there are challenges: • The ﬁrst challenge lies in the model learning with the most useful sensing inputs, i.e., we wish to seek for the samples that best describe the environment. Navigating the robot to obtain such samples is called informative planning [2]. In this work, we utilize the mutual information between visited locations and the remainder of the space to characterize the amount of information (information gain) collected. • The second challenge is the relaxation of the pro- hibitiv e computational cost for the model prediction. The most accurate way to estimate a latent model is to use all historical sensing data. Howe ver , since the en vironmental monitoring task can be long-range and long-term, the data size continuously gro ws until it “explodes”. Consequently , an efﬁcient estimator will need to dynamically select only the most information- rich data while abandoning the samples that are less informativ ely novel. Planning and en vironment monitoring are two big and well studied topics. Here we brieﬂy re view the works that are related to the informativ e planning and the model predic- tion with sparse GPs. Representative informativ e planning approaches include, for example, algorithms based on a recursiv e-greedy style [13, 24] where the informativeness is generalized as submodular functions and a sequential- allocation mechanism is designed in order to obtain subse- quent waypoints. This recursive-greedy framew ork has been extended later by incorporating obstacle av oidance [1] and diminishing returns [2]. In addition, a differential entropy based frame work [4, 11] was proposed where a batch of waypoints can be obtained through dynamic programming. W e recently proposed a similar informative planning method based on the dynamic programming structure in order to compute the informati ve waypoints [12]. This method is further extended here as an adaptive path planning compo- nent by incorporating the online learning and re-planning mechanisms. There are also many methods optimizing over complex deterministic and static information (e.g., see [25, 28]). A critical problem for the persistent (long-term e ven life-long) tasks that one must consider is the large-scale accumulated data. Although af ﬂuent data might predict the most accurate model, in practice a huge amount of data are very likely to exceed the capacity of onboard computational hardware. Methods for reducing the computing burdens of GPs hav e been proposed. For example, GP regressions can be done in a real-time fashion where the problem can be estimated locally with local data [15]. Another representativ e framew ork is a sparse representations of the GP model [5] which is based on a combination of a Bayesian online algorithm together with a sequential construction of the most relev ant subset of the data. This method allows the model to be reﬁned in a recursi ve way as the data streams in. The framew ork has been further extended to many application domains such as visual tracking [21]. W e propose an informativ e planning and online learning approach for the long-term en vironmental monitoring. The objectiv e is to construct an estimated model by na vigating the robot to the most informativ e regions to collect data with the greatest information. Our method integrates the sparse vari- ant of GPs so that both the model and the hyperparameters can be improv ed online with dynamic but a ﬁxed size of data. Then the ameliorated en vironment model is in turn used to improv e the planning component at appropriate re-planning moments. W e conducted simulation on ocean temperature data and the results show that the predicted model can very well match the patterns of the ground truth model. I I . P R E L I M I N A R I E S In this section, we brieﬂy present the preliminary back- ground for the GP-based environmental modeling. A. Gaussian Process Re gr ession on Spatial Data A GP is deﬁned as a collection of random variables where any ﬁnite number of which ha ve a joint Gaussian distribu- tion. GP’ s prediction behavior is determined by the prior cov ariance function (also known as kernel ) and the training points. The prior co variance function describes the relation between two independent data points and it typically comes with some free hyperparameters to control the relation. Formally , let X be the set of n training points associated with target values, y , and let X ∗ be the testing points. The predictiv e equations of the GP regression can be summarized as: f ∗ | X, y , X ∗ ∼ N ( ¯ f ∗ , cov ( f ∗ )) ¯ f ∗ , E [ f ∗ | X, y , X ∗ ] = K ( X ∗ , X ) K ( X , X ) − 1 y cov ( f ∗ ) = K ( X ∗ , X ∗ ) − K ( X ∗ , X ) K ( X , X ) − 1 K ( X , X ∗ ) (1) where K ( · , · ) denotes a cov ariance matrix. For example, K ( X , X ∗ ) is ev aluated by a pre-selected kernel function for all pairwise data points in X and X ∗ . A widely adopted choice of kernel function for spatial data is the squar ed exponential automatic rele vance determination function: k ( x , x 0 ) = σ 2 f exp( − 1 2 ( x − x 0 ) T M ( x − x 0 )) + σ 2 n δ xx 0 (2) where M = diag ( l ) − 2 . The parameters l are the length- scales in each dimension of x and determine the level of correlation (each l i models the degree of smoothness in the spatial v ariation of the measurements in the i th dimension of the feature vector x ). σ 2 f and σ 2 n denote the variances of the signal and noise, respectiv ely . δ xx 0 is the Kronecker delta function which is 1 if x = x 0 and zero otherwise. B. Estimation of Hyperparameters Using T r aining Data Let θ , { σ 2 n , σ 2 f , l } be the set of hyperparameters in the kernel function. W e are interested in estimating these hyperparameters so that the kernel function can describe the underlying phenomena as accurate as possible. A common approach to learning the set of hyperparameters is via maximum likelihood estimation combined with k -fold cross- validation (CV) [22]. An extreme case of the k -fold cross- validation is when k = n , the number of training points, also known as lea ve-one-out cross-validation (LOO-CV). Mathematically , the log-likelihood when leaving out training case i is log p ( y i | X, y − i , θ ) = − 1 2 log σ 2 i − ( y i − µ i ) 2 2 σ 2 i − 1 2 log(2 π ) (3) where y − i denotes all targets in the training set except the one with index i , and µ i and σ 2 i are calculated according to Eq. (1). The log-likelihood of LOO is therefore L LOO ( X, y , θ ) = n X i =1 log p ( y i | X, y − i , θ ) . (4) Notice that in each of the | y | LOO-CV iterations, a matrix in verse, K − 1 , is needed, which is costly if computed re- peatedly . This can actually be computed efﬁciently from the in verse of the complete covariance matrix using in version by partitioning [20]. The resulting predicti ve mean and variance can then be formulated as µ i = y i − [ K − 1 y ] i / [ K − 1 ] ii σ 2 i = 1 / [ K − 1 ] ii (5) T o obtain the optimal values of hyperparameters θ , we can compute the partial deriv ativ es of L LOO and use the conju- gate gradient optimization techniques. The partial deriv ati ves of L LOO is ∂ L LOO ∂ θ j = n X i =1 1 [ K − 1 ] ii  α i [ Z j α ] i − 1 2 (1 + α 2 i [ K − 1 ] ii )[ Z j K − 1 ] ii  , (6) where α = K − 1 y and Z j = K − 1 ∂ K ∂ θ j . W ith the standard gradient descent method, we update each θ j iterativ ely: θ ( t +1) j = θ ( t ) j + η ∂ L LOO ∂ θ ( t ) j , (7) where η is the learning rate. I I I . T E C H N I C A L A P P R OAC H As aforementioned, one limitation of GPs for the long- term mission is the memory requirement for large (possibly inﬁnite) training sets. In our system, we borrow the idea of Sparse Online Gaussian Process (SOGP) [5] to overcome this limitation. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a rele vant subsampling of the data which best describes a latent model. A. Online Learning with Gaussian Pr ocesses Giv en a prior GP ˆ p t ( f ) at time t , when a new data point ( x t +1 , y t +1 ) at time t + 1 comes in, it’ s incorporated by performing a Bayesian update to yield a posterior . p post ( f ) = p ( y t +1 | f ) ˆ p t ( f ) E ˆ p t ( f ) [ p ( y t +1 | f D )] , (8) where f = [ f ( x 1 ) , . . . , f ( x M )] T denotes a set of function values, and f D ⊆ f where f D is the set of f ( x i ) = f i with x i in the training set. In general, p post ( f ) is no longer Gaussian unless the likelihood itself is also Gaussian. There- fore, p post ( f ) is projected onto the closest GP , ˆ p t +1 where ˆ p t +1 = arg min ˆ p KL ( p post ( f ) || ˆ p ) . (KL is the Kullback- Leibler div ergence that is used to measure the difference between two probability distrib utions.) It is shown in [17] that the projection results in a good matching of the ﬁrst two moments (mean and cov ariance) of p post and the new Gaussian posterior ˆ p t +1 . By following the lemma of [6], we arriv e at the parametrization for the approximate posterior GP at time t as a function of the kernel and likelihoods (natural parametrization): ¯ f ∗ = t X i =1 k ( x ∗ , x i ) α t ( i ) = α T t k x ∗ ,t v ar ( f ∗ ) = k ( x ∗ , x ∗ ) + t X i,j =1 k ( x ∗ , x i )[ C t ] ij k ( x j , x ∗ ) = k ( x ∗ , x ∗ ) + k T x ∗ ,t C t k x ∗ ,t (9) where k x ∗ ,t = [ k ( x 1 , x ∗ ) , . . . , k ( x t , x ∗ )] T , and α t and C t are updated using α t = T t ( α t − 1 ) + q t s t C t = U t ( C t − 1 ) + r t s t s T t s t = T t ( C t − 1 k x ∗ ,t ) + e t q t = ∂ ∂ E ˆ p t − 1 ( f ) [ f t ] log E ˆ p t − 1 ( f ) [ p ( y t | f t )] r t = ∂ 2 ∂ E ˆ p t − 1 ( f ) [ f t ] 2 log E ˆ p t − 1 ( f ) [ p ( y t | f t )] (10) where e t is the t -th unit vector . The operator T t ( U t ) is deﬁned to extend a t − 1 -dimensional vector (matrix) to a t - dimensional one by appending zero at the end of the vector (zeros at the last ro w and column of the matrix). For the regression with Gaussian noise (variance σ 2 0 ), The expected likelihood is a normal distribution with mean ¯ f ∗ and variance v ar ( f ∗ )+ σ 2 0 . Hence, the logarithm of the expected lik elihood is: log E ˆ p t − 1 ( f ) [ p ( y t | f t )] = − 1 2 log[2 π ( v ar ( f ∗ ) + σ 2 0 )] − ( y t − ¯ f ∗ ) 2 2( v ar ( f ∗ ) + σ 2 0 ) , (11) and the ﬁrst and second deriv ati ves with respect to the mean ¯ f ∗ giv e the scalars q t and r t are q t = y t − ¯ f ∗ v ar ( f ∗ ) + σ 2 0 , r t = − 1 v ar ( f ∗ ) + σ 2 0 . (12) B. Sparseness in Gaussian Processes T o prev ent the unbounded growth of memory requirement due to the increase of data, it is necessary to limit the number of the training points which are stored in a basis vector set (BV -set) , while preserving the predictiv e accuracy of the model. This is done in two different stages. First, when the ne w training point ( x t +1 , y t +1 ) at time t + 1 arriv es, we calculate the squared norm of the “residual vector” from the projection in the space spanned by the current BV -set. Let the quantity be γ t +1 , speciﬁcally , γ t +1 = k ( x t +1 , x t +1 ) − k T x t +1 ,t Q t k x t +1 ,t , (13) where Q t = K ( X t , X t ) − 1 is the inv ersion of the full kernel matrix. The costly matrix in version can be alleviated via the following equations: Q t = U t ( Q t − 1 ) + γ − 1 t ( T t ( ˆ e t ) − e t )( T t ( ˆ e t ) − e t ) T ˆ e t = Q − 1 t − 1 k x t ,t − 1 (14) Essentially , γ t +1 can also be thought as a form of “novelty” for the new training point ( x t +1 , y t +1 ) . Therefore it’ s in- cluded in BV -set only if it e xceeds some predeﬁned threshold ω . Otherwise, only an update of ˆ s t +1 is necessary . ˆ s t +1 = C t k x t +1 ,t + ˆ e t +1 (15) Second, when the size of BV -set exceeds the memory limit (or any pre-deﬁned limit), m , a score measure is used to pick out the lowest one and remove it from the existing BV -set. Formally , let  i be the scoring function for the i th element in the BV -set. It’ s a measure of change on the e xpected posterior mean of a sample due to sparse approximation [6].  i = | [ α t +1 ] i | [ Q t +1 ] ii (16) Assume the j th element in BV -set is the one with the lowest  , the removal of any element requires a re-update of parameters α t +1 , C t +1 and Q t +1 . ˆ α t +1 = α ( t ) − α j Q j q j ˆ C t +1 = C ( t ) + c j Q j Q j T q j 2 − 1 q j [ Q j C j T + C j Q j T ] ˆ Q t +1 = Q ( t ) − Q j Q j T q j , (17) where C ( t ) is the resized matrix by removing the j th column and the j th row from C t +1 , C j is the j th column of C t +1 excluding the j th element and c j = [ C t +1 ] j j . Similar operations apply for Q ( t ) , Q j , q j , α ( t ) , and α j . C. En vir onment Representation & Informative Sampling Lo- cations T o facilitate the computation of future informative sam- pling locations, we discretize the en vironment into a grid map where each grid represents a possible sampling spot. The mean and variance of the measurement at each grid can be predicted via the SOGP model. W e use the mutual information between the visited locations and the remainder of the space to characterize the amount of information (in- formation gain) collected. Formally , the mutual information between two sets of sampling spots, A , B can be ev aluated as: I ( Z A ; Z B ) = I ( Z B ; Z A ) = H ( Z A ) − H ( Z A | Z B ) . (18) The entropy H ( Z A ) and conditional entropy H ( Z A | Z B ) can be calculated by H ( Z A ) = 1 2 log  (2 π e ) k | Σ AA |  H ( Z A | Z B ) = 1 2 log  (2 π e ) k | Σ A | B |  (19) where k is the size of A . The cov ariance matrix Σ AA and Σ A | B can essentially be calculated from the posterior GP described in Eq. (9). T o compute the future sampling spots, let X denote the entire sampling space (all grids), and Z X be measurements for data points in X . The objectiv e is to ﬁnd a subset of sampling points, P ⊂ X with a size | P | = n , which gives us the most information for predicting our model. This is equiv alent to the problem of ﬁnding new sampling points in the un-sampled space that maximize the mutual information between sampled locations and un-sampled part of the map. The optimal subset of sampling points, P ∗ , with maximal mutual information is P ∗ = arg max P ∈X I ( Z P ; Z X \ P ) (20) where X represents all possible combinatorial sets, each of which is of size n . P ∗ can be computed efﬁciently using a dynamic programming (DP) scheme [12]. Here is the basic idea: Let x i ∈ X denote an arbitrary sampling point at DP stage i and x a : b represent a sequence of sampling points from stage a to stage b . The mutual information between the desired sampling points (which ev entually form P ) and the remaining map can then be written as I ( Z x 1: n ; Z X \{ x 1: n } ) , which can be approximated as follows: I ( Z x 1: n ; Z X \{ x 1: n } ) ≈ I ( Z x 1 ; Z X \{ x 1 } ) + n X i =2 I ( Z x i ; Z X \{ x 1 ,..., x i } | Z x 1: i − 1 ) , (21) Eq. (21) can be expressed in a recursiv e form, i.e. for stages i = 2 , . . . , n , the value V i ( x i ) of x i is: V i ( x i ) = max x i ∈ X \{ x 1 ,..., x i − 1 } I ( Z x i ; Z X \{ x 1 ,..., x i } | Z x 1: i − 1 ) + V i − 1 ( x i − 1 ) , with a recursion base case V 1 ( x 1 ) = I ( Z x 1 ; Z X \{ x 1 } ) . Then with the optimal solution in the last stage, x ∗ n = arg max x n ∈ X V n ( x n ) , we can backtrace all optimal sam- pling points until the ﬁrst stage x ∗ 1 , and obtain P ∗ = { x ∗ 1 , x ∗ 2 , . . . , x ∗ n } . Note that, the informativeness maximization procedure only outputs batches of sampling points, but does not con vey any information of “a path” which is a sequence of or- dered waypoints. Therefore, these sampling points are post- processed with a customized Tra velling Salesman Problem (TSP) [9] solver to generate a shortest path but without returning to the starting point (by setting all edges that return to the starting point with 0 cost). W e then route the robot along the path from its initial location to visit the remaining path waypoints. D. Informative Planning and Online Learning F rame work For dynamic en vironment, the prediction accuracy of GP degrades as time elapses because it does not incorporate the temporal variation of the en vironment. T o address this issue, we re-estimate the hyperparameters repetiti vely at appropriate moments. The re-estimate triggering mechanism depends on two factors: • The ﬁrst factor stems from the computational concern. Since any re-estimate will be immediately followed by a re-planning of the future routing path, and because the computation time for the path planning is much more costly than that of the hyperparameter re-estimate. Thus, an appropriate frequency for the simultaneous re-estimate and re-planning needs to be determined to match the computational constraint. • The second factor relates to the intensity of spatiotem- poral variations. Since the kernel function that describes (a) (b) Fig. 2. (a) Salinity data obtained from ROMS. It is treated as a ground truth throughout the paper; (b) The predicted model using GP without data-driven hyperparameter optimization. two points’ spatial relation is an indicator of a GP’ s prediction capacity , thus the repetitive hyperparameter re-estimates of the kernel function should reﬂect the variation intensity of the environment. In our implementation, we use a measure, ρ ∈ [0 , 1] , to decide the moment for triggering the re-estimate and re- planning processes. The measure ρ represents the proportion of samples that are recently added to the current BV -set since last re-estimate. The hyperparameter re-estimate and path re- planning are carried out if ρ is abov e certain pre-deﬁned threshold, ρ 0 . Roughly , ρ 0 can be deﬁned to be in versely proportional to the computational power and the intensity of en vironmental variation, and the higher the threshold, the less frequent the re-estimate. The whole informati ve planning and online learning framew ork is pseudo-coded in Alg. 1. Algorithm 1: Informativ e Planning and Online Learning 1 Initialize SOGP 2 while true do 3 ρ = 0 /* for hyperparameter re-estimate */ 4 Calculate sampling spots as described in III-C 5 Use Tra velling Salesman Problem (TSP) solver to generate a routing path, P 6 for each point p ∈ P do 7 Do sampling on p to get a scalar value v 8 Use ( p , v ) as a training point to update SOGP described in III-A and III-B 9 if ( p , v ) r eplaces some sample in the BV -set then 10 Increase ρ 11 if ρ > ρ 0 then 12 Do hyperparameter re-estimate described in II-B 13 break I V . E X P E R I M E N TA L R E S U LT S W e validated our method in the scenario of ocean monitor- ing. The simulation en vironment was constructed as a two di- mensional ocean surface and we tessellated the en vironment into a grid map. Our method applies for any en vironmental phenomena. In our experiments, we use salinity data recently (a) (b) Fig. 3. Informativ e sampling spots before post-processed as paths. (a) Results under hyperparameters empirically set: { σ 2 n = exp( − 2) , σ 2 f = exp(2) , l x = exp(1) , l y = exp(1) } ; (b) Results under hyperparameters learned from data collected: { σ 2 n = exp( − 4 . 6) , σ 2 f = exp(6 . 8) , l x = exp(3 . 4) , l y = exp(3 . 2) } . observed in the Southern California Bight region. The data is obtained from R OMS [23]. Fig. 2(a) sho ws the salinity data as a scalar ﬁeld (the black regions represent lands while the gray areas denote ocean), which is used as the ground truth for comparison. W e implemented a sparse online variant of GP (SOGP) built upon the open-source library libgp [3]. A careful down- sampling of R OMS data to a desired resolution is performed to alleviate the computational cost for generating informative sampling locations. The resolution of the grid map is 351 × 391 , whereas the resolution for the sampling spots generation (path planning) is 12 × 12 . First, we show the predictive accuracy using un-tuned hyperparameters, i.e., hyperparameter values are set em- pirically/manually instead of data-driven. Fig. 2(b) sho ws the prediction result with 50 prior random samples and manually set hyperparameters θ = { σ 2 n = exp( − 2) , σ 2 f = exp(2) , l x = exp(1) , l y = exp(1) } . W e can observe that the prediction does not match well with the ground truth (see the area circled in red). Then, we in vestigate and compare the generated informati ve sampling points under empirical and data-driv en hyperparameters. Fig. 3(a) and 3(b) show results of manually-set and data-driven hyperparameters, respectiv ely . W e can see that the relativ e distances among points (and the covered areas) in Fig. 3(b) are larger than those in Fig. 3(a). This is mainly affected by l , which controls the pairwise spatial correlations. The process of the long-term informative planning and online learning is demonstrated in Fig. 4. Each sub-ﬁgure depicts an informativ e path after each hyperparameter re- estimate. The red and blue points stand for the robot’ s current starting position and the informativ e sampling locations, respectiv ely; the yello w dots represent the points stored in the SOGP BV -set. The robot launched from a shore location (79 , 236) and performed the sampling operations at each time step along the planned path. W e emulated the memory limit by setting the maximum size of the BV -set as m = 100 . The threshold is set as ρ 0 = 0 . 6 . The distribution patterns of the yellow dots in Fig. 4(a) to 4(f) reveal the sparseness of BV -set, indicating that as the robot gradually explores the whole map, the BV -set only stores those points that are the most useful for predicting the model. The corresponding (a) (b) (c) (d) (e) (f) Fig. 4. (a)-(f) Informative paths resulted from subsequent re-plannings. The red and blue points represent the robot’ s starting locations and the informati ve sampling spots, respectively . The robot initially launched at (79, 236). The yellow dots denote the points stored in the SOGP BV -set. (a) (b) (c) (d) (e) (f) Fig. 5. (a)-(f) The learned en vironment models. Each corresponds to a step in Fig. 4. (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig. 6. Maps of prediction variances. (a)-(i) V ariances reduce as the robot follo ws planned path and collects data samples. (j) The ﬁnal variance map that corresponds to the moments in Fig. 4(f) and Fig. 5(f). prediction maps are shown in Fig. 5, from which we can see that the constructed models constantly con ver ge to the ground truth and are able to characterize the general patterns of the en vironment in the ﬁnal stages. Finally , we in vestigate the variances of our predictions. W e create a variance map on which each v alue records the variance of a spot on the grid map. Fig. 6 illustrates a series of variance maps along the sampling operations. W e can see that the map gradually “falls tow ards the ground”, indicating a decrease of predication variances along the robot’ s exploration. Lastly , Fig. 7 shows plots of mean squared errors (MSEs) comparing with the ground truth. W e use different thresholds ρ 0 and different launch locations to do the statistics. The x - axis corresponds to the total number of sampling operations, which is roughly proportional to the travel time (or distance). The y -axis is the MSE calculated with the whole map as a testing set. The ﬁgure reveals that, in general e very setting follows a descending trend (reducing error) along the cov erage of the planned informative regions. By comparing results of different thresholds ρ 0 , we can observe that there are more error ﬂuctuations for low ρ 0 values. A possible reason is that, if the explored regions are not yet well cov ered, the hyperparameter re-estimate might optimize only among some local regions rather than the entire map, causing a loss of generality and an overﬁtting problem. V . C O N C L U S I O N S En vironmental monitoring entails persistent presence by robots. This suggests that both planning and learning are likely to constitute critical components of any robotic system built for monitoring. In this paper, we present an informa- tiv e planning and online learning method that enables an autonomous marine vehicle to effecti vely perform persistent (a) (b) Fig. 7. The MSE plots under different launching locations { (79 , 236) , (207 , 68) } and thresholds ρ 0 = { 0 . 1 , 0 . 2 , . . . , 1 . 0 } . The y - axis is the MSE value while the x -axis is the total number of sampling operations. ocean monitoring tasks. Our proposed framework iterates between a planning component that is designed to collect data with the richest information content, and a sparse Gaus- sian Process learning component where the en vironmental model and hyperparameters are learned online by selecting and utilizing only a subset of data that makes the greatest contribution. W e conducted simulations with ocean salinity data; the results sho w a good match between the predicted model and the ground truth, with con ver ging decreases of both prediction errors and map variances. R E F E R E N C E S [1] J. Binney , A. Krause, and G. S. Sukhatme. Informative path planning for an autonomous underwater vehicle. In International Confer ence on Robotics and Automation , pages 4791–4796, 2010. [2] J. Binney , A. Krause, and G. S. Sukhatme. Optimizing waypoints for monitoring spatiotemporal phenomena. International Journal on Robotics Researc h (IJRR) , 32(8):873–888, 2013. [3] M. Blum. libgp. https://github.com/mblum/libgp . [4] N. Cao, K. H. Low , and J. M. Dolan. Multi-robot informative path planning for active sensing of environmental phenomena: A tale of two algorithms. In Pr oceedings of the 2013 International Confer ence on Autonomous Agents and Multi-agent Systems , pages 7–14, 2013. [5] L. Csat ´ o and M. Opper . Sparse on-line gaussian processes. Neural computation , 14(3):641–668, 2002. [6] L. Csat. Sparse on-line gaussian processes. Neural Computation . [7] M. Dunbabin and L. Marques. Robotics for environmental monitoring: Signiﬁcant advancements and applications. IEEE Robot. Autom. Mag. , 19(1):24 – 39, 2012. [8] E. Fiorelli, P . Bhatta, and N. E. Leonard. Adaptive sampling using feedback control of an autonomous underwater glider ﬂeet. In Proc. 13th Int. Symposium on Unmanned Untether ed Submersible T ec h , pages 1–16, 2003. [9] G. Laporte. The traveling salesman problem: An overvie w of exact and approximate algorithms. Eur opean Journal of Operational Research , 59(2):231 – 247, 1992. [10] N. E. Leonard, D. A. Paley , R. E. Davis, D. M. Fratantoni, F . Lekien, and F . Zhang. Coordinated control of an underwater glider ﬂeet in an adaptiv e ocean sampling ﬁeld experiment in monterey bay . Journal of F ield Robotics , 27(6):718–740, 2010. [11] K. H. Low . Multi-r obot Adaptive Exploration and Mapping for En vir onmental Sensing Applications . PhD thesis, Carne gie Mellon Univ ersity , Pittsburgh, P A, USA, 2009. [12] K.-C. Ma, L. Liu, and G. S. Sukhatme. An information-driven and disturbance-aware planning method for long-term ocean monitoring. In IEEE/RSJ International Conference on Intelligent Robots and Systems , 2016. [13] A. Meliou, A. Krause, C. Guestrin, and J. M. Hellerstein. Nonmyopic informativ e path planning in spatio-temporal models. In Pr oceedings of National Conference on Artiﬁcial Intelligence (AAAI) , pages 602– 607, 2007. [14] T . Miles, S. H. Lee, A. Whlin, H. K. Ha, T . W . Kim, K. M. Assmann, and O. Schoﬁeld. Glider observations of the dotson ice shelf outﬂo w . Deep Sea Research P art II: T opical Studies in Oceanography , 2015. [15] D. Nguyen-tuong and J. Peters. Local gaussian process regression for real time online model learning and control. In In Advances in Neural Information Processing Systems 22 (NIPS , 2008. [16] L. M. Oliveira and J. J. Rodrigues. W ireless sensor networks: a survey on en vironmental monitoring. J ournal of Communications , 6:143–151, 2011. [17] M. Opper . On-line learning in neural networks. chapter A Bayesian Approach to On-line Learning, pages 363–378. Cambridge University Press, New Y ork, NY , USA, 1998. [18] R. Ouyang, K. H. Low , J. Chen, and P . Jaillet. Multi-robot activ e sensing of non-stationary gaussian process-based environmental phe- nomena. In Proceedings of the 2014 International Confer ence on Autonomous Agents and Multi-agent Systems , pages 573–580, 2014. [19] D. A. Paley , F . Zhang, D. M. Fratantoni, and N. E. Leonard. Glider control for ocean sampling: The glider coordinated control system. IEEE T r ansaction on Contr ol System T ec hnology , 16(4):735–744, 2008. [20] W . H. Press, S. A. T eukolsk y , W . T . V etterling, and B. P . Flannery . Numerical recipes in C , volume 2. Cambridge university press Cambridge, 1996. [21] A. Ranganathan, M.-H. Y ang, and J. Ho. Online Sparse Gaussian Process Regression and Its Applications. IEEE T ransactions on Image Pr ocessing , 20(2):391–404, Feb . 2011. [22] C. E. Rasmussen and C. K. I. W illiams. Gaussian Pr ocesses for Machine Learning . The MIT Press, 2005. [23] A. F . Shchepetkin and J. C. McWilliams. The regional oceanic modeling system (R OMS): a split-explicit, free-surface, topography- following-coordinate oceanic model. Ocean Modelling , 9(4):347–404, 2005. [24] A. Singh, A. Krause, C. Guestrin, W . Kaiser, and M. Batalin. Ef ﬁcient planning of informative paths for multiple robots. In Proceedings of the 20th International Joint Conference on Artiﬁcal Intelligence , IJCAI’07, pages 2204–2211, 2007. [25] D. E. Soltero, M. Schwager , and D. Rus. Generating informativ e paths for persistent sensing in unkno wn environments. In IR OS , pages 2172–2179, 2012. [26] M. T rincav elli, M. Re ggente, S. Coradeschi, A. Loutﬁ, H. Ishida, and A. J. Lilienthal. T o wards en vironmental monitoring with mobile robots. In IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 2210–2215, 2008. [27] A. C. W atts, V . G. Ambrosia, and E. A. Hinkley . Unmanned Aircraft Systems in Remote Sensing and Scientiﬁc Research: Classiﬁcation and Considerations of Use. Remote Sensing , 4(6):1671–1692, June 2012. [28] J. Y u, M. Schwager , and D. Rus. Correlated orienteering problem and its application to informativ e path planning for persistent monitoring tasks. In IEEE/RSJ International Conference on Intelligent Robots and Systems , 2014.

Informative Planning and Online Learning with Sparse Gaussian Processes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment