ContraMap: Contrastive Uncertainty Mapping for Robot Environment Representation

ContraMap: Contrastiv e Uncertainty Mapping f or Robot En vir onment Repr esentation Chi Cuong Le W eiming Zhi Abstract — Reliable robot perception requires not only pre- dicting scene structure, but also identifying where predictions should be tr eated as unreliable due to sparse or missing observations. W e present ContraMap, a contrastive continuous mapping method that augments kernel-based discriminati ve maps with an explicit uncertainty class trained using synthetic noise samples. This formulation treats unobserved regions as a contrastive class, enabling joint envir onment prediction and spatial uncertainty estimation in real time without Bayesian inference. Under a simple mixture-model view , we show that the probability assigned to the uncertainty class is a monotonic function of a distance-aware uncertainty surr ogate. Experi- ments in 2D occupancy mapping, 3D semantic mapping, and tabletop scene reconstruction show that ContraMap preserves mapping quality , produces spatially coher ent uncertainty esti- mates, and is substantially mor e efﬁcient than Bay esian kernel- map baselines. I . I N T RO D U C T I O N Reliable robot operation depends on scene representations that are both c ontinuous and uncertainty-aware. Classical grid-based methods, such as occupancy grids [1], discretise space into ﬁxed cells, which limits spatial resolution and scalability . Modern continuous formulations instead model occupancy or semantics directly over continuous space using Gaussian Processes or kernel-feature methods [2], [3]. These representations support high-ﬁdelity mapping and ef ﬁcient spatial querying, but they typically either omit uncertainty or rely on Bayesian inference, whose computational cost can become prohibitiv e for large-scale or real-time deployment. The core challenge is not only to predict occupancy or semantics continuously , but also to represent wher e the map should abstain fr om conﬁdence in sparsely observed or unobserved space, without incurring the cost of posterior inference. This is particularly important in robotics, where ov erconﬁdent predictions in occluded or unexplored regions can degrade downstream planning and decision-making. The quality of the map along with estimates of it’ s uncertainty can be leveraged for further do wnstream motion generation [4], [5], [6]. Here, W e introduce ContraMap , Contr astive Uncer- tainty Mapping , a continuous kernel-based mapping method that models unobserv ed space as an e xplicit uncertainty class. ContraMap augments a standard softmax mapping model with synthetic contrasti ve samples labelled as “uncertain”, enabling the model to jointly predict en vironment structure and a spatial uncertainty score. In occupancy mapping, this captures the transition between free, occupied, and unob- served space; in semantic mapping, it highlights ambiguous W . Zhi is with the School of Computer Science and the Australian Centre for Robotics, Univ ersity of Sydney , Australia. Occluded/High uncertainty Scene Input point cloud Output ContraMap Fig. 1: For robust robot operation, scene representations should pro- vide not only spatially consistent mapping b ut also a measure of un- certainty across the environment. ContraMap augments continuous classiﬁcation-based mapping with an additional uncertainty output, enabling joint en vironment representation and direct uncertainty prediction at any queried location. The model reconstructs scene structure while assigning high uncertainty to occluded or weakly observed regions, such as the space behind the table. or weakly supported regions around objects. The method re- tains the linear-time optimisation and inference ef ﬁciency of discriminativ e kernel maps while av oiding Bayesian posterior inference. The main contributions of this work are: 1) A contrastiv e formulation for continuous uncertainty- aware mapping that represents unobserved regions as an explicit uncertainty class within a kernel-based discriminativ e map. 2) A theoretical characterisation showing that, under a simple mixture-model assumption, the probability as- signed to the uncertainty class is a monotonic function of a distance-aware uncertainty surrogate. 3) Experimental validation across 2D occupancy map- ping, 3D semantic mapping, and tabletop scene re- construction showing competitive mapping quality and substantial ef ﬁciency gains over Bayesian kernel-map baselines. The paper is organized as follows. Section II discusses re- lated previous studies. In Section III, we revie w the Bayesian methods used in representing different kind of environments, and re view about noise contrastiv e estimation, which is an important cornerstone of our approach. Next, Section IV present an o vervie w of the proposed method. After that, we present theoretical and empirical insight into our proposed approach. Our experiments demonstrate the effecti veness of proposed strategy are shown in Section VI. Finally , Section VII concludes our paper . I I . R E L A T E D W O R K S Continuous Envir onment Mapping: Continuous en vi- ronment representations ha ve e volv ed from Gaussian Pr ocess Occupancy Maps (GPOM) [2], which provide principled Fig. 2: Predicti ve uncertainty for a Gaussian Process (GP) and neural-network baselines on three toy datasets. Orange / Magenta points are in-distribution training samples, and Red points are out-of-distribution samples. Background shading indicates relati ve uncertainty (brighter = higher, darker = lower). Our method best matches the GP “gold standard”: it stays conﬁdent near observed data and becomes uncertain as inputs mo ve away from the training distrib ution. Bayesian spatial reasoning but suffer from cubic computa- tional complexity , limiting its scalability in large-scale or 3D en vironments. T o address these inef ﬁciencies, Hilbert Maps [3], [7] lev erage kernel approximations with a simple soft- max classiﬁer to construct continuous maps in linear time, while Bayesian Hilbert Maps (BHMs) [8], [9] further enable uncertainty estimation by maintaining posterior distributions ov er model parameters through variational inference. Re- cently , V -PRISM [10] introduced a multiclass formulation for mapping tabletop scenes, jointly modeling semantic segmen- tation, and predictiv e uncertainty . While Hilbert Maps based methods achieve high accuracy in occupancy and semantic prediction, they tend to produce overconﬁdent estimates in sparsely observed or entirely unobserved regions, which undermines reliability during autonomous operation. On the other hand, Bayesian formulations [8], [10], [11] mitigate this issue by modeling uncertainty through posterior vari- ance, b ut introducing extensi ve inference and optimization costs again, which can hinder their real-time deployment in large-scale en vironments. Uncertainty Estimation in Neural Networks: Quantify- ing predicti ve uncertainty is vital for the safe deployment of learning-based systems in real-world scenarios. Spectral- normalized Neural Gaussian Pr ocesses (SNGP) [12] encour- ages predicti ve uncertainty to increase as inputs mo ve a way from the training data by bounding hidden-layer sensitiv- ity via spectral normalization. Monte Carlo Dr opout (MC Dropout) [13] interprets stochastic dropout at inference time as an approximate Bayesian inference mechanism, while Deep Ensembles [14] provide a strong non-Bayesian alter - nativ e by aggregating multiple independently trained models to capture predicti ve ambiguity . Howe ver , as illustrated in Figure 2, many of these methods do not consistently produce reliable uncertainty estimation. In particular , despite explicit regularization for distance aw areness, SNGP can still exhibit high-conﬁdence predictions in regions far from the observed data. Out-of-Distribution Detection: Out-of-Distribution (OOD) detection aims to identify inputs that are different from the training distribution, which is a critical safety concern for deep neural networks (DNNs). Because DNNs are typically trained under a closed-set assumption (test samples follow the same distribution as training data), exposure to distrib ution shifts in real-world en vironments can lead to unreliable and overconﬁdent predictions [15], [16]. T o address this issue, prior works has explored classiﬁcation- with-rejection and open-set recognition frame works, which explicitly model the presence of unknown or ambiguous inputs through the rejection rule [17], [18]. Other methods hav e e xplored looking at dif fusion models to detect OOD samples [19], [20]. In the context of en vironment mapping, we observe that unobserved spatial regions naturally play a role similar to OOD inputs. Our method explicitly samples noise to represent these unobserved re gions and treats them as an additional ”uncertain” class during training. This formulation aligns with classiﬁcation-with-rejection principles, enabling the model to allocate probability mass to out-of-distribution class instead of forcing overconﬁdent assignments to known classes. Furthermore, building on this perspectiv e, we theoretically sho w that the probability assigned to the ”uncertain” class is a monotonic function of a distance-aware uncertainty measure. I I I . P R E L I M I NA R I E S Modern robotic mapping methods represent spatial struc- ture as a continuous function rather than a ﬁxed grid, allowing occupancy or semantic information to be queried at any location in space. Given training samples X = ( x i , y i ) N i =1 , where x i ∈ R d is a spatial coordinate and y i denotes occupancy or class labels, these models predict the T raining data setup Negative sampling Adding noise Feature vector projection Model training Simple Softmax Classifier Environment Mapping Uncertainty Estimation Results Observed data (C+1)-th class Fig. 3: Overvie w of ContraMap . Our method employs a softmax classiﬁer network with an additional output node to jointly represent the en vironment and estimate uncertainty . Observed data (e.g., LiDAR scan or segmented point cloud) are augmented with negati ve samples and noise, where the noise is labeled as an additional ( C + 1) -th or ”uncertain” class. The augmented data are then projected into a feature vector space using a set of reference points, forming the training dataset D train . The model is trained on this dataset using a ﬁrst-order optimization method. Once trained, we can query the model at arbitrary locations in the en vironment to obtain either en vironment mapping results or uncertainty estimates via the additional ”uncertain” node. probability of each label using kernel features that encode local spatial correlations. Each point is projected into a feature vector ϕ ( x ) ⊤ = [ k ( x, h 1 ) , k ( x, h 2 ) , . . . , k ( x, h H )] , (1) where { h i } H i =1 are reference points (kernel centres) and k ( · , · ) is typically a Gaussian kernel k ( x, h ) = exp( − γ ∥ x − h ∥ 2 ) , (2) with bandwidth γ controlling the spatial smoothness. This mapping allows simple linear models on ϕ ( x ) to represent complex, non-linear structures, making kernel-based meth- ods an efﬁcient and interpretable foundation for continuous occupancy and semantic mapping. A. Bayesian K ernel-based Mapping A Bayesian view treats the model weights as random vari- ables, allo wing both the mean prediction and its uncertainty to be inferred. For binary occupancy , the likelihood is P ( y = 1 | x, w ) = σ ( w ⊤ ϕ ( x )) , (3) where w is gi ven a Gaussian prior p ( w ) = N ( µ 0 , Σ 0 ) . The predictiv e distrib ution marginalises o ver the posterior P ( y = 1 | x ∗ , X ) = Z σ ( w ⊤ ϕ ( x ∗ )) p ( w | X ) d w , (4) providing both occupancy probability and predictiv e vari- ance. For multi-class segmentation, the formulation extends to a softmax likelihood with weight matrix W ∈ R C × H , enabling simultaneous reasoning over multiple object cat- egories. In both cases, the posterior covariance Σ captures spatial correlations and conﬁdence b ut requires repeated in- version of an H × H matrix. This operation scales as O ( H 3 ) , making Bayesian inference computationally expensi ve for dense or large-scale maps. B. Motivation While Bayesian kernel-based mapping offers a princi- pled measure of uncertainty , its cubic cost prevents real- time deployment in robotic settings where both scale and responsiv eness are critical. Moreov er, parameter uncertainty does not always correspond directly to spatial uncertainty , especially in partially observed or occluded regions. These challenges motiv ate ContraMap , which preserves the efﬁcienc y and spatial continuity of kernel-based models but learns uncertainty explicitly as a predictive quantity . By introducing an additional “uncertain” class and training with contrastiv e supervision between observed and synthetic noise samples, ContraMap enables joint prediction and uncertainty estimation without Bayesian inference, achie ving real-time uncertainty-aware mapping. I V . M E T H O D O V E RV I E W Bayesian extensions of Hilbert Maps (HMs) [8], [10] provide a principled means of modeling uncertainty , but their reliance on cov ariance matrix inv ersion leads to cubic scaling with the number of hinge points, making them unsuitable for real-time deployment. Our goal is to design a lightweight alternativ e that both preserves the efﬁcienc y of HMs and captures uncertainty . A. Noise Contrastive Estimation to Uncertainty Modeling Noise Contrastiv e Estimation (NCE) [21], [22] offers a useful perspectiv e: model parameters can be estimated by training a classiﬁer to distinguish data samples x ∈ X drawn from p InD ( ·| θ ) ag ainst noise samples ˜ x ∈ ˜ X drawn from p noise ( · ) . The objectiv e is to maximize J ( θ ) = 1 | X | X x ∈ X ln[ h ( x ; θ )] + 1 | ˜ X | X ˜ x ∈ ˜ X ln[1 − h ( ˜ x ; θ )] , (5) where h ( u ; θ ) is the posterior probability that u originates from the data distribution. Importantly , this objective is equiv alent to the log-likelihood of a logistic regression classiﬁer discriminating between real and noise samples. Thus, NCE establishes a direct connection between density estimation and supervised classiﬁcation. W e leverage this connection to reinterpret uncertainty modeling: rather than inferring full Bayesian posteriors ov er parameters, we treat ambiguous or unseen regions as “noise” and explicitly intro- duce them as a separate prediction class. B. Efﬁcient Hilbert Maps with an Uncertainty Class Building on HMs, which use logistic regression to model occupancy , we extend the sigmoid formulation to a softmax classiﬁer model that includes an additional class for uncer- tainty . Here, we assume that we have a segmented point cloud and seek to assign each segmentation mask a label, then, learn a continuous representation encapsulating this. Concretely , the model is parameterized by a weight matrix W ∈ R ( C +1) × H , where C is the number of standard classes ( C = 2 for 2D mapping, i.e., occupied/free). T o train W , we construct an augmented dataset: D = { ( x i , y i ) } n i =1 ∪ { ( x ∗ j , y ∗ j = C + 1) } m j =1 , (6) where ( x i , y i ) are the original samples and { ( x ∗ j , y ∗ j = C + 1) } are noise points randomly sampled from the en- vironment, labeled as the ( C + 1) -th “uncertain” class. T raining then reduces to optimizing the standard cross- entropy loss ov er D , which can be done ef ﬁciently using gradient descent. The resulting model outputs occupancy probabilities for kno wn classes while the ( C + 1) -th node provides an explicit estimate of uncertainty . This design retains the scalability and efﬁcienc y of HMs while av oiding the cubic computational burden of Bayesian v ariants. V . F RO M N O I S E T O A N U N C E RTA I N T Y I N D I C ATO R Our classiﬁer is trained with an explicit extra label ( C + 1) representing noise or none-of-the-above inputs. At test time, this allows the model to either: (i) explain a query using one of the C in-distribution classes, or (ii) allocate probability mass to the uncertainty class when the query lies away from the support of the observed data. In this sense, the additional output should be interpreted primarily as a learned distance- aware OOD score over continuous space. The goal of this section is therefore not to claim calibrated uncertainty in a strict probabilistic sense, but to justify why the softmax probability on the ( C + 1) -th node, p C +1 ( x ) , provides a meaningful uncertainty ordering . In particular , we show that under a simple mixture-model vie w , p C +1 ( x ) is a monotonic function of a distance-aware uncertainty surrogate. Key idea: If we train on a mixture of (a) in-distribution data and (b) broadly spread noise labeled as ( C + 1) , then a well-trained softmax classiﬁer approximates the Bayes posterior for the noise component. In this case, p C +1 ( x ) be- comes (approximately) the probability that x came from the noise process rather than the in-distrib ution process. Since in-distribution density typically decays as we mov e away from the data manifold, this posterior naturally increases with distance-to-data, which is also ho w many uncertainty measures behav e. Distance to the in-distrib ution: Let X InD be the in- distribution training set. W e quantify ho w far a test input x lies from the in-distribution support via the expected distance d ( x ) = E x ′ ∼ p InD  ∥ x − x ′ ∥ X  , (7) where ∥ · ∥ X is the distance metric induced by the input space. A simple generative picture: T o interpret the uncertainty node, we consider a stylised mixture model in which training examples are dra wn from: p ( x, y ) = α InD p InD ( x, y ) + α noise p noise ( x, y = C + 1) , (8) and the noise inputs are approximately uniform in the input space, i.e., p noise ( x ) ≈ n 0 . Also assume the in-distribution marginal density decreases as we move away from the data, meaning p InD ( x ) ≈ g ( d ( x )) , with g positi ve and non- increasing. What does the ( C + 1) softmax lear n? For a sufﬁciently well-trained softmax classiﬁer , the ( C + 1) output approx- imates the Bayes posterior probability [23] that x was generated by the noise component: p C +1 ( x ) ≈ α noise p noise ( x ) α InD p InD ( x ) + α noise p noise ( x ) . (9) Plugging in p noise ( x ) ≈ n 0 and p InD ( x ) ≈ g ( d ( x )) gi ves p C +1 ( x ) ≈ α noise n 0 α InD g ( d ( x )) + α noise n 0 . (10) Because g ( · ) is non-increasing, the denominator shrinks as d ( x ) grows, so p C +1 ( x ) is non-decr easing in the distance- to-data d ( x ) . In other words, the farther we mov e from the training support, the more the classiﬁer assigns a higher probability to the noise class. Connecting to an uncertainty measure: Many standard uncertainty measures are also distance-aw are, i.e., there exists a non-decreasing function h such that U ( x ) = h ( d ( x )) . (11) This is kno wn to hold, for example, for Gaussian Processes with RBF kernels where predictiv e variance increases with distance from the training data [12]. Combining (10) and (11), we obtain an (input-dependent) monotone relationship: p C +1 ( x ) ≈ φ ( U ( x )) , (12) for some non-decreasing function φ (implicitly given by composing the distance-to-noise-posterior mapping with the in verse of h ). Under these assumptions, the output probability of the ( C + 1) -th class is a monotonic transformation of an input- dependent uncertainty measure. This does not imply that p C +1 ( x ) is a calibrated uncertainty estimate in the strict probabilistic sense; rather , it shows that the score preserves the relativ e ordering of uncertainty and can therefore serve as a useful spatial uncertainty indicator . V I . E X P E R I M E N T S T o demonstrate the effecti veness of the proposed method, we conduct a comprehensiv e set of experiments ev aluating predictiv e uncertainty , mapping accuracy , and scalability across both 2D and 3D robotic en vironments. Speciﬁcally , we provide qualitati ve uncertainty analysis and quantitativ e comparisons against Hilbert Maps, Bayesian Hilbert Maps, V -PRISM, and 3D Hilbert Maps, as well as an ev aluation on hinge point count and scalability . All experiments are run on a NVIDIA T esla P100 GPU. A. Qualitative evaluation of pr edictive uncertainty T o qualitativ ely ev aluate uncertainty estimation behav- ior , we ﬁrst conducted experiments on three standard toy classiﬁcation datasets: T wo Ovals, T wo Moons, and T wo Circles. These datasets feature increasingly complex decision boundaries and are commonly used to analyze uncertainty . Follo wing the setup in prior work on uncertainty estimation [12], we compare our approach against a Gold Standard Gaussian Process (GP), MC Dropout [13], Deep Ensem- bles [14], DNN-GP and SNGP [12]. As these methods are not designed for en vironment mapping, the comparison focuses e xclusi vely on uncertainty estimation behavior , rather than quantitativ e classiﬁcation accuracy . Uncertainty in our method is derived directly through the ( C + 1) -th node in the output layer , while the compared methods infer uncertainty implicitly from their predicti ve distrib utions. As shown in Figure 2, the Gold Standard GP exhibits low uncertainty in the vicinity of training samples and increasing uncertainty as predictions move farther from the observed data. Among the compared methods, our approach most closely matches this behavior across all three datasets. In our results, uncertainty remains low near the data manifold and increases smoothly in regions that are sparsely sampled or entirely unobserved, including areas containing out-of-distribution points. This behavior of our model stems from the structure inherited from Hilbert Maps [3]. By projecting inputs into a high-dimensional space using hinge points, the model can effecti vely separate complex class geometries. In addition, we introduce uniformly sampled noise labeled as an ex- plicit ”uncertain” class during training. These samples act as negati ve evidence across the input space, enabling the model to identify regions that lack sufﬁcient training evi- dence. Consequently , unlike methods that derive uncertainty implicitly from the predicti ve class distribution (e.g., entropy or v ariance), uncertainty in our approach is governed by distance from the training distribution rather than ambiguity between in-distribution classes, and no ele vated uncertainty is observed along boundaries between well-separated classes. B. Compar e to HMs and BHMs in occupancy mapping This experiment empirically e valuates the proposed method in terms of accuracy , training time, and inference time; and compares it with Hilbert Maps (HMs) [3] and Bayesian Hilbert Maps (BHMs) [8] in occupancy mapping. W e also demonstrate the ability of our approach to capture uncertainty by extracting outputs for each class in the training dataset D , particularly the ( C + 1) -th class. 1) Settings: Experiments are conducted on multiple datasets from the Radish repository [24], each is split into training and testing sets with a 9:1 ratio. HMs and BHMs are trained using the observed training data X derived directly from Radish datasets, while our softmax model is trained on an augmented dataset D formed by combining X with randomly sampled noise from the en vironment. In this occupancy mapping task, points in X are labeled as c = 1 (occupied) or c = 0 (unoccupied). T o construct D , we generate a noise set ˜ X with | ˜ X | = | X | by uniformly sampling points across the landscape and labeling them as c = 2 , representing an additional uncertainty class. Thus, D = X ∪ ˜ X contains three classes, and the softmax model has three output nodes. For feature computation, the hinge (a) Our (b) HMs [3] (c) BHMs [8] Fig. 4: Occupancy mapping results of each method on Intel dataset. (a) Free Space (b) Occupied Space (c) Uncertainty in Space Fig. 5: Results e xtracted from each output of the softmax model. points { h i } H i =1 are chosen ev enly across the landscape, with H determined by the map size. Performance is ev aluated using the Area Under the Re- ceiv er Operating Characteristic Curve (A UC) [25], together with average training and inference times. The A UC is computed from predictions on the testing set. For inference, ev enly spaced query points are generated to cover each map; these points are batch-processed by each model to produce occupancy maps, and inference times are recorded. 2) Results Analysis: T able I shows the accuracy and runtime of each method on different datasets from Radish [24], along with the number of features used for each dataset. From T able I, both our method and HMs require substan- tially lower training and inference times than BHMs, while achieving comparable mapping accuracy in terms of A UC. This demonstrates the efﬁciency of parameter learning via gradient descent compared to Bayesian approaches. Since the proposed method and HMs employ simple softmax and logistic regression models, they can be trained effecti vely using ﬁrst-order gradient-based optimization, and inference is fast because it only in volves forward computation. In terms of runtime, HMs are marginally faster than our method, likely because logistic regression is slightly lighter than softmax model; howe ver , the difference is negligible. The mapping quality is illustrated in Figure 4. Since our method uses a softmax model with three output nodes, the displayed occupancy map is obtained by di viding the output of class 1 (probability of being occupied) by that of class 0 (probability of being unoccupied). The results show that our method produces maps with precision comparable to HMs and BHMs. T o further analyze the model, we separately extract the outputs of each node. Figure 5 presents these results on the Intel dataset, showing that each node clearly captures its corresponding class. From Figure 5a, conﬁdently un- occupied regions are highlighted, while Figure 5b outlines the occupied areas. Notably , Figure 5c shows the output of the uncertainty class learned from noise, demonstrating that the additional class ef fectiv ely models ambiguity . As T ABLE I: The performance of our method in comparison to Hilbert Map [3] and Bayesian Hilbert Map [8] on the Radish datasets [24]. Datasets Hilbert Maps [3] Bayesian Hilbert Maps [8] Our Belgioioso (#features = 8400) A UC ↑ 0.9904 ( ± 0.0003) 0.9901 ( ± 0.0003) 0.9901 ( ± 0.0003) T raining time (s) ↓ 2.2609 ( ± 0.0279) 722.6140 ( ± 2.5651) 3.2367 ( ± 0.0231) Inference time (s) ↓ 0.4258 ( ± 0.0014) 231.3024 ( ± 0.0746) 0.4681 ( ± 0.0022) Edmonton (#features = 8800) A UC ↑ 0.9754 ( ± 0.0005) 0.9734 ( ± 0.0005) 0.9767 ( ± 0.0004) T raining time (s) ↓ 1.8345 ( ± 0.0371) 789.5725 ( ± 1.7593) 2.2733 ( ± 0.0245) Inference time (s) ↓ 0.6997 ( ± 0.0016) 405.7368 ( ± 0.0842) 0.7630 ( ± 0.0011) Fhw (#features = 5940) A UC ↑ 0.9608 ( ± 0.0014) 0.9583 ( ± 0.0014) 0.9603 ( ± 0.0013) T raining time (s) ↓ 1.6850 ( ± 0.0467) 326.7038 ( ± 0.8631) 1.9292 ( ± 0.0125) Inference time (s) ↓ 0.2186 ( ± 0.0127) 75.1121 ( ± 0.0267) 0.2358 ( ± 0.0008) Intel (#features = 5600) A UC ↑ 0.9644 ( ± 0.0009) 0.9688 ( ± 0.0007) 0.9631 ( ± 0.0008) T raining time (s) ↓ 1.9923 ( ± 0.2506) 288.2842 ( ± 1.7646) 2.5446 ( ± 0.0206) Inference time (s) ↓ 0.1977 ( ± 0.0030) 65.4582 ( ± 0.1118) 0.2164 ( ± 0.0049) Mexico (#features = 7500) A UC ↑ 0.9726 ( ± 0.0006) 0.9709 ( ± 0.0007) 0.9731 ( ± 0.0006) T raining time (s) ↓ 1.7175 ( ± 0.0415) 566.5886 ( ± 1.2325) 2.0832 ( ± 0.0271) Inference time (s) ↓ 0.5305 ( ± 0.0042) 251.3564 ( ± 0.1505) 0.5786 ( ± 0.0034) T ABLE II: Quantitati ve comparison against 3D Hilbert Maps (HMs) on the SemanticKITTI dataset. Experiments were conducted on the initial scan of 10 distinct sequences. Bold indicates the statistically better result ( p < 0 . 05 ). Metric HMs Our mIoU ↑ 0.8096 ± 0.0515 0.8412 ± 0.0394 T raining Time (s) ↓ 32.6278 ± 1.5994 35.9025 ± 1.7746 Inference T ime (s) ↓ 203.6160 ± 8.2965 203.7221 ± 8.3205 T ABLE III: Quantitativ e comparison against 3D Hilbert Maps (HMs) on the SceneNet dataset. The ev aluation was performed across 60 distinct indoor scenes (rooms). Bold indicates the sta- tistically better result ( p < 0 . 05 ). Metric HMs Our mIoU ↑ 0.6622 ± 0.1234 0.6694 ± 0.1231 T raining Time (s) ↓ 32.0929 ± 22.6771 35.9777 ± 25.5202 Inference T ime (s) ↓ 32.2525 ± 2.6430 32.2989 ± 2.6393 observed in Figure 5c, regions conﬁdently classiﬁed as class 0 or 1 exhibit low uncertainty , whereas unobserv ed regions, where noise samples are more likely drawn, show high uncertainty . T aken together, the abo ve experiments suggest that our method can perform occupancy mapping accurately with a small runtime, while also be able to explicitly estimate the uncertainty in the en vironment. C. Compar e against 3D Hilbert Maps in 3D en vir onments mapping Follo wing the pre vious experiments, we further com- pare our method against 3D Hilbert Maps (HMs) [7] on two benchmarks, SemanticKITTI [26] and SceneNet [27], to ev aluate performance in both large-scale outdoor and indoor environments. Experiments on SemanticKITTI are conducted using the initial scan of 10 distinct sequences, while SceneNet e valuation spans 60 independent indoor scenes (rooms). W e report mIoU, training time, and infer - ence time to assess not only semantic accuracy but also computational ef ﬁciency . In particular, this experiment aims to examine whether extending the original HM formulation with an additional uncertain output node, introduced to enable uncertainty-aware semantic mapping, can preserve or improve semantic accuracy while maintaining practical efﬁcienc y . Qualitative comparisons of semantic mapping results with 3D HMs are illustrated in Figures 6 and 7. As sho wn in T ables II and III, our method consistently achiev es higher mIoU than 3D HMs on both datasets. This indicates that incorporating the ”uncertain” class does not degrade semantic performance; instead, it leads to more Ground T ruth HMs Results Our Results Fig. 6: Reconstruction results on SceneNet dataset. accurate semantic predictions. A plausible explanation is that augmenting the training data with the ”uncertain” class encourages the classiﬁer to better separate in-distribution semantic classes from ambiguous or out-of-distrib ution re- gions. This results in clearer decision boundaries between positiv e classes and non-semantic or noisy observations, which is particularly beneﬁcial in sparse or noisy 3D sensing scenarios. In terms of time efﬁciency , our method incurs a small training-time ov erhead, while inference time remains nearly identical to that of 3D HMs. This ov erhead is expected, as both approaches employ a softmax classiﬁer , but our model includes one additional output node to explicitly estimate uncertainty and needs to train on a larger augmented dataset including noise, which slightly increase the number of pa- rameters and optimization cost. Nevertheless, the resulting training and inference times remain highly efﬁcient and well within practical limits. In return for this marginal cost, our method enables uncertainty-aware semantic mapping, which is valuable for downstream tasks such as exploration, planning, and decision-making, while preserving and e ven improving semantic accurac y . D. Compar e to V -PRISM in repr esenting tabletop scenes In the next experiment, we inv estigate the ability of pro- posed method in handling the multiclass mapping problem in tabletop scenes scenario, and compare to the recent study V - PRISM [10]. Settings: Follo wing [10], 100 scenes containing Ground T ruth HMs Results Our Results Fig. 7: Reconstruction results on SemanticKITTI dataset. T ABLE IV: The performance of our proposed method compared to V -PRISM [10] on YCB dataset. V -PRISM [10] Our IoU ↑ 0.4994 ( ± 0.0878) 0.4854 ( ± 0.0805) Chamfer (m) ↓ 0.0126 ( ± 0.026) 0.0124 ( ± 0.031) T raining time (s) ↓ 11.2826 ( ± 5.6768) 9.3533 ( ± 2.1653) Reconstructing time (s) ↓ 229.5563 ( ± 100.0576) 3.7560 ( ± 1.1183) up to 10 objects each are generated from the YCB dataset [28] to form the training and testing data. W e also adopt the hinge-point generation mechanism of V -PRISM [10], which has been shown to be ef fecti ve for tabletop mapping. When constructing the training set D for our method, we observed that sampling excessi ve noise in tabletop scenes degrades performance. Unlike occupancy mapping, the ob- served data are distributed across more classes; using the same amount of noise as in the previous setting would create a strong imbalance between sampled noise and samples in each class, hindering learning of the true data distribution. Therefore, we set the noise ratio to 2 . 5% of the size of the actual training data. Additionally , due to the smaller noise ratio, noise samples are preferentially drawn near objects to increase the likelihood of capturing uncertain regions, such as occluded areas. Similar to [10], the metrics used to assess performance are the intersection ov er union (IoU) and the Chamfer distance. W e used the same process to calculate the Chamfer distance as in V -PRISM. In addition, we also report the average training and mesh reconstructing times of each method to compare their processing speed. Results Analysis: The performance and runtime of our method and V -PRISM [10] are reported in T able IV, while the third and ﬁfth columns of Figure 8 illustrate representa- tiv e examples of the reconstructed scenes. The results from T able IV together with examples sho wed in Figure 8 state that our method achie ves comparable accuracy to V -PRISM in representing the tabletop scene, with V -PRISM attaining a slightly higher av erage IoU score, whereas our method yields a marginally better Chamfer distance. Ho wev er , in terms of time ef ﬁciency , our method surpasses V -PRISM, especially in scene reconstruction. This once again claims the substantial runtime adv antage of our approach compared to a Bayesian- based method. With regard to uncertainty quantiﬁcation, we show the result of the additional node, which corresponds to the ”uncertain” class, in the fourth column of Figure 8. Our results V -PRISM’ s results RGB images Point Cloud Fig. 8: Representative scene reconstructions. Col. 1: RGB. Col. 2: se gmented point cloud. Col. 3: our reconstructed mesh. Col. 4: our uncertainty (softmax auxiliary head; lighter = higher). Col. 5: V -PRISM mesh. Col. 6: V -PRISM uncertainty heatmap (lighter = higher). Following [10], Cols. 4 and 6 show uncertainty on a 2D scene slice (bottom = nearer the camera). Object boundaries from the top-view projection are ov erlaid for spatial context. 224 360 648 1400 5600 Number of Hinge Points 0.75 0.80 0.85 0.90 0.95 A UC A UC vs Hinge Points 224 360 648 1400 5600 Number of Hinge Points 1 0 0 1 0 1 1 0 2 Time (s) T raining Time vs Hinge Points 224 360 648 1400 5600 Number of Hinge Points 1 0 1 1 0 0 1 0 1 Time (s) Inference Time vs Hinge Points Our (0 hidden layers) Our (1 hidden layer) Our (2 hidden layers) BHMs Fig. 9: Ef fect of hinge point count on performance (Intel dataset). Results are shown for our methods with varying numbers of hidden layers and for BHMs . The uncertain areas with respect to the occluded or partially observed sections of the scenes are highlighted correctly . E. Impact of hinge points count and Scaling ability of pr oposed method In this part, we are going to empirically study the impact of hinge point count H , as well as the scalability of our method and Bayesian-based method when H increases. W e run occupancy mapping experiments on Intel dataset with different numbers of hinge points and recorded accuracy indicated by the A UC score, as well as the run time of our method and BHMs. The results are illustrated in Figure 9. The results suggest that if we do not use enough hinge points to calculate features, or , in other words, if the number of features is insufﬁcient, both methods cannot learn the en vironment landscape effecti vely . An increase in the number of hinge points leads to higher A UC scores. Howe ver , with BHMs, this comes at the cost of a dramatic rise in runtime, whereas, for our method, the runtime grows only modestly with the hinge point count. This emphasizes that our method scales much better than the Bayesian method, and is more practical to deploy in real world, especially for representing vast environments. As a softmax classiﬁer can be considered as a lite version of a neural netw ork with no hidden layers, we also carried out experiments which add 1 and 2 hidden layers to our model to in vestigate whether extending to an advanced architecture can substantially improv e our method or not. The results reported in Figure 9 indicate that adding more hidden layers can help the model perform exceptionally well ev en with only a small amount of initial features. This can be explained by the hidden layers provide the ability to learn hidden features in the middle of the training process, hence, reducing the need to start with a sufﬁcient number of features at the input layer . It is worth noting that adding one or two hidden layers incurs only a slight increase in training and inference time. This observation supposes that our method can be further enhance for more accurate results, while still keeping the fast runtime and the uncertain estimation capability . V I I . C O N C L U S I O N A N D F U T U R E W O R K This study proposes a straightforward yet effecti ve ap- proach for uncertainty estimation while maintaining accu- rate environment representation. Our method is inspired by noise contrasti ve estimation, utilizing noise samples as an additional ”uncertain” class and learning it through a lightweight softmax classiﬁer with an extra output node. Em- pirical results sho w that the proposed frame work can produce accurate en vironment maps while simultaneously identifying uncertain regions, all with exceptionally low computational ov erhead. The uncertainty estimates produced by our model not only contrib ute to more reliable robot perception, but also enable acti ve perception strate gies, where robots prioritize exploration of poorly observed regions to improv e mapping efﬁcienc y . In the future, we plan to in vestig ate adaptive or scene-aware strategies for generating contrastiv e noise samples, as our experiments indicate that the choice of noise generation can signiﬁcantly af fect both the learned map and the resulting uncertainty estimation. R E F E R E N C E S [1] A. Elfes, “Sonar-based real-world mapping and navigation, ” IEEE Journal on Robotics and Automation , 1987. [2] S. T . O’Callaghan and F . T . Ramos, “Gaussian process occupancy maps, ” International Journal Robotics Research , 2012. [3] F . Ramos and L. Ott, “Hilbert maps: Scalable continuous occupancy mapping with stochastic gradient descent, ” International Journal Robotics Resear ch , 2016. [4] W . Zhi, T . Lai, L. Ott, and F . Ramos, “Diffeomorphic transforms for generalised imitation learning, ” in Learning for Dynamics and Control Confer ence, L4DC , 2022. [5] W . Zhi, I. Akinola, K. v an W yk, N. Ratliff, and F . Ramos, “Global and reactive motion generation with geometric fabric command se- quences, ” in IEEE International Conference on Robotics and Automa- tion, ICRA , 2023. [6] W . Zhi, T . Lai, L. Ott, E. V . Bonilla, and F . Ramos, “Learning efﬁcient and robust ordinary differential equations via invertible neural networks, ” in International Conference on Machine Learning, ICML , 2022. [7] V . Guizilini and F . Ramos, “T o wards real-time 3d continuous occu- pancy mapping using hilbert maps, ” Int. J. Robotics Res. , vol. 37, no. 6, pp. 566–584, 2018. [8] R. Senanayake and F . Ramos, “Bayesian hilbert maps for dynamic continuous occupancy mapping, ” in Confer ence on Robot Learning , 2017. [9] W . Zhi, L. Ott, R. Senanayake, and F . Ramos, “Continuous occupancy map fusion with fast bayesian hilbert maps, ” in International Confer- ence on Robotics and Automation (ICRA) , 2019. [10] H. Wright, W . Zhi, M. Johnson-Roberson, and T . Hermans, “V - PRISM: probabilistic mapping of unknown tabletop scenes, ” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IR OS 2024 , 2024. [11] W . Zhi, R. Senanayake, L. Ott, and F . Ramos, “Spatiotemporal learning of directional uncertainty in urban environments with kernel recurrent mixture density networks, ” IEEE Robotics and Automation Letters , 2019. [12] J. Z. Liu, Z. Lin, S. P adhy , D. Tran, T . Bedrax-W eiss, and B. Lak- shminarayanan, “Simple and principled uncertainty estimation with deterministic deep learning via distance a wareness, ” in Advances in Neural Information Processing Systems , 2020. [13] Y . Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning, ” in Proceedings of the 33nd International Confer ence on Machine Learning, ICML 2016, New Y ork City, NY , USA, June 19-24, 2016 , vol. 48, 2016, pp. 1050– 1059. [14] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predicti ve uncertainty estimation using deep ensembles, ” in Advances in Neural Information Pr ocessing Systems 30: Annual Con- fer ence on Neural Information Pr ocessing Systems 2017, December 4-9, 2017, Long Beach, CA, USA , 2017, pp. 6402–6413. [15] S. Mohseni, M. Pitale, J. Y adawa, and Z. W ang, “Self-supervised learning for generalizable out-of-distribution detection, ” Proceedings of the AAAI Confer ence on Artiﬁcial Intelligence , 2020. [16] S. V ernekar, A. Gaurav , T . Denouden, B. Phan, V . Abdelzad, R. Salay , and K. Czarnecki, “ Analysis of conﬁdent-classiﬁers for out- of-distribution detection, ” CoRR , 2019. [17] Z. Cheng, X. Zhang, and C. Liu, “Uniﬁed classiﬁcation and rejection: A one-versus-all framework, ” Mach. Intell. Res. , 2024. [18] W . Liu, X. W ang, J. D. Owens, and Y . Li, “Energy-based out-of- distribution detection, ” in Advances in Neural Information Pr ocessing Systems , 2020. [19] M. S. Graham, W . H. Pinaya, P .-D. Tudosiu, P . Nachev , S. Ourselin, and J. Cardoso, “Denoising dif fusion models for out-of-distribution detection, ” in Pr oceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition (CVPR) W orkshops , 2023. [20] H. Cheng, T . Zheng, Z. Ma, T . Zhang, M. Johnson-Roberson, and W . Zhi, “Dose3: Diffusion-based out-of-distribution detection on se (3) trajectories, ” IEEE Robotics and Automation Letters , 2026. [21] M. Gutmann and A. Hyv ¨ arinen, “Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics, ” J. Mach. Learn. Res. , 2012. [22] M. Gutmann and A. Hyv ¨ arinen, “Noise-contrastiv e estimation: A new estimation principle for unnormalized statistical models, ” in Pr oceedings of the Thirteenth International Confer ence on Artiﬁcial Intelligence and Statistics , ser . Proceedings of Machine Learning Research, Y . W . T eh and M. Titterington, Eds., vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, 13–15 May 2010, pp. 297–304. [Online]. A vailable: https://proceedings.mlr.press/v9/gutmann10a.html [23] I. J. Goodfellow , Y . Bengio, and A. Courville, Deep Learning . Cam- bridge, MA, USA: MIT Press, 2016. [24] A. How ard and N. Roy , “The robotics data set repository (radish), ” 2003. [Online]. A vailable: http://radish.sourceforge.net/ [25] A. P . Bradley , “The use of the area under the roc curve in the ev aluation of machine learning algorithms, ” P attern Recognition , vol. 30, no. 7, pp. 1145–1159, 1997. [26] J. Behley , M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stach- niss, and J. Gall, “Semantickitti: A dataset for semantic scene un- derstanding of lidar sequences, ” in 2019 IEEE/CVF International Confer ence on Computer V ision, ICCV 2019, Seoul, K orea (South), October 27 - November 2, 2019 , 2019, pp. 9296–9306. [27] J. McCormac, A. Handa, S. Leutenegger , and A. J. Davison, “Scenenet RGB-D: can 5m synthetic images beat generic imagenet pre-training on indoor segmentation?” in IEEE International Conference on Com- puter V ision, ICCV 2017, V enice, Italy , October 22-29, 2017 , 2017, pp. 2697–2706. [28] B. Calli, A. Singh, A. W alsman, S. Srinivasa, P . Abbeel, and A. M. Dollar , “The YCB object and model set: T owards common benchmarks for manipulation research, ” in 2015 International Confer ence on Advanced Robotics (ICAR) , 2015, pp. 510–517.

ContraMap: Contrastive Uncertainty Mapping for Robot Environment Representation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment