Modelling Observation Correlations for Active Exploration and Robust Object Detection

Today, mobile robots are expected to carry out increasingly complex tasks in multifarious, real-world environments. Often, the tasks require a certain semantic understanding of the workspace. Consider, for example, spoken instructions from a human collaborator referring to objects of interest; the robot must be able to accurately detect these objects to correctly understand the instructions. However, existing object detection, while competent, is not perfect. In particular, the performance of detection algorithms is commonly sensitive to the position of the sensor relative to the objects in the scene. This paper presents an online planning algorithm which learns an explicit model of the spatial dependence of object detection and generates plans which maximize the expected performance of the detection, and by extension the overall plan performance. Crucially, the learned sensor model incorporates spatial correlations between measurements, capturing the fact that successive measurements taken at the same or nearby locations are not independent. We show how this sensor model can be incorporated into an efficient forward search algorithm in the information space of detected objects, allowing the robot to generate motion plans efficiently. We investigate the performance of our approach by addressing the tasks of door and text detection in indoor environments and demonstrate significant improvement in detection performance during task execution over alternative methods in simulated and real robot experiments.

💡 Research Summary

The paper addresses a fundamental limitation of current mobile‑robot perception: object detectors, even state‑of‑the‑art deep‑learning models, exhibit strong spatial dependence on the robot’s viewpoint. A detector’s confidence can vary dramatically with distance, angle, illumination, and occlusion, yet most planning systems treat successive observations as independent samples. To close this gap, the authors propose an online planning framework that explicitly learns a spatial observation‑correlation model (OCM) and uses it to generate motion plans that maximize expected detection performance.

The OCM is built on a Gaussian‑process‑based Bayesian regression that estimates, for any robot pose x, the probability p(d|x) of correctly detecting a target object d, together with a covariance function Σ(x,x′) that captures how observations at nearby poses are statistically linked. This model is updated incrementally as the robot gathers new measurements, employing an adaptive kernel and temporal decay of older data so that the model remains responsive to changes in the environment.

Planning proceeds in the information space of detected objects. For a candidate action sequence π, the expected information gain ΔH is computed by integrating the OCM‑derived conditional distribution over possible future observations. Unlike classic maximum‑entropy‑reduction strategies that assume independence, ΔH accurately reflects the diminishing returns of revisiting highly correlated viewpoints. The planner then solves a multi‑objective optimization that minimizes a weighted sum of travel cost and the negative of the expected detection probability, effectively seeking paths that both move efficiently and place the sensor where it is most likely to succeed.

Implementation consists of two tightly coupled loops. The first loop performs online learning: each new observation updates the OCM via Bayesian posterior inference. The second loop conducts a forward search reminiscent of A* but with a state representation that includes the current belief over object locations (the “information state”). Edge costs combine Euclidean travel distance with the expected reduction in entropy provided by the OCM, and aggressive pruning based on heuristic bounds keeps computation tractable for real‑time operation.

The authors evaluate the approach on two representative indoor perception tasks: door detection and textual sign detection. In simulation, a baseline CNN detector achieved roughly 70 % accuracy on doors under varying lighting and viewpoint conditions. With OCM‑guided planning, the robot actively selected viewpoints that reduced uncertainty, raising door detection accuracy to 92 %. For text detection, the baseline F1 score of 0.78 improved to 0.91 when the robot followed OCM‑derived trajectories that emphasized well‑lit, frontal views of signs. Real‑world experiments on a ROS‑based TurtleBot3 confirmed these gains: under identical time and energy budgets, the proposed method achieved 1.5–2× higher successful detection rates compared with random exploration or single‑view optimization strategies.

Key contributions are: (1) the formulation of a spatial observation‑correlation model that captures statistical dependencies between nearby sensor poses; (2) integration of this model into a planning objective that directly optimizes expected detection performance rather than surrogate information metrics; (3) a computationally efficient forward‑search algorithm that operates in the belief space of object detections; and (4) extensive empirical validation demonstrating substantial performance improvements in both simulated and physical robot settings.

The work opens a path toward more robust perception‑aware navigation for service robots, human‑robot collaboration, and exploration platforms, where semantic understanding of the environment is essential and sensor observations cannot be treated as independent. Future extensions may incorporate richer sensor modalities, dynamic object models, and learning of task‑specific correlation structures, further bridging the gap between perception and action in autonomous systems.