Integrating Deep RL and Bayesian Inference for ObjectNav in Mobile Robotics

Integrating Deep RL and Bayesian Inference for ObjectNav in Mobile Robotics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Autonomous object search is challenging for mobile robots operating in indoor environments due to partial observability, perceptual uncertainty, and the need to trade off exploration and navigation efficiency. Classical probabilistic approaches explicitly represent uncertainty but typically rely on handcrafted action-selection heuristics, while deep reinforcement learning enables adaptive policies but often suffers from slow convergence and limited interpretability. This paper proposes a hybrid object-search framework that integrates Bayesian inference with deep reinforcement learning. The method maintains a spatial belief map over target locations, updated online through Bayesian inference from calibrated object detections, and trains a reinforcement learning policy to select navigation actions directly from this probabilistic representation. The approach is evaluated in realistic indoor simulation using Habitat 3.0 and compared against developed baseline strategies. Across two indoor environments, the proposed method improves success rate while reducing search effort. Overall, the results support the value of combining Bayesian belief estimation with learned action selection to achieve more efficient and reliable objectsearch behavior under partial observability.


💡 Research Summary

The paper tackles the challenging problem of indoor object navigation (ObjectNav) for mobile robots, where partial observability, sensor noise, and the trade‑off between exploration cost and success probability make classical approaches insufficient. Traditional probabilistic methods maintain an explicit belief over possible target locations using Bayesian filtering, but they rely on handcrafted utility functions for action selection, limiting adaptability. Conversely, deep reinforcement learning (DRL) learns policies end‑to‑end from interaction data, achieving strong empirical performance yet lacking explicit uncertainty modeling, interpretability, and often requiring large amounts of training data.

To bridge this gap, the authors propose a hybrid framework that combines Bayesian belief estimation with a deep Q‑network (DQN) policy. The system assumes a known 2‑D occupancy grid of the environment (free vs. occupied cells) but unknown semantic content of occupied cells. At each timestep the robot captures an RGB‑D observation, runs a calibrated YOLO‑v11 detector (softmax output with temperature scaling) to obtain class probabilities for the target object, and projects each detection into the world map using depth and known camera‑to‑robot pose. Detections are assigned to occupied cells (or the nearest occupied cell if projected onto free space), producing spatial evidence.

The belief over each occupied cell is represented by a Dirichlet distribution with parameters βᵢⱼ for K object classes plus a background class. An initial uniform prior (β=1) is used. Observation evidence consists of (a) positive detections, where the class probability vector p is scaled to allocate a fixed background mass, and (b) negative evidence from visible occupied cells without detections, whose background mass decays with robot‑to‑cell distance to reflect detector false‑negative rates. Instead of naïve count accumulation, the authors adopt a conservative Bayesian fusion rule (Kaplan et al., 2020) that blends prior parameters with observation vectors and adds a minimum evidence term, preventing over‑confidence under noisy detections. The posterior mean of each Dirichlet yields a per‑cell probability map ˆπ(k)ᵢⱼ, which together with the per‑cell entropy forms a four‑channel tensor (target probability, entropy, occupancy, robot pose) fed to the DQN.

To keep the action space tractable, free cells are partitioned into spatial clusters using a hierarchical clustering scheme. Each cluster’s centroid serves as a candidate navigation goal. The DQN outputs a dense Q‑value map over all free cells; a binary mask restricts selection to the current level’s centroids. An ε‑greedy policy chooses a centroid, after which a conventional planner computes a shortest‑path trajectory to the selected goal while the robot continues to collect observations and update the belief online.

The reward function provides a large positive reward when the target is detected with confidence above a predefined threshold (e.g., 75 %) and penalizes each timestep proportionally to travel distance and elapsed time, encouraging efficient exploration.

Experiments are conducted in the Habitat 3.0 simulator on two distinct indoor layouts (a living‑room/kitchen composite and a corridor/meeting‑room composite). Three methods are compared: (1) a purely probabilistic baseline that updates the belief and selects actions via a handcrafted information‑gain heuristic, (2) a pure DRL baseline that learns directly from RGB‑D images and robot pose, and (3) the proposed hybrid approach. Evaluation metrics include success rate, average number of steps to locate the object, and cumulative reward. The hybrid method outperforms both baselines, achieving roughly a 12 percentage‑point increase in success rate, an 18 % reduction in average steps, and higher cumulative rewards. The analysis shows that the Bayesian belief map effectively guides the learned policy toward high‑uncertainty regions, reducing redundant exploration.

Key contributions are: (i) a calibrated detection‑to‑belief pipeline that yields reliable probabilistic evidence, (ii) the integration of a belief‑driven state representation with a DQN policy, (iii) a clustering abstraction that scales the action space while preserving spatial structure, and (iv) an empirical demonstration that combining explicit uncertainty modeling with learned action selection yields superior performance in realistic indoor navigation tasks. The authors suggest future work on dynamic environments, map uncertainty, and multi‑object search to bring the approach closer to real‑world deployment.


Comments & Academic Discussion

Loading comments...

Leave a Comment