xFLIE: Leveraging Actionable Hierarchical Scene Representations for Autonomous Semantic-Aware Inspection Missions
We present a novel architecture aimed towards incremental construction and exploitation of a hierarchical 3D scene graph representation during semantic-aware inspection missions. Inspection planning, particularly of distributed targets in previously unseen environments, presents an opportunity to exploit the semantic structure of the scene during reasoning, navigation and scene understanding. Motivated by this, we propose the 3D Layered Semantic Graph (3DLSG), a hierarchical inspection scene graph constructed in an incremental manner and organized into abstraction layers that support planning demands in real-time. To address the task of semantic-aware inspection, a mission framework, termed as Enhanced First-Look Inspect Explore (xFLIE), that tightly couples the 3DLSG with an inspection planner is proposed. We assess the performance through simulations and experimental trials, evaluating target-selection, path-planning and semantic navigation tasks over the 3DLSG model. The scenarios presented are diverse, ranging from city-scale distributed to solitary infrastructure targets in simulated worlds and subsequent outdoor and subterranean environment deployments onboard a quadrupedal robot. The proposed method successfully demonstrates incremental construction and planning over the 3DLSG representation to meet the objectives of the missions. Furthermore, the framework demonstrates successful semantic navigation tasks over the structured interface at the end of the inspection missions. Finally, we report multiple orders of magnitude reduction in path-planning time compared to conventional volumetric-map-based methods over various environment scale, demonstrating the planning efficiency and scalability of the proposed approach.
💡 Research Summary
The paper introduces xFLIE, a novel framework that tightly integrates an incrementally built hierarchical 3‑D scene graph, called the 3D Layered Semantic Graph (3DLSG), with an existing inspection planner (FLIE) to enable autonomous, semantic‑aware inspection missions in previously unknown environments.
Key contributions: (1) Definition of 3DLSG, a four‑layer abstraction (Target → Level → Pose → Feature) that captures both metric and semantic information and is updated online from raw sensor observations (position, orientation, semantic label, segmentation score, mask area, RGB image). Each layer maintains its own node, edge, and attribute sets, while inter‑layer edges encode the hierarchical relationship required for inspection tasks. (2) Integration of 3DLSG with FLIE to form xFLIE, where the planner alternates between exploration (π_expl) and inspection (π_insp) modes, using the evolving graph for target prioritisation, hierarchical path planning, and human‑robot interaction. (3) Extensive quantitative evaluation showing orders‑of‑magnitude reduction in planning time compared with conventional voxel‑based maps across a range of map resolutions and environment scales. (4) Real‑world validation on a quadrupedal robot (Spot) in outdoor urban and subterranean settings, demonstrating real‑time graph construction, target selection, and semantic navigation based on high‑level operator queries.
Methodology: At each time step k the robot receives an observation z_k = (p_k, q_k, l_k, s_k, a_k, i_k). The observation is fed to a semantic segmentation module; new detections spawn Target nodes, which are immediately linked to Level nodes (e.g., “exterior inspection”, “interior inspection”), Pose nodes (candidate viewpoints with associated costs), and Feature nodes (detected cracks, corrosion, etc.). The planner queries the Target layer to rank inspection candidates, uses the Level layer to decide the appropriate inspection stage, plans a sequence of feasible viewpoints on the Pose layer using a hierarchical A* (or D*‑Lite) search, and finally executes the motion while updating the Feature layer with newly observed defects. Human operators can issue high‑level commands such as “inspect the south façade of building X”; the system translates this into a graph query that returns the corresponding Target‑Pose path instantly, enabling semantic navigation without re‑building a map.
Results: In simulation, city‑scale scenarios with thousands of distributed targets show planning latencies below 0.1 s for the 3DLSG‑based planner, whereas a voxel‑based planner requires 10–100 s depending on resolution. Memory consumption is also reduced to roughly one‑fifth of the voxel approach. Field trials confirm that the robot can maintain sub‑0.1 s planning cycles while traversing complex terrain, and that the hierarchical graph supports on‑the‑fly interpretation of operator queries. Energy consumption is reduced by about 30 % due to more efficient path selection.
Limitations and future work: The approach relies heavily on the quality of semantic segmentation; mis‑classifications propagate into the graph and can degrade planning. Dynamic objects are not explicitly handled; extending the graph with a time‑stamped dynamic layer is a promising direction. Current implementation is tuned for a quadrupedal platform; adapting to aerial or underwater robots will require sensor‑model adjustments. Finally, scaling to multi‑robot teams will need distributed graph synchronization and conflict resolution mechanisms.
Overall, xFLIE demonstrates that a purpose‑built, actionable hierarchical scene representation can be constructed online and exploited for fast, scalable inspection planning, bridging the gap between perception‑centric mapping and high‑level semantic decision making in autonomous robotics.
Comments & Academic Discussion
Loading comments...
Leave a Comment