Where is My Stuff? An Interactive System for Spatial Relations

Where is My Stuff? An Interactive System for Spatial Relations
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper we present a system that detects and tracks objects and agents, computes spatial relations, and communicates those relations to the user using speech. Our system is able to detect multiple objects and agents at 30 frames per second using a RGBD camera. It is able to extract the spatial relations in, on, next to, near, and belongs to, and communicate these relations using natural language. The notion of belonging is particularly important for Human-Robot Interaction since it allows the robot ground the language and reason about the right objects. Although our system is currently static and targeted to a fixed location in a room, we are planning to port it to a mobile robot thus allowing it explore the environment and create a spatial knowledge base.


💡 Research Summary

This paper presents an integrated, interactive system designed to track objects and people in a defined workspace, compute spatial relationships between them, and communicate these relationships to users through natural language dialogue. Motivated by use cases in elder care and industrial workshops, the system aims to solve the common problem of forgetting or misplacing items.

The hardware setup consists of a ceiling-mounted Microsoft Kinect RGB-D camera focused on a target area and an array microphone. The system architecture is built around three core components: detection, spatial reasoning, and dialog.

The detection module processes point cloud data from the Kinect in real-time (approximately 30 fps). People are modeled as “stalagmites” rising from the floor with constraints on dimensions, while objects are modeled as “bumps” on the work surface. Both are tracked over time, with special handling for objects being held by merging the hand’s point cloud with the object’s. Detection data, including properties like bounding boxes and IDs, is streamed via a ZeroMQ pub-sub channel in JSON format.

The spatial relations module analyzes this geometric data to compute observer-independent relationships based on predefined geometric rules. Key object-object relations include in (80% volume containment), on (vertical support), near (proximity based on object size), and next to (nearness without intervening objects). A crucial object-agent relation, belongs to, is inferred when a previously unseen object first appears in conjunction with a specific person. This allows the system to ground natural language references like “my wallet.” These relations are computed per frame and maintained in a database.

The dialog module handles voice interaction. The system is activated either by a wake word (“Celia”) or by the user looking directly at the camera. Upon activation, the user has two seconds to pose a question, such as “Where is my wallet?” The system then queries its spatial knowledge base, composes a natural language response synthesizing the relevant relations (e.g., “It is next to the vase, under the magazines”), and delivers it via speech synthesis.

The authors acknowledge that the current implementation is static, with a fixed camera view, which limits its field of view and raises privacy concerns. They outline future work focused on porting the system to a mobile robot platform. This mobility would allow the robot to explore the environment, reduce occlusions by viewing scenes from multiple angles, and proactively follow people to learn about object usage in a broader context. This evolution points towards a more general human-robot interaction (HRI) system capable of building a comprehensive spatial knowledge base of a dynamic environment. The paper’s contribution lies in its end-to-end integration of real-time perception, explicit spatial logic, and natural language interaction for a practical assistive task.


Comments & Academic Discussion

Loading comments...

Leave a Comment