NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
Safe and efficient robot operation in complex human environments can benefit from good models of site-specific motion patterns. Maps of Dynamics (MoDs) provide such models by encoding statistical motion patterns in a map, but existing representations use discrete spatial sampling and typically require costly offline construction. We propose a continuous spatio-temporal MoD representation based on implicit neural functions that directly map coordinates to the parameters of a Semi-Wrapped Gaussian Mixture Model. This removes the need for discretization and imputation for unevenly sampled regions, enabling smooth generalization across both space and time. Evaluated on two public datasets with real-world people tracking data, our method achieves better accuracy of motion representation and smoother velocity distributions in sparse regions while still being computationally efficient, compared to available baselines. The proposed approach demonstrates a powerful and efficient way of modeling complex human motion patterns and high performance in the trajectory prediction downstream task. Project code is available at https://github.com/test-bai-cpu/nemo-map
💡 Research Summary
The paper “NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping” introduces a novel, continuous representation for modeling site-specific human motion patterns, crucial for safe and efficient robot operation in human-centric environments. It addresses key limitations of existing Maps of Dynamics (MoDs), which typically rely on discrete spatial sampling (grid cells) and require costly offline batch processing.
The core innovation of NeMo-map is the use of an implicit neural function to model motion dynamics continuously across space and time. Instead of fitting a separate statistical model per grid cell, NeMo-map learns a single neural network, Φ_θ, that takes spatio-temporal coordinates (x, y, t) as input and directly outputs the parameters of a Semi-Wrapped Gaussian Mixture Model (SWGMM). This SWGMM represents the joint probability distribution over movement velocity (speed ρ and orientation θ) at the queried location and time. The SWGMM is particularly suited for this task as it can model the correlation between the circular orientation variable and the linear speed variable, while capturing multimodality in flow directions (e.g., at intersections).
The architecture cleverly combines different feature representations. Spatial coordinates are used to interpolate features from a learnable spatial feature grid, capturing local geometric context. The temporal variable is encoded using a SIREN network with periodic activations, ideal for modeling cyclical patterns like daily routines. These spatial and temporal features, concatenated with the raw coordinates, are fed into a Multi-Layer Perceptron (MLP) that outputs the SWGMM parameters (mixture weights, means, and covariances). The model is trained end-to-end by minimizing the negative log-likelihood of observed velocity data.
This approach eliminates the need for manual discretization and spatial imputation. It enables querying smooth, detailed motion distributions at any arbitrary point and time, not just at pre-defined grid centers. The neural network inherently provides smooth interpolation between observed data points, leading to more physically plausible flow fields, especially in sparsely sampled regions.
The method is evaluated on two real-world pedestrian trajectory datasets: the ATC indoor shopping mall dataset and the ETH/UCY outdoor university datasets. It is compared against three established baselines: the original CLiFF-map (discrete grid-based SWGMMs), an Online CLiFF-map variant that supports incremental updates, and STeF-map, a frequency-based spatio-temporal model that discretizes motion orientation. Quantitative evaluation using Negative Log-Likelihood (NLL) on test data shows that NeMo-map achieves superior accuracy in representing the underlying motion distributions across both datasets.
Furthermore, NeMo-map demonstrates smoother velocity distributions in areas with limited observations compared to the discrete baselines. It also offers significant practical advantages: it represents the entire spatio-temporal field with a single, compact neural network, making it memory-efficient compared to storing a separate model per grid cell or per time slice. Once trained, inference is fast and constant-time for any query point, facilitating real-time applications such as robot motion planning or long-term human trajectory prediction. The work successfully bridges advances in implicit neural representations with the problem of spatial-temporal dynamics modeling, proposing a powerful and efficient new paradigm for creating continuous “maps of how people move.”
Comments & Academic Discussion
Loading comments...
Leave a Comment