SENSORIMOTOR GRAPH: Action-Conditioned Graph Neural Network for Learning Robotic Soft Hand Dynamics

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Soft robotics is a thriving branch of robotics which takes inspiration from nature and uses affordable flexible materials to design adaptable non-rigid robots. However, their flexible behavior makes these robots hard to model, which is essential for a precise actuation and for optimal control. For system modelling, learning-based approaches have demonstrated good results, yet they fail to consider the physical structure underlying the system as an inductive prior. In this work, we take inspiration from sensorimotor learning, and apply a Graph Neural Network to the problem of modelling a non-rigid kinematic chain (i.e. a robotic soft hand) taking advantage of two key properties: 1) the system is compositional, that is, it is composed of simple interacting parts connected by edges, 2) it is order invariant, i.e. only the structure of the system is relevant for predicting future trajectories. We denote our model as the ‘Sensorimotor Graph’ since it learns the system connectivity from observation and uses it for dynamics prediction. We validate our model in different scenarios and show that it outperforms the non-structured baselines in dynamics prediction while being more robust to configurational variations, tracking errors or node failures.

💡 Research Summary

The paper introduces the “Sensorimotor Graph,” an action‑conditioned graph neural network (GNN) designed to model the dynamics of a robotic soft hand, which is essentially a non‑rigid kinematic chain composed of flexible fingers. Traditional physics‑based modeling struggles with the infinite degrees of freedom and highly nonlinear deformations inherent to soft materials. Recent learning‑based approaches have shown promise but typically ignore the underlying physical structure, limiting their generalization and robustness.

Inspired by infant sensorimotor babbling, the authors propose a two‑stage framework that first infers the latent connectivity graph of the system from observed key‑point trajectories and actuator commands, and then uses this graph as an explicit relational inductive bias for future‑state prediction. The graph inference component builds on the Neural Relational Inference (NRI) model: a fully‑connected encoder processes sequences of node positions and control signals through iterative node‑to‑edge and edge‑to‑node message‑passing layers, producing a probability distribution over possible edges. A softmax with temperature τ yields a continuous relaxation of the discrete edge variables, enabling end‑to‑end training either with supervised edge labels or in an unsupervised manner.

The decoder takes the inferred edge probabilities as a fixed structure and conditions the dynamics on both the graph and the action vector. Each edge is embedded with a function that incorporates the corresponding actuator signal; edge embeddings are aggregated at each node via an order‑invariant summation, then transformed by a node‑wise function to produce the mean (µ) and variance (σ²) of a Gaussian distribution for the next time step. Sampling from this distribution yields the predicted node positions, which are fed back for recursive multi‑step forecasting. Because actions are part of the node input, the model can either map specific actuators to specific fingers or learn a more flexible mapping where all actions are broadcast to all nodes and the network discovers the appropriate associations.

Experiments are conducted in the SOFA physics simulator, generating datasets with varied material stiffness, friction, and finger configurations. The Sensorimotor Graph is benchmarked against several baselines: (i) non‑structured recurrent models such as LSTM and MLP, (ii) a static‑graph GNN that assumes a known connectivity, and (iii) a fully‑connected NRI without action conditioning. Across metrics—including mean positional error, energy preservation, and prediction horizon—the proposed model consistently outperforms the baselines. Crucially, the graph‑based architecture exhibits strong robustness to (a) changes in the number or arrangement of fingers, (b) loss of a subset of key‑points, and (c) permutations of node ordering, thanks to the order‑invariant message‑passing scheme.

The contributions are threefold: (1) formulation of a sensorimotor learning problem for soft‑hand dynamics as explicit graph inference, (2) integration of action conditioning into both the encoder and decoder, yielding a differentiable dynamics model suitable for model‑based optimal control, and (3) extensive empirical validation demonstrating superior accuracy and resilience to structural perturbations. Limitations include dependence on reliable key‑point extraction (e.g., from markers or vision pipelines) and the computational cost of graph inference for real‑time control on embedded hardware.

In conclusion, the Sensorimotor Graph bridges the gap between unstructured deep learning and physics‑informed modeling for soft robotics. By learning both the connectivity and the dynamics in a unified, differentiable framework, it opens the door to self‑calibrating, model‑based controllers that can adapt to wear, material variability, and unforeseen environmental interactions. Future work should target real‑world hardware deployment, online adaptation, and extensions to handle complex contact dynamics and multi‑object manipulation.

SENSORIMOTOR GRAPH: Action-Conditioned Graph Neural Network for Learning Robotic Soft Hand Dynamics

💡 Research Summary

Comments & Academic Discussion

Leave a Comment