Conclusion

We introduced the to discover multi-agent dynamics from sensory observations. We showed that the proposed model can identify the inherent dynamics and predict its evolution. We observe that a major advantage of over state-of-the-art is that it can be re-tuned online if the relation parameters or physical properties of agents get altered or the number of agents is changed, but the fundamental laws remain same. This capability makes employable in real scenarios where these relation parameters and physical properties often change and may not be directly observable.

One limitation of the current model is that it weights different interaction terms in a linear way with relational attributes or physical parameters. This assumption may not be true in many cases. In future, we would to like to address this shortcoming. Moreover, we plan to extend such that on-line tuning can be performed to reduce error even when the core dynamics is changed over time. Exploring to learn dynamics of agents controlled by external input to achieve some goals will be an important extension as well.

Results

Visualization of evolution up to 200 timesteps. (a) Trajectory plot of point-mass system with four (4) agents. Widths of the trajectories are proportional to the masses of corresponding agents. Predictions are from network trained from scratch. (b): Trajectory plot of point-mass system with eight (8) agents. Predictions are from re-tuned wrapper network preceded by frozen core network trained with 4 agents. (c): Trajectory plot of predator-swarm system with twenty (20) prey and one (1) predator. Red wider trajectory correspond to the predator. Predictions are from network trained from scratch.

Visualization of evolution up to 200 timesteps. Phases (0 to 2π) over timesteps for eight (8) oscillating agents abide by the Kuramoto model. Predictions are from network trained from scratch.

All results are generated as solution to an initial value problem i.e. evolution of the system is predicted only from an initial observation, no intermediate observation is used. We use mean-squared-error (MSE) between ground truth and prediction through timesteps as metric for evaluation. Fifty ($`50`$) test sequences are used to generate the MSE plots with errorbars showing $`95\%`$ confidence intervals. Visual evolution of ground truth and prediction are shown in Figure 1 and Figure 2.

(a): MSE between ground truth positions and predicted positions of agents of considered point-mass system. ‘MLP’ denotes MLP trained with same amount of data as MagNet, whereas ‘MLP 10X data’ denotes MLP trained with 10X more data and 10X more steps. (b): MSE between ground truth phases and predicted phases of oscillating agents in Kuramoto model.

Comparison with IN, which takes physical and relational attributes as input. (a): MSE between ground truth positions and predicted positions of agents of considered point-mass system. (b): MSE between ground truth phases and predicted phases of oscillating agents in Kuramoto model. (c): MSE between ground truth positions and predicted positions of agents of considered predator-swarm system.

Learning and prediction from direct and clean observations

We consider four ($`4`$) interacting objects with different mass and different pairwise spring constants for point-mass system. can predict the evolution of state codes for a long period of time with negligible error if it is trained with perfect observations (no noise). Figure 3 shows the MSE between ground truth and prediction over timesteps for MagNet along with all baselines for point-mass system and Kuramoto model. As shown in Figure 3(a), even if the baseline MLP is trained with more data (we use 10X more data and 10X more number of steps than ), the MSE is higher than . Note, the baseline MLP is not scalable with number of agents; hence, data requirement would increase exponentially with number of agents. Accordingly, training MLP or LSTM baseline for predator-swarm dynamics with twenty-one (21) agents is intractable and hence, is not considered for comparison.

Comparison with interaction network

IN requires physical and relational attributes of the agents as input along with their observable states. Therefore, IN is trained and evaluated assuming the physical and relational attributes of agents are known. In contrast, our model is trained and evaluated using only the observable states. Size of the implemented IN is chosen to have similar parameter count with our model. Figure 4 shows the performance comparison between our model an IN. Our model shows comparable performance (better for point-mass system) with IN, which has access to physical and relational attributes of agents.

Learning and prediction from noisy observations

While evaluating the model on test sequences, we use initial 16 observations to denoise the derivatives (velocities) using total-variation regularization . Figure 5(a) shows the MSE over timesteps for the model trained with noisy observation. As expected, when dynamics is learned from noisy observations, accurate prediction window becomes shorter than that of with perfect observation. However, we observe that MSE of the network trained with noisy observation remains within 10X margin of the network trained with clean observation up to 100 timesteps.

(a): MSE with ground truth positions for trained with noisy observation of point-mass system. (b): MSE before and after re-tuning in a scenario with different number of agents from training scenarios. Re-tuning loss is shown in the inset.

Performance of re-tuning

In this experiment, we increase the number of agents for the point-mass system to eight (8) and change spring constants between agent-pairs and masses of the agents. We seek to predict evolution of this eight-agent system using the trained with four (4) agents. Agent-wise wrapper-weights are initialized with the average values of pre-trained wrapper-weights across all agents. Figure 5(b) shows that the prediction error increases with time and once crosses a threshold, re-tuning of the wrapper (core is kept frozen) starts. We observe that after re-tuning with 10000 observations, prediction error for the eight-agent system reduces (Figure 5(b)). This experiment demonstrates the generalization capability of the core network within .

Introduction

Multi-agent systems are prevalent in both the natural world and engineered world. Engineered distributed systems of mobile robots, multiple sensors, unmanned aerial vehicles etc. often take inspiration from natural multi-agent systems like swarms, schools, flocks, and herds of social animals or birds. Understanding the behavior of such natural or engineered multi-agent systems from sensory observations is a key challenge in robotics from the design and adversarial perspective. Discovering the hidden dynamics of a multi-agent interaction from observations will enable machines to simulate and predict evolution of complex systems.

Research in the field of data-driven dynamics learning can be divided into two main categories. First, one assumes well-known equations of the physical system and estimate their parameters based on observation data . However, many complex systems are difficult to represent solely by a fixed model. The alternative (and arguably more compelling) approach is to identify an approximate representation of the actual model using machine learning techniques like regression or neural networks . As an important step in this direction, Battaglia et al. presented interaction networks (INs) to learn multi-agent interaction by coupling machine learning with structured models. Watters et al. improved IN to learn multi-agent interactions from visual observations. However, IN requires object relation graph as an explicit input; but the relation graphs are often unknown in a real scenario. Moreover, input state vector to IN can include physical properties like agent’s mass which may not be directly observable. Chang et al. proposed a similar model to predict bouncing ball dynamics. Their model does not require object relation graph as input and can predict mass of the involved agents; however, they did not demonstrate its ability to predict evolution of dynamics with pairwise interaction force among agents. Finally, these models are generalized to any number of agents only when physical properties of agents and pairwise interaction parameters remain uniform or explicitly given as input and do not allow online learning or re-tuning with less data in similar scenarios with different physical properties and different interaction parameters.

(a): Multi-agent network with four agents. State-dynamics of each agent is dependent on itself and other agents. (b): Training, online re-tuning and prediction mode of . Black arrows belong to all three modes. Red arrows are activated in training and re-tuning mode, whereas green arrows operate only in prediction mode.

In this paper, we introduce the (Multi-agent interaction Network) that can discover interaction dynamics and predict evolution of complex multi-agent system with heterogeneous relational attributes and physical properties solely from observational data. The foundation of is based on the formulation of multi-agent system as a coupled non-linear network where agents are assumed to be connected to each other using a generic ordinary differential equation (ODE) based state evolution dynamics. The formulation is inspired by a wide range of multi-agent systems ranging from objects interacting by virtue of fundamental laws of physics to swarm systems, opinion dynamics under social interaction . discovers the dynamics of a multi-agent system by learning the “customization" of the generic ODE to minimize the error between prediction and sensory observation. does not require relational graph or non-observable parameters as input, rather it is inherently capable of learning relationship among agents from observations and due to the preceding formulation, agent-specific parameters of the “customization" can be learned online. The paper makes following key contributions in discovering multi-agent dynamics from observations.

We develop a neural network based realization of the time-discretized model of the coupled non-linear network representing multi-agent dynamics that can be trained using stochastic gradient descent (SGD) based backpropagation. The model is trained for single time-step prediction; long term prediction is performed through iterative single-step prediction.
The supports continuous learning to accurately predict state evolution even if the relational attributes (e.g. interaction coefficients among agents), physical properties of agents (e.g. mass), or the number of agents changes, but the fundamental interaction remains the same. This is enabled by structuring as two back-to-back networks: a core network to model/learn the fundamental multi-agent dynamics, and a reduced-complexity wrapper network to learn the agent-specific parameters. The entire network is first trained as a single entity. During operation, core network is kept frozen, but the wrapper network is re-tuned once the prediction error crosses a threshold (Figure 6(b)).
We demonstrate application of for learning/predicting dynamics from direct, as well as noisy observations of states.