Title: Dynamical modeling of nonlinear latent factors in multiscale neural activity with real-time inference
ArXiv ID: 2512.12462
Date: 2025-12-13
Authors: Eray Erturk, Maryam M. Shanechi
📝 Abstract
Real-time decoding of target variables from multiple simultaneously recorded neural time-series modalities, such as discrete spiking activity and continuous field potentials, is important across various neuroscience applications. However, a major challenge for doing so is that different neural modalities can have different timescales (i.e., sampling rates) and different probabilistic distributions, or can even be missing at some time-steps. Existing nonlinear models of multimodal neural activity do not address different timescales or missing samples across modalities. Further, some of these models do not allow for real-time decoding. Here, we develop a learning framework that can enable real-time recursive decoding while nonlinearly aggregating information across multiple modalities with different timescales and distributions and with missing samples. This framework consists of 1) a multiscale encoder that nonlinearly aggregates information after learning within-modality dynamics to handle different timescales and missing samples in real time, 2) a multiscale dynamical backbone that extracts multimodal temporal dynamics and enables real-time recursive decoding, and 3) modality-specific decoders to account for different probabilistic distributions across modalities. In both simulations and three distinct multiscale brain datasets, we show that our model can aggregate information across modalities with different timescales and distributions and missing samples to improve real-time target decoding. Further, our method outperforms various linear and nonlinear multimodal benchmarks in doing so.
💡 Deep Analysis
📄 Full Content
Dynamical modeling of nonlinear latent factors in
multiscale neural activity with real-time inference
Eray Erturk
Ming Hsieh Department of Electrical and Computer Engineering
University of Southern California
Los Angeles, CA
eerturk@usc.edu
Maryam M. Shanechi∗
Ming Hsieh Department of Electrical and Computer Engineering
Thomas Lord Department of Computer Science
Alfred E. Mann Department of Biomedical Engineering
Neuroscience Graduate Program
University of Southern California
Los Angeles, CA
shanechi@usc.edu
Abstract
Real-time decoding of target variables from multiple simultaneously recorded
neural time-series modalities, such as discrete spiking activity and continuous
field potentials, is important across various neuroscience applications. However, a
major challenge for doing so is that different neural modalities can have different
timescales (i.e., sampling rates) and different probabilistic distributions, or can
even be missing at some time-steps. Existing nonlinear models of multimodal
neural activity do not address different timescales or missing samples across modal-
ities. Further, some of these models do not allow for real-time decoding. Here,
we develop a learning framework that can enable real-time recursive decoding
while nonlinearly aggregating information across multiple modalities with different
timescales and distributions and with missing samples. This framework consists
of 1) a multiscale encoder that nonlinearly aggregates information after learning
within-modality dynamics to handle different timescales and missing samples in
real time, 2) a multiscale dynamical backbone that extracts multimodal temporal
dynamics and enables real-time recursive decoding, and 3) modality-specific de-
coders to account for different probabilistic distributions across modalities. In both
simulations and three distinct multiscale brain datasets, we show that our model can
aggregate information across modalities with different timescales and distributions
and missing samples to improve real-time target decoding. Further, our method
outperforms various linear and nonlinear multimodal benchmarks in doing so.
1
Introduction
Real-time continuous decoding of target time-series, such as various brain or behavioral states from
neural time-series data is of interest across many neuroscience applications. A popular approach
for doing so is to develop dynamical latent factor models that describe neural dynamics in terms
of the temporal evolution of latent variables that can be used for downstream decoding. To date,
∗Corresponding author: shanechi@usc.edu
39th Conference on Neural Information Processing Systems (NeurIPS 2025).
arXiv:2512.12462v1 [cs.LG] 13 Dec 2025
dynamical latent factor models of neural data have mostly focused on a single modality of neural
data, for example, either spiking activity or local field potentials (LFP) [1–4]. However, brain and
behavioral target states are encoded across multiple spatial and temporal scales of brain activity
that are measured with different neural modalities. Furthermore, some of these dynamical models
have a non-causal inference procedure, which hinders real-time decoding. Therefore, inference of
target variables could be improved by developing nonlinear dynamical models of multimodal neural
time-series that can, at each time-step, aggregate information across neural modalities and do so in
real-time.
A natural challenge in developing such multimodal models arises when modalities are not aligned
due to their different recording timescales that can be caused by various factors such as fundamental
biological differences across modalities—with some modalities evolving slower than others [5]—
differences in recording devices [6, 7], or measurement failures or interruptions [8–10]. Further,
modalities could have different distributions. For example, spiking activity is a binary-valued time-
series that indicates the presence of action potential events from neurons at each time. As such, it has
a fast millisecond timescale and is often modeled as count processes, such as Poisson. In comparison,
LFP activity is a continuous-valued modality that measures network-level neural processes, has
a slower timescale, and is typically modeled with a Gaussian distribution [5, 7, 11]. We refer to
multimodal data with different timescales as multiscale data. Thus, to fuse information across
spiking and LFP modalities and improve downstream target decoding tasks, their dynamics should
be modeled by incorporating their cross-modality probabilistic and timescale differences through a
careful encoder design.
Existing neural dynamical modeling approaches do not address the nonlinear modeling of multimodal
data with different timescales and/or with real-time decoding capability. Specifically, most dynamical
models do not capture multimodal neural dynamics and instead focus on a single modality of neural
activity either by using linear/switching-linear approaches [1, 12, 13] or by utilizing no