Cross-Session Decoding of Neural Spiking Data via Task-Conditioned Latent Alignment
Cross-session nonstationarity in neural activity recorded by implanted electrodes is a major challenge for invasive Brain-computer interfaces (BCIs), as decoders trained on data from one session often fail to generalize to subsequent sessions. This issue is further exacerbated in practice, as retraining or adapting decoders becomes particularly challenging when only limited data are available from a new session. To address this challenge, we propose a Task-Conditioned Latent Alignment framework (TCLA) for cross-session neural decoding. Building upon an autoencoder architecture, TCLA first learns a low-dimensional representation of neural dynamics from a source session with sufficient data. For target sessions with limited data, TCLA then aligns target latent representations to the source in a task-conditioned manner, enabling effective transfer of learned neural dynamics. We evaluate TCLA on the macaque motor and oculomotor center-out dataset. Compared to baseline methods trained solely on target-session data, TCLA consistently improves decoding performance across datasets and decoding settings, with gains in the coefficient of determination of up to 0.386 for y coordinate velocity decoding in a motor dataset. These results suggest that TCLA provides an effective strategy for transferring knowledge from source to target sessions, enabling more robust neural decoding under conditions with limited data.
💡 Research Summary
The paper tackles a fundamental obstacle in invasive brain‑computer interfaces (BCIs): the non‑stationarity of neural recordings across recording sessions. Traditional approaches either retrain decoders on new session data or employ adaptive algorithms, but both strategies falter when only a small amount of data is available from the new session. To overcome this limitation, the authors propose a Task‑Conditioned Latent Alignment (TCLA) framework that leverages a well‑sampled source session to learn a compact latent representation of neural dynamics and then aligns limited‑data target sessions to this representation in a task‑conditioned manner.
The architecture builds on the autoencoder component of the LDNS diffusion model. A shared encoder‑decoder pair learns a low‑dimensional latent trajectory (z) from binned spike counts (x). To accommodate sessions with different numbers of recorded channels, each session is equipped with a pair of 1×1 convolutional “read‑in” and “read‑out” layers that map raw spikes to a fixed‑size embedding before the shared encoder and back after the shared decoder. In the first training stage, the source session’s data are used to jointly optimize the shared autoencoder and its own read‑in/read‑out layers. The loss combines Poisson negative log‑likelihood for spike reconstruction, an L2 penalty on latent magnitudes (β₁), and a temporal smoothness term that penalizes rapid changes across adjacent time bins (β₂).
In the second stage, the shared autoencoder is frozen, and only the target session’s read‑in/read‑out layers are trained. Crucially, the latent trajectories are grouped by task condition (e.g., movement direction), and for each condition a multi‑kernel Maximum Mean Discrepancy (MMD) loss is computed between the source and target latent distributions. The multi‑kernel MMD uses a set of Gaussian kernels with bandwidths spanning several scales, allowing the alignment to capture both fine‑grained and coarse‑grained distributional differences. The total loss for the target session is the sum of the Poisson reconstruction loss and the condition‑specific MMD loss weighted by β₃.
The authors evaluate TCLA on three non‑human primate datasets: two motor‑center‑out reaching tasks (MOTORCO1 and MOTORCO2) recorded from 96‑channel Utah arrays in primary motor cortex, and one oculomotor center‑out task (OCULOCO) recorded from frontal eye fields and middle temporal area. Each dataset contains multiple sessions; one session is designated as the source, while the remaining sessions serve as targets. To simulate limited‑data scenarios, target sessions are split into 10 % training, 10 % validation, and 80 % testing. After aligning the latent space, a downstream Long Short‑Term Memory (LSTM) decoder is trained on the inferred firing rates to predict 2‑D kinematics (position and velocity). Decoding performance is quantified by the coefficient of determination (R²), and results are reported with bootstrap means and 95 % confidence intervals across target sessions.
Visualization with t‑SNE shows that, without cross‑session alignment (LDNS‑within‑session), latent trajectories from different sessions form distinct clusters, reflecting substantial inter‑session mismatch. In contrast, TCLA projects all target sessions into a common cluster that overlaps with the source, indicating successful alignment of the latent manifolds. Quantitatively, TCLA consistently outperforms two baselines: AutoLF‑ADS (a latent‑variable decoder trained independently on each target session) and LDNS‑within‑session (same autoencoder architecture but no alignment). Across all datasets and both position and velocity decoding, TCLA yields statistically significant R² improvements ranging from 0.03 to 0.38 (p < 10⁻⁴–10⁻⁵, Wilcoxon signed‑rank test). Notably, the gains are larger for kinematic dimensions where baseline performance is weaker (e.g., the x‑coordinate in the motor tasks and the y‑coordinate in the oculomotor task), suggesting that TCLA effectively compensates for the most challenging decoding directions.
The study’s contributions are threefold: (1) introducing a task‑conditioned latent alignment mechanism that directly addresses cross‑session non‑stationarity; (2) demonstrating a flexible architecture that can handle sessions with varying channel counts via session‑specific read‑in/read‑out layers; and (3) showing that even with a small fraction of target‑session data, knowledge transferred from a well‑sampled source session can substantially boost decoding accuracy. The authors discuss future extensions such as online MMD‑based adaptation for real‑time BCI use, incorporation of more complex task variables, exploration of alternative kernel families or neural‑network‑based discrepancy measures, and validation on human clinical datasets.
In summary, TCLA provides a principled and effective solution for cross‑session neural decoding under limited‑data constraints, paving the way for more robust and data‑efficient invasive BCI systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment