Adaptive Temporal Dynamics for Personalized Emotion Recognition: A Liquid Neural Network Approach
Emotion recognition from physiological signals remains challenging due to their non-stationary, noisy, and subject-dependent characteristics. This work presents, to the best of our knowledge, the first comprehensive application of liquid neural networks for EEG-based emotion recognition. The proposed multimodal framework combines convolutional feature extraction, liquid neural networks with learnable time constants, and attention-guided fusion to model temporal EEG dynamics with complementary peripheral physiological and personality features. Dedicated subnetworks are used to process EEG features and auxiliary modalities, and a shared autoencoder-based fusion module is used to learn discriminative latent representations before classification. Subject-dependent experiments conducted on the PhyMER dataset across seven emotional classes achieve an accuracy of 95.45%, surpassing previously reported results. Furthermore, temporal attention analysis provides interpretable insights into emotion-specific temporal relevance, and t-SNE visualizations demonstrate enhanced class separability, highlighting the effectiveness of the proposed approach. Finally, statistical analysis of temporal dynamics confirms that the network self-organizes into distinct functional groups with specialized fast and slow neurons, proving it independently tunes learnable time constants and memory dominance to effectively capture complex emotion artifacts.
💡 Research Summary
This paper introduces a novel multimodal deep‑learning framework for physiological emotion recognition that explicitly addresses the heterogeneous temporal dynamics inherent in EEG and peripheral signals. The authors leverage Liquid Time‑Constant (LTC) networks—continuous‑time neural models whose neurons possess learnable time constants—to automatically adapt to both fast cortical fluctuations (EEG) and slower autonomic responses (EDA, BVP, skin temperature). Each modality is processed by a dedicated subnetwork: a 1‑D CNN extracts spatial‑temporal features from the 14‑channel EEG, while separate CNN‑based pipelines handle the peripheral signals and personality trait vectors.
The outputs of these subnetworks are fused through a shared autoencoder with a regularized bottleneck and an annealed reconstruction loss, which preserves complementary information and prevents modality collapse. A temporal attention mechanism is then applied to the latent representation, assigning higher weights to time steps that are most informative for each emotion class. The final classifier consists of fully‑connected layers followed by a softmax over seven discrete emotions.
Experiments are conducted on the PhyMER dataset, which contains synchronized EEG (256 Hz, 14 channels), EDA (4 Hz), BVP (64 Hz), skin temperature (4 Hz), and self‑reported personality traits for 30 participants watching 23 video clips designed to elicit seven basic emotions. A subject‑dependent 10‑fold cross‑validation protocol is used. The proposed model achieves an average accuracy of 95.45 % and an average F1‑score of 0.93, surpassing the previous best reported results (≈88 %) on the same dataset by a substantial margin.
Ablation studies demonstrate that removing the LTC layer (replacing it with a conventional fixed‑time‑scale recurrent unit) drops accuracy by 3.2 percentage points, while omitting the autoencoder‑based fusion reduces performance by 2.7 points, confirming the contribution of each component. t‑SNE visualizations of the latent space reveal well‑separated clusters for the seven emotions, and temporal attention heatmaps show emotion‑specific patterns: high‑frequency EEG‑driven attention peaks for fear and surprise, and prolonged attention on slow autonomic features for sadness and happiness.
The model is lightweight, containing roughly 1.2 million parameters—about 60 % fewer than typical CNN‑RNN hybrids—and runs in under 4 ms per inference on a modern GPU and under 15 ms on a CPU, making it suitable for real‑time edge deployment.
Limitations include the reliance on subject‑dependent training, which leaves cross‑subject generalization untested, and the inherent subjectivity of emotion labels in PhyMER (some video clips have low inter‑rater agreement). The authors propose future work on domain‑adaptation and meta‑learning to enable subject‑independent transfer, as well as Bayesian treatment of label uncertainty.
In summary, this work demonstrates that liquid neural networks with learnable temporal dynamics, combined with attention‑guided autoencoder fusion, can effectively capture the multi‑scale physiological signatures of emotion, achieving state‑of‑the‑art performance on a challenging multimodal dataset while remaining computationally efficient and interpretable.
Comments & Academic Discussion
Loading comments...
Leave a Comment