Copilot-Assisted Second-Thought Framework for Brain-to-Robot Hand Motion Decoding

Copilot-Assisted Second-Thought Frame work for Brain-to-Robot Hand Motion Decoding Y izhe Li † University of Birmingham yxl1897@alumni.bham.ac.uk Shixiao W ang † University of Birmingham sxw1238@student.bham.ac.uk Jian K. Liu University of Birmingham j.liu.22@bham.ac.uk Abstract —Motor kinematics prediction (MKP) from electroen- cephalography (EEG) is a key research area f or developing movement-r elated brain–computer interfaces (BCIs). While tra- ditional methods often rely on CNNs or RNNs, T ransformers ha ve demonstrated superior capabilities for modeling long sequential EEG data. In this study , we propose a CNN–attention hybrid model f or decoding hand kinematics from EEG during grasp- and-lift tasks, achieving superb perf ormance in within-subject experiments. W e further extend this approach to EEG–EMG multimodal decoding, yielding signiﬁcantly impro ved perf or - mance. Within-subject tests achiev e PCC values of 0.9854, 0.9946, and 0.9065 (X, Y , Z), r espectively , computed on the midpoint tra- jectory between the thumb and index ﬁnger , while cross-subject tests result in 0.9643, 0.9795, and 0.5852. The decoded trajectories from both modalities were used to control a Franka P anda r obotic arm in a MuJoCo simulation. T o enhance trajectory ﬁdelity , we intr oduce a copilot framework that ﬁlters low-conﬁdence decoded points using a motion-state-aware critic within a ﬁnite- state machine. This post-processing step improves the overall within-subject PCC of EEG-only decoding to 0.93 while excluding fewer than 20% of the data points. The code is a vailable at https://github .com/ssshiFang/EEGDecoding Robotarm. Index T erms —BCI, EEG, EMG, copilot, hybrid modality , motor kinematics prediction, CNN, transformer , robot arm I . I N T R O D U CT I O N Electroencephalography (EEG) is a fundamental tool in neuroscience, offering the potential for controlling external devices directly through brain activity , bypassing the neu- romuscular pathway [1]. This concept is realized through brain–computer interfaces (BCIs). The central nervous system (CNS) typically generates mo vement in response to external stimuli, acting on effectors such as muscles or glands [2], [3]. BCIs introduce an alternativ e pathw ay for this interaction by lev eraging digital technologies. The y can utilize signals from task-related brain regions to enhance, monitor , or e ven partially replace traditional CNS–effector communication [4]. While BCIs can theoretically use any brain-deri ved signal as input, practical limitations exist, such as random mental activity or external noise. Designing an ef fectiv e BCI paradigm is therefore critical. By using various types of stimuli and tasks, a mapping can be established between speciﬁc inputs and corresponding brain activity patterns [5]. Current BCI systems are primarily built on paradigms like motor imagery , ev ent-related potentials, and steady-state visually e v oked po- † These authors contributed equally to this work tentials, all of which hea vily rely on non-in vasi ve EEG for data acquisition [6]. Decoding algorithms are another crucial component, trans- lating measured brain signals into ex ecutable control com- mands. Ho wever , this step is particularly challenging due to the inherent properties of EEG. Scalp EEG represents a summa- tion of currents from multiple local ﬁeld potentials, reﬂecting broad brain acti vity [7], [8]. Furthermore, interference between current sources complicates spatial localization, and signal transmission through various tissue layers results in a very low signal-to-noise ratio. The adv ent of deep learning models, such as Shallow Con vNet and EEGNet, has demonstrated superiority o ver traditional handcrafted feature methods [9]– [11]. T ransformers hav e further shown po werful capabilities in handling long sequences, overcoming limitations of CNNs and RNNs [12]–[14]. Consequently , hybrid models combining these architectures are increasingly applied to classiﬁcation tasks [15], [16]. Despite these adv ances, most kinematic trajec- tory prediction tasks using movement-related EEG, beyond the core paradigm classiﬁcations, still rely on ﬁlter -based, linear , or CNN/RNN approaches [17] and continue to struggle with decoding accuracy [18]. This work proposes a CNN–attention hybrid model for EEG-based hand kinematics reconstruction. W e further inv es- tigate EEG–EMG multimodal decoding, reconstruct motion trajectories in a MuJoCo robotic arm simulation, and design a copilot algorithm to reﬁne decoding points through secondary ev aluation. I I . R E L A T E D W O R K EEG-based kinematics decoding aims to reconstruct con- tinuous limb movement trajectories (e.g., position, velocity) from non-in v asiv e EEG signals. Prior studies have targeted various body parts, including ankle plantar ﬂe xion [19], hand kinematics [20], ﬁnger mov ements [21], and e ven saccadic eye mov ements [22]. These tasks are closely linked to the temporal–spectral dynamics of EEG and hav e been explored in both motor imagery and motor ex ecution paradigms [23], [24]. Among these, hand-mo vement decoding is the most extensi vely studied. Early work often assumed EEG lack ed sufﬁcient informa- tion for complex hand movement representation, a notion later challenged by [25]. [26] further elucidated the role of contralateral motor cortical regions in upper-limb movement control. These studies laid the foundation for hand-kinematics decoding and advanced understanding beyond simple single- region acti v ation models. Deep learning models hav e demonstrated superior per- formance in capturing nonlinear relationships between EEG features and motor parameters, especially when combined with time–frequency or spatial–spectral preprocessing. A CNN–LSTM model decoded hand kinematics from cortical sources with a mean correlation value (CV) of 0.62±0.08 [20]. For other body parts, biceps curl trajectory estimation reached a PCC of 0.7 [27], and a multi-directional CNN-BiLSTM net- work for 3D arm tasks achiev ed a grand-a veraged correlation coefﬁcient (CC) of 0.47 [28]. A comparison in [29] showed deep learning models (rEEGNet, rDeepCon vNet, rShallo w- Con vNet) signiﬁcantly outperformed mLR, with mean PCCs reaching 0.78–0.79 for x/y axes and 0.62–0.64 for the z-axis. Despite progress, emerging technologies offer new so- lutions. Hybrid CNN–transformer architectures have shown strong capabilities in EEG decoding tasks, such as EEGformer [30], EASM [31], EEG-TCNTransformer [32], and EEG- Con vTransformerNetwork [33]. Giv en that most kinematics decoding models remain CNN- or RNN-based, the success of the AI copilot frame work in [34] moti v ated our e xploration of hybrid modeling and trajectory optimization for non-in vasi ve EEG-based hand decoding. The contributions of this study are as follo ws: (1) W e propose a CNN–attention hybrid model to predict 3D coor- dinates of the index ﬁnger and thumb from EEG. The model achiev es an overall PCC of 0.8728 in within-subject e xper- iments (0.8376, 0.9229, 0.7208 for X, Y , Z axes), reaching up to 0.8916 with a larger input window . (2) W e in vestigate EEG–EMG multimodal fusion for kinematics decoding. The best within-subject PCC is 0.9707 (0.9854, 0.9946, 0.9065 for X, Y , Z), and the best cross-subject total PCC reaches 0.9063 (0.9643, 0.9795, 0.5852 for X, Y , Z). (3) W e reconstruct hand mov ement trajectories in a MuJoCo robotic arm simulator and design a copilot framew ork based on a state machine and knowledge graph to ﬁlter unreliable decoding points. I I I . M A T E R I A L S A N D M E T H O D S A. Dataset Description [35] The W A Y -EEG-GAL (W earable interfaces for hand function recov ery EEG grasp and lift) dataset contains scalp EEG recordings from a grasp-and-lift task designed to decode sensory , intentional, and motor-related signals. T welve partic- ipants each performed 328 trials, totaling 3,936 trials. Each trial required participants to: (1) reach for a small object upon cue, (2) grasp it with the thumb and index ﬁnger , (3) lift and hold it brieﬂy , (4) place it back, and (5) return the hand to the start position. The dataset includes 32-channel EEG, electromyography (EMG) from ﬁ ve muscles, and contact force/torque and 3D position data for the hand (inde x ﬁnger , thumb, wrist) and the object. Fig. 1: Structure of the decoding model. B. Data Pr epr ocessing EEG preprocessing was performed using MNE-Python. Raw data were band-pass ﬁltered between 0.1 and 40 Hz with an inﬁnite impulse response (IIR) ﬁlter to remove lo w- frequency drifts and high-frequenc y noise. Common av erage referencing (CAR) w as then applied across all electrodes to reduce spatially distributed artifacts and enhance localized neural activity . EMG preprocessing in volved tw o steps. First, a 4th-order Butterworth band-pass ﬁlter (20–450 Hz), implemented in second-order sections (SOS) for stability , remo ved high- frequency noise and baseline drift. Second, EMG data were downsampled from 4000 Hz to 500 Hz to match the EEG sampling rate. Finally , kinematic data were scaled via min–max normal- ization, mapping the processed data to the range [0, 1] as K [ t ] = k [ t ] − k min k max − k min . C. Algorithm Structur e 1) Model Overview: The model architecture is illustrated in Fig. 1. It consists of ﬁv e main components: a multi-con volution block, a Squeeze-and-Excitation (SE) block, an embedding block, a self-attention block, and a fully connected block. The input is a preprocessed EEG segment in the form of a 2D matrix (channels × time points). 2) Multi-Con volution Bloc k: The multi-con volution block (Fig. 2) is partially inspired by EEGNet. First, a 1D conv olu- tion with a large temporal kernel captures long-range depen- dencies [36]. Subsequent layers employ multiple smaller ker - nels of v arying sizes to extract multi-scale temporal features. The 1D con volution is implemented as y [ t ] = P K − 1 k =0 w [ k ] · x [ t + k ] , where y [ t ] is the output for the t -th element. A pooling layer then simpliﬁes features and reduces computational cost for the subsequent attention mechanism. The av erage pooling operation is deﬁned as y i = 1 k P k − 1 j =0 x i · s + j . 3) Squeeze-and-Excitation (SE) Block: The SE block high- lights informati ve input channels. Follo wing [37], for an input Fig. 2: Structure of the multi-con volution block (top). Structure of the self-attention block (bottom). feature map X ∈ R B × C × F × T , the block performs: Squeeze: z = 1 F · T F X i =1 T X j =1 X ( i, j ) , (1) Excitation: s = σ ( W 2 · δ ( W 1 · z )) , (2) Scale: ˜ X = s ⊙ X , (3) where σ and δ are acti v ation functions. 4) Embedding and Self-Attention Blocks: The embedding block (Fig. 2) abstracts the channel–feature dimension (C × F) into tokens. The attention block then establishes long-range dependencies between these tokens. The attention mechanism is deﬁned as [38]: Attention ( Q, K, V ) = softmax  QK ⊤ √ d k  V . (4) A single attention block is used, but a multi-head strategy captures div erse channel relationships: MultiHead ( Q, K, V ) = Concat ( head 1 , . . . , head h ) W O , (5) head i = Attention ( QW Q i , K W K i , V W V i ) . (6) 5) Fully Connected Layer: This module processes the attention output Z ∈ R T × D . Global average pooling is applied ov er the T dimension, followed by layer normalization and two fully connected layers to produce the ﬁnal output. D. Modality Fusion As shown in Fig. 1, EMG signals undergo the same multi- con volution and SE block processing as EEG. Fusion of EEG and EMG signals occurs within the embedding module. Fig. 3: Structure of the copilot and decoding point ﬁltering workﬂo w . The grasp-and-lift movement is segmented into ﬁv e states (SEARCHING, LIFTING, HOLDING, PUTTING, RETURNING), each with dif ferent conﬁdence thresholds. UNREL Y is a virtual state representing points misclassiﬁed as belonging to the current state, which are assigned a higher decoding threshold for ﬁltering. E. Copilot F ramework As depicted in Fig. 3, the copilot module integrates three components: (1) a decoding model that predicts spatial coor- dinates, (2) a label prediction model (identical structure) that classiﬁes motion states, and (3) a critic model that estimates conﬁdence scores for each decoded point. A kno wledge graph interacts with external sensor information to driv e state tran- sitions within a ﬁnite state machine. I V . E X P E R I M E N T D E TA I L S A. Experiment Design EEG and EMG signals from each grasp-and-lift trial served as decoding inputs. Considering MRCP latency and the trans- former’ s sequence modeling capability , v arious window sizes (50–1000 samples) and delays (100–700 ms) were e v aluated. Model input was a 3D tensor (batch × channel × time points), with outputs being min–max normalized 3D coordinates for the index ﬁnger and thumb (6 v alues total). The dataset was split following [29]: F or each participant, we randomly selected 30 trials as the validation set and another 30 non- ov erlapping trials as the test set, the rest for training, using MSELoss. For e valuating window sizes and delays, data from partic- ipant 4 was used. All delays and slicing were applied post- preprocessing to av oid artifact spread from ﬁltering bound- aries; segments with mismatched EEG and kinematics data after delay were discarded. For within-subject model comparisons, the proposed model was tested against EEGNet, DeepCon vNet, TCN, and a transformer baseline (EEG-only). The modiﬁed rEEGNet and rDeepCon vNet from [29] were used. The multimodal version of our model used combined EEG–EMG input. T ests used a 250-sample window and 200 ms delay . W ithin-subject tests in volved participants 3, 4, 5, 7, 9; cross-subject tests decoded Fig. 4: (Left) W ithin-subject decoding performance for differ- ent models (EEGNet, DeepConvNet, Transformer , EEG-TCN, EEG-only , and EMG–EEG fusion). (Right) Within-subject and cross-subject decoding performance for EEG-only and EMG–EEG fusion models. participant 1’ s data using models trained on these ﬁv e. Abla- tion studies validated model components. Decoded trajectories were simulated on a 7-DOF Franka Panda robotic arm in MuJoCo. For each trial, the midpoint of decoded index ﬁnger/thumb trajectories was mapped to the arm workspace, conv erted to joint angles via in verse kinematics, and interpolated for continuous motion. B. Copilot Implementation EEG preprocessing for the copilot was identical to pre vious experiments. The dataset was split dif ferently: W e reused the same trial-lev el train/v alidation/test split as previously for training the critic model, motion stage classiﬁer, and state transition model. C. P erformance Metrics 1) P earson Corr elation Coefﬁcient (PCC): PCC mea- sures linear correlation between decoded and actual 3D coordinate trajectories (thumb and index ﬁnger). The for- mula for two sequences C x and C y is: P C C xy = P n i =1 ( C x i − ¯ C x )( C y i − ¯ C y ) √ P n i =1 ( C x i − ¯ C x ) 2 √ P n i =1 ( C y i − ¯ C y ) 2 . The o verall PCC is com- puted by concatenating all x, y , z outputs into a single vector . 2) Root Mean Squar e Err or (RMSE): RMSE quanti- ﬁes the de viation between predicted and actual trajectories: RMSE xy = q 1 N P N i =1 ( C x i − C y i ) 2 . V . R E S U LT S A N D D I S C U S S I O N A. Model Comparison 1) W ithin-Subject Decoding: As shown in Fig. 4 (Left), except for the EEG–EMG multimodal approach, all EEG- only models showed notable inter-subject variability . The multimodal model remained stable across participants 3, 4, 5, 7, and 9, with PCC values of 0.94–0.97 and RMSE of 0.09–0.15. Participant 7 achiev ed the highest accuracy (PCC = 0.9707, RMSE = 0.0989), while participant 5 had the lowest (PCC = 0.9369, RMSE = 0.1447). In contrast, EEG-only models performed worse. Our proposed CNN–attention model Fig. 5: (Left) Model performance with varying input windo w sizes (50–1000 samples) for participant 4 with a 200 ms delay . Step size for the sliding windo w was one-ﬁfth of the window length. (Right) Model performance with varying kinematic data delays (50–350 samples) for participant 4 with a ﬁxed input length of 250 samples. performed best among them, achie ving a PCC of 0.8728 and RMSE of 0.1908 for participant 4—only 0.08 PCC lower than the multimodal result. The transformer baseline followed (PCC = 0.8353, RMSE = 0.2208), while EEGNet, DeepCon vNet, and TCN showed comparable results (PCC = 0.76–0.78, RMSE = 0.24–0.26). Participant 3 yielded the weakest EEG- only results, with PCC around 0.7 for both our model and the transformer , and RMSE near 0.3. TCN performed worst ov erall (PCC = 0.6036, RMSE = 0.3263). These results indicate that our model provides a signiﬁcant advantage ov er prior EEG-only models (EEGNet, DeepCon- vNet) for hand kinematics decoding, with an av erage PCC im- prov ement of approximately 0.05, despite remaining sensitive to inter-subject variability . 2) Cr oss-Subject Decoding: As shown in Fig. 4 (Right), cross-subject decoding resulted in lower PCC and higher RMSE than within-subject decoding. For the EEG–EMG model, cross-subject decoding sho wed an av erage PCC decrease of 0.09 and RMSE increase of 0.08. The model trained on participant 9 generalized best (PCC = 0.9063, RMSE = 0.1781), showing the smallest drop from its within-subject performance. The model trained on participant 4 performed worst (PCC = 0.8388, RMSE = 0.2278). The impact was more se vere for the EEG-only model, with an average PCC drop of 0.23 and RMSE increase of 0.1. The best cross-subject performance came from the model trained on participant 4 (PCC = 0.6646, RMSE = 0.3139), still worse than the lo west within-subject result. The poorest performance was from the model trained on participant 7 (PCC = 0.4985, RMSE = 0.3894). These ﬁndings suggest EEG–EMG multimodal signals are more robust to inter-subject variability . Both models sho wed a parallel trend between within-subject and cross-subject performance, implying that datasets yielding strong within- subject decoding also tend to support better cross-subject generalization under our deep learning architecture. 3) Effect of W indow Size and Delay: W e ﬁrst study two practical factors that directly shape decoding quality: the EEG input window length and the temporal delay between EEG and kinematics. All results in this ablation are obtained in the Fig. 6: MuJoCo robotic arm simulation (top). Trials with relativ ely clear trajectories (high PCC) from participants 3, 4, 5, 7, and 9, reconstructing the grasp-and-lift movement. For participant 3, the trajectory was rotated 180° around the z-axis to resolve visual overlap (bottom). EEG-only setting using participant 4, where PCC/RMSE are computed on the 3D midpoint trajectory between the thumb and index ﬁnger . Windo w size. Fig 5 (Left) sho ws a clear trade-off between temporal context and noise accumulation. As the windo w length increases, PCC improves steadily and reaches a turning point at 200 samples, indicating that short windo ws provide insufﬁcient context for stable trajectory estimation. Beyond 200 samples, performance gradually declines without sharp oscillations, suggesting that overly long windo ws may dilute informativ e cues and introduce non-stationary interference. The lo west accuracy occurs at 50 samples (PCC = 0.8004), while the best performance is achiev ed at 750 samples (PCC = 0.8916). The axis-wise trends follo w the same pattern, with the Y -axis consistently tracking the ov erall improv ement most closely . Delay . Fig 5 (Right) ev aluates the inﬂuence of kinematic delay . Performance peaks at a 200 ms delay and then degrades as the delay increases, consistent with the intuition that exces- siv e lag weakens the time alignment between neural activity and motor e xecution. The best o verall accuracy and error are obtained at 200 ms (PCC = 0.8728, RMSE = 0.1908), whereas the worst decoding is observed at 700 ms (PCC = 0.7993, RMSE = 0.2351). Across delays, axis-speciﬁc performance consistently follows Y > X > Z. The Y -axis achieves its highest PCC at 100 ms (0.9241), the X-axis peaks at 200 ms (0.8376), while the Z-axis is less stable and reaches its maximum at 300 ms (0.7554). B. Hand Kinematics Reconstruction on a Robotic Arm As shown in Fig. 6, EEG-only decoding occasion- ally produced trajectory rev ersals during all movement phases—particularly in grasping/return (participants 3, 7) and lift-hold stages (participants 3, 4, 9). Fluctuations during hold- Fig. 7: Performance of the test dataset for participant 4 after copilot ﬁltering, showing the proportion of retained points (bottom-left graph). Peripheral plots show trajectory changes before/after ﬁltering for a randomly selected trial, compared to the ground truth. The red box indicates the overall PCC for a single ﬁltered trial. ing indicate residual precision limitations. Incorporating EMG signiﬁcantly reduced the magnitude of re versals, especially for participants 3 and 4. While the trajectories in Fig. 6 appear relati vely complete, this applies only to trials with very high PCC ( > 0.9). For most trajectories (PCC 0.83–0.88), rev ersals are more common and pronounced. T o address this, we introduced the copilot ﬁltering module, whose effect is shown in Fig. 7. The graph shows that PCC continuously increased as the retention ratio decreased, up to a threshold. When the retained points dropped to 27.88% (672/2410), performance metrics (including PCC) changed signiﬁcantly and irregularly . This indicates that applying an appropriate conﬁdence threshold in the copilot can enhance decoding quality , b ut excessi ve ﬁlter - ing is detrimental. The peripheral trajectory plots demonstrate that obvious rev ersals are ﬁltered out, resulting in a clearer grasp-and-lift movement proﬁle. In summary , copilot-ﬁltered, EEG-based decoding trajec- tories can roughly reconstruct the grasp-and-lift motion and generate a stable robotic arm trajectory . Howe ver , for low- accuracy trajectories, the copilot currently cannot correct points—it can only ﬁlter unreliable ones. Higher-precision trajectory reconstruction from non-in vasi ve EEG still requires more robust decoding models and potentially improved hard- ware. V I . C O N C L U S I O N A N D F U T U R E W O R K This paper proposed a CNN–attention hybrid model for predicting hand kinematic trajectories from mov ement-related EEG or combined EEG–EMG signals. W e also designed a copilot framework to ﬁlter decoding points, impro ving the quality of EEG-only trajectories. Future work will integrate additional sensors to pro vide assistance tailored to different motion patterns and explore more advanced methods for decoding EEG and controlling arms in a more robust fash- ion [39]. R E F E R E N C E S [1] J. W olpaw and E. W . W olpaw , Brain–Computer Interfaces: Principles and Practice . Oxford University Press, 01 2012. [2] P . Brodal, The central nervous system : structure and function / P er Br odal. , 4th ed., 2010. [3] Z. Y in, J. K. Liu, and K. K ornyshev a, “Distributed neural dynamics underlie the shift from movement preparation to ex ecution, ” bioRxiv , Dec. 2025. [4] J. R. W olpaw , J. D. R. Mill ´ an, and N. F . Ramsey , “Brain-computer interfaces: Deﬁnitions and principles, ” Handbook of clinical neurology , vol. 168, p. 15—23, 2020. [5] Z. Y u, J. K. Liu, S. Jia, Y . Zhang, Y . Zheng, Y . Tian, and T . Huang, “T oward the Next Generation of Retinal Neuroprosthesis: V isual Com- putation with Spikes, ” Engineering , vol. 6, no. 4, pp. 449–461, Apr . 2020. [6] M.-H. Lee, O.-Y . Kwon, Y .-J. Kim, H.-K. Kim, Y .-E. Lee, J. Williamson, S. Fazli, and S.-W . Lee, “Eeg dataset and openbmi toolbox for three bci paradigms: an in vestigation into bci illiteracy, ” Gigascience , vol. 8, no. 5, 2019. [7] B. Z. Allison, S. Dunne, R. Leeb, J. D. R. Milln, and A. Nijholt, T owar ds Practical Brain-Computer Interfaces: Bridging the Gap fr om Resear ch to Real-W orld Applications . Springer, 2012. [8] R. Portillo-Lara, B. T ahirbegi, C. Chapman, J. Goding, and R. Green, “Mind the gap: State-of-the-art technologies and applications for eeg- based brain–computer interfaces, ” APL Bioengineering , vol. 5, p. 031507, 09 2021. [9] I. H. de Oliv eira and A. C. Rodrigues, “Empirical comparison of deep learning methods for eeg decoding, ” Fr ontiers in neur oscience , vol. 16, p. 1003984, 2023. [10] R. T . Schirrmeister , J. T . Springenberg, L. D. J. Fiederer , M. Glasstetter, K. Eggensperger , M. T angermann, F . Hutter, W . Burg ard, and T . Ball, “Deep learning with conv olutional neural networks for eeg decoding and visualization, ” Human brain mapping , vol. 38, no. 11, pp. 5391–5420, 2017. [11] V . J. Lawhern, A. J. Solon, N. R. W aytowich, S. M. Gordon, C. P . Hung, and B. J. Lance, “Eegnet: a compact con volutional neural network for eeg-based brain-computer interf aces, ” Journal of neural engineering , vol. 15, no. 5, p. 56013, 2018. [12] T . M. Ingolfsson, M. Hersche, X. W ang, N. K obayashi, L. Cavigelli, and L. Benini, “Eeg-tcnet: An accurate temporal con volutional network for embedded motor-imagery brain-machine interfaces, ” in Conference pr oceedings - IEEE International Confer ence on Systems, Man, and Cybernetics , vol. 2020-. IEEE, 2020, pp. 2958–2965. [13] Y . Song, X. Jia, L. Y ang, and L. Xie, “Transformer-based spatial- temporal feature learning for ee g decoding, ” 2021. [14] G. Cisotto, A. Zanga, J. Chlebus, I. Zoppis, S. Manzoni, and U. Marko wska-Kaczmar , “Comparison of attention-based deep learning models for eeg classiﬁcation, ” 2020. [15] Y . Song, Q. Zheng, B. Liu, and X. Gao, “Eeg conformer: Con volutional transformer for eeg decoding and visualization, ” IEEE T ransactions on Neural Systems and Rehabilitation Engineering , vol. PP , pp. 1–1, 12 2022. [16] D. W ang and Q. W ei, “Smanet: A model combining sincnet, multi- branch spatial-temporal cnn, and attention mechanism for motor imagery bci, ” IEEE transactions on neural systems and r ehabilitation engineer- ing , vol. 33, pp. 1497–1508, 2025. [17] R. J. Kobler , A. I. Sburlea, V . Mondini, M. Hirata, and G. R. M ¨ uller- Putz, “Distance- and speed-informed kinematics decoding improves m/eeg based upper-limb movement decoder accuracy , ” Journal of neural engineering , vol. 17, no. 5, p. 56027, 2020. [18] T . Trammel, N. Khodayari, S. J. Luck, M. J. T raxler , and T . Y . Swaab, “Decoding semantic relatedness and prediction from ee g: A classiﬁcation method comparison, ” Neur oImage , vol. 277, p. 120268, 2023. [19] M. Jochumsen, I. K. Niazi, N. Mrachacz-Kersting, D. Farina, and K. Dremstrup, “Detection and classiﬁcation of movement-related cortical potentials associated with task force and speed, ” J ournal of neural engineering , vol. 10, no. 5, p. 56015, 2013. [20] A. Jain and L. Kumar , “Ee g cortical source feature based hand kine- matics decoding using residual cnn-lstm neural network, ” in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) . IEEE, Jul. 2023, p. 1–4. [21] A. Y . Paek, H. A. Agashe, and J. L. Contreras-V idal, “Decoding repetitiv e ﬁnger movements with brain acti vity acquired via non-inv asi ve electroencephalography , ” F r ontiers in neur oengineering , vol. 7, p. 3, 2014. [22] Y . Jia and C. W . T yler, “Measurement of saccadic eye mov ements by electrooculography for simultaneous eeg recording, ” Behavior r esearch methods , vol. 51, no. 5, pp. 2139–2151, 2019. [23] A. Chaddad, Y . Wu, R. Kateb, and A. Bouridane, “Electroencephalogra- phy signal processing: A comprehensiv e review and analysis of methods and techniques, ” Sensors , vol. 23, no. 14, p. 6434, 2023. [24] K. Erat, E. B. S ¸ ahin, F . Do ˘ gan, N. Merdano ˘ glu, A. Akcakaya, and P . O. Durdu, “Emotion recognition with eeg-based brain-computer interfaces: a systematic literature review , ” Multimedia tools and applications , vol. 83, no. 33, pp. 79 647–79 694, 2024. [25] T . J. Bradberry , R. J. Gentili, and J. L. Contreras-V idal, “Reconstructing three-dimensional hand movements from nonin vasi ve electroencephalo- graphic signals, ” The Journal of neur oscience , v ol. 30, no. 9, pp. 3432– 3437, 2010. [26] P . Ofner, A. Schwarz, J. Pereira, and G. R. M ¨ uller-Putz, “Upper limb movements can be decoded from the time-domain of low-frequency eeg, ” PloS one , v ol. 12, no. 8, p. e0182578, 2017. [27] M. Saini, A. Jain, S. P . Muthukrishnan, S. Bhasin, S. Roy , and L. Kumar , “Bicurnet: Premov ement eeg-based neural decoder for biceps curl trajectory estimation, ” IEEE transactions on instrumentation and measur ement , vol. 73, pp. 1–11, 2024. [28] J.-H. Jeong, K.-H. Shim, D.-J. Kim, and S.-W . Lee, “Brain-controlled robotic arm system based on multi-directional cnn-bilstm network using eeg signals, ” IEEE transactions on neural systems and r ehabilitation engineering , vol. 28, no. 5, pp. 1226–1238, 2020. [29] A. Jain and L. Kumar , “Esi-gal: Eeg source imaging-based kinematics parameter estimation for grasp and lift task, ” arXiv , 2024. [30] Z. W an, M. Li, S. Liu, J. Huang, H. T an, and W . Duan, “Eegformer: A transformer–based brain acti vity classiﬁcation method using eeg signal, ” F rontier s in neur oscience , vol. 17, p. 1148855, 2023. [31] M. Singh, S. Chauhan, A. K. Rajput, I. V erma, and A. K. Tiwari, “Easm: An efﬁcient attnsleep model for sleep apnea detection from eeg signals, ” Multimedia tools and applications , v ol. 84, no. 4, pp. 1985–2003, 2025. [32] A. H. P . Nguyen, O. Oyeﬁsayo, M. A. Pfeffer , and S. H. Ling, “Eeg- tcntransformer: A temporal conv olutional transformer for motor imagery brain–computer interfaces, ” Signals , vol. 5, no. 3, pp. 605–632, 2024. [33] S. Bagchi and D. R. Bathula, “Eeg-con vtransformer for single-trial eeg based visual stimuli classiﬁcation, ” 2021. [34] J. Y . Lee, S. Lee, A. Mishra, X. Y an, B. Mcmahan, B. Gaisford, C. Kobashiga wa, M. Qu, C. Xie, and J. C. Kao, “Brain–computer interface control with artiﬁcial intelligence copilots, ” Natur e Machine Intelligence . [35] M. Da vid Luciw , E. Jarocka, and B. Edin, “W ay-eeg-gal: Multi-channel eeg recordings during 3,936 grasp and lift trials with varying weight and friction, ” Nov 2014. [36] S. Bai, J. Z. K olter, and V . Koltun, “An empirical e valuation of generic con volutional and recurrent networks for sequence modeling, ” 2018. [37] J. Hu, L. Shen, S. Albanie, G. Sun, and E. W u, “Squeeze-and-e xcitation networks, ” IEEE transactions on pattern analysis and machine intelli- gence , vol. 42, no. 8, pp. 2011–2023, 2020. [38] A. V aswani, N. Shazeer , N. Parmar , J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “ Attention is all you need, ” arXiv , vol. abs/1706.03762, 2017. [39] Z. Y ang, S. Guo, Y . Fang, Z. Y u, and J. K. Liu, “Spiking V ariational Policy Gradient for Brain Inspired Reinforcement Learning, ” IEEE T ransactions on P attern Analysis and Machine Intelligence , vol. 47, no. 3, pp. 1975–1990, Mar . 2025.

Copilot-Assisted Second-Thought Framework for Brain-to-Robot Hand Motion Decoding

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment