On the use of Pairwise Distance Learning for Brain Signal Classification with Limited Observations

On the use of Pairwise Distance Lear ning f or Brain Signal Classiﬁcation with Limited Observ ations Da vid Calhas, 1,3 Enrique Romero, 2,4 Rui Henriques 1,5 1 Instituto Superior T ecnico, Lisbon, Portugal 2 Univ ersitat Politecnica de Catalunya, Barcelona, Spain 3 david.calhas@tecnico.ulisboa.pt, 4 eromero@cs.upc.edu, 5 rmch@tecnico.ulisboa.pt Abstract The increasing access to brain signal data using electroen- cephalography creates new opportunities to study electro- physiological brain acti vity and perform ambulatory diag- noses of neurological disorders. This work proposes a pair - wise distance learning approach for Schizophrenia classiﬁca- tion relying on the spectral properties of the signal. Gi ven the limited number of observations (i.e. the case and/or control individuals) in clinical trials, we propose a Siamese neural network architecture to learn a discriminative feature space from pairwise combinations of observations per channel. In this way , the multivariate order of the signal is used as a form of data augmentation, further supporting the network generalization ability . Con volutional layers with parameters learned under a cosine contrastive loss are proposed to ade- quately explore spectral images deri ved from the brain signal. Results on a case-control population sho w that the features extracted using the proposed neural network lead to an im- prov ed Schizophrenia diagnosis (+10pp in accuracy and sen- sitivity) against baselines, suggesting the existence of non- trivial electrophysiological brain patterns able to capture dis- criminativ e neuroplasticity proﬁles among individuals. 1 Introduction The recording of increasingly affordable and precise elec- troencephalographic (EEG) data is creating unprecedented opportunities to understand brain acti vity , aid personalized prognostics, and promote health through wearable biofeed- back systems (Nan et al. 2012). Electroencephalography is non-in v asiv e, safe, inexpensi ve, and shows rich temporal content; in contrast with other brain imaging modalities, such as magnetic resonances, entailing higher costs, risks and restrictions on the periodicity of recordings (Fowle and Binnie 2000). EEG monitoring is widely used to assess psy- chiatric disorders, and has shown to be a valuable source to study Schizophrenia, a disorder affecting about 1% of the world population, largely susceptible to misdiagnoses (Owen, Sawa, and Mortensen 2016). Despite the inherent advantages of monitoring electro- physiological brain activity , its use for diagnosing neuronal diseases is still capped by the limited size of case-control populations (Litjens et al. 2017), as well as by the intrinsic Copyright c  2020, Association for the Advancement of Artiﬁcial Intelligence (www .aaai.org). All rights reserved. difﬁculties of mining brain signals. Brain signal data is high- dimensional, multi variate, susceptible to noise/artefacts, rich in temporal-spatial-spectral content, and highly-v ariable be- tween individuals (da Silv a 2013). This work proposes a dedicated class of neural net- works to e xtract discriminati ve features of Schizophrenia from electrophysiological brain data. The proposed ap- proach combines principles from pairwise distance learning and spectral imaging in order to address the aforementioned challenges, enabling superior diagnostics. Accordingly , the proposed approach offers se ven major contributions: 1. Ability to learn from small datasets by taking advantage of Siamese Network layering, inherently prepared to work in augmented data spaces mapped from a limited number of observ ations (Gorbachevskaya and Boriso v 2002). The features produced by these networks hav e proven to be useful to perform classiﬁcation as they rely on either the homologous or discriminativ e properties of observation- pairs in a pairwise distance domain (K och, Zemel, and Salakhutdinov 2015); 2. Ability to deal with the rich and complex spectral and temporal content of EEG data by processing the signal into spectral images with a ﬁne frequency and temporal resolution per electrode, and by subsequently reshaping the Siamese network architecture with adequate con volu- tional operations; 3. Robustness to noise and wave-instability by asses sing dis- tances on the spectral content under a cosine-loss. Gath- ered e vidence shows less susceptibility to artefacts and the inherent variability of electrophysiological potentials as- sociated with continuously changing overlapping electri- cal ﬁelds produced by localized neurons (da Silva 2013); 4. Ability to deal with the multiv ariate nature of the signal (rich spatial content) by capturing interdependencies be- tween channels as their content is simultaneously used to shape the weights of shared connections in the network; 5. Ability to handle the extremely-high dimensional nature of the gathered spectral content from brain signals (high- resolution spectral image per electrode) under L1 regular - ization; 6. Applicability of the proposed EEG-based diagnostics to alternativ e populations or diseases, evidenced by the: i) placed Bayesian optimization step (Snoek, Larochelle, and Adams 2012) for hyperparameter tuning and ﬁxing feature numerosity; ii) fully-automated nature of the ap- proach once signals are recorded; and iii) generalization ability of the learning process on validation data. In contrast with the traditional stance to neural informa- tion processing systems, this manuscript explores whether we can go deep on highly-dimensional spatiotemporal data in the presence of a very limited number of data observa- tions. This stance is much needed in healthcare giv en the limited size of trials (cohort studies), often dri ven by dis- ease rarity , capped size of control population, trial eligibility requirements, or the facultati ve nature of EEG assessments. Results conﬁrm this possibility: +10pp in the accuracy and sensitivity of Schizophrenia diagnostics. The features e xtracted from the proposed spectral and pairwise distance space further suggest the presence of dis- criminativ e elecrophysiological patterns linked to neuro- plasticity aspects of the individuals. This observation is in accordance with ﬁndings from pre vious studies that estab- lished statistically signiﬁcant relationships between v aria- tions in the frequency band spectrum and neuroplasticity conditions (Bhandari et al. 2016; Liu et al. 2015). The manuscript is organized as follo ws. After formaliz- ing the problem, Section 2 surve ys existing contributions on the diagnosis of individuals from brain signal data. Section 3 describes the proposed solution. Section 4 shows extended evidence of its relev ance for diagnosing Schizophrenia. Fi- nally , concluding remarks are drawn in Section 5. 1.1 Problem Description Problem. A EEG recording or brain signal observation is a multiv ariate time series X = { x j t | j ∈ { 1 ..M } , t ∈ { 1 ..T }} , where x j t is a measure of the electrophysiological acti vity in scalp channel j and instant t , T is the number of time points, and M is the multiv ariate order (number of chan- nels). Gi ven a brain signal dataset { ( X i , c i ) | i = 1 ..N } , with N EEG recordings X i annotated with a label c i ∈ Σ , our task is to identify a discriminativ e feature space to clas- sify (unlabeled) observ ations. Speciﬁcally , we are interested in classifying Schizophrenia giv en case-control populations. Background. The electrophysiological signal produced by a speciﬁc channel k in the cerebral cortex is a univ ariate time series that can be decomposed into a frequency time series using a discrete Fourier transform. The analysis of the frequency domain of a signal, generally referred as spectral analysis, determines the predominant wav es monitored at a certain location. A short-time discrete Fourier transform can be alternatively applied along a sliding window of the raw signal to capture potentially relev ant changes on the spec- tral activity of the brain throughout the EEG recording. The spectral content produced by this time-varying form of spec- tral analysis is here informally referred as a spectral image since it measures brain activity along two contiguous axes: time and frequency . 2 Related W ork 2.1 EEG Classiﬁcation EEGNet (Lawhern et al. 2018), EEGNet-SSVEP (Lawhern et al. 2018), DeepCon vNet (Schirrmeister et al. 2017) and ShallowCon vNet (Schirrmeister et al. 2017) are considered state-of-the-art EEG classiﬁcation built models that make use of con volutional operations directly on the ra w EEG data. These con volutions are placed along time and chan- nels. Approaches like these rely on the properties of its models to extract discriminativ e features from EEG signals. These models are validated with a total of 4 datasets, all of which are based on EEG task or stimuli based recording ses- sions. One can see directly that these networks learn event related potentials from the EEG signal, which makes the EEG recording session dependable of a task environment. In contrast, we aim at extracting neuroplasticity related fea- tures from the EEG signal, as the dataset used is based on a resting state recording session. In Section 4, these models are shown to perform bad on resting state EEG data. 2.2 EEG on Schizophrenia Dve y-Aharon et al. (2015) claim mostly changes in func- tional connectivity are seen in patients with Schizophrenia, as well as dif ferences in theta-frequency acti vity . A classi- ﬁcation approach was applied on 1-minute signals recorded by a single electrode. The de veloped system consists of four stages: performing se veral preprocessing tasks and break- ing the raw signals into relev ant interv als; transformation of the EEG signal into a time/frequency representation via the Stockwell transformation; feature extraction from the time/frequency representation; and discrimination of spe- ciﬁc time frames following a given set of stimuli between the time/frequency matrix representations of the healthy sub- jects and the schizophrenia patients. Despite promising re- sults, the approach requires the performance of cogniti ve tasks by the individuals under assessment throughout the recording. Dve y-Aharon et al. (2017) introduced another way of looking at the EEG signal using connectivity maps deriv ed from the brain acti vity . In order to b uild these maps, a similarity function needs to be chosen, so one can check which nodes are more similar to which ones. Results showed that the degradation of connectivity is being accelerated within schizophrenia individuals. And that information relay changes in an abnormal manner primarily in the prefrontal area. This gi ves a good insight on how connecti vity maps can be applied to discriminate schizophrenia. And most im- portant, that one should take into account that a change in a certain region can inﬂuence other re gions in the brain. Sabeti, Katebi, and Boostani (2009) introduced another approach to classify Schizophrenia based on entropy and complexity measures of the EEG signal. The features ex- tracted from the signal were: Shannon entropy , spectral entropy , approximate entropy , Lempel-Ziv complexity and Higuchi fractal dimension. Genetic programming was used for feature selection. W ith these features, Adaptati ve boost (Adaboost) and Linear Discriminant Analysis (LD A) clas- siﬁers were v alidated, showing performance improvements against peer approaches. The recordings were done with T able 1: Schizophrenia EEG datasets. Dataset Healthy Controls Schizohprenic Individuals Access Dve y-Aharon et al. (2017) and Dvey- Aharon et al. (2015) 20 20 Priv ate Sabeti, Katebi, and Boostani (2009) 25 25 Priv ate Gorbachevskaya and Boriso v (2002) 39 45 Public eyes open, a setting easily biased by en vironmental effects. Notable examples of connectionist and spectral ap- proaches were introduced to discriminate and characterize Schizophrenia. Ne vertheless, there is still a research gap on how to simultaneously explore the rich spectral, tempo- ral and spatial nature of brain signals to perform classiﬁ- cation. In spite of the indisputable role of neural network learning for the analysis of complex spatiotemporal signal data, its role for EEG-based diagnostics of psychiatric disor - ders remains largely unexplored due to the absence of large cohorts and the inherent stochastic complexities associated with electrophysiological data. 2.3 Siamese Neural Network First introduced by Bromley et al. (1994) as a nov el model used in the task of signature classiﬁcation whose aim was to distinguish signature for geries from the real ones, Siamese Neural Networks (SNN) are deep learning architectures with two sub-networks that consist on the same instance, hence being called ”siamese networks”. This architecture receives as input a pair of samples. Subsequently , the outputs of the pairs used as input to these ”siamese networks” are joined in a distance function. The proposed distance function between the output of the SNNs is the cosine similarity (for signa- tures from the same person the output should be 1 , and − 1 for forged ones). This model had outstanding results at the time, detecting 80 . 0% of the forged signatures and 95 . 5% of the genuine signatures. More recently , K och, Zemel, and Salakhutdinov (2015) successfully used a SNN Architecture for One Shot Learning (meaning the model only sees each class once in an epoch). This approach reached 92 . 8% ac- curacy in the test set. These results were achiev ed through a Siamese Con volutional Architecture. Once this kind of network is trained, its learned representations via a super- vised metric-based approach with SNNs are useful to per- form tasks like classiﬁcation, relying on the discriminative properties of these features. 3 Our A pproach The proposed architecture is inspired by the architecture formerly introduced by K och, Zemel, and Salakhutdino v (2015). An advantage of this type of architecture is the abil- ity to augment the original dataset from an instance-based data space to a pair-based one. Our approach has two main steps: 1) feature e xtraction; and 2) classiﬁcation. In step 1, the internal representations obtained from the SNN architec- ture model are extracted after training. In step 2, a classiﬁca- tion task is performed using these extracted features. Previ- ous to both steps, we perform hyperparameter optimization for ev ery model using Bayesian Optimization (BO) (Snoek, Larochelle, and Adams 2012). 3.1 Dataset Description Approaches based on induced stimuli or task performance, followed by the analysis of ev ent related potentials, are not considered in this work. Instead, a resting state setup is consider to monitor the underlying brain patterning at the brain cortex, independently of the surrounding environ- ment/undertaken task. Subsequently , this a voids any addi- tional interference on the EEG signal recorded. Howells et al. (2018) ﬁndings support the use of this setup, claiming that differences on the spectral activity – such as higher delta and a lower alpha synchronization in psychotic disorders – can be optimally detected in resting state protocols with both open and closed eyes. T able 1 shows the content of EEG datasets containing healthy control indi viduals and schizophrenic individuals. Dve y-Aharon et al. (2017), Dve y-Aharon et al. (2015) and Sabeti, Katebi, and Boostani (2009) works were introduced and discussed in Section 2. Unfortunately , besides their low signiﬁcance and variance, due to the lack of observ ations, the datasets used are not made publicly av ailable. Nonethe- less, Gorbachevskaya and Borisov gathered a total of 84 in- dividuals, of which 45 were schizophrenic and 39 were re- garded as healthy controls. The population in (Gorbachevskaya and Borisov 2002), consists of adolescents who had been screened by a psychi- atrist and got either a positi ve or negati ve diagnostic for the schizophrenia neuropathology . EEG recordings were sam- pled at 128 Hz with 1 minute duration. Individuals were set in a resting state with eyes closed. In accordance with the 10-20 system of electrode placement, the topographical po- sitions of the placed EEG channels are: F7, F3, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2. 3.2 Siamese Neural Network Ar chitecture The SNN architecture contains two sub networks that corre- spond to the same instance (twin networks). Both of these twin networks are referred to as the Base Network (BN). The input and output of the BN are an example and a feature vector , respectiv ely . The output feature vector corresponds to the features extracted in the aforementioned step 1. In our case, the BN receiv es as input a Discrete Short- T ime Fourier T ransform (DSTFT) representation of the EEG signal, that is extracted from the 1 minute recording of a channel of an indi vidual. The DSTFT is taken with 2 sec- onds length windows in order to capture frequencies as low as 0 . 5 Hz, corresponding to the delta wav e frequencies Figure 1: Base network from the SNN. (Howells et al. (2018) points out that frequencies lo wer than 2 Hz are rele vant to differentiate Schizophrenia). This image is processed through two con volutional layers, followed by a fully connected layer . The activ ation function used in the con volutional layers is the rectiﬁed linear function (Hahn- loser et al. 2000), while the fully connected layer uses the softmax acti vation function, normalizing the domain of the feature representations, ~ f ∈ R q , i ∈ [1 , q ] : f i ∈ [0 , 1] . Once the BN network (Fig. 1) is built, a replication of it is made, producing its twin and sharing their weights. The SNN layout is achiev ed joining these twins and computing a distance metric between their outputs, as shown in Fig. 2. In our case, the inputs to the SNN are pairs of DSTFT representations and the outputs are the computed distance between the representations obtained by the BN. Figure 2: SNN architecture. The SNN tries to solve what is known as a neighbor sep- aration problem, consisting on the separation of instances in a dataset that contains different classes. In our case we have two classes: schizophrenic and healthy control individuals. In this neighbor separation problem, pairs of individuals of the same class (schizophrenic with schizophrenic or health y with healthy) are called neighbors and pairs of indi viduals of dif ferent classes (schizophrenic with healthy) are called non-neighbors. The network learns a transformation with the objectiv e of assigning small distance to neighbors and large distance to non-neighbors. W ith the previously described architecture, the neighbor separation problem can be posed as a minimization problem of a certain loss function that depends on such distance. In (Hadsell, Chopra, and LeCun 2006), the Contrasti ve Loss function is introduced to that end, deﬁned as: L ( W , Y , X 1 , X 2 ) = Y D W 2 + (1 − Y ) max (0 , m − D W ) 2 (1) where ( X 1 , X 2 ) is the input pair , Y = 1 if X 1 and X 2 are neighbors and 0 otherwise, D W the distance between the predicted values of X 1 and X 2 , and m is the margin value of separation. Minimization of the Contrasti ve Loss function leads to a scenario where neighbors are pulled together and non-neighbors are pushed apart, according to a certain dis- tance metric. The margin value is sensiti ve. High v alues of m increase the separation between non-neighbors (pairs of different class), impacting positiv ely the accuracy although making the training slo wer . In contrast, low values of m may cause the model not to learn the desired behavior . The distance metric considered in our case is the co- sine distance. This metric was chosen with the belief that a pattern based metric (cosine) would perform better than a magnitude based one (euclidean), in order to shed light on how the schizophrenia pathology e xpresses itself through the EEG. Besides the type of layers and the distance metric, the fol- lowing techniques are integrated in the model: L 1 regular - ization and Dropout layers. The L 1 regularization is useful because it helps remove features that are not useful for the task. Dropout layers are introduced to improv e generaliza- tion. Regularization is applied at the kernel of all layers. The Dropout probability used is 0 . 5 , as suggested by Sriv astav a et al., and is applied after each con volutional layer . Adam (Kingma and Ba 2014) is used to optimize the network dur- ing the training session. Hyperparameter T uning The number of layers, as well as their type, are ﬁxed. The rest of hyperparameters (regu- larization factor , margin value, learning rate, kernel size and output dimension of the BN) are susceptible to optimization. As previously mentioned, we apply BO to that end. BO is set to run with a maximum of 50 acquisitions and starts with 5 iterations to perform an initial exploration. In each iteration and acquisition, a K -fold Cross V alidation with K = 5 is done with the training set of a Lea ve-One-Out Cross V alida- tion (LOOCV) partition. The combination of hyperparam- eters that has the best average validation accuracy across the 5 -folds is chosen to perform the feature extraction. Each of the hyperparameters are assigned the follo wing value do- mains to e xplore: regularization factor ∈ [10 − 3 , 10 − 1 ] , mar - gin value ∈ [1 . 0 , 2 . 0] , learning rate ∈ [10 − 6 , 10 − 3 ] , ker- nel size t × f with t = f = { 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 } (the same kernel size is used for both con volutional lay- ers) and ﬁnal output dimension ∈ { 2 , 4 , 6 , 8 , 10 , 12 , 14 } . The BO surrogate model is a standard Gaussian Process. Expected Improv ement is used as an acquisition function and the Limited-Memory BroydenFletcherGoldfarbShanno algorithm as the acquisition optimizer . The DSTFT magnitudes are normalized, under the hy- pothesis that there exists a threshold from which there is no additional information to identify the schizophrenia pathol- ogy . W ith this, the values are normalized by an upper v alue, U . V alues of f smaller than U are divided by U and magni- tudes bigger than U are set to 1 . 0 . This allo ws every mag- nitude of the frequencies to be within the interval [0 , 1] af- ter the normalization is performed. W e take adv antage of the BO exploration to obtain U , by introducing it in the same op- timization process made for the SNN hyperparameters. The domain assigned to be explored for U is [100 . 0 , 500 . 0] . Pairwise Dataset Structure W e w ant the network to learn a valid transformation that generalizes to all channels. T o that end, the pairs are set such that only equal channels are paired. Pairs of dif ferent channels are not considered, since we see different channels as correlated spaces with dif fer- ent properties. Fortunately , the SNN is capable of learn- ing different spaces/classes, as shown in (Koch, Zemel, and Salakhutdinov 2015), where the proposed system is able to learn a similar setup. This pairwise schema can be seen as a data augmentation technique being performed with the addi- tion of noise to the dataset. This noise is present by mixing all the channels with the aforementioned restrictions in order to be coherent. No other data augmentation scheme, such as image transformations (scaling, rotations), is applied. From our original EEG dataset, X 1 , ..., X N spectral im- ages are derived with N = 84 examples, and a pairwise dataset P is built. Formally , P = P 1 , ..., P O with O = c C N 2 = M C 84 2 = 55776 , where M = 16 is the number of EEG channels. The space complexity of the pair dataset is O ( c C N 2 ) . The SNN training session is done with a batch size multiple of the number of channels. In particular, we use B = 16 ∗ c . Therefore, there are 16 pairs of individuals in each batch and each pair of individuals has c = 16 chan- nel pairs. This scheme can only be applied in small datasets, since the model does not scale well in terms of space com- plexity , but our goal is precisely to tackle small datasets by the creation of a whole ne w optimization space where the variability contained in the data can be exploited in a dif fer- ent way . 3.3 V alidation After the SNN has been tuned and trained (in a 20 epochs session), the outputs of the BN for ev ery example were the result of our feature extraction process. With these features, the following classiﬁers were trained to identify schizophrenia: Support V ector Machines (SVM), Random Forest (RF), XGBoost (XGB), Nai ve Bayes (NB) and k- Nearest Neighbors (kNN). This process was performed with a LOOCV , where each fold consists on one subject ( 16 chan- nels/instances). For each of these classiﬁers, BO hyperpa- rameter tuning is also performed, setup with a maximum of 10 acquisitions and 5 iterations for initial exploration. The hyperparameter domains for each classiﬁer were: • SVM: type of kernel (linear or radial-basis function k er- nel), cost C ∈ [0 . 5 , 5] , and gamma coefﬁcient γ ∈ [0 . 00001 , 1 . 0] • RF: number of estimators N e ∈ { 5 , 10 , 15 , 20 , 25 } • XGB: maximum depth d ∈ { 3 , 4 , 5 , 6 , 7 } , learning rate λ ∈ [0 . 001 , 0 . 1] , and number of estimators N e ∈ { 10 , 50 , 100 , 200 } • NB has no hyperparameters • kNN: number of neighbors k ∈ { 2 , 3 , 4 , 5 , 6 , 7 , 8 } The hyperparameter tuning optimization for the classiﬁers is also performed in a K -Fold Cross V alidation setup ( K = 5 ), but instead of using the whole dataset (as was the case for the SNN) only the training set of the LOOCV partition was used. Similar to the BO for the SNN, the combination of hyperparameters with the best a verage v alidation accuracy is chosen for each classiﬁer . 4 Results Classiﬁcation results observed, with the extracted features from the proposed SNN, are compared with state-of-the-art classiﬁers developed by Schirrmeister et al. (2017), Charles (2013) and Lawhern et al. (2018). W e further compare our approach against classiﬁers able to learn directly from spectral/FFT features extracted each channel (Hindarto and Sumarno 2016). The EEG classiﬁers proposed in pre vi- ous work are referred to as: (vi) EEGNet, (vii) EEGNet- SSVEP , (viii) Riemann, (ix) DeepCon vNet, (x) Shallow- Con vNet. The FFT features classiﬁers are referred to as: (i) FFT -kNN, (ii) FFT -NB, (iii) FFT -RF , (iv) FFT -SVM, (v) FFT -XGB. The proposed classiﬁers based on the SNN extracted features are referred to as: (xi) DSTFT -SNN- kNN, (xii) DSTFT -SNN-NB, (xiii) DSTFT -SNN-RF , (ixx) DSTFT -SNN-SVM, (xx) DSTFT -SNN-XGB. According to T able 2, the SNN features outperform the baselines considered by an average of 20pp both in accuracy , speciﬁcity and sensiti vity . In fact all of the collected differ - ences are statistically signiﬁcant under signiﬁcance thresh- olds below 1E-5. The results observed when considering FFT features un- derline the dif ﬁcult nature of the problem at hands, showing that the use of spectral features is not sufﬁcient to capture discriminativ e electrophysiological brain patterns. As previously mentioned in Section 2, the previous work on EEG – referred in T able 2 as: (vi), (vii), (viii), (ix) and (x) – is unable to capture neuroplasticity dif ferences be- tween healthy and Schizophrenia individuals from resting state data. These approaches are mainly prepared to detect ev oked potentials in response to speciﬁc stimuli, thus gen- erally neglecting subtle, spontaneous electrophysiological variations in the brain of indi viduals. T able 2: Comparison between Baseline Features and SNN Extracted Features (among all channels) Classiﬁer Accuracy Sensitivity Speciﬁcity (i) FFT -kNN 0 . 60 ± 0 . 31 0 . 56 ± 0 . 33 0 . 64 ± 0 . 30 (ii) FFT -NB 0 . 57 ± 0 . 32 0 . 33 ± 0 . 38 0 . 85 ± 0 . 14 (iii) FFT -RF 0 . 58 ± 0 . 32 0 . 58 ± 0 . 32 0 . 64 ± 0 . 29 (iv) FFT -SVM 0 . 66 ± 0 . 28 0 . 69 ± 0 . 26 0 . 63 ± 0 . 29 (v) FFT -XGB 0 . 65 ± 0 . 28 0 . 68 ± 0 . 26 0 . 61 ± 0 . 30 (vi) EEGNet (2018) 0 . 58 ± 0 . 32 0 . 58 ± 0 . 31 0 . 59 ± 0 . 32 (vii) EEGNet-SSVEP (2018) 0 . 54 ± 0 . 34 0 . 60 ± 0 . 31 0 . 46 ± 0 . 37 (viii) Riemann (2013) 0 . 41 ± 0 . 50 0 . 47 ± 0 . 54 0 . 44 ± 0 . 50 (ix) DeepCon vNet (2017) 0 . 54 ± 0 . 12 0 . 64 ± 0 . 08 0 . 41 ± 0 . 14 (x) ShallowCon vNet (2017) 0 . 57 ± 0 . 32 0 . 58 ± 0 . 31 0 . 56 ± 0 . 32 (xi) DSTFT -SNN-kNN 0 . 78 ± 0 . 20 0 . 78 ± 0 . 19 0 . 77 ± 0 . 20 (xii) DSTFT -SNN-NB 0 . 76 ± 0 . 21 0 . 78 ± 0 . 20 0 . 73 ± 0 . 23 (xiii) DSTFT -SNN-RF 0 . 79 ± 0 . 18 0 . 81 ± 0 . 17 0 . 77 ± 0 . 20 (ixx) DSTFT -SNN-SVM 0 . 78 ± 0 . 19 0 . 83 ± 0 . 16 0 . 72 ± 0 . 23 (xx) DSTFT -SNN-XGB 0 . 83 ± 0 . 16 0 . 84 ± 0 . 15 0 . 82 ± 0 . 16 In contrast, the use of DSTFT representations followed by application of the proposed SNNs are better prepared to detect neuroplasticity characteristics on the EEG signal as motiv ated by the rich spectral content inputted to the SNN, the properties of the entailed transformations, and the dis- criminativ e power of the features outputted from the SNN. These observations are experimentally demonstrated by the results presented in T able 2, with a signiﬁcant dif ference be- tween our approach and the previous w ork on EEG. Among the classiﬁers applied to the SNN features, XG- Boost has the better performance, follo wed by RFs, SVMs with sparse kernel and kNNs. W e hypothesize that this ob- servation is primarily driven by the compositional value of the extracted features and the heterogeneity of individual proﬁles. Understandably , since only a part of the overall features hav e discriminati ve value for a giv en subject due to proﬁle heterogeneity , NB and kNN have an understand- able lo wer performance due to their inherent inability to dis- card non-relev ant features. Similarly , when we compare the classiﬁers performance from FFT features, FFT -kNN and FFT -NB have a slightly inferior performance against FFT - XGB and FFT -SVM. Among the ﬁv e classiﬁers all of them slightly underperformed on discriminating healthy controls (speciﬁcity) than discriminating schizophrenic indi viduals (sensitivity) due to an inherent ability to avoid f alse nega- tiv es. The gathered results conﬁrm the relev ance of working in a pairwise distance space to guarantee a good generalization ability . In addition, the applied conv olution transformations guarantee a sensitivity to the inherently rich spatial, tempo- ral and spectral nature of the EEG signal. W e hypothesize that these aspects, together with the use of regularization and the cosine loss function (able to fav or variations over abso- lute dif ferences in the spectral content), explain the ability to learn extremely discriminati ve features. 5 Conclusion The rich nature of the electrophysiological data measured at the cerebral cortex makes deep learning a natural candi- date to study disorders disrupting the normal brain activity . Nev ertheless, the limited size of case-control populations, together with the inherent variability of the spectral content within and among individuals, had left the value of neu- ral netw ork approaches lar gely une xplored. This manuscript stresses the rele vance of revisiting this problem, showing that adequately reshaped neural networks with proper loss and regularization can increase the accuracy of Schizophre- nia diagnostics by 15-to-20 percentage points against peer alternativ es (without hampering sensiti vity or speciﬁcity). T wo master principles underlie these results: 1) the map- ping of the original data space into a pairwise distance space to support data augmentation while enhancing the discrimi- nativ e power of the output features; and 2) the e xploration of the rich nature of brain patterning through con volution op- erations on the spectral imaging of the signal, with weights learned under a cosine loss to improve robustness against the inherent noisy nature of electrophysiologic data. As future work, we aim to extend the experimental anal- ysis to wards alternati ve disorders, and dif ferent EEG instru- mentation or protocols; contrast the performance of the pro- posed EEG-based learners ag ainst state-of-the-art MRI- and PET -based learners on a population of indi viduals with (and without) neurodegenerativ e conditions being currently mon- itored at Instituto de Medicina Molecular; and to establish a method that is capable of performing a neurofeedback tech- nique to tackle Schizophrenia symptoms, similarly to what has been previously proposed by Nan et al. (2012). References [2016] Bhandari, A.; V oineskos, D.; Daskalakis, Z. J.; Ra- jji, T . K.; and Blumberger , D. M. 2016. A re vie w of impaired neuroplasticity in schizophrenia inv estigated with non-in v asiv e brain stimulation. F r ontiers in psychiatry 7:45. [1994] Bromley , J.; Guyon, I.; LeCun, Y .; S ¨ ackinger , E.; and Shah, R. 1994. Signature veriﬁcation using a” siamese” time delay neural network. In Advances in neural information pr ocessing systems , 737–744. [2013] Charles, P . 2013. pyriemann. https://github .com/ alexandrebarachant/pyRiemann. [2013] da Silva, F . L. 2013. Eeg and meg: relev ance to neu- roscience. Neur on 80(5):1112–1128. [2015] Dve y-Aharon, Z.; Fogelson, N.; Peled, A.; and Intra- tor , N. 2015. Schizophrenia detection and classiﬁcation by advanced analysis of ee g recordings using a single electrode approach. PloS one 10(4):e0123033. [2017] Dve y-Aharon, Z.; Fogelson, N.; Peled, A.; and Intra- tor , N. 2017. Connectivity maps based analysis of eeg for the advanced diagnosis of schizophrenia attrib utes. PloS one 12(10):e0185852. [2000] Fo wle, A. J., and Binnie, C. D. 2000. Uses and ab uses of the eeg in epilepsy . Epilepsia 41:S10–S18. [2002] Gorbachevskaya, K., and Boriso v , S. 2002. Ee g of healthy adolescents and adolescents with symptoms of schizophrenia. http://brain.bio.msu.ru/eeg schizophrenia. htm. Online; accessed 1st February 2019. [2006] Hadsell, R.; Chopra, S.; and LeCun, Y . 2006. Di- mensionality reduction by learning an in variant mapping. In 2006 IEEE Computer Society Conference on Computer V i- sion and P attern Recognition (CVPR’06) , volume 2, 1735– 1742. IEEE. [2000] Hahnloser , R. H.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; and Seung, H. S. 2000. Digital selection and analogue ampliﬁcation coexist in a cortex-inspired silicon circuit. Natur e 405(6789):947. [2016] Hindarto, H., and Sumarno, S. 2016. Feature ex- traction of electroencephalography signals using fast fourier transform. CommIT (Communication and Information T ec h- nology) J ournal 10(2):49–52. [2018] Howells, F . M.; T emmingh, H. S.; Hsieh, J. H.; van Dijen, A. V .; Baldwin, D. S.; and Stein, D. J. 2018. Elec- troencephalographic delta/alpha frequenc y acti vity dif feren- tiates psychotic disorders: a study of schizophrenia, bipolar disorder and methamphetamine-induced psychotic disorder . T ranslational psychiatry 8(1):75. [2014] Kingma, D. P ., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv pr eprint arXiv:1412.6980 . [2015] K och, G.; Zemel, R.; and Salakhutdinov , R. 2015. Siamese neural netw orks for one-shot image recognition. In ICML Deep Learning W orkshop , volume 2. [2018] Lawhern, V . J.; Solon, A. J.; W ayto wich, N. R.; Gor- don, S. M.; Hung, C. P .; and Lance, B. J. 2018. Eeg- net: a compact con volutional neural network for eeg-based braincomputer interfaces. Journal of Neural Engineering 15(5):056013. [2017] Litjens, G.; K ooi, T .; Bejnordi, B. E.; Setio, A. A. A.; Ciompi, F .; Ghafoorian, M.; V an Der Laak, J. A.; V an Gin- neken, B.; and S ´ anchez, C. I. 2017. A surve y on deep learning in medical image analysis. Medical ima ge analysis 42:60–88. [2015] Liu, K. K.; Bartsch, R. P .; Lin, A.; Mantegna, R. N.; and Iv anov , P . C. 2015. Plasticity of brain wa ve netw ork in- teractions and e volution across physiologic states. F r ontiers in neural cir cuits 9:62. [2012] Nan, W .; Rodrigues, J. P .; Ma, J.; Qu, X.; W an, F .; Mak, P .-I.; Mak, P . U.; V ai, M. I.; and Rosa, A. 2012. Individual alpha neurofeedback training ef fect on short term memory . International journal of psychophysiology 86(1):83–87. [2016] Owen, M. J.; Sa wa, A.; and Mortensen, P . B. 2016. Schizophrenia. The Lancet 388(10039):86 – 97. [2009] Sabeti, M.; Katebi, S.; and Boostani, R. 2009. En- tropy and complexity measures for eeg signal classiﬁcation of schizophrenic and control participants. Artiﬁcial intelli- gence in medicine 47(3):263–274. [2017] Schirrmeister , R. T .; Springenberg, J. T .; Fiederer, L. D. J.; Glasstetter , M.; Eggensperger , K.; T angermann, M.; Hutter , F .; Bur gard, W .; and Ball, T . 2017. Deep learning with con volutional neural networks for ee g decoding and vi- sualization. Human brain mapping 38(11):5391–5420. [2012] Snoek, J.; Larochelle, H.; and Adams, R. P . 2012. Practical bayesian optimization of machine learning algo- rithms. In Advances in neural information pr ocessing sys- tems , 2951–2959. [2014] Sriv astava, N.; Hinton, G.; Krizhevsk y , A.; Sutske ver , I.; and Salakhutdinov , R. 2014. Dropout: A simple way to prev ent neural networks from ov erﬁtting. Journal of Ma- chine Learning Resear ch 15:1929–1958.

On the use of Pairwise Distance Learning for Brain Signal Classification with Limited Observations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment