Detection of Brain Stimuli Using Ramanujan Periodicity Transforms

1 Detecti on of Bra in Sti muli Using Ramanujan Periodicity T ransforms Pouria Saidi, Azadeh V osough i and Geor ge Atia Department of Electrical and Computer Eng ineering, Universit y of Central Florida, Orlando, Fl, 32816 USA pouria.saidi@knigh ts.ucf.edu; azadeh@uc f.edu; geor ge.atia@ucf.edu Abstract — Objective: The abi l ity to efﬁciently match the fre- quency of the brain’s response to repetitive visual stimuli in real time is t he b asis fo r reliable SSVEP-based Brain-Compu t er - Interfacing (BCI). Approach: The detection of di f f erent stimuli is posed as a composite hypothesis test, where SSVEPs are assumed to admit a sparse re presentation in a Rama nujan Periodicity T ransfor m (RP T) dictionary . For th e b inary case, we d ev elop and analyze the perf ormance of an RPT d etector based on a derive d generalized likelih ood ratio test. Our approach is extended to multi-hypothesis multi-electrode settings, where we capture th e spatial correlation between th e electrodes u sing pre- stimulus data. W e also in troduce a new metric for evaluating SSVEP detection schemes based on their achiev able efﬁciency and discrimination rate tradeoff fo r giv en system resourc es. Results: W e obtain exact distributions of the test statistic in terms of conﬂuent hypergeom etric functions. Results based on extensive simulations with both synth esized and real data indi cate that the RPT detector sub stantially outperforms sp ectral-based methods. Its p erfo rmance also surpasses the state-of-the-art Canonical Correlation An alysis (CCA) methods wi th respect to accuracy and sample complexity in short data len gths regimes crucial for real-time applications. The proposed approach i s asymptotically optimal as it closes the gap to a perfect measurement bound as the data length increases. In contrast to existing supervised methods which are highly data-depend ent, t h e RPT detector on l y u ses pre-stimulus d ata to estimate the per-subject sp atial correlation, thereby di spensing with consid erable ov erhead associated with data collection for a large nu mber of sub jects and stimuli. Signiﬁcance: Our work advances t he theory and practice of emerg ing real-time BCI and aff ords a n ew fra mework f or comparing SSVEP detection schemes acro ss a wider sp ectrum of operating reg imes. Index T erms —Ramanujan periodicity transf orm, Nested pe- riodic matrices, Steady-state visual ev oked potentials, Brain computer interface, error exponent-discrimination rate tradeoff. I . I N T RO D U CT I O N B RAIN computer in terfacing (BCI) is used to connect the hum an nervous system to external d evices. Pushing the fro ntiers of su c h techn ology holds promise to assist, and improve the quality of life of, patients with mo tor disabil- ity due to various neuro logical diso r ders throug h the u se of intentio n -contro lled prosthetics. Non -in vasi ve BCIs exploit features fr o m Electro e ncephalo gram (EEG) signals distinctiv e of different tasks. For examp le, motor imag e r y BCI refe rs to a class of BCI in which featur es and p atterns are extracted from EEG sensorimotor rhythms triggered by action imagination (e.g., imag in ed movement of th e limbs). Despite n otew orthy efforts to devise a h ost of tech niques to classify EEG trials associated with movement imagination [1], [2], mo tor imager y BCI remains at its in fancy to date. Evoked-potential-ba sed BCI is ano ther non -in vasi ve tech- nology th a t records EE G signals fr om electrical a cti vity in- duced on the co rtex in response to some repetitive visual stimulus using electr o des attached to the scalp. Th e brain’ s responses to suc h external stimuli, known as Steady State V isual Evoked Poten tial (SSVEPs), are known to exhibit peri- odicity matching that of the stimuli [3]. Th e (quasi) periodic patterns of SSVEPs have been the basis for mu ch p rogress in the theor y and p ractice of SSVEP-BCI. For example, spectral- based meth ods capturing the en ergy distrib ution across the frequen cy spectru m such as Power Sp ectral Density Analy sis (PSD A) ar e pop ular c hoices for classifying SSVEPs [3 ], [4]. Howe ver , the presenc e of h igh levels of backgr ound n oise d ue to brain chatter – ine vitably superimposed on the recorded brain’ s respo nse – contin u es to be a ma jo r challenge in face of the development of r eliable SSVEP detecto rs. I ts degrading effect o n performan c e is furthe r compou nded by the stringen t an d indispensab le laten cy req u irements o f real- time BCI, in wh ich high levels o f acc uracy are mandated with short delay . For instance, the fr equency r esolution o f spectral- based metho ds is severely diminished with the u se of sho rter data length s. An altern ati ve a n d state-of -the-art appro ach to SSVEP de- tection relies on Canonical Co rrelation Analy sis (CCA) – initially pro posed in [5] to ﬁnd relation s between sets o f variates. In particular, the authors in [ 6] lev eraged CCA in developing a method for SSVEP classiﬁcation, which we refer to in this work as standard CCA. The key idea u n derlying standard CCA is obtaining the m aximum corr elation of the data with a referen c e matrix deﬁned for each class consisting of perio dic signals with the frequ ency of the stimulus a nd its harmon ics to d ecide on a class label. Some o f the state-o f-the- art supervised algo rithms originated fr om standard CCA such as in [7]–[1 0] to further boost th e perform ance o f SSVEP classiﬁcation, albe it at the expense o f heavy reliance on post-stimulus trainin g trials [11]. On e d rawback o f excessi ve reliance on such data in the trainin g phase of sup ervised methods, suc h as In dividual T em plate CCA (IT CCA), lies in the overhead a nd co st associated with d ata collection. For example, the low-freque n cy ﬂashing lights in SSVEP-based BCIs used to ob ta in large amplitude respo n ses on the brain cortex [12] could become tiring and burden some fo r some subjects, may cause eye fatigue, and may even trigger seizu res in some patients [13], [1 4]. 2 Other tha n the use of Fourier transfo r m an d perio dogra m s to iden tify perio dic signals in trinsic to time serie s-d ata (as in PSD A), there exist per iodicity estimatio n me thods that search for pe r iodicities and regular ities in d ata directly in the time domain [15], [16]. For example, the autho rs in [ 17] studied periodicity estimatio n usin g rep resentations of discrete p eri- odic sequences in Nested Per iodic Matrices ( NPMs). T hey also introdu c ed the so- called Ramanu jan Period icity Transforms (RPT) as a n in stan ce of NPMs. The use o f RPT was shown to exhibit robustness to noise and p hase shifts. Ramanujan su m s deﬁning the bases for RPT were sh own usef u l in r epresenting periodic seq uences in various applicatio ns [18]. Howe ver , only few works have inves tigated th eir application with biomedical signals, such as in the analysis of T -wa ve altern ans [19] an d the detectio n of tan dem repeats in DNA [ 20]. W e le veraged an RPT -based model to detect SSVEPs for the ﬁrst time in [21], [22] provid ing prelimina r y results fo r the cu rrent work , an d demonstra te d its ability to capture the u n derlying period ic ity in SSVEPs and its robustness to latencies na turally present in the br ain’ s respo nse to external stimuli. Contributions: I n this pap er , we b uild on o u r preliminar y prior work on the RPT m odel and extensively an alyze SSVEP detection in a compo site hypoth esis testing fr amew ork. The following summarizes the main c o ntributions of this pap er . • W e develop an RPT detector o f th e brain’ s response associated with various stimuli based on a generalized likelihood ratio test ( GLR T) u nder the proposed RPT model. • W e provide an exact analy sis of the perfo rmance o f the RPT detector by de riving the distributions of the test statistic charac te r ized in terms of con ﬂuent hypergeo met- ric fu nctions for the b inary case. • W e d evise ﬂexible Gaussian app roximatio ns of the d e- riv ed distributions that av ail an efﬁcient framework for the design of stimu lus wa veforms for high -accuracy BCIs. • W e establish a symptotic o ptimality o f the prop osed ap - proach by an a lyzing the per forman ce gap with re spect to a d erived perfect measure m ent boun d fo r the compo site test. The gap is sho wn to vanish exponentially fast in the EEG system resou rces (data leng th L and sign a l to noise ratio ( SNR )). • W e introduc e and investigate the trad eoff betwe en the error expone n t, − log P e / ( L · SNR) , c apturing the rate a t which the classiﬁcation error decay s with L and SNR and the discrim ination rate log 2 M (the log arithm of the nu mber o f classes b eing discrimin ated) for various detection metho ds including th e RPT d etector for a given number of electrod es. Inspired by the concep t of d iv ersity and multiplexing from com munication th eory [23], this tradeoff is introdu ced here fo r the ﬁrst tim e in the context of SSVEP detection and is shown to be pa rticularly useful for compar ing various d e te c tion methodo logies across an entire spectru m of op e r ation regimes. • W e extend ou r approach and de riv e th e correspo nding test for mu lti-hypo thesis an d m ulti-chann el settings, where we capture the spa tial corre latio n between the recordin g electrodes. The paper is o rganized as follows. In section II, we pr ovid e a brief ba ckgrou nd about Ramanujan sum s and RPT m atrices and their prop erties. Then , we p resent the comp osite hyp othe- sis testing mo del, an d p rovide an analysis of the RPT d etector and the associated sufﬁcient statistic f or the binary c a se. The analysis is extended to mu lti-class and mu lti-electrode settings. W e pre sent o ur results o n syn thesized and real data in Section III. Section IV is d ev oted to a discussion an d our co ncludin g remarks are in Section V. I I . M E T H O D S Notation: W e use lowercase letters fo r scalars, b old lower - case letters for vector s an d bold upp ercase letters for ma tr ices. W e use d | T to ind icate that d is a divisor of T . The Eule r totient function o f p , tha t is the nu m ber of positive integers smaller than p that ar e co- p rime to p , is den oted by φ ( p ) . Giv en a set S , the set S c denotes its c o mplemen t. Th e operato r tr( . ) denotes th e tra c e of its matrix argumen t, and log d enotes the natur al logarith m to th e b ase e , unless the base is made explicit. W e use the notatio n x ∼ N ( µ, Σ) to indicate that a random vector x has a m ultiv ariate norma l (Gaussian) distribution with mean vector µ and covariance ma tr ix Σ . A. Rama nujan Sums and R PT Dictionary In [17], the author s intro duced NPMs that can capture periodicity in sequences. T hey extend ed th e n o tion of the se matrices to p eriodicity dictionaries. In [24], period icity dictio- naries of or der P max are deﬁne d as the set of sign a ls B that can represent all perio dic sequenc es with p eriod 1 ≤ p ≤ P max throug h linear combinations o f sign a ls in the set B . I t was shown th at a diction ary that span s all subsp aces of p eriodic signals of per io ds 1 to P max must con ta in at least φ ( p ) linear ly indepen d ent signals with period p for each p . Con sequently , a periodic dictio nary of ord er P max must h av e at least P P max p =1 φ ( p ) linearly indepen dent signals. RPT matrices are instances of the NPMs built from Ra- manujan sums [25]. A Ram anujan sum is deﬁned as c q ( n ) = q X k =1 ( k,q )=1 e j 2 π kn/q , (1) where ( k , q ) is th e greatest co mmon divisor ( gcd) of k and q . The sequ ence c q ( n ) is an all integer, symmetric and periodic sequence with per iod q . For in stance, c 1 ( n ) = { 1 } , c 2 ( n ) = { 1 , − 1 } , c 3 ( n ) = { 2 , − 1 , − 1 } , c 4 ( n ) = { 2 , 0 , − 2 , 0 } and c 5 ( n ) = { 4 , − 1 , − 1 , − 1 , − 1 } . These examples show o nly one pe riod o f th e sequ ences. Prop erties o f Ramanu jan sums are investigated in [26] a n d [2 7]. One important featu re of c q ( n ) high ly r elev ant to ou r work is the orthog onality proper ty . Speciﬁcally , the sequenc e s c q 1 ( n ) and c q 2 ( n ) are or th ogon a l over th e sequen ce length L = lcm( q 1 , q 2 ) for q 1 6 = q 2 , wher e lcm is the least co mmon multiplier of q 1 and q 2 , i.e., L − 1 X n =0 c q 1 ( n ) c q 2 ( n − k ) = 0 , q 1 6 = q 2 (2) for any integer 0 ≤ k ≤ L − 1 . 3 W e deﬁne th e sequence c q c q =  c q (0) c q (1) . . . c q ( q − 1 )  T . For each 1 ≤ q ≤ P max , a submatrix C q is con structed from columns that a re circularly downshifted version s of c q , i.e., C q = h c q c (1) q . . . c ( φ ( q ) − 1) q i , (3) where c (1) q is the circu larly downshifted version o f c q , i.e., c (1) q =  c q ( q − 1 ) c q (0) c q (1) . . . c q ( q − 2)  T . (4) The versions c ( i ) q are similarly d eﬁned with hig her order downshifts. For in stan ce, C 4 and C 5 are C 4 =     2 0 0 2 − 2 0 0 − 2     4 × φ (4) C 5 =       4 − 1 − 1 − 1 − 1 4 − 1 − 1 − 1 − 1 4 − 1 − 1 − 1 − 1 4 − 1 − 1 − 1 − 1       5 × φ (5) (5) W e extend the matrices C q periodically to leng th L yielding submatrices R q . W e can readily deﬁn e the RPT diction ary matrix K [2 4] con structed by concatenatin g all sub matrices R q , q = 1 , . . . , P max , wher e P max is the largest possible value that q assumes. Th u s, K =  R 1 R 2 R 3 . . . R P max  . (6) Such dictio naries are called Nested Periodic Dictionaries (NPD). W e note that NPDs have exactly φ ( p ) signals with period p for each p . In the next two sections, we develop the SSVEP detection problem in a co mposite h ypothesis testing framework u sing the signal re presentation s in an RPT dictio- nary and analyze the perfor m ance of the developed detector . B. Binary S SVEP Detection 1) Compo site bina ry hypoth e sis testing mod el: In this sec- tion, we assume that th e SSVEP is associated with one o f two stimuli with freq uencies f 0 and f 1 , which re present two possible hy potheses H 0 and H 1 , respectively . Equiv alently , the recorded sequen ce is of period T 0 or T 1 . W e c o nsider an observation model in wh ich we exp r ess the measuremen ts under each o f the h ypothe ses using the model, H 0 : y = Kx 0 + w H 1 : y = Kx 1 + w (7) where y is an L × 1 observation (measure m ent) vector, K is th e L × P P max p =1 φ ( p ) RPT diction ary matrix ( L is the len gth of the sequences), x 0 and x 1 are the P P max p =1 φ ( p ) × 1 rep resentations of the observations in the RPT subspace, mod eled as two deterministic but un known vectors. The L × 1 vector w is an additive white Gau ssian noise ( A WGN) vector with covariance matrix σ 2 I modeling the b ackgrou nd EEG noise, that is, w ∼ N  0 , σ 2 I  . Given x m we ca n w r ite th e d istribution of y as y ∼ N  Kx m , σ 2 I  , where m ∈ { 0 , 1 } . Refer e nce [1 7] considers a period estimation prob lem where the suppo rt set (locations of the non zero en tires) of the signal repr e sentation is unknown. By contrast, here we incor porate prior infor mation about the suppor t o f x 0 and x 1 under b o th hypo theses since the perio d of the SSVEP sho uld match that of th e stimulus. W e denote the known support sets supp( x m ) for x m by S m := { j : x m,j 6 = 0 } , whe r e x m,j is the j -th elem ent o f x m , m = 0 , 1 . The non- zero e lements correspo nd to the colum ns of the sub matrices of K a ssoc ia ted with p eriod T m and its divisors. W e deﬁne K S m whose column s are the column s of K indexed b y th e support set S m , and column vector x S m whose entries are equal to th e non- z ero entries of x m indexed by S m . W e deﬁne th e SNR correspo nding to hy pothesis H m as SNR m = x T S m K T S m K S m x S m σ 2 L . (8) W ithout loss of generality , we set σ 2 = 1 so th at different SNRs can be accoun ted fo r by varying the magnitud e of the signal part on the sup port. 2) Bina ry RPT detecto r: Let f ( y | H m ) denote the con d i- tional proba bility density fu nction ( pdf) of the measu rement vector y given hyp othesis H m . For the model in (7) we ca n write: f ( y | H m ) = 1 (2 π σ 2 ) L 2 exp  −k y − Kx m k 2 2 2 σ 2  . (9) The b inary RPT d etector is ob tained fro m the GLR T d eﬁned as [28] L ( y ) = max x 1 :supp( x 1 )= S 1 f ( y | H 1 ) max x 0 :supp( x 0 )= S 0 f ( y | H 0 ) H 1 ≷ H 0 η , (10) which req uires com puting the restricted Maximu m L ikelihood (ML) estimate of x m when the sup p ort set is restric ted to S m . From ( 9), the restricted ML estimate un der H m , m = 0 , 1 , is obtained as th e solution to min x m :supp( x m )= S m k y − Kx m k 2 2 . (11) According ly , we can rewrite (11) as min k y − K S m x S m k 2 2 . (12) The solution to (12) c an be expressed as ˆ x S m = ( K T S m K S m ) − 1 K T S m y . (13) Replacing in (1 0), the d ecision rule r e d uces to ℓ ( y ) = y T By − y T Ay H 1 ≷ H 0 2 σ 2 log η = γ , (14) where B = K S 1 ( K T S 1 K S 1 ) − 1 K T S 1 , A = K S 0 ( K T S 0 K S 0 ) − 1 K T S 0 and ℓ ( y ) is the sufﬁcient statistic. The two matrice s A and B in ( 14) ar e idem potent, i.e., A 2 = A , B 2 = B a nd their eig en values are eith er 0 o r 1. 3) P erformance ana lysis o f the b inary RPT detecto r: In this section, we analyz e the perform ance of th e d etector in (14). W e start off with the special case where the length of y (measure d in num b er of samp le ) is L = lcm( T 0 , T 1 ) . In this ca se, the orthog onality o f the RPT sub- matrices is preserved. Under th is assumption, w e are a b le to obtain the exact distributions of the test statistic ℓ ( y ) , an d in tu rn pr ovide an exact per forman ce analysis. Then, we consider the gener a l case in which y is 4 (a) (b) Fig. 1. (a) Schematic for an example of the RPT dictiona ry m atrix K and its submatric es, where T 0 = 10 and T 1 = 8 . (b) RPT matrices are restricted to support sets S 0 and S 1 . In this exa mple, R 1 and R 2 exi st in both K S 0 and K S 1 matrice s. of arbitrar y len gth L . I n this case, the o r thogon ality of th e RPT su b matrices is n ot n ecessarily preser ved . For the general case, we provide an appr o ximate analy sis based on Gaussian approx imations of the distributions of ℓ ( y ) . • Orthogona l case when L = lcm( T 0 , T 1 ) : If the period s T 0 and T 1 share any d ivisors, the suppo rt sets S 0 and S 1 are no n - disjoint, i.e., S 0 ∩ S 1 6 = ∅ . T herefor e, th e dec ision rule in (14) reduces to ℓ ( y ) = y T B ⊥ y − y T A ⊥ y H 1 ≷ H 0 2 σ 2 log η = γ , (15) where the m a tr ices A ⊥ and B ⊥ are given by A ⊥ = K S 0 \ S 1 ( K T S 0 \ S 1 K S 0 \ S 1 ) − 1 K T S 0 \ S 1 B ⊥ = K S 1 \ S 0 ( K T S 1 \ S 0 K S 1 \ S 0 ) − 1 K T S 1 \ S 0 , (16) and S i \ S j denotes th e difference set of S i and S j . Th e superscript ⊥ is reserved for th is orth ogona l case thr ougho ut this sub section. Fig. 1(a) shows a schematic for an examp le of the d ictionary ma tr ix K and its submatrice s in wh ich T 0 = 10 and T 1 = 8 . Fig. 1 (b) illustrate s the RPT matrices that are restricted to support sets S 0 and S 1 and also subma tr ices K S 0 \ S 1 and K S 1 \ S 0 correspo n ding to two difference sets S 0 \ S 1 and S 1 \ S 0 , respec ti vely . As shown, the subm atrices R 1 and R 2 correspo n ding to the divisors 1 an d 2, respectively , exist in both K S 0 and K S 1 , thus do n ot play a role in the sufﬁcient statistic. Per th e deﬁnitions of A ⊥ and B ⊥ in (16), the measurement vector y is projected o nto the colum n space of K S 0 \ S 1 and K S 1 \ S 0 , and the decision rule in (15) chooses the index of the sub sp ace that yields the larger projection. This insight is dem onstrated in Fig. 2. In Fig. 2(a), we have L = lcm( T 0 , T 1 ) and th us the or thogon ality of th e RPT subspa c e s is preser ved , and in Fig. 2(b) L is arbitrary and th erefore the subspaces are no t ortho gonal. W e have the following lemma. Lemma 1. F or L = lcm( T 0 , T 1 ) , we h ave A ⊥ B ⊥ = 0 . Pr oof. Lemma 1 follows from the orthog onality p roperty in (2), which for the choice of L = lcm( T 0 , T 1 ) en sures that the c o lumns of submatr ices K S 0 and K S 1 correspo n ding to distinct divisors are ortho gonal. Since th e submatr ices K S 0 \ S 1 and K S 1 \ S 0 of the RPT dictionar y K indexed by th e difference (a) (b) Fig. 2. (a) For L = lcm( T 0 , T 1 ) , the column spaces C ( K S 0 \ S 1 ) and C ( K S 1 \ S 0 ) of the orthogonal submatri ces effec ti ve in the decision rule are orthogona l. The GLR T chooses the index correspondin g to the subspace that yields the larger projection. Here, y T 1 A ⊥ y 1 > y T 1 B ⊥ y 1 , hence the RPT dete ctor maps y 1 to H 0 . (b) For arbi trary L , the submatr ices are not orthogona l. Since, y T 2 Ay 2 < y T 2 By 2 , the RPT detecto r maps y 2 to H 1 . sets d o n o t share any column s, we h av e K T S 0 \ S 1 K S 1 \ S 0 = 0 , hence, A ⊥ B ⊥ = 0 . T o analyze the p e r forman ce of the RPT detector, we n eed f ( ℓ ( y ) | H 0 ) and f ( ℓ ( y ) | H 1 ) , the pdf s of the test statistic ℓ ( y ) under each h ypothesis. From [2 9, Lemma 1.1 ] it follows that the two qu adratic term s y T A ⊥ y an d y T B ⊥ y in ( 15) h av e Chi-squared distributions χ 2 ( r ⊥ A , λ 2 , ⊥ m,A ) and χ 2 ( r ⊥ B , λ 2 , ⊥ m,B ) un- der hy pothesis H m , whe r e r ⊥ A and r ⊥ B are the d egre e s of free- dom, and λ 2 , ⊥ m,A and λ 2 , ⊥ m,B are the non -centrality p arameters. From [3 0], we can readily state th e fo llowing theor em p roved in Appen dix A, which p rovides exact exp ressions for the distributions of the sufﬁcient statistic ℓ ( y ) for the ortho g onal case when L = lcm( T 0 , T 1 ) . Theorem 2. ( D istribution of test statistic for orthogonal c a se) The distributions f ( ℓ ( y ) | H m ) , m = 0 , 1 , of the sufﬁcient statistic ℓ ( y ) = y T B ⊥ y − y T A ⊥ y in ( 1 5) ar e given b y f ( ℓ ( y ) | H 1 ) = ∞ X i =0 exp( − 1 2 λ 2 , ⊥ 1 ,B )( λ 2 , ⊥ 1 ,B / 2) i i ! p r ⊥ B +2 i,r ⊥ A ( t ) f ( ℓ ( y ) | H 0 ) = ∞ X j =0 exp( − 1 2 λ 2 , ⊥ 0 ,A )( λ 2 , ⊥ 0 ,A / 2) j j ! p r ⊥ B ,r ⊥ A +2 j ( t ) (17) wher e p a,b ( t ) =        2 − ( a + b ) 2 Γ( a/ 2) t a + b − 2 2 e − t 2 ψ ( b 2 , a + b 2 ; t ) if t ≥ 0 2 − ( a + b ) 2 Γ( b/ 2) ( − t ) a + b − 2 2 e t 2 ψ ( a 2 , a + b 2 ; − t ) if t ≤ 0 (18) and ψ ( a, b ; x ) is deﬁned as ψ ( a, b ; x ) = (Γ( a )) − 1 Z ∞ 0 e − xt t a − 1 (1 + t ) b − a − 1 dt , (19) and th e p a rameters of the non- central Chi-squ ar ed distribu- tions ar e λ 2 , ⊥ 0 ,A = v X i =1 d i / ∈T 1 x T S 0 ,i K T S 0 ,i K S 0 ,i x S 0 ,i , λ 2 , ⊥ 1 ,A = 0 λ 2 , ⊥ 1 ,B = u X j =1 , d j / ∈T 0 x T S 1 ,j K T S 1 ,j K S 1 ,j x S 1 ,j , λ 2 , ⊥ 0 ,B = 0 r ⊥ A = v X i =1 d i / ∈T 1 φ ( d i | T 0 ) r ⊥ B = u X j =1 d j / ∈T 0 φ ( d j | T 1 ) (20) 5 (b) (a) 0 0.2 0.4 0.6 0.8 1 Probability of false alarm 0 0.2 0.4 0.6 0.8 1 Probability of detection Exact Approximation Experiment Fig. 3. (a) The exact and Gaussia n approximated distribu tions of ℓ ( y ) deri ved in (17) and (21) are in agreemen t and match the histogram of ℓ ( y ) obtained from numerical experiments. (b) The R OC curve obtain ed from numerical inte gratio n of the exa ct pdfs is close to the ones obtained from the Gaussian approximat ed pdfs and from the numerical expe riment. wher e T 0 and T 1 ar e th e sets of divisors of T 0 and T 1 of car dinalities v and u , r espectively , K S 0 ,i , i = 1 , . . . , v , and K S 1 ,j , j = 1 , . . . , u , ar e th e corr espondin g submatrices, a nd x S m ,i and x S m ,j ar e r ows of x S m r estricted to the ind ex set of the i -th and j -th divisors of T 0 and T 1 . Pr oof. Per Lem ma 1 and Th eorem 2, the sufﬁcient statistic ℓ ( y ) in (1 5) is the difference of two indep endent non -central Chi-squared ra n dom variables (R V s) . Accordin gly , the p roof of Theorem 2 f o llows fro m Theo rems 2.1B, 3.1, 3.2 , 3.3 in [30] which charac te r ize the distribution of linear combin ations of indep endent n on-cen tral Chi-squar e d R Vs in terms of co n- ﬂuent hypergeo metric fun ctions. Fur ther d e tails are provided in App endix A. In designin g BCIs, it is desirable to select wa veforms that yield well-separable hyp otheses with respect to the employed signals rep resentation. Due to th e complexity o f th e exact pdf expressions of the sufﬁcient statistic ℓ ( y ) in (17), we app r oxi- mate th ese pd f s with Gaussian distributions by ﬁtting th e ﬁrst and second order statistics. Th ese Ga ussian approx im ations lead to a ﬂexible wa veform design . Recalling that a non - central Chi-squared distribution with r d egrees of f reedom and non-ce n trality param eter λ 2 has a m ean r + λ 2 and variance 2( r + 2 λ 2 ) , the ap proxim ate Gau ssian pdfs un der H 0 and H 1 are H 0 : ℓ ( y ) ∼N  r ⊥ B − r ⊥ A − λ 2 , ⊥ 0 ,A , 2( r ⊥ B + r ⊥ A ) + 4 λ 2 , ⊥ 0 ,A  H 1 : ℓ ( y ) ∼N  r ⊥ B − r ⊥ A + λ 2 , ⊥ 1 ,B , 2( r ⊥ B + r ⊥ A ) + 4 λ 2 , ⊥ 1 ,B  (21) T o validate these theor etical results, we consider (7 ) and assume that the period s co rrespon d ing to the two stimuli are T 0 = 32 , T 1 = 18 under hypotheses H 0 and H 1 , respectiv ely . W e ﬁx SNR = − 14 dB and gen erate 200 0 ob servation vectors y under each hy pothesis. Fig. 3(a) shows tha t the exact pdfs of ℓ ( y ) u nder H 0 and H 1 derived in (17) ag ree with the approx imated Gaussian pdfs in (21) and histograms of the sufﬁcient statistic ℓ ( y ) o b tained from n umerical experimen ts. Based on these Gaussian ap proxim ations, we can obtain th e probab ility o f detectio n P D := P 1 ( ℓ ( y ) > γ ) and th e probab ility of false alarm P F := P 0 ( ℓ ( y ) > γ ) , where P m denotes the p robability measure un der the m - th hypo thesis P D = Q   γ − r ⊥ B + r ⊥ A − λ 2 , ⊥ 1 ,B q (2( r ⊥ B + r ⊥ A ) + 4 λ 2 , ⊥ 1 ,B   P F = Q   γ − r ⊥ B + r ⊥ A + λ 2 , ⊥ 0 ,A q (2( r ⊥ B + r ⊥ A ) + 4 λ 2 , ⊥ 0 ,A   (22) where Q ( . ) denote s the Q-fun ction, wh ich is th e tail o f the standard Norm al distribution. Fig. 3(b ) illustrates the R OC curves, where P D and P F correspo n ding to the exact p dfs are obtained using nume r ical integration of the exact p dfs in (17), and P D and P F correspo n ding to the app roximate Gaussian pdfs in (21) are found u sin g (22). Clearly , the R OC c urves are in close ag reement. • General case when L is arbitrary: I n this section , we treat the general case with ar bitrary L that do e s not n ecessarily lead to o rthogo nality . The two quadratic terms y T Ay an d y T By hav e χ 2 distributions χ 2 ( r A , λ 2 m,A ) a n d χ 2 ( r B , λ 2 m,B ) under hypoth eses H m , m = 0 , 1 . The param eters of these distributions c a n be wr itten as λ 2 0 ,A = x T S 0 K T S 0 AK S 0 x S 0 = x T S 0 K T S 0 K S 0 x S 0 λ 2 0 ,B = x T S 0 K T S 0 BK S 0 x S 0 λ 2 1 ,A = x T S 1 K T S 1 AK S 1 x S 1 λ 2 1 ,B = x T S 1 K T S 1 BK S 1 x S 1 = x T S 1 K T S 1 K S 1 x S 1 r A = v X i =1 φ ( d i | T 0 ) = T 0 r B = u X j =1 φ ( d j | T 1 ) = T 1 (23) Since L is ar bitrary , the two random variables y T Ay an d y T By are not necessarily ortho gonal, wherefo re th ey are not in depend ent an d ﬁnd ing the exact pdf o f ℓ ( y ) requires obtaining th e joint p df of these two depe ndent/cor related R Vs. Akin to the or thogon al case, we ap proxim ate the pdf o f ℓ ( y ) with a Gau ssian distribution by ﬁtting the ﬁrst and second order statistics and consid ering the cov ariance between the two R Vs y T Ay an d y T By for the test statistic un der eac h hypoth esis. Ther efore, we consid e r the appr o ximate model H 0 : ℓ ( y ) ∼ N  r B − r A + λ 2 0 ,B − λ 2 0 ,A , σ 2 0  H 1 : ℓ ( y ) ∼ N  r B − r A + λ 2 1 ,B − λ 2 1 ,A , σ 2 1  (24) where σ 2 0 = 2( r B + 2 λ 2 0 ,B ) + 2( r A + 2 λ 2 0 ,A ) − 2 Cov ( y T By , y T Ay | H 0 ) σ 2 1 = 2( r B + 2 λ 2 1 ,B ) + 2( r A + 2 λ 2 1 ,A ) − 2 Cov ( y T By , y T Ay | H 1 ) (25) and Cov ( y T By , y T Ay | H m ) deno tes th e covariance between y T Ay and y T By u nder H m . In A p pendix B, we show th at Cov ( y T By , y T Ay | H 0 ) = 4 λ 2 0 ,B + 2 L X i =1 L X j =1 c ij Cov ( y T By , y T Ay | H 1 ) = 4 λ 2 1 ,A + 2 L X i =1 L X j =1 c ij (26) 6 where c ij is the ( i, j ) -th e n try of ma tr ix C = A ⊙ B , and ⊙ is an element-wise pro duct. W e can readily o btain general expressions for P D and P F , P D = Q   γ − r B − λ 2 1 ,B + r A + λ 2 1 ,A q (2( r B + r A ) + 4( λ 2 1 ,B − λ 2 1 ,A ) − 4 P L i =1 P L j =1 c ij   P F = Q   γ − r B − λ 2 0 ,B + r A + λ 2 0 ,A q (2( r B + r A ) + 4( λ 2 0 ,A − λ 0 ,B ) − 4 P L i =1 P L j =1 c ij   . (27) Under the N eyma n-Pearson criteria [28], the maximum value for P D for false alarm level α , i.e., P F ≤ α , is giv en in (28). 4) P erfect measur ement b ound fo r th e binary R PT detecto r: In th is sectio n, we derive an upper bou nd on the p erform ance of the pro posed binar y RPT detecto r . W e use a standar d approa c h from detectio n theo ry in w h ich an up per bound on the perfo r mance of all compo site tests is obtain e d by assuming that the signals x 0 and x 1 are kn own, h ence the appellation ‘ p erfect measuremen t b ound ’ (PMB) [28]. Un der this hypo thetical assump tion, the pr oblem r educes to on e of a simple binary hypo thesis testing p roblem. For o ur problem , under this a ssum ption the o ptimal test be c omes ℓ ( y ) = y T ( K S 1 x S 1 − K S 0 x S 0 ) H 1 ≷ H 0 ln( η ) + 1 2 ( x T S 1 K T S 1 K S 1 x S 1 − x T S 0 K T S 0 K S 0 x S 0 ) . (30) Giv en th e mode l in ( 7), un der th e Neyman-Pearson cr iteria, P D for false alarm level α is expressed in (2 9). W e note that, (2 9) provides an u pper boun d on the p erform ance of all composite tests correspo n ding to the mo del in (7), includ ing the pr oposed RPT detector in (14). T o compare the perfor m ance of the pro posed RPT detec- tor to th e PMB, w e let ga p( L, SNR) denote the dif ference between P D of the RPT d etector a n d the PMB. Lemma 3 provides an approxim a te chara cterization of the gap in the asymptotic regime o f large L . Th ere are two m ain sou rces of appro ximation, namely using the Gaussian approx imate distribution and using an appro ximation for the Q-fun ction. Lemma 3. Suppose the SNR s corr esponding to hypo theses H 0 and H 1 , deﬁned in (7), are eq ual. F or lar ge L and SNR , the differ ence between P D of th e RPT detector in (28) an d the hypothe tica l d e tector (which corr esponds to the PMB) in (2 9) is well-appr oximated by gap( L, SNR) = P D − P D PMB ≈ e − L. SNR 2  √ 2 − e − L. SNR 2  2 √ π √ L. SNR . (31) Accor dingly , lim L · SNR →∞ | log gap | L · SNR = O (1) , (32) wher e O (1) is a co nstant 1 . 1 This is the standa rd Big O notation, so that f ( x ) = O ( g ( x )) if f there exi sts positi ve numbers C and x 0 such that | f ( x ) | ≤ C g ( x ) , ∀ x ≥ x 0 . 60 70 80 90 100 110 L.SNR 0 5 10 15 20 25 30 |log gap| (a) (b) 0 500 1000 1500 2000 Data length L 0 0.2 0.4 0.6 0.8 1 Probability of detection PMB (theory) RPT detector (theory) PMB (experiment) RPT detector (experiment) Fig. 4. (a) P MB and P D of the binary RPT det ector from both theory and numerica l experime nt. (b) | log gap | scale s linearl y in L. SNR . Pr oof. See Append ix C Lemma 3 ind icates tha t, assymp totically the gap is d omi- nated by the exponentially decayin g fun c tio n, establishing the asymptotic op timality of the proposed RPT detector , in the sense that the p erform a nce gap with respect to the PMB – which provides an u pper boun d on th e perfor mance of any composite test – appro aches zero. For a num erical example, we consider (7) and assume th at the periods correspond ing to the two stimu li ar e T 0 = 32 and T 1 = 18 un der hy potheses H 0 and H 1 , re spectiv ely . W e ﬁx SNR = − 15 dB and generate 5000 ob servation vectors y und e r each hypo thesis. Fig. 4(a) dep icts th e difference between the PMB and P D for the RPT detector in (1 4) as we vary L from max { T 0 , T 1 } to 2000. The ﬁgure conﬁrms that the numerical experiments are in agreem ent with the analytical Gaussian approx imations and veriﬁes the asymptotic op timality of the RPT detecto r . The constant slo p e o f Fig. 4(b) corro borates the linear scaling of | lo g gap | with respect to the pr oduct L · SN R as we d erived in (3 2). C. Multi-class an d Multi-electr ode SSVEP Detectio n 1) Compo site M-ary hypo thesis testing mod el: Our model in (7) can be extended to disting uishing M > 2 hypothe - ses (multiple classes) H m correspo n ding to M stimuli and periodic brain respon ses with fre q uencies f m , m ∈ M = { 0 , 1 , · · · , M − 1 } . W e can also take in to ac c ount th e me a - surements collec te d (reco rded) from m u ltiple electrod e s. Since these elec trodes lie in clo se pro ximity , their signa ls are n ot in - depend ent. Hence, it is imp ortant that su c h an extended mod e l also captu res the sp atial c o rrelation between the mea su rements of different electrodes. W e extend the m odel in ( 7) to M hypoth eses, in which the measureme nts und er each hyp o thesis are mode le d as H 0 : Y = K X 0 + W H 1 : Y = K X 1 + W . . . H M − 1 : Y = K X M − 1 + W (33) where Y is an L × N c observation (measur ement) matrix, L is the length of th e recorded data, N c is the numbe r of elec tr odes (chann e ls), an d X m , m ∈ M a r e the signal representation s of the observations in the RPT subspace. Similar to the comp osite binary hypothesis testing model in Section II -B, the signal representatio ns X m , m ∈ M , ar e mode le d a s M determ inistic 7 P D = Q    Q − 1 ( α ) q 2( r B + r A ) + 4( λ 2 0 ,A − λ 2 0 ,B ) − 4 P L i =1 P L j =1 c ij + λ 2 0 ,B + λ 2 1 ,A − λ 2 0 ,A − λ 2 1 ,B q (2( r B + r A ) + 4( λ 2 1 ,B − λ 2 1 ,A ) − 4 P L i =1 P L j =1 c ij    (28) P D PMB = Q  Q − 1 ( α ) − q x T S 1 K T S 1 K S 1 x S 1 + x T S 0 K T S 0 K S 0 x S 0 − x T S 1 K T S 1 K S 0 x S 0 − x T S 0 K T S 0 K S 1 x S 1  (29) but unk nown m atrices and w e r estrict th e row suppo rt of the signal rep resentation X m under th e m -th hypothesis to the atoms o f the RPT diction ary matrix K that span the subspace of per io dic signals with period T m . W e d enote the known support sets supp( X m ) for X m by S m := { j : x m,j 6 = 0 } , where x m,j is the j -th r ow of X m , m = 0 , . . . , M − 1 . L e t y l , k l and w l denote the l -th r ow of Y , K and W , respectively . From the extended model in (3 3) we h av e y l = k l X m + w l , l = 1 , . . . , L , m ∈ M . (34) The n oise vector w l is assum e d to b e Gau ssian with co- variance ma tr ix Σ w , i.e., w l ∼ N ( 0 , Σ w ) , wher e Σ w captures the spatial cor relation between the measurem e n ts of different electrodes. Und er these assump tions, we have y l ∼ N ( k l X m , Σ w ) . Also, we assume the vectors w l , l = 1 , . . . , L , are indep e ndent and identically d istributed. 2) Multi-cla ss RPT detecto r: Le t f ( Y | H m ) denote the condition al p d f of the measurement matrix Y given hy pothesis H m . For the extended model in (33) we can write: f ( Y | H m ) = L Y l =1 1 (2 π ) N c 2 | Σ w | 1 2 exp − ( y l − k l X m ) Σ w − 1 ( y l − k l X m ) T 2 ! . (35) Similar to the binary RPT detector in Section II-B, the m ulti- class RPT detector is ob tained f rom the GLR T assumin g unifor m pr ior prob abilities P ( H m ) = 1 M and u niform cost assignment [ 28]. W e deﬁne the generalized likelihoo d ratios (GLRs) L m ( Y ) , max X m :supp( X m )= S m f ( Y | H m ) max X 0 :supp( X 0 )= S 0 f ( Y | H 0 ) , m = 1 , . . . , M − 1 , (36) which requir e co m puting the restricted ML estimate o f X m when the support set is restricted to S m for m ∈ M . Le t ˆ X m denote the ML estimate of X m when the row suppor t set is restricted to S m . From (3 5), we ﬁnd that the ML e stima te ˆ X m is the solution to the following min X m :supp( X m )= S m tr  ( Y − KX m ) Σ w − 1 ( Y − KX m ) T  (37) such that x m,j = 0 , ∀ j ∈ S c m . Accordin gly , (37) can be rewritten as min tr  ( Y − K S m X S m ) Σ w − 1 ( Y − K S m X S m ) T  (38) where K S m and X S m are the columns of K and ro ws of X m indexed by S m , respectiv ely . The solution to (38) can be written as (f or pro of see Appen dix D) ˆ X S m =  K T S m K S m  − 1 K T S m Y . (39) Thus, the d ecision ˆ m is o btained by r eplacing (35) in ( 36), which gives ˆ m ( Y ) = arg max m ∈M tr YΣ w − 1 ˆ X T S m K T S m − K S m ˆ X S m Σ w − 1 ˆ X T S m K T S m 2 ! . (40) Replacing with th e estimates in (39), we r each ˆ m ( Y ) = arg max m ∈M tr  YΣ w − 1 Y T A m  , (41) where A m is A m = K S m  K T S m K S m  − 1 K T S m . (42) D. Err or Exponen t - Discrimination Ra te T radeoff It is conceiv able that a reliable detection scheme should be a ble to lev erage on info rmation ob tained fr om ad d itional electrodes to boo st th e system ’ s efﬁciency , giv en that the detector is tasked with distinguishing between a ﬁxed n umber of classes. A lter natively , for th e same e fﬁciency , th e additional electrodes may be explo ited to increase the detecto r’ s ability to discriminate a larger number of classes. This insight is the basis for a n ew tradeoff, intr oduced here fo r the ﬁrst time in the context of SSVEP detectio n , be twe e n ‘er ror expon ent’ and the ‘d iscrimination ra te’ deﬁn ed in our prob lem setup as − log P e / ( L · SNR) and lo g 2 M , respectively . Inspired b y th e notion of diversity and mu ltiplexing tr adeoff in in f ormation theory [23], ou r goal is to p rovide a suitable m e tric asso- ciated with a bro ader spectr um of o perating regimes acr oss which different SSVEP detectio n meth ods co uld be comp ared. Speciﬁcally , giv en th e system resources L and SNR , a detec- tion metho d can leverage a mu lti- electrode EEG system (with N c electrodes) to trad e the system’ s efﬁciency (measu red by the error pr obability P e associated with the detector) for a higher discrim ination r ate ( th e detector ’ s ability to discriminate M classes, measured by log 2 M ) and vice-versa. Th e error probab ility P e in the bin a ry hyp o thesis testing setting with unifor m prio r prob abilities P ( H 0 ) = P ( H 1 ) = 1 / 2 is P e = 1 2 (1 − P D + P F ) , thu s it can be approx imated in the high SNR r egime as P e = e − L. SNR 8 √ 2 π p L. SNR / 4 , (43) 8 using the exp r essions of P D and P F derived in (27) and the Q-functio n approx imation. Since − log P e / ( L · SNR) ca p tures the rate at wh ich the error pr o bability decays with L and SNR , we use the logar ithm o f the erro r p robability as a measur e of the system’ s efﬁciency . For distinguishing between M classes, we deﬁn e the average error probab ility as P e = 1 M M X m =1 P ( H j | H m ) for j 6 = m. (44) As a result, we can characterize th e error probab ility for an entire spec trum o f regimes correspon ding to different discrimination rates fo r a given number of electro des N c and system resou rces (in term s o f data len gth L an d SNR ). I I I . N U M E R I C A L R E S U LT S In this sectio n, we start off by presentin g some numer ical results f rom experimen ts with synthesized d ata to verify and validate the theory developed for the binary RPT detec tor of Section II-B. Then , we provide results for the multi-class and multi-electro d e RPT detector u sin g bo th synthesized and real data, along with compa r isons to the existing SSVEP detection methods, in cluding PSD A, standa r d CCA an d IT CCA [9]. As rea l data, we em ploy a p ublicly av ailable SSVEP dataset [11], which co ntains mu ltip le go al freq u encies (frequ encies of stimuli) f m , m = 0 , ..., M − 1 , with a sampling frequency of f s = 2 56 Hz, and M is the numb er of c la sses to be discriminated. The data length L (m easured in samples) is related to the data length T (m easured in seconds) thr ough T = L/f s . W e study the classiﬁcation accu racy A = 1 − P e and the Information Transfer Rate (IT R) d eﬁned in [1 1] in terms of the n umber of classes to be discr iminated M , the classiﬁcation accu racy A , and the data leng th T in secon ds, ITR =  log 2 M + A log 2 A + (1 − A ) log 2  1 − A M − 1    60 T  . W e f urther discuss the ad vantage of NPMs as ﬁnite co m plete bases for estimating the period s of p eriodic sequences and study the newly introdu ced err o r exponen t-discriminatio n rate tradeoff in terms of th e nu m ber of electro d es N c . A. Theoretical V alida tion of Binary RPT Detector U sin g Syn- thesized Data Fig. 3 validated our theoretical results for the binary RPT detector f or the orth ogon al c a se when L = lcm( T 0 , T 1 ) . Here , we validate o ur theo retical resu lts for th e gener al case when L is arbitrary . W e con sider (7 ) a nd ﬁx SNR = − 15 d B and generate 25 0 0 observation vectors y under each hy pothesis. W e consider two cases: f or case (a) we assume that th e period s correspo n ding to th e two stimuli are T 0 = 15 and T 1 = 25 under hypo theses H 0 and H 1 , re spectiv ely , an d for case (b) we assume T 0 = 3 2 and T 1 = 1 8 . Fig. 5 illu strates the R OC curves f o r these two cases, where P D and P F are obtain ed using both the expressions d eriv ed in ( 27) and the numerica l experiment. Fig. 6(a) and Fig . 6( b) sho w the error probability P e , and the cla ssiﬁcation accuracy A in term s of the data length L for the setup in case (b). These two ﬁgures show that results from b oth theo ry and experiments strong ly ag ree. 0 0.2 0.4 0.6 0.8 1 Probability of false alarm 0 0.2 0.4 0.6 0.8 1 Probability of detection T 0 = 25 Vs. T 1 = 15 L = 250(theory) L = 250(experiment) L = 500(theory) L = 500(experiment) 0 0.2 0.4 0.6 0.8 1 Probability of false alarm 0 0.2 0.4 0.6 0.8 1 T 0 = 32 Vs. T 1 = 18 L = 250(theory) L = 250(experiment) L = 500(theory) L = 500(experiment) (a) (b) Fig. 5. R OC curves show P D versus P F for both theory (Gaussian approximat ed pdfs) and numerica l experiment . (a) T 0 = 25 , T 1 = 15 , (b) T 0 = 32 , T 1 = 18 . (a) (b) 0 500 1000 1500 2000 Data length L 0 0.1 0.2 0.3 0.4 0.5 Error probability P e Theory Experiment 0 500 1000 1500 2000 Data length L 0.5 0.6 0.7 0.8 0.9 1 Accuracy Theory Experiment Fig. 6. T 0 = 32 and T 1 = 18 (a) error Probabil ity P e is obtai ned using both theory (Gaussian approximat ed pdfs) and numerical experime nt. (b) Classiﬁca tion accurac y versus data length L (in number of samples). B. Real Da ta Description for V alid a tion of Multi-cla ss a nd Multi-electr ode RPT Detector The public SSVEP daatset [11] we use is ge nerated based on the sampling fr equency o f f s = 256 Hz. T en subjects have participated in the exper iment and 15 tr ials have bee n reco rded for each sub ject per goal frequ ency . Hence, th e dataset ha s a total numbe r of 1 50 tr ials fo r each go al freq uency . W e use 9 goal freq uencies (i.e. , the n u mber o f classes to be discrim- inated is M = 9 ) for perfo rmance validation of the RPT detector and perfo rmance co mparison amon g different SSVEP detection methods. The se 9 goa l fre q uencies are f 0 = 9 . 25 , f 1 = 9 . 75 , f 2 = 10 . 2 5 , f 3 = 10 . 7 5 , f 4 = 1 1 . 25 , f 5 = 11 . 7 5 , f 6 = 12 . 25 , f 7 = 12 . 7 5 and f 8 = 14 . 25 Hz. All the record e d data is ﬁltered u sing 4-3 0 Hz ban dpass ﬁlters. The trials are recorde d u sing 8 electro des positione d on the occip ital region of the subject’ s brain cortex as dep icted in Fig. 7. The period T m correspo n ding to each goal f requency f m is co mputed as T m = f s /f m , then is ro unded to its nearest integer value. C. V alidation of Mu lti-class and Mu lti-electr ode RPT Detector Using Synth e sized and Real Data In this section, we in vestigate th e perfo rmance of th e mu lti- class and multi-electro de RPT detecto r o btained in ( 41). Since all the electro des ar e located on the oc c ipital region of the brain co r tex, the collected d a ta is normally cor related. As described in Sectio n I I-C, our m o del captures the spatial correlation betwee n the data of different electrod es through the covariance matrix Σ w . W e use p r e-stimulus data to ﬁnd a sample covariance matrix and use it as an e stima te of the tr ue Σ w . An advantage of u sing only pre-stimu lus d ata segments in estimating Σ w is that one doe s n o t need to co llect data 9 Fig. 7. Eight electrode s are positio ned on the occipi tal region of the s ubject ’ s brain cortex. Fig. 8. Schemati c for an SSVEP observa tion. There is an unknown latenc y betwee n the stimulus onset and the brain’ s response. from eac h subject u nder each goa l f requen cy f m . This is importan t in practice as it sig n iﬁcantly red uces the time, cost, and overhead associated with d ata collection, especially with a large numbe r of classes and subjects. • Latency and its effect on the pe rformances o f SSVEP d e- tection methods: It is well reco gnized th at there exists a delay (latency) betwee n the onset o f an external stimulus and the beginning of the brain’ s response [10]. This delay widely varies among different subjects a n d is g enerally u nknown. The latency is an important factor that affects the perfor mance of all SSVEP d etection meth o ds. T o test the pe rforman ce o f the RPT detecto r with resp e ct to this uncon trolled latency , we deﬁne W ait time and Time wind ow (See Fig . 8). W ait time is the time b etween the stimulus on set an d the beginnin g of T ime window . Time window is the portion of the recorded data we are using for our perfor mance ev aluation with length 0 0.1 0.2 0.3 0      W                            ! " # $ % & ' ( ) * + , - . / 1 2 3 4 5 R 6 7 S 8 9 : ; < = > ? @ A IT CCA 0 0.1 0.2 0.3 B C D E F G H I J K L M N O P Q T U V X Y Z [ 0.1 \ ] ^ _ 0.2 ` a b c 0.3 d e f g h i j Accuracy k l m n o p q r s t u v w x IT CCA (a) (b) Fig. 9. Classiﬁca tion accurac y with (a) 0.5 s econd post-stimulus SSVEPs and (b) 1 second post-sti mulus SSVEPs. The wai t time represe nts the time betwee n the onset of the stimulus and the beginni ng of our time window intende d to capture the time it takes for the brain to respond to the exte rnal stimulus. T measured in second s. Fig. 9(a) and Fig. 9(b ) com pare th e perfor mance of the RPT detecto r to tho se o f stan dard CCA and IT CCA usin g 0.5 second an d 1 second of po st-stimulus data as the wait time v aries, respectively . Fig. 9 demo nstrates th at the perfor mance of all meth o ds improve as we increase the wait time due to an improved SNR . Noting that th e com putational complexity and perf ormance of all these m ethods depe nd on the d ata len gth un d erscores an imp ortant advantage of the RPT app roach. In particular, takin g the latency into acco unt one could a ttain a better perfor mance for a short da ta len gth which is crucial fo r real-time BCI. As shown, fo r th e sho rt data length s used, the RPT meth od achieves hig her accur acy . Based on th ese resu lts we co nsider a 0 .25 second wait time after the on set of th e stimulus. • Effect of spatial co rr elation knowledge on the RPT detector performance: Fig. 10(a) p r esents the classiﬁcation accuracy A a s f unction of the da ta length T for a multi-class multi- electrode RPT d e tector using synth esized d ata. In this experi- ment, we g enerate pe riodic sequence s with N c = 8 electr odes for M = 9 classes. Th e dashed b lue cu rve correspon ds to a genie-a id ed RPT detec tor that knows Σ w . The dotted re d curve is obtained wh en th e RPT de te c tor uses an e stima te o f Σ w obtained from p re-stimulus data (real recorded d ata from one of th e subjects). The d ashed green cu r ve is fo r a RPT detector that falsely assumes no spatial correlation betwee n the recorde d data fro m different electro des and classiﬁes the data thereof, h ence this case is den oted as ‘mo del m isma tch’. • P e rfo rmance co mparison o f differ ent SS VEP meth ods using r eal data: Figs. 1 0(b) and 10(c) illustrate the accur acy and IT R of an RPT detector with N c = 8 electrod e s and M = 9 c lasses. This result is o btained using leave-p-out cro ss validation for p = 12 , which indicates th at only 2 0% o f the data is used for training. Note that IT CC A outpe rforms other methods if we increase the nu mber o f training trials for each c la ss. Fig. 1 1 shows the accuracy and ITR of the aforem entioned methods for a multi-class multi-electrod e setting using leave- one-ou t cross validation (i.e., we use the maximum n u mber of training tr ials to construct th e ref erence matrix. The ref e rence matrix is d eﬁned later) . Wh ile IT CCA a chieves th e highe st perfor mance, it is im portant to note tha t it requir es p ost- stimulus trials to constru c t the reference m atrix. He n ce, IT CCA bo osts the perform ance of standard CCA at the expense of ad ditional train ing with post-stimu lus data per subject. By contrast, the RPT detector is unsup ervised, in the sense tha t it only u ses training to estimate the spatial co r relation m atrix Σ w per sub je c t from pre-stimulu s da ta. Our r esults also indica te that th e RPT de tector can ou tperform PSD A and stan d ard CCA methods for sho rt data lengths, i.e., less than 1.5 seconds. D. Differ ence between Sta n dard CCA and RPT Detection Methods F r om Basis Representation P erspective In this sectio n , we clar ify a fun damental difference between standard CCA and the pro posed RPT metho d and that is the completen e ss of the basis used for sig n al repr esentation. Stan- dard CCA ﬁrst co nstructs a ref e rence matrix Q m ∈ R 2 N h × L 10 (a) y z { 1 | } ~ 2    3                        0 0.2         ¡ 1 Accuracy ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬  ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î x Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û (b) (c) Ü Ý Þ 1 ß à á 2 â ã ä 3 å æ ç è é ê ë ì í î ï ð ñ ò óô õ ö ÷ ø ù ú 0.2 0.3 û ü ý þ ÿ 0             1 Accuracy R  IT CCA S          P      1 1   2 2 ! 3 3 " # D $ % & ' ( ) * + , - . / 4 5 6 7 8 9 0 10 20 30 : ; < = > ? @ A B C I E F G H J K L M N O Q T U V W IT CCA X Y Z [ \ ] ^ _ ` a b c d e f Fig. 10. Number of classes and electrodes are M = 9 and N c = 8 , (a) Classiﬁc ation accurac y A versus the data length T , synthesiz ed data is used to generat e the plots, the y compare A when the RPT detector kno ws the true Σ w (dashed blue), Σ w is unkno wn and the RPT det ector estimates Σ w using pre-stimulu s data (dotted red), the RPT detect or fal sely assumes there is no spatial correlati on betwee n reco rded data of diff erent electrodes (dashed green) (b) Classiﬁcation accurac y A versus the data length T , real data is used to generate these plots, the y compare the RPT detecto r , IT CCA, standar d CCA, and PSD A. For the PSDA m ethod, the data recorded by the best electrode (the one which yields the highest accura cy) is s elected. (c) ITR versus the data length T , real data is used to generate these plots. g h i 1 j k l 2 m n o 3 p q r Data length (Seconds) 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accuracy RPT IT CCA Standard CCA PSDA 0.5 1 1.5 2 2.5 3 3.5 Data length (Seconds) 0 10 20 30 40 50 60 70 80 90 ITR (bits/min) RPT IT CCA Standard CCA PSDA (a) (b) Fig. 11. (a) Classiﬁca tion accura cy and (b) ITR of the RPT detect or , IT CCA, standard CCA and PS DA methods using leave -one-out cross val idation . from N h harmon ics. Let Q m =  q m 1 . . . q m L  , where q m l is a 2 N h × 1 co lu mn vector , deﬁned as q m l = h sin ω m l f s cos ω m l f s . . . sin N h ω m l f s cos N h ω m l f s i T (45) where ω m = 2 π f m and l = 1 , 2 , . . . , L . For the measurem ent matrix Y and th e refer ence matrix Q m , standar d CCA ﬁnds weight vectors w y ( N c × 1 ) and w q m ( 2 N h × 1 ), which maximize the correlatio n betwee n linear comb in ations o f the signals recorded from the electrodes z = Yw y and signals in the referen ce matrix s m = Q T m w q m (both are L × 1 vectors) via solv in g the following optimization prob lem ρ m = max ρ ( z , s m ) = max w y , w q m E [ zs T m ] p E [ zz T ] E [ s m s T m ] , (46) where ρ m is known as th e max im um can o nical correlation . Solving the op timization pro blem in (46) M times, once for each goa l f requen cy f m , and ﬁnding ρ m ’ s, standard CCA obtains the d e cision using the following rule ˆ m = arg max m ∈M { ρ 0 , . . . , ρ M − 1 } . (47) Howe ver , suc h refer ence matrices do not form a ﬁnite com- plete b asis fo r rep resenting perio dic signals. Hence, to be able to rep resent fully period ic signal, Stan dard CCA req u ires an inﬁnite number of harmonics to construct Q m matrices. Therefo re, the perfor mance of Stan dard CCA d e pends on the number of har monics used in the refer ence m atrix. By contrast, in the RPT method we le verage a complete ﬁnite basis in the form of NPMs. T o d emonstrate this fact u sing s t u v w h x y z { | } ~ h        h      0 1 2 3  Data length (Seconds) 0 0.2 0.4 0.6 0.8 1 Accuracy RPT      h        h   1 2 3  Data length (Seconds) 0 0.2 0.4 0.6 0.8 1 Accuracy (a) (b) Fig. 12. (a) Classiﬁcatio n accurac y of standard CCA and the RPT detector on synthesiz ed data, (b) real data. synthesized data, w e genera te random period ic sequences with periods T 0 = 3 2 an d T 1 = 18 that corresp ond to the two stimuli u n der hy potheses H 0 and H 1 , resp ectiv ely . W e ﬁx SNR = − 12 dB and generate 2 000 observation vectors y under each hy pothesis. Fig. 12(a) depicts the classiﬁcation accuracy o f the binary RPT detector and standard CCA versus different data leng ths T using the gen erated d ata. For the standard CCA method we co nsider three cases: in the ﬁrst case w e let the nu mber of harmo nics N h = 1 to generate the Q m referenc e m atrices, in the second case N h = 2 , and in the third case N h = 3 . T o pro duce Fig. 12(b) we use the real data, and for standard CCA we con sider two cases of N h = 1 and N h = 2 . Both ﬁgures show that the accu racy of standard CCA metho d improves as we increase N h and enrich the referenc e matrices. This is in con tr ast to th e RPT d etection method which u ses a ﬁnite co mplete basis f or represen tin g the periodic signals. E. Err or Expo nent- Discrimination Rate T radeo ff Based on Real and Synthesized Data In this section, we show the erro r exponent- discrimination rate tradeoff using both synthesized and re a l data. • T radeoff based on synthesized d a ta: W e gen erate rando m periodic seq uences with period T m = T 0 + ( m − 1)∆ T , where T 0 = 10 and ∆ T = 1 u nder the m odel in (33) f o r M = 11 classes and N c = 8 electrodes. W e ﬁx the data length L = 50 samp le s and SNR = − 10 dB and we genera te 500 o bservations under each hypo thesis. W e let the ( i, j ) - th entry o f the spatial c orrelation matrix be Σ w i,j = ρ d ij 11 for some 0 < ρ < 1 , where d ij denotes the distan ce between elec trodes i and j . Fig. 13(a) and 13(b) plot the erro r exponent − log P e / ( L · SNR) versus the d iscrimination rate log 2 M using the gen erated synth esized data for N c = 4 and N c = 8 , respectively . These ﬁgu r es show that the RPT detecto r achieves a better tradeo ff comp ared to stand ard CCA. Fig. 13(c) plots the same using the synthesized data f or the RPT detector f or N c = 1 , 2 , 4 , 8 . This ﬁgure shows that a larger N c yields a be tter tradeo ff between th e er ror exponen t and the discrimination rate. • T radeoff based on r eal data: Fig. 14(a) illustra tes the trad eoff using real data f or L = 256 ( T = 1 second ) an d N c = 8 , indicating that the RPT detector exhibits a better tradeo ff compare d to standard CCA and IT CCA m ethods. Fig. 1 4(b) plots the same using real date for the RPT d etector fo r N c = 1 , 2 , 4 , 8 , verifyin g th at a larger N c provides a b etter tardeoff between the error exponent and the discrimination rate. In these exper im ents SNR is estimated f r om p re-stimulus and post-stimu lus data. I V . D I S C U S S I O N W e started with an in depth analysis o f a co mposite b i- nary hy p othesis testing fr amew ork to detect pe riodic SSVEP responses. The RPT detecto r distinguishes M classes (cor- respond in g to M hypoth eses H m ) using the Ramanujan sub- space, i.e., the RPT de tector chooses the class co rrespon d ing to the su bspace that yield s the largest p rojection. W e remar k that NPMs are designed for integer periodicity estimation. Hence, we hav e to round the periods correspon ding to the stimuli frequen cies to the n earest integer values. This is the ma in drawback of employing the RPT detector for SSVE P detec tio n since we may no t have exact integer p e r iods. Given the model in section II-C1, the Raman ujan ma tr ix K S m correspo n ding to T m contains the submatrices R m and R d | m . Theref ore, the submatrices with in, inhe r ently span the subspace of T m and its divisors. If T m is an ev en integer, its divisors correspo n d to e ven har monics of the goal frequen cy f m (i.e., nf m where n is even), w h ich can enh ance the per forman ce o f th e RPT detector . W e also compared the perfo rmance of the RPT detector to th a t of a ﬁctitious detector that knows x 0 and x 1 , i.e. , the signal re p resentations in the RPT dictionary . T his detector yields an upp e r bo und on the per forman ce for the compo site test. As shown in Fig. 4, the RPT detecto r based on th e GLR T is asymp totically optimal as it clo ses th e ga p to the perfect measuremen t bound as L in creases. The p roposed RPT d etection meth od is un supervised since it does no t u se any informa tio n from po st-stimulus data and uses only pre-stimulu s d ata to estimate the covariance m atrix capturing the spatial correlation between the record ed d ata of different electrod es. The per forman ce of sup ervised m ethods is h eavily depen dent on the nu m ber of training trials. For instance, for the IT CCA m e thod it is crucial to have enoug h training data to con struct th e referen ce matr ices an d it h as been shown th a t redu cing the number of train in g tr ials can deteriorate the perfor mance o f the me thod. In co ntrast to th e ref erence matr ix in stand ard CCA which does n ot provide a complete basis for periodic signals, the RPT detector le verages a com plete dictionary spanning the subspaces of all period ic signals. V . C O N C L U S I O N W e p ropo sed and an a lyzed a new approa c h to SSVEP detec- tion using linear r e presentation s in RPT d ictionaries known to be r obust to noise an d laten cy . The RPT detec to r outp e rforms the state-o f -the-art metho ds in the short data length regime crucial fo r real-time BCI. Further, it d oes not depend o n post-stimulus d ata, which red uces the overhead associated with data collectio n in supervised me thods. Furthe rmore, we introdu c ed a new tradeoff b etween the error expo n ent and the discrimination rate, which can serve as a basis for comp aring SSVEP d etection sche m es acro ss an entire range of operation regimes where efﬁciency is traded for rate and vice-versa. A P P E N D I X A P R O O F O F T H E O R E M 2 From [29, Lemma 1.1], the R Vs y T B ⊥ y and y T A ⊥ y in (15) have non-centra l Chi-squared d istributions with r ⊥ B and r ⊥ A degrees of freedom and non- centrality parameters λ 2 , ⊥ m,B and λ 2 , ⊥ m,A , respecti vely , under hypothesis H m , where r ⊥ B = tr( B ⊥ ) , r ⊥ A = tr( A ⊥ ) , λ 2 , ⊥ m,B = µ T m B ⊥ µ m , λ 2 , ⊥ m,A = µ T m A ⊥ µ m and µ m is th e m ean of the ob servation y under H m . Mo reover , we have shown in Lemm a 1 th a t A ⊥ B ⊥ = 0 . Hence, it fo llows from [31] that y T B ⊥ y and y T A ⊥ y are indepen d ent. From the deﬁnitions ab ove and the ortho g onality of the sub matrices associated with different divisors, we can readily show th a t λ 2 , ⊥ 1 ,A = λ 2 , ⊥ 0 ,B = 0 . Th erefore , the test statistic ℓ ( y ) in (15) is the difference b etween a no n-central an d a central Chi-squ a red R V . Per [32, Theorem 1.1 ], a non -central Chi-squared R V with n degrees o f freedom can be rep r esented as a sum o f a non-cen tral Ch i- squared R V with one d egree of freedom with the same n on-cen trality parameter and a central Chi-squared R V with n − 1 degrees of freedom . Moreover , we can wr ite a central Chi-squared R V with ( m + n ) degrees of freedom as the sum o f two in depend ent Chi-squar ed R Vs with m and n degrees of fre edom. Accord ingly , un der H 1 , ℓ ( y ) = ℓ 1 ( y ) + ℓ 2 ( y ) − ℓ 3 ( y ) − ℓ 4 ( y ) (48) where ℓ 1 ( y ) ∼ χ 2 (1 , λ 2 , ⊥ 1 ,B ) , ℓ 2 ( y ) ∼ χ 2 ( r ⊥ B − 1 , 0 ) , ℓ 3 ( y ) ∼ χ 2 (1 , 0) and ℓ 4 ( y ) ∼ χ 2 ( r ⊥ A − 1 , 0) . Using [ 3 0, The o rem 3.3] which char acterizes the d istribution o f the difference of linear combinatio ns of no n -central Chi-squared R Vs, we can express the distribution of ℓ ( y ) as h ( t ) = ∞ X i =0 ∞ X j =0 q i l j p r ⊥ B +2 i,r ⊥ A +2 j ( t ) , (49) where p a,b ( . ) is deﬁne d in (18), and following f rom [30, Theorem 2.1 A] and (48), we can read ily obtain th e coefﬁcients q i ’ s and l j ’ s as q 0 = exp − λ 2 , ⊥ 1 ,B 2 ! , l 0 = 1 , (50) q i = exp  − λ 2 , ⊥ 1 ,B 2   λ 2 , ⊥ 1 ,B 2  i i ! for i 6 = 0 , (51) 12 1    2    3 ¡ ¢ £ ¤ ¥ ¦ 2 M 0 0.1 0.2 0.3 § ¨ © ª « ¬  ® ¯ ° ± ² ³ ´ µ ¶ · e /(L.SNR) N c = 4 RPT Standard CCA 1 1.5 2 2.5 3 3.5 log 2 M 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 N c = 8 RPT Standard CCA 1 1.5 2 2.5 3 3.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 N c = 1 N c = 2 N c = 4 N c = 8 log 2 M (a) (b) (c) Fig. 13. (a) and (b) Error expo nent-disc riminatio n rate tradeo ff using synthesiz ed data for N c = 4 and N c = 8 respecti vel y , with M changing from 2 to 11 with ﬁxed L = 50 and SNR = − 10 dB. (c) Error exponent- discrimina tion rate tradeof f for the RPT detect or using synthesiz ed data for differe nt N c . 1 1.5 2 2.5 3 3.5 log 2 M 0.008 0.01 0.012 0.014 0.016 -log(P ¸ )/(L.SNR) RPT Standard CCA ¹ º » ¼ ½ 1 1.5 2 2.5 3 3.5 log 2 M 2 4 6 8 10 12 14 16 -log(P ¾ )/(L.SNR) × 10 -3 N c = 1 N c = 2 N c = 4 N c = 8 1 1.5 2 2.5 3 3.5 log 2 M 2 4 6 8 10 12 14 16 -log(P ¿ )/(L.SNR) × 10 -3 N c = 1 N c = 2 N c = 4 N c = 8 (a) (b) Fig. 14. (a) Error exponent- discrimina tion rate tradeof f using real data with M changing from 2 to 9 with ﬁxe d L = 256 and N c = 8 . (b) Error exp onent- discrimina tion rate tradeof f of the RPT detector for dif ferent N c . and l j = 0 for j > 0 . Similarly , we can de riv e the corre- sponding co efﬁcients under H 0 . Based on the afo remention ed deﬁnitions fo r the non -centrality parameters, we h av e λ 2 , ⊥ 0 ,A = x T S 0 K T S 0 A ⊥ K S 0 x S 0 λ 2 , ⊥ 1 ,B = x T S 1 K T S 1 B ⊥ K S 1 x S 1 (52) which subseq u ently leads to th e equation s in (20). A P P E N D I X B P R O O F O F ( 2 6 ) Here, we an alyze the covariance o f the two q uadratic terms in the expr ession of the test statistic ℓ ( y ) in (14) u nder H 0 . Cov ( y T By , y T Ay | H 0 ) = E 0 [ y T Byy T Ay ] − E 0 [ y T By ] E 0 [ y T Ay ] (53) where E m denotes the expectation under H m . Focusing on the ﬁrst term in (53) we expand it as below: E 0 [ y T Byy T Ay ] = x T S 0 K T S 0 BK S 0 x S 0 x T S 0 K T S 0 K S 0 x S 0 + E [ w T Bww T Aw ] + x T S 0 K T S 0 B E [ wx T S 0 K T S 0 w ] + x T S 0 K T S 0 B E [ ww T ] K S 0 x S 0 + x T S 0 K T S 0 B E [ ww T Aw ] + E [ w T BK S 0 x S 0 x T S 0 K T S 0 w ] + E [ w T BK S 0 x S 0 w T ] K S 0 x S 0 + E [ w T BK S 0 x S 0 w T Aw ] + E [ w T Bw ] x T S 0 K T S 0 K S 0 x S 0 + E [ w T Bwx T S 0 K T S 0 w ] + E [ w T Bww T ] K S 0 x S 0 + x T S 0 K T S 0 BK S 0 x S 0 E [ w T Aw ] (54) Using the deﬁnitions in (2 3), it simpliﬁes to E 0 [ y T Byy T Ay ] = λ 2 0 ,B λ 2 0 ,A + r B r A + 2 L X i =1 L X j =1 c ij + 4 λ 2 0 ,B + r B λ 2 0 ,A + r A λ 2 0 ,B (55) where c ij are th e en tries o f the matrix C = A ⊙ B , wh ere ⊙ denotes the elem e n t-wise pr oduct. Similar ly , we can show that E 0 [ y T By ] E 0 [ y T Ay ] = ( λ 2 0 ,B + r B )( λ 2 0 ,A + r A ) (56) Substituting (55) and (56) in to (53) we ﬁn d Cov ( y T By , y T Ay | H 0 ) = 2 L X i =1 L X j =1 c ij + 4 λ 2 0 ,B (57) The term Co v ( y T By , y T Ay | H 1 ) can be derived sim ilar ly . A P P E N D I X C P R O O F O F L E M M A 3 From (8) W e h av e L · SNR m = x T S m K T S m K S m x S m . (58) Based on the deﬁnition s of λ 2 0 ,A and λ 2 1 ,B in ( 23), we have L · SNR 0 = λ 2 0 ,A and L · SNR 1 = λ 2 1 ,B . Bor r owing the Q- function app roximatio n in [ 33] given below Q ( x ) ≈ e − x 2 / 2 √ 2 π √ 1 + x 2 for x > 0 , (59) we can ap proxim ate (28) a n d (29) for large values of L · SNR m as th e following P D ≈ 1 − e − L (SNR 0 + SN R 1 ) 2 8 SNR 1 √ 2 π q L (SNR 0 + SNR 1 ) 2 4 SNR 1 (60) and P D PMB ≈ 1 − e − L (SNR 0 + SNR 1 ) 2 √ 2 π p L (SNR 0 + SNR 1 ) . (61) W ithout loss of g enerality we a ssume SNR = SNR 0 = SNR 1 , and thus we have L · SNR = L · SNR 0 = L · SNR 1 . Using this assumptio n we can write: P D ≈ 1 − e − L · SNR 2 √ 2 π √ L · SNR (62) 13 and P D PMB ≈ 1 − e − L. SNR √ 2 π √ 2 L. SNR . (63) Therefo re, th e gap is appro ximated b y gap( L, SNR) ≈ e − L · SNR 2 √ 2 π √ L · SNR − e − L · SNR √ 2 π √ 2 L · SNR = e − L · SNR 2  √ 2 − e − L · SNR 2  2 √ π √ L · SNR (64) with the asy mptotic ord er in (32). A P P E N D I X D Let D den ote the ﬁrst de r iv ati ve o f the expression in (3 8) with respect to X S m . The estimate ˆ X S m can be o btained by setting D e qual to z ero. It is easy to verify the f ollowing [34], D = ∂ ∂ X S m tr  − 2 Y Σ w − 1 X T S m K T S m  + ∂ ∂ X S m tr  K S m X S m Σ w − 1 X T S m K T S m  = − 2 K T S m YΣ w − 1 + 2 K T S m K S m X S m Σ w − 1 Letting D eq ual to zero and so lving for X S m we ﬁnd ˆ X S m =  K T S m K S m  − 1 K T S m Y . A C K N OW L E D G M E N T S This work was suppor ted in p art by NSF Gr ants CCF- 15259 90, CCF-155 2497 and CCF-134196 6. R E F E R E N C E S [1] Y . Shin, S. Lee, J. Lee, and H.-N. Lee, “Sparse representa tion-ba sed classiﬁca tion scheme for motor imagery-based brain–compute r interf ace systems, ” J ournal of neural engineering , vol . 9, no. 5, p. 056002, 2012. [2] P . Saidi, G. K. Atia, A. Pari s, and A. V osoughi , “Motor imag ery classiﬁca tion using multiresolutio n analysis and sparse representat ion of EEG signals, ” in IEEE Global Confer enc e on Signal and Informati on Pr ocessing , 2015, pp. 815–819. [3] A. P . Liav as, G. V . Moustakides, G. Henning, E. Z. Psarakis, and P . Husar , “ A periodogr am-based method for the detecti on of steady- state visually evo ked potential s, ” IEEE T ransac tions on Biomedic al Engineerin g , vol. 45, no. 2, pp. 242–248, 1998. [4] W . Y ij un, W . Ruiping, G. Xiaorong, and G . Shangkai, “Brai n-computer interf ace based on the high-frequenc y steady-stat e visual ev oke d poten- tial, ” in Procee dings of the Fi rst Internation al Conf . on Neural Interface and Contr ol , May 2005, pp. 37–39. [5] H. Hotelling, “Relati ons bet ween two sets of variat es, ” Biometrika , vol. 28, no. 3/4, pp. 321–377, 1936. [6] Z. Lin, C. Zhang, W . W u, and X. Gao, “Frequenc y recognition based on canonical correlat ion analysis for SSVEP-based BCIs, ” IEEE T r ans- actions on Biomedical Engineering , vol. 53, no. 12, pp. 2610–2614, 2006. [7] Y . Zhang, G. Zhou, Q. Zhao, A. Onishi, J. Jin, X. W ang, and A. Cichocki, “Multiway canonical correlat ion analysis for frequency component s recogniti on in SSVEP-based BCIs, ” in Neural Information Pr ocessing . Springer , 2011, pp. 287–295. [8] Y . Zhang, G. Zhou, J . Jin, M. W ang, X. W ang, and A. Cichocki , “L1-reg ularize d multi way canonica l correlation analysi s for SSVEP- based BCI, ” IEEE T ransactions on Neural Systems and Rehabilitat ion Engineerin g , vol. 21, no. 6, pp. 887–896, 2013. [9] G. Bin, X. Gao, Y . W an g, Y . Li, B. Hong, and S. Gao, “ A high- speed BCI based on code modulation VEP, ” J ournal of neural engi neering , vol. 8, no. 2, p. 025015, 2011. [10] X. Chen, Y . W ang, S. Gao, T .-P . Jung, and X. Gao, “Filter bank canonica l correla tion analysis for implemen ting a high-speed ssvep-based brain– computer interfa ce, ” Journal of neural engine ering , vol. 12, no. 4, p. 046008, 2015. [11] M. Nakanishi, Y . W a ng, Y .-T . W ang, and T .-P . Jung, “ A comparison study of canonic al correlati on ana lysis based methods for detectin g steady-st ate visual e vok ed potenti als, ” PL O S ONE , vol. 10, no. 10, 2015. [12] M. H. Chang, H. J. Bae k, S. M. Lee, and K. S. Park, “ An amplit ude- modulate d visual stimulatio n for reducing eye fatigue in ss vep-based brain–c omputer interface s, ” Clinical Neur ophysio logy , vol. 125, no. 7, pp. 1380–1391, 2014. [13] R. S. F isher, G. Harding, G. Erba, G. L. Barkley , and A. Wil kins, “Photic -and pattern- induced seizures: a re vie w for the epile psy founda- tion of americ a working group, ” Epilepsia , vol. 46, no. 9, pp. 1426–1441 , 2005. [14] W . Y ij un, W . Ruiping, G. Xiaorong, and G. Shangka i, “Brain-co mputer interf ace based on the high-frequenc y s teady-state visual ev oke d poten- tial, ” in Neural Interface and Contr ol, 2005. Proc eedings. 2005 Fi rst Internati onal Confer ence on . IEEE , 2005, pp. 37–39. [15] D. D. Muresan and T . W . Parks, “Orthogonal, exactl y periodic subspace decomposit ion, ” IEE E Tr ansacti ons on Signal Proc essing , vol. 51, no. 9, pp. 2270–2279, 2003. [16] W . A. Sethares and T . W . Staley , “Periodici ty transforms, ” IEEE T rans- actions on Signal Pro cessing , vol . 47, no. 11, pp. 2953–2964, Nov 1999. [17] S. V . T enneti and P . V aidyanath an, “Nested periodic matrices and dicti onaries: new signal representat ions for period estimation, ” IEEE T r ansaction s on Signa l Pr ocessin g , vol. 63, no. 14, pp. 3736–3750, 2015. [18] M. Planat, M. Minaro vjech, and M. Saniga, “Ramanu jan sums analysis of long-period sequences and 1/f noise, ” EPL (Eur ophysics L etters) , vol. 85, no. 4, p. 40005, 2009. [19] L. Mainardi, M. Bertinel li, and R. Sassi, “ Anal ysis of t-wav e alternans using the ramanujan transform, ” in Computer s in Cardiolo gy , 2008 . IEEE, 2008, pp. 605–608. [20] S. V . T enneti and P . V aidyana than, “Detectin g tandem repeats in DNA using ramanujan ﬁlter bank, ” in Circ uits and Systems (ISCAS), 2016 IEEE Internation al Symposium on . IEEE, 2016, pp. 21–24. [21] P . Saidi, G. Atia, and A. V osoughi, “Dete ction of visual evok ed pote ntials using Ramanujan periodicit y transform for real time brain computer interf aces, ” in IE EE Internationa l Confer ence on Acoustics, Speech and Signal Proc essing (ICASSP) , 2017, pp. 959–963. [22] ——, “On robust detection of brain stimuli with Ramanujan periodicit y transforms,, ” in 51st IEEE Asilomar Confer ence on Signals, Systems and Computer s , 2017. [23] L. Z heng and D. N. C. T se, “Div ersity and multiple xing: A fundamental tradeof f in multiple-a ntenna channe ls, ” IEEE T ransact ions on informa- tion theory , vol. 49, no. 5, pp. 1073–1096, 2003. [24] S. V . T enneti and P . V aidyana than, “ A uniﬁed theory of union of subspaces represent ations for period estimation , ” IEE E T ransact ions on Signal Proc essing , vol. 64, no. 20, pp. 5217–5231, 2016. [25] S. Ramanujan, “On certai n trig onometric al sums and their applicati ons in the theory of numbers, ” Tr ans. Cambridg e Philosoph. Soc , vol. XXll, no. 13, pp. 259–276, 1918. [26] P . V aidyanat han, “Ramanu jan sums in the conte xt of signal processing Part I: Fundamental s, ” IEEE T ransact ions on Signal Proce ssing , vol. 62, no. 16, pp. 4145–4157, 2014. [27] ——, “Ramanuja n sums in the conte xt of signal processing Part II: For represen tation s and appl icatio ns, ” IEEE T ransaction s on Signal Pr ocessing , vol. 62, no. 16, pp. 4158–4172, 2014. [28] H. L. V an Trees, Detection , estimation, and modulation theo ry P art I . John Wi ley & Sons, 2004. [29] T . W . Anderson and G. P . Styan, “Cochran’ s theorem, rank additi vity , and tripotent m atrice s, ” Stanford Univ ersity , Dept. of Statistic s, T ech. Rep. 43, 1980. [30] S. J. Press, “Linear combinations of non-centra l chi-square v ariate s, ” The Annals of Mathemati cal Statistics , pp. 480–487, 1966. [31] W . G. Coc hran, “The distri butio n of quadrati c forms in a normal system, with application s to the analy sis of cov ariance, ” in Mathematical Pr oceedi ngs of the Cambridge Philosophic al Society , vol. 30, no. 2. Cambridge Uni versit y Press, 1934, pp. 178–191. [32] H. O. Lancaster and E. Seneta, Chi-squar e distribu tion . W ile y Online Library , 1969, ch. VII. [33] P . Borjesson and C.-E. Sundberg, “Simple approximatio ns of the error functio n q (x) for communicatio ns applica tions, ” IEEE T ransac tions on Communicat ions , vol. 27, no. 3, pp. 639–643, 1979. [34] K. B. Petersen, M. S. P edersen et al. , “The matrix cookbook, ” T echnical Univer sity of Denmark , vol. 7, no. 15, p. 510, 2008.

Detection of Brain Stimuli Using Ramanujan Periodicity Transforms

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment