Supervised classification for a family of Gaussian functional models
In the framework of supervised classification (discrimination) for functional data, it is shown that the optimal classification rule can be explicitly obtained for a class of Gaussian processes with "triangular" covariance functions. This explicit kn…
Authors: Amparo Baillo, Juan Antonio Cuesta-Albertos, Antonio Cuevas
Sup ervised classification for a family of Gaussian functional mo dels Amparo Ba ´ ıllo ∗ , Juan An tonio Cuesta-Al b erto s † and An tonio C uev as ∗ ∗ Universidad A ut´ onoma de Madrid and † Universidad de Ca ntabria Abstract In the framew ork of sup ervised classification (discrimination) for f u nctional data, it is sho wn that th e optimal cla ssification rule can b e explicitly obtained for a class of Gaussian pro cesses with “triangular” co v a riance functions. T his explicit kno wledge has t w o p ractical consequences. First, the consistency of the wel l- kno wn n earest neig hbors c lassifier (whic h is not guaran teed in the p roblems with functional data) is established for the indicated class of pro cesses. Second, and more imp ortant, parametric and nonparametric plug-in classifiers can b e obtained b y estimating the unknown eleme nt s in the optimal rule. The p er f ormance of these new plu g-in classifiers is chec ked, with p ositiv e re- sults, through a simulation stud y and a real d ata example. 1 In tro du ction Statement of the pr ob l e m. Notation Discrimination, also called “sup ervised classification” in mo dern terminology , is one of the oldest statistical problems in exp erimen tal science: the aim is to decide whether a random observ a t io n X (taking v alues in a “feature space” F endow ed with a dis- tance D ) either belong s to the p opulation P 0 or t o P 1 . F or example, in a medical problem P 0 and P 1 could corresp ond to the group of “health y” and “ill” individuals, resp ectiv ely . The decision must b e taken from the information pro vided b y a “tr aining sample” X n = { ( X i , Y i ) , 1 ≤ i ≤ n } . Here X i , i = 1 , . . . , n , are indep enden t replications of X , measured o n n ra ndomly c hosen individuals, and Y i are the corr esp o nding v alues of an indicator v ariable which tak es v alues 0 or 1 according to the mem b ership of the ∗ These authors have been partially suppor ted by Spanish grant MTM2007 -6663 2. † This author hav e b een partially supported by the Spanish gran t MTM200 8-060 7-C02- 02. E-mail a ddresses: ampar o.baillo@ua m.es, cuesta j@unican.es, a nt onio.cuev as@uam.es 1 i -th individual to P 0 or P 1 . The term “sup ervised” refers to the fact t ha t the individuals in the training sample are supp osed to b e correctly classified, typically using “external” non statistical pro cedures, so that they prov ide a reliable basis for the assignation of the new observ ation. It is p ossible to consider the case where K > 2 p opulations, P 0 , . . . , P K − 1 are in v olv ed but, in what follows, we will restrict ourselv es to the binary case K = 2. The mathematical problem is to find a “classifier” (or “classification rule”) g n ( x ) = g n ( x ; X n ), with g n : F → { 0 , 1 } , that minimizes the classification error P { g n ( X ) 6 = Y } . It is not difficult to pro v e (e.g., Devro y e et al ., 1996, p. 1 1) that the optimal classification rule (often called “ Bay es rule”) is g ∗ ( x ) = I { η ( x ) > 1 / 2 } ( x ) , (1) where η ( x ) = E ( Y | X = x ) a nd I A stands for the indicato r function o f a set A ⊂ F . Of course, since η is unkno wn the exact expression of this rule is usually unkno wn, and th us different pro cedures ha v e b een pro p osed to approx imate g ∗ using the training data. F rom no w on we will use the f o llo wing notatio n. Let µ i b e the distribution of X conditional on Y = i , that is, µ i ( B ) = P { X ∈ B | Y = i } for B ∈ B F (the Borel σ -a lgebra on F ) and i = 0 , 1. W e denote by S i ⊂ F the supp ort of µ i , for i = 0 , 1, S = S 0 ∩ S 1 and p = P { Y = 0 } (w e assume 0 < p < 1). G iv en t w o measures µ and ν , the expression µ < < ν denotes that µ is absolutely contin uo us with respect to ν (i.e., ν ( B ) = 0 implies µ ( B ) = 0). The notation C [0 , 1] stands for the space of real con tinuous functions on the in terv al [0 , 1] endo w ed with the usual suprem um norm, denoted by k·k . The subspace of functions of class 2 (i.e. with tw o contin uous deriv ativ e s) is denoted by C 2 [0 , 1]. Finite dimensio nal s p ac es . Th r e e c l a s sic a l discrim ination pr o c e dur es The origin of the discrimination problem go es back to the classical w ork b y Fisher (1936) where, in the d -v a r ia te framework F = R d , a simple “linear classifier” of t ype g n ( x ) = I { w ′ x + w 0 > 0 } w as in tro duced for the case that b oth p opulat ions P 0 and P 1 are homoscedastic, that is, ha v e a common co v aria nce matrix Σ. In tuitiv ely , w ′ x + w 0 = 0 is c hosen as the affine h yperplane whic h provides the “maxim um separation” b etw een b oth p opulatio ns. It is w ell-kno wn (see, e.g., Duda et al . 2000 for details) that the the expression of Fisher’s rule turns out to dep end on the in v erse Σ − 1 of the cov ariance matrix. It is also known that Fisher’s linear rule is in f act the optimal one (1 ) when the conditional distributions o f X | Y = 0 and X | Y = 1 a re homoscedastic normals a nd all the means and cov ariances a r e kno wn. These conditions lo ok quite restrictiv e but, as argued by Hand (200 6 ) in a pro v o cativ e pap er, F isher’s rule (o r rather its sampling appro ximation obtained by estimating the unknow n parameters) is hard to b eat in 2 practical examples. T hat is, while it is not difficult to construct examples where this rule outrageously fails, its p erformance is quite go o d in most cases found in real-life examples. F or this reason, Fisher’s linear rule is still the mo st p opular classification too l among practitioners, in spite of the p osterior intensiv e researc h on this topic. Thu s, in a w a y , Fisher’s rule represen ts a sort of “golden standard” in the m ultiv ariate statistical discrimination problem. The b o oks b y Devro y e et al . (19 96), Duda et al . (2000) and Hastie et al . (200 1) offer different in teresting p ersp ectiv es of the w ork done in discrimination theory since Fisher’s pioneering pap er. All of them fo cus on the standard m ultiv aria te case F = R d . Man y classifiers ha v e b een pro p osed as an alternative to Fisher’s linear rule in this finite-dimensional setup. One of the simplest and easiest to motiv ate is the so-called k -nearest neigh bors metho d. Fixed a p ositiv e integer v a lue (or smo othing parameter) k = k n this rule simply classifies an incoming observ ation x in the p opulation P 1 if the ma jority among the k training observ at io ns closest to x ( with resp ect to the considered distance D ) b elong t o P 1 . More concretely t he k - NN rule can b e defined by g n ( x ) = I { η n ( x ) > 1 / 2 } , (2) where η n ( x ) = 1 k n X i =1 I { X i ∈ k ( x ) } Y i (3) and “ X i ∈ k ( x )” means tha t X i is one of the k nearest neigh b ors of x . In fact, the definition of t he k -NN rule is extremely simple a nd can b e introduced (in terms of “ma jority v ote among the neigh b ors”) with no explicit reference to an y regression estimator. Ho w ev er, the idea of replacing the unknow n regr ession function η ( x ) in the optimal classifier (1) with a regr ession estimator (giv en by (3) in the case of the k - NN rule) is very natural. It suggests a general metho dology to construct a wide class of classifiers b y just plugging in differen t regression estimators η n in (1) instead of the true regression function η ( x ). In the finite dimensional case F = R d this is a particularly fruitful idea, as a w ealth of different (parametric and nonparametric) estimators of η ( x ) is a v ailable; see Audib ert and Tsybak o v (2007) for some r easons in fa v or of the plug-in metho dology in classification. The main purp ose of this w ork is to sho w that the plug-in metho dolo gy can b e also successfully used f o r classification in some functional data mo dels. Discrimination of functional data. Differ enc es with the finite-dimensio nal c ase W e are concerned here with the problem of (binar y) sup ervised classification with functional data. That is, w e assume thro ug hout that the space ( F , D ) where the data 3 X i liv e is a separable metric space (ty pically a space of functions). F or some theoretical results, considered b elo w, w e will imp ose mor e sp ecific assumptions on F . The study of discrimination tech niques with functional data is not as dev eloped as the corresp onding finite-dimensional theory but, clearly , is o ne of the most activ e researc h topics in the b o oming field of functional data analysis (F D A). Tw o w ell-kno wn b o oks including broad ov erviews of FDA with in teresting examples are F errat y and Vieu (2006) and Ramsa y and Silv erman (2005). A recen t surv ey on sup ervised and unsup ervised classification with functional data can b e f ound in Ba ´ ıllo et al . (200 9). While the fo r ma l statemen t of the functional classification problem is very m uc h the same as that indicated at the b eginning of this section, there a re some imp ortan t differences with the classical finite-dimensional case. (a) L ack of a simple functional vers i o n o f F isher’s line ar rule : As men tioned ab ov e, the idea b ehind Fisher’s rule requires to in v ert the co v ariance op erator. When F = R d this is increasingly difficult as the dimension d increases, but it becomes imp ossible in the functional framew ork where the op erator is t ypically not in v ertible. Th us the applicabilit y of F isher’s linear metho dolog y to functional data is a no n- trivial issue of curren t in terest for researc h. See, for instance, James and Hastie (200 1) and Shin (2008) for in teresting adaptations of linear discrimination ideas to a functional setting. (b) Difficulty to i m plement the plug-in ide a : Unlik e the finite-dimensional case, the plug-in metho dology is no t generally considered as a standard pro cedure to con- struct functional classifiers . When x is infinite-dimensional, there ar e y et few simple parametric mo dels giving a go o d fit to the regression f unction and the structure of nonparametric estimators o f η is relatively complicated. (c) The k -NN functional cla ssifier is n ot universal ly c onsistent : In the discrimination problem a sequence of classifiers { g n } , based on samples of size n , is said to b e “consisten t” when the corresponding sequence of classification errors conv erges, as n t ends to infinit y , to the “low es t p ossible error” att a ined b y t he Ba y es classifier (1); see Section 3 b elo w for more details. It turns out (see Stone, 1977) that, in the case of finite-dimensional data X i ∈ R d , an y sequence o f k -NN classifiers is consisten t pro vided that k n → ∞ and k n /n → 0. Since suc h consiste ncy holds irre- sp ectiv ely o f the distribution o f the data ( X , Y ), this prop erty is called “ univ ersal consistency ”. The definition of the k -NN classifier can b e easily tra nslated to the functional setup (by replacing the usual Euclidean distance in R d with an appropria te func- tional metric D ). Ho w ev er, the univ ersal consistency is lo st. C´ e rou and Guyader 4 (2006, Th. 2) ha v e o btained sufficien t conditions for consis tency of the k -NN classifier when X tak es v alues in a separable metric space. Nev erthele ss, the re- quired assumptions are not alwa ys trivial to c hec k. As the k -NN rule is a natura l “default c hoice” in infinite-dimensional setups, an imp o r tan t issue is to ensure its consistency , at least for some functional mo dels of practical in terest. The purp ose and structur e of this p ap er This w ork aims to partia lly fill the gaps p oin ted out in the p oin ts (b) and (c) of the ab ov e para graph. T o this end, in Subsection 2.1 a simple express ion is obtained for the Ba y es (optimal) rule g ∗ in the case that b oth distributions, µ 0 and µ 1 , are equiv alen t. How ev er, g ∗ turns out to dep end on the Rado n-Nik o dym deriv ativ e dµ 0 /dµ 1 whic h is usually unkno wn, or ha s an extremely in v olv e d expression, ev en when µ 0 and µ 1 are completely kno wn. An in teresting exception is giv en b y G aussian pro cesses with a sp ecific t yp e o f cov ariance functions, called “triangular”. F or these pro cesse s the Ra do n-Nik o dym deriv ativ e has b een explicitly calculated by V arb erg (19 6 1) and Jørsb o e ( 1968) whose results are collected and briefly commen ted in Subsection 2.2. In Subsec tion 2.3 parametric plug-in estimators for g ∗ are obta ined b y assuming that µ 0 and µ 1 are either (parametric) Brow nian motions or O r nstein-Uhlenbec k pro cesses. Non-parametric plug-in estimators for g ∗ are prop osed and analyzed in Subsection 2.4, under the sole assumption that the co v ariance functions are triang ular. Since the pro ofs of the results in this subsection are rather tec hnical, t hey are deferred to a final app endix. This concludes our con tributions regarding issue (b). Section 3 is dev oted to the k - NN consistency pro blem introduced in (c): w e use the ab ov e-mentioned result b y C ´ erou and Guy ader (2006) to show that the k -NN rule is consisten t in functional classification problems where the data are generated b y certain Gaussian triangular pro cesses sp ecified in Subsection 2.2. Finally , in Section 4 the pr a ctical p erformance of the plug- in rules prop osed in Section 2 is chec k ed, and compared with the k -NN rule, throug h a sim ulation study and the analysis of a real data example. 2 The optimal clas s ifier for a Gauss ian family 2.1 A general expression based on Radon-Nik o dym d eriv ativ es When the distributions µ 0 and µ 1 of P 0 and P 1 are b oth absolutely contin uous with resp ect to some common σ -finite measure µ , it is easy to see, as a consequence of Ba y es form ula, that the optima l rule is g ∗ ( x ) = I { (1 − p ) f 1 ( x ) >pf 0 ( x ) } , (4) 5 where p = P { Y = 0 } and f 0 , f 1 are the µ -densities of P 0 and P 1 , resp ectiv ely . The expression ( 4 ) is particularly imp ortan t in the finite dimensional problems with F = R d , where the Leb esgue measure µ arises as the nat ural reference measure and the corresp onding Leb esgue densities can b e estimated in man y w a ys. In the infinite- dimensional spaces there is no suc h ob vious dominant measure. How ev er if w e assume that µ 0 and µ 1 , with supp orts S 0 and S 1 , are absolutely con tin uous with resp ect to eac h other on S 0 ∩ S 1 , the optimal rule can b e also expressed in a simple w a y with resp ect to the Radon-Nik o dym deriv ativ e dµ 0 /dµ 1 as show n in the following result. Theorem 1 Assume that µ 0 << µ 1 and µ 1 << µ 0 on S = S 0 ∩ S 1 . Then η ( x ) = 0 if x ∈ S 0 ∩ S c 1 if x ∈ S 1 ∩ S c 1 − p p dµ 0 dµ 1 ( x ) + 1 − p if x ∈ S. (5) pr ovi d es the expr ession fo r the optimal rule g ∗ ( x ) = I { η ( x ) > 1 / 2 } . Proo f: Define µ = µ 0 + µ 1 . Then µ i << µ , for i = 0 , 1, and w e can define the Radon- Nik o dym deriv ativ es f i = dµ i /dµ , for i = 0 , 1. F rom the definition o f the conditional exp ectatio n we know that η ( x ) = E ( Y | X = x ) = P ( Y = 1 | X = x ) can b e expressed by η ( x ) = f 1 ( x )(1 − p ) f 0 ( x ) p + f 1 ( x )(1 − p ) . (6) Observ e that µ | S c ∩ S i = µ i | S c ∩ S i and th us f i | S c ∩ S i = I S c ∩ S i , for i = 0 , 1. Since µ 0 << µ 1 and µ 1 << µ 0 on S then, on this set, t here exists the Radon-Nik o dym deriv ative s dµ 0 /dµ 1 and dµ 1 /dµ 0 . In this case, it also holds that µ | S << µ i | S , for b oth i = 0 , 1 and dµ dµ i ( x ) = 1 + dµ 1 − i dµ i ( x ) , for any x ∈ S. Then (see, e.g., F olland 1999 ) , for i = 0 , 1 and for P X -a.e. x ∈ S , f i ( x ) = dµ i dµ ( x ) = dµ dµ i ( x ) − 1 = 1 1 + dµ 1 − i dµ i ( x ) (7) Substituting (7) in to expression (6) we get (5). ✷ The m utual absolute contin uity is not a v ery restrictiv e assumption if w e deal with Gaussian measures. According to a w ell-kno wn result b y F eldman and H´ ajek (see F eld- man, 1958) for any giv en pair of Gaussian pro cesses , there is a dich otomy in suc h a w a y that they a r e either equiv alent or mutually singular. In the first case b oth measures µ 0 and µ 1 ha v e a common supp ort S . As for the iden tification of the supp ort, V a khania 6 (1975) has pro v ed that if a Gaussian pro cess, with tra jectories in a separable Banach space F , is not degenerate (i.e., the distribution of an y non-trivial linear con tin uous functional is not degenerate) then the supp ort of such pro cess is the whole space F . In an y case, expression (5) would b e of no practical use unless some expressions , reasonably easy to estimate, can b e fo und for the Radon-Nik o dym deriv ative dµ 0 /dµ 1 . This issue is considered in the next subsection. 2.2 Explicit expression for a family of Gaussian distributions The b est k no wn Gaussian process is perhaps the standard Brow nian motion { W ( t ) , t ≥ 0 } , for whic h E ( W ( t )) = 0 and the cov ariance function is Co v( W ( s ) , W ( t )) := Γ( s, t ) = min( s, t ). A wide class of Brow nian-ty p e pro cesses can b e obtained by lo cation a nd scale c hanges of ty p e m ( t ) + σ W ( t ), where m ( t ) is a giv en mean function and σ > 0. In fact, the cov ariance structure Γ( s, t ) = min ( s, t ) can b e generalized to define a m uc h broader class of pro cesses with Γ ( s , t ) = u (min( s, t )) v (max( s, t )), where u and v denote suitable real functions. Co v ariance functions of this type a re called triangular . They ha v e receiv ed considerable a tten tion in the literature. F or example, Sack s and Ylvisak er (1 966) use this condition in the study of optimal designs for regr ession prob- lems where the erro rs are generated b y a zero mean pro cess with co v ariance function Γ( s, t ). It turns out that the Hilb ert space with repro ducing k ernel K pla ys an imp or t a n t role in the results and, as these authors p oin t out, the norm of this space is par t icularly easy to handle when Γ is triangular. On the other hand, V arb erg ( 1 964) has giv en a n in teresting represen tat io n of t he pro cesses X ( t ) , 0 ≤ t < b , with zero mean and trian- gular co v ariance function. This author pro v ed that they can b e expresse d in t he f o rm X ( t ) = R b 0 W ( u ) d u R ( t, u ), where W is t he standard Wiener pro cess and R = R ( t, u ) is a function, of b ounded v aria tion with resp ect to u , defined in terms of Γ. The so-called Ornstein-Uhlenbec k mo del, for whic h Γ( s, t ) = σ 2 exp( − β | s − t | ) ( β , σ > 0), pro vides another imp ortant class of pro cesses with triangular cov ariance functions. They are widely used in ph ysics a nd finance. The follow ing theorem is due to V arb erg (1961 , Th. 1) and Jø r sb o e (1968, p. 61). It shows t ha t the Radon-Nik o dym deriv ative can b e expressed in a closed, rel- ativ ely simple w a y for these special classes of Gaussian pro cesses. F or more informa- tion concerning explicit expressions of Rado n- Nik o dym deriv ativ es for Gaussian pro- cesses see Segall and Kaila th (19 75) and references therein. F rom no w on let us denote m i ( t ) = E ( X ( t ) | Y = i ). Theorem 2 Let ( F , D ) = ( C [0 , 1] , k · k ) . Assume that X | Y = i , for i = 0 , 1 , are Gaus- sian pro cesses o n [0 , 1] , with co v ariance functions Γ i ( s, t ) = u i (min( s, t )) v i (max( s, t )) , for s, t ∈ [0 , 1] , where u i , v i , for i = 0 , 1 , are p o sitiv e functions in C 2 [0 , 1] . Assume 7 also that v i , for i = 0 , 1 , and v 1 u ′ 1 − u 1 v ′ 1 are b ounded aw ay fr om zero on [0 , 1] , that u 1 v ′ 1 − u ′ 1 v 1 = u 0 v ′ 0 − u ′ 0 v 0 and that u 1 (0) = 0 if and o nly if u 0 (0) = 0 . a) Assume that m i ≡ 0 , for i = 0 , 1 . Then there exist some constan ts C 1 , C 2 , C 3 and a function F , whose expressions ar e give n in the pro of , suc h that dµ 0 dµ 1 ( x ) = C 1 exp 1 2 C 3 x 2 (0) + C 2 x 2 (1) − Z 1 0 x 2 ( t ) v 0 ( t ) v 1 ( t ) dF ( t ) . (8) b) Assume now that the cov ariance functions a re identical, i.e. u i = u and v i = v for i = 0 , 1 , that m 1 ≡ 0 , m 0 is a function m ∈ C 2 [0 , 1] , suc h that m (0) = 0 whenev er u (0) = 0 . Then the re exist some constants D 1 , D 2 and a function G , whose expressions are g iv en in the pro o f, suc h tha t dµ 0 dµ 1 ( x ) = exp D 1 + D 2 − 2 G (0) v (0) x (0) + 2 G (1) v (1) x (1) − 2 Z 1 0 x ( t ) v ( t ) dG ( t ) . (9) Proo f: a) V arberg (1961, Th. 1) sho ws t hat, under the assumptions of ( a ), µ 0 and µ 1 are equiv alen t measures. The R adon-Nik o dym deriv ativ e of µ 0 with resp ect to µ 1 is dµ 0 dµ 1 ( x ) = C 1 exp 1 2 C 4 x 2 (0) + Z 1 0 F ( t ) d x 2 ( t ) v 0 ( t ) v 1 ( t ) , (10) where C 1 = v 0 (0) v 1 (1) v 0 (1) v 1 (0) 1 / 2 if u 0 (0) = 0 u 1 (0) v 1 (1) v 0 (1) u 0 (0) 1 / 2 if u 0 (0) 6 = 0 C 4 = 0 if u 0 (0) = 0 v 0 (0) u 0 (0) − u 1 (0) v 1 (0) v 1 (0) v 0 (0) u 0 (0) u 1 (0) 1 / 2 if u 0 (0) 6 = 0 and F = ( v 1 v ′ 0 − v 0 v ′ 1 ) / ( v 1 u ′ 1 − u 1 v ′ 1 ). Observ e t ha t, by the ass umptions of the theorem, F is differe n tiable with b ounded deriv ativ e. Th us F is of b ounded v aria t io n and it ma y b e expresse d as the dif- ference of t w o b ounded p ositiv e increasing functions. Therefore the sto c hastic in tegral (10) is w ell defined and it can b e ev alua ted integrating b y parts, leading to conclusion (8), with C 3 = C 4 − F (0 ) /v 0 (0) v 1 (0) and C 2 = F (1) /v 0 (1) v 1 (1). b) In Jørsb o e (1968 ) , p. 61, it is pro v ed that, under the indicated assumptions, µ 0 and µ 1 are equiv alen t measures with Ra do n-Nik o dym deriv ativ e dµ 0 dµ 1 ( x ) = exp D 3 + D 2 x (0) + 1 2 Z 1 0 G ( t ) d 2 x ( t ) − m ( t ) v ( t ) , 8 with D 3 = − m 2 (0) 2 u (0) v (0) I { u (0) > 0 } , D 2 = m (0) u (0) v (0) I { u (0) > 0 } and G = ( v m ′ − mv ′ ) / ( v u ′ − u v ′ ). Again, the in tegration b y parts gives (9), where D 1 = D 3 − R 1 0 G d ( m/v ). ✷ In the general case where m 0 6 = m 1 and Γ 0 6 = Γ 1 , let us denote b y P m, Γ the distribu- tion of the Gaussian pro cess with mean m and co v ariance function Γ. Then, applying the c hain rule for Radon-Nik o dym deriv ativ es (see, e.g., F olland, 19 99) w e get dµ 0 dµ 1 ( x ) = dP m 0 , Γ 0 dP m 1 , Γ 1 ( x ) = dP m 0 , Γ 0 dP 0 , Γ 0 ( x ) dP 0 , Γ 0 dP 0 , Γ 1 ( x ) dP 0 , Γ 1 dP m 1 , Γ 1 ( x ) . (11) Under the appropriate assumptions the expressions of the Radon-Nik o dym deriv a tiv es in the rig ht-hand side of (11) are give n in (8) and (9). 2.3 P aramet ric plug-in ru les The aim of t his subsection is t w ofold. First and foremost, we sho w how the theoretical results of Subsections 2 .1 and 2.2 b ecome useful in practice. T o t his end, w e consider examples o f we ll-kno wn G aussian pro cesses that fulfill t he r equiremen ts of Theorems 1 and 2, na mely Brownian motions with drift and Or nstein-Uhlenbec k pro cesses. W e deriv e t he expressions of the Rado n- Nik o dym deriv ativ es dµ 0 /dµ 1 for t hese examples. Then, it is straightforw ard to compute the Bay es rule g ∗ for classification b etw een t w o elemen ts of one of these families. In these particular examples the mean a nd v ariance of t he Gaussian pro cess X | Y = i ha v e kno wn parametric expressions (up to a finite n um ber of par a meters). Th us g ∗ is completely sp ecified as long as the parameters hav e kno wn v alues. When this is not the case, w e can substitute eac h unkno wn pa r a meter in g ∗ b y some estimate. The resulting discrimination pro cedure is called the p ar ametric plug-in rule. In particular, for the Bay es rules giv en in (1 2), (13 ), (14) and (1 5) b elo w the explicit expression of the pa rameter estimates is given in the app endix. The second ob jectiv e of Subsection 2.3 is to obtain the expressions o f t he Ba y es rules for the mo dels used in Section 4 and to deriv e the corresp onding parametric plug- in v ersions. Two Br ownian mo tion s Let us denote X ( t ; i ) = ( X ( t ) | Y = i ). In the Brownian case, using the standard notation in sto chas tic differen tial equations, X ( t ; i ) is just the solution of dX ( t ; i ) = m i ( t ) dt + σ i W i ( t ) dt , for i = 0 , 1 and t ∈ [0 , 1]. Here m 1 ≡ 0, m 0 ( t ) = ct , 0 < c < ∞ is a constan t, W 0 and W 1 are tw o uncorrelated Br ownian motions and ( X (0; i ) ∼ N (0 , θ 2 i ). 9 Then, if σ 0 = σ 1 = σ , the conditions of Theorem 2 a r e satisfied with u i ( t ) = θ 2 i + σ 2 t and v i ≡ 1, for i = 0 , 1 . When θ 0 = θ 1 = 0, w e hav e X (0; i ) ≡ 0 and, fo r an y x ∈ S , dµ 0 dµ 1 ( x ) = exp n c σ 2 (2 x (1) − c ) o . Th us the Ba y es rule is g ∗ ( x ) = I { x (1) 0, σ i > 0, η i are constan ts. If X (0; i ) is equal to a constan t c i , w e ha v e that m i ( t ) = η i + ( c i − η i ) e − β i t and Γ i ( s, t ) = σ 2 i e − β i | s − t | − e − β i | s + t | . Fixing v i (1) = 1, w e get u i ( t ) = σ 2 i e − β i ( e β i t − e − β i t ) and v i ( t ) = e β i (1 − t ) for i = 0 , 1. The condition u 1 v ′ 1 − u ′ 1 v 1 = u 0 v ′ 0 − u ′ 0 v 0 in Theorem 2 is fulfilled if and only if β 0 σ 2 0 = β 1 σ 2 1 . Also, since u i (0) = 0 , then m i (0) = c i has to b e 0 for i = 0 , 1. Then it is straigh tforw ard to c hec k that the Bay es rule g ∗ classifies x in p opulation P 1 if 0 > 2 β 2 0 ( σ 2 0 − η 2 0 ) − β 2 1 ( σ 2 1 − η 2 1 ) + 4 x (1)( η 0 β 0 − η 1 β 1 ) + ( β 1 − β 0 ) x 2 (1) + 4 ( η 0 β 2 0 − η 1 β 2 1 ) Z 1 0 x ( t ) dt + ( β 2 1 − β 2 0 ) Z 1 0 x 2 ( t ) dt. (14) When X ( 0 ; i ) is random, it follows a normal distribution with mean η i and v ariance σ 2 i . Then m i ( t ) = η i , for a ll t ∈ [0 , 1], and Γ i ( s, t ) = σ 2 i e − β i | s − t | , u i ( t ) = σ 2 i e − β i (1 − t ) and 10 v i ( t ) = e β i (1 − t ) . Consequen tly , the Bay es rule assigns x to p opulation P 1 if 2 β 1 σ 2 1 (log( β 1 ) − log( β 0 )) > 2 β 2 0 σ 2 0 − β 2 1 σ 2 1 + β 1 η 2 1 (1 + β 1 ) − β 0 η 2 0 (1 + β 0 ) +4 x (1)( η 0 β 0 − η 1 β 1 ) + 4 ( η 0 β 2 0 − η 1 β 2 1 ) Z 1 0 x ( t ) dt +( β 1 − β 0 ) x 2 (0) + x 2 (1) + ( β 1 + β 0 ) Z 1 0 x 2 ( t ) dt . ( 1 5) The parametric plug- in classification rule is deriv ed by substituting the unkno wn parameters β i , η i and σ i , i = 0 , 1, in (14) and (15) with their corresp onding estimators. 2.4 Nonparametric plug-in ru les In this section w e analyze the situation in whic h the pro cesses ultimately b elong to the Gaussian family fulfilling the conditions of Theorem 2, but w e do not place any parametric a ssumption on the mean a nd the co v aria nce functions. How ev er, let us note that, un til w e get to the es timation of the Radon- Nik o dym deriv ativ es, the Gaussianiat y assumption is not needed. Sp ecifically , we only assume that the cov ariance functions of the in v o lv ed pro cesses are of t ype Γ( s, t ) = u (min( s, t )) v (max( s, t )), for some (unknow n) real functions u , v where v is b ounded aw ay from 0 on the in terv al [0 , 1]. Observ e t ha t, in or der to use a plug-in v ersion of the optimal classification rule along the lines of Theorems 1 and 2, w e need to estimate the functions m , u and v as w ell as their first and second deriv atives . Since these estimation problems hav e some indep enden t in terest, in this subsection we consider them in a general setup, not necessarily link ed to the classification problem. Th us w e use the or dinar y iid sampling mo del with a fixed sample size denoted, for simplicit y , b y n in all cases. Regarding u and v , let us note that the condition Γ( s , t ) = u (min( s, t )) v (max( s, t )), for s, t ∈ [0 , 1], en tails u ( s ) = Γ( s, 1) / v (1) and v ( t ) = Γ(0 , t ) /u (0) if u (0) > 0. How eve r, it is clear that these conditions only determine u and v up to multiplicativ e constan ts so that one can imp ose (without loss of generality) the additio na l assumption v (1) = 1. Th us, it turns out that u and v can b e uniquely determined in terms of Γ( 0 , t ) and Γ( s, 1). Our study will require three steps: first, the estimation of the mean function m and its deriv ativ es, then the analogo us study for Γ(0 , t ), Γ( s, 1) and σ 2 ( t ) := Γ( t, t ) and, finally , the analysis of more inv olved functions defined in terms of these. In Prop o sitions 1 to 3 b elo w we assume that the sample data are X 1 , . . . , X n , iid tra jectories of a pro cess X in the space C [0 , 1], endo w ed with the supre m um norm, k · k . Estimation of the me an and c ovarian c e functions and their de ri v atives T o estimate the mean function m ( t ) = E [ X ( t )] and its deriv ativ es, w e will only need to assume that { X n } satisfies that E k X 1 k 2 < ∞ , whic h (see p. 172 in Araujo and G in ´ e, 11 1980) implies that the distribution of X 1 satisfies the Cen tral Limit Theorem (CL T) in ( C [0 , 1] , k · k ). The natura l estimator of m is the sample mean, denoted by ˆ m n ( t ) = P n i =1 X i ( t ) /n . Since the deriv ativ e s of m are also inv olved in the expressions of the Radon-Nik o dym deriv ativ es obtained in Theorem 2, w e will also need to consider the estimation of m ′ and m ′′ . Our estimators will dep end on a giv en seque nce h n ↓ 0 of smo othing parameters. Giv en t ∈ [ h n , 1 − h n ], define ˆ m ′ n ( t ) := ˆ m n ( t + h n ) − ˆ m n ( t − h n ) 2 h n , ˆ m ′′ n ( t ) := ˆ m n ( t + h n ) + ˆ m n ( t − h n ) − 2 m n ( t ) h 2 n . F or t ∈ [0 , h n ), we define ˆ m ′ n ( t ) := ˆ m n ( t + h n ) − ˆ m n (0) h n + t , ˆ m ′′ n ( t ) := ˆ m n ( t + h n ) + ˆ m n (0) − 2 ˆ m n ( γ n ) γ 2 n . where γ n = ( t + h n ) / 2. The definition o f ˆ m ′ n and ˆ m ′′ n on (1 − h n , 1] is similar. These definitions allow us to handle a na logously the extreme p oin ts and the inner ones. Th us w e will not pay sp ecial att ention to the extreme p oin ts in the pro ofs. There is a sligh t notational a buse in these definitions as, for example, ˆ m ′ n ( t ) is not the deriv ativ e of ˆ m n ( t ) but an estimator of m ′ ( t ). W e k eep this notation throughout the man uscript for simplicit y . As mentioned at the b eginning o f this section, due to the tria ngular structure of Γ, in pr inciple we should only concentrate on the estimation of t he functions s 7→ Γ( s, 1) and t 7→ Γ(0 , t ) and their deriv ativ es. Ho w ev er, due to tec hnical reasons w e will also need to consider the f unction σ 2 ( t ) = Γ( t, t ) and its deriv ativ es. Nat ura l nonparametric estimators of these functions can b e giv en in terms of the empirical cov ariance ˆ Γ n ( s, t ) := 1 n X i ( X i ( s ) − ˆ m n ( s )) ( X i ( t ) − ˆ m n ( t )) , s, t ∈ [0 , 1] . The estimation of the required deriv ativ es is carr ied out in an analogous w a y as w e did with the mean function. Observ e finally that, since v (1) = 1, w e can estimate u ( t ) = Γ( t, 1) b y ˆ u n ( t ) := ˆ Γ n ( t, 1) for an y t ∈ [0 , 1] and similarly for its first t w o deriv ativ es. Regarding the function σ 2 , we estimate σ 2 ( t ) by ˆ σ 2 n ( t ) := ˆ Γ n ( t, t ). Prop osition 1 L et { X n } b e iid tr aje ctories in C [0 , 1] of a pr o c ess such that E k X 1 k 2 < ∞ and whose me an function m : [0 , 1] → R has a Lip schitz se c ond derivative. a) F or the me a n estimation pr oblem we have, k m − ˆ m n k = O P ( n − 1 / 2 ) (16) k m ′ − ˆ m ′ n k = O P ( n 1 / 2 h n ) − 1 + O ( h 2 n ) (17) k m ′′ − ˆ m ′′ n k = O P ( n 1 / 2 h 2 n ) − 1 + O ( h n ) (18) 12 b) Assume that E k X 1 k 4 < ∞ and that the func tion s t → Γ( t, 1) , t → Γ( 0 , t ) and σ 2 admit Lipschitz se c ond or der derivatives. Then, we ha ve k ˆ Γ n ( · , 1) − Γ( · , 1) k = k ˆ u n − u k = O P ( n − 1 / 2 ) , (19) k ˆ Γ ′ n ( · , 1) − Γ ′ ( · , 1) k = k ˆ u ′ n − u ′ k = O P n 1 / 2 h n − 1 + O ( h 2 n ) , (20) k ˆ Γ ′′ n ( · , 1) − Γ ′′ ( · , 1) k = k ˆ u ′′ n − u ′′ k = O P n 1 / 2 h 2 n − 1 + O ( h n ) , (21) Similar r esults also hold for ˆ Γ n (0 , · ) and ˆ σ 2 n . F rom the pro of of this prop osition (see the App endix) it can b e c hec k ed t ha t the assumption E k X 1 k 4 < ∞ can b e replaced with E k X 1 k 2+ δ < ∞ , for some δ > 0, and E ( X r (1)) < ∞ for any r > 0. Estimation of v The estimation of v is harder than that of u . It will b e useful to distinguish tw o cases, where the estimators m ust b e defined in differen t w a ys. In the case u (0 ) > 0 (corresp onding to the case σ 2 (0) > 0) we ha v e v ( t ) = Γ(0 , t ) /u ( 0 ) whic h is estimated b y ˆ v n ( t ) := 1 ˆ u n (0) ˆ Γ n (0 , t ) , t ∈ [0 , 1] . (22) When u (0) = 0 (whic h implies that σ 2 (0) = 0), the estimator prop osed in (22) is, at b est, highly unstable. This case is not un usual: see, for instance, the examples in tro duced in Subsection 2.3 when X (0) / Y = i is constan t. F or the sak e of simplicity from now on assume that σ 2 ( t ) > 0 for t ∈ (0 , 1) . The first step is to define ˆ v n ( t ) = ˆ σ 2 n ( t ) / ˆ u n ( t ) for t ∈ [ δ n , 1], where δ n is a sequence of p ositiv e num b ers conv erging to zero (whose rat e will b e determined later). Then w e define estimates for the first and the second deriv ativ es of v on the same in terv al. The structure of v n as a quotien t suggests defining, on [ δ n , 1], ˆ v ′ n := 1 ˆ u 2 n ( ˆ σ 2 n ) ′ ˆ u n − ˆ u ′ n ˆ σ 2 n , ˆ v ′′ n := 1 ˆ u 3 n ˆ u n ( ˆ σ 2 n ) ′′ ˆ u n − ˆ u ′′ n ˆ σ 2 n − 2 ˆ u ′ n (( ˆ σ 2 n ) ′ ˆ u n − ˆ u ′ n ˆ σ 2 n ) , where ( ˆ σ 2 n ) ′ ( t ) = ˆ Γ ′ n ( t, t ), ( ˆ σ 2 n ) ′′ ( t ) = ˆ Γ ′′ n ( t, t ) No w w e complete the definition of our estimator of v o n the whole interv al b y using a T a ylor-kind expansion on [0 , δ n ), ˆ v n ( t ) = ˆ v n ( δ n ) + ( t − δ n ) ˆ v ′ n ( δ n ) + 1 2 ( t − δ n ) 2 ˆ v ′′ n ( δ n ) , if t ∈ [0 , δ n ) . (23) Finally , take ˆ v ′ n ( t ) := ˆ v ′ n ( δ n ) + ( t − δ n ) ˆ v ′′ n ( δ n ) , if t ∈ [0 , δ n ) . ˆ v ′′ n ( t ) := ˆ v ′′ n ( δ n ) , if t ∈ [0 , δ n ) . 13 Prop osition 2 L et the assumptions of Pr op osition 1 (b) h old. a) If u (0) > 0 then the r ate of c o nver g e nc e of k ˆ v n − v k , k ˆ v ′ n − v ′ k and k ˆ v ′′ n − v ′′ k ar e the same as those of (19), (20) and (21) , r esp e ctively. b) If u (0) = 0 assume that inf t u ′ ( t ) > 0 and inf t ∈ [ δ, 1] σ 2 ( t ) > 0 fo r every δ > 0 . L et { δ n } ↓ 0 b e such that sup( n − 1 / 2 , h n ) = o ( δ n ) . Then k ˆ v n − v k = O P δ n h 2 n √ n + O ( h n ) + O ( δ 3 n ) k ˆ v ′ n − v ′ k = O P 1 h 2 n √ n + O h n δ n + O ( δ 2 n ) k ˆ v ′′ n − v ′′ k = O P 1 δ n h 2 n √ n + O h n δ 2 n + O ( δ n ) . Estimation of the R a d on-Niko dym derivatives Here w e plug-in the estimates of m , u , v and their deriv ativ es obtained ab ov e in the Radon-Nik o dym deriv ativ es f = dµ 0 /dµ 1 obtained ab o v e in Theorem 2. Denote b y ˆ f n the resulting estimate. Then, we compute the conv ergence rate to the Ba y es risk of the error attained b y the corresp onding nonparametric plug-in classification pro cedure. According to Theorem 2 the Radon-Nik o dym densities of interest a re the exp onen tial of some integrals, ratios, pro ducts or square ro ots of functions estimated with orders of con v ergence app earing in Prop ositions 1 and 2. T he final rate will b e that of the w orst estimate handled, whic h cor r espo nds to the second or der deriv atives . As with the estimation of v , there is some difference in the orders dep ending on whether σ 2 (0) is strictly p ositive or not. The main conclusions a re summarized in the followin g result. Theorem 3 L et us as s ume that c onditions in Pr op osition 1 (b) and The or em 2 hold . a) If u i (0) > 0 for i = 0 , 1 , then for h n = O ( n − 1 / 6 ) we ge t log ˆ f n ( x ) − log dµ 0 dµ 1 ( x ) = O P n − 1 / 6 , x ∈ C [0 , 1] . b) If u i (0) = 0 for i = 0 , 1 and inf t u ′ ( t ) > 0 an d inf t ∈ [ δ, 1] σ 2 ( t ) > 0 fo r every δ > 0 , then, for h n = O ( n − 9 / 50 ) we h a ve E log ˆ f n ( X ) − log dµ 0 dµ 1 ( X ) X 1 , . . . , X n = O P n − 1 / 10 . 14 Let us note tha t, in an y case, our no nparametric estimator ˆ f n ( x ) = dP ˆ m 0 ˆ Γ 0 /dP ˆ m 1 ˆ Γ 1 is constructed, using (11), under the sole assumption that the cov ariance function has a triangular structure. So, the estimator is forma lly t he same in b oth cases a) and b) of Theorem 2. If we knew that m i = 0 for i = 0 , 1 then w e could emplo y ˆ f n ( x ) = dP ˆ m 0 ˆ Γ 0 /dP ˆ m 0 ˆ Γ 1 and the rates of Theorem 3 would impro v e, under the assumptions of Theorem 3 b), to O P ( n − 3 / 28 ). Using higher or der derivatives The pro of of Theorem 3 w as based on t he use of T a ylor expansions of o rder tw o. Next w e show ho w the existence of higher order deriv ativ es impro v es the estimation pro cess. Prop osition 3 Under the as s ump tions of The or em 3 supp ose further that the me a n function m : [0 , 1] → R as wel l as the functions t → Γ( t, 1) , t → Γ(0 , t ) and σ 2 admit Lipschitz thir d or de r derivatives. Th e n the r ates in The or em 3 a) and b) ar e imp r ove d to O P ( n − 1 / 4 ) and O P ( n − 5 / 32 ) , r e s p e ctively. A remark similar to that made a fter Theorem 3 applies here. If w e incorp orate the information m i = 0 to t he estimator, the con v ergence rate in Prop osition 3 b) sligh tly impro v es to O P ( n − 1 / 6 ). The con v ergenc e orders ma y b e further improv ed b y assuming additional smo othness orders and taking a dv antage of nume rical differen tiation tec hniques (see, for ins tance, p. 146 in Gautschi, 1997) . W e will not dev elop this idea in the presen t w ork. How eve r, let us observ e that in the estimation of functions with infinite deriv ativ es it is p ossible to obtain orders as close to O P ( n − 1 / 2 ) as des ired by choo sing k large enough in the k -p oint rule (see, fo r instance, Herzeg and Cv etk o vic, 19 8 6). Estimation of the pr ob ability of m i s c lassific ation W e denote b y ˆ L n := L (ˆ g n ) = P { ˆ g n ( X ) 6 = Y |X n } the classification error asso ci- ated with the nonparametric plug-in rule ˆ g n ( x ) = I { ˆ η n ( x ) > 1 / 2 } . Here ˆ η n is o btained b y substituting the R a don-Nik o dym deriv ativ e f = d µ 0 /dµ 1 in (5) with the estimator ˆ f n obtained b y replacing m , u , v and their deriv ativ es with the corresp onding nonparamet- ric estimators obtained along this subsection. The follow ing result is an example of ho w the conv ergence rates f o r the difference b etw een the logarithms of the R a don-Nik o dym deriv ativ es ˆ f n ( x ) and f ( x ) can b e translated in to con v ergence ra t es of ˆ L n to t he Bay e s error L ∗ . Theorem 4 L et the assumptions of Pr o p osition 1 (b) and The or em 2 hold. I f u i (0) > 0 for i = 0 , 1 , then taking h n = O ( n − 1 / 6 ) we ge t ˆ L n − L ∗ = O P n − 1 / 6 . 15 In the case when u i (0) = 0, for i = 0 , 1, we can prov e that ˆ L n − L ∗ is O P ( n − 1 / 10 ) under the a ssumptions t ha t inf t u ′ ( t ) > 0 and inf t ∈ [ δ, 1] σ 2 ( t ) > 0 fo r ev ery δ > 0. The idea is to follo w the same steps as in the pro of of Theorem 4, but bounding the in tegrals in (38) a nd (42) as we did along the pro of of Theorem 3. 3 Consis tency of the k -NN functio n al r u les As stated in the in tro duction, the k -NN classifier is not univ ersally consisten t in the functional setting. Ho w ev er, C ´ erou and Guy ader (2006) pro vide sufficien t conditions for t he consistency L n → L ∗ in probabilit y (or, equiv alen tly , E ( L n ) → L ∗ ), where L n is t he conditional classification error o f the k -NN rule. In this section we show that these conditions are fulfilled b y the Gaussian pro cesses in tro duced in Section 2.2 and, in consequence , that the k -NN is consisten t in probability for them. Throughout this section the feature space where the v aria ble X tak es v alues is a separable metric space ( F , D ). As usual, w e will denote b y P X the distribution of X defined by P X ( B ) = P { X ∈ B } for B ∈ B F , where B F are the Borel sets of F . The k ey assumption is a regularit y condition on the regression function η ( x ) = E ( Y | X = x ) which is called Besic ovich c on d ition (BC) . The function η is said to fulfill (BC) if lim δ → 0 1 P X ( B X,δ ) Z B X,δ η ( z ) dP X ( z ) = η ( X ) in probability , where B x,δ := { z ∈ F : D ( x, z ) ≤ δ } is the closed ball with cen ter x and r a dius δ . Besico vic h condition plays , for instance, an imp ortan t ro le in the consistenc y of kerne l rules (see Abraham et al . 2006 ). C ´ ero u and Guy ader (2 006, Th. 2) ha v e pro v ed tha t , if ( F , D ) is separable and condition ( BC) is fulfilled, then t he k -NN classifier defined b y (2) and (3) is consisten t in probabilit y pro vided that k n → ∞ and k n /n → 0. In order to apply this result in our case, it will b e sufficien t to observ e that the con tin uit y ( P X -a.e.) of η ( x ) implies also (BC) . Consequen tly w e can establish t he follow ing result, whose pro of is immediate from Theorems 1 and 2. Prop osition 4 Under the assumptions of The or em 1 supp os e that P X ( ∂ S ) = 0 . The n for P X -a.e. x, z in the top olo gic al interior of S , | η ( z ) − η ( x ) | = 1 − p p dµ 0 dµ 1 ( z ) + 1 − p − 1 − p p dµ 0 dµ 1 ( x ) + 1 − p ≤ p 1 − p dµ 0 dµ 1 ( x ) − dµ 0 dµ 1 ( z ) . (24) As a c onse quenc e, for b oth c ases a) and b) c onsider e d in The or em 2 the k -NN functional classifier is c onsistent in pr ob ability, pr ovide d that k n → ∞ and k n /n → 0 . 16 Of course, the p oin t is tha t the Radon-Nikodym deriv ativ es give n in Theorem 2 are con tin uous on C [0 , 1]. So (24) w ould imply also the con tin uit y of η ( x ) whic h in turn en tails the Besicov ic h condition (BC) a nd the consistency . 4 Empirical result s In this section we compare the p erformance of t he k -NN classification pro cedure with the plug-in one for infinite-dimensional data . First (Subsection 4 .1) w e describ e the results of a sim ulation study carried out with pro cesses from the tw o Gaussian families sp ecified in Subsection 2 .3. Afterw ards (Subsection 4.2) w e fo cus o n a real-data set. 4.1 Mon te Carlo study The observ atio ns will b e realizations of tw o Ornstein-Uhlen b ec k pro cesses and t w o Brow - nian motions as describ ed in Subsection 2 .3 . The par a meters chos en for t he pairs of pro cesses are sp ecified in T able 1 (in F ig ure 1 we ha v e depicted some tra jectories of the pro cesses used in the sim ulations). Figure 1 here. W e assume that p = P { Y = 0 } , the prop ortion of o bserv ations coming from P 0 , is 1 / 2 and is kno wn in adv ance. F or eac h i = 0 , 1 w e take a training sample with size n i = 100 and a test sample with size 50 from P i . The pro cesses are observ ed at equidistan t times of the interv al [0 , 1], t 0 = 0 , t 1 , . . . , t N = 1, with N = 5 0. W e denote b y ∆ = t j − t j − 1 the inte rno dal distance. The num b er of Mon te Carlo runs is 1000 . In each run we use the training sample to construct four classifiers: k - NN with the suprem um norm and with a PLS-based semimetric (see e.g. F errat y and Vieu, 200 6 , p. 30), parametric and nonparametric plug-in a s in tro duced in Subsections 2.3 and 2.4 resp ectiv ely . The p erformance of these classifiers is assessed by the prop ortion of correctly classified observ ations in the test samples. W e also compute this prop ortion for the Bay es r ule asso ciated t o each mo del. The num b er k of neighbours and the n um b er of PLS directions for pro jection are c hosen via cross-v alidation from a maximu m of 10 neigh b ours and 5 PLS directions respective ly . When applying the no nparametric plug-in metho d to the data functions ev aluated on the whole inte rv al [0 , 1] w e observ ed a noticeable b oundary effect near 0, esp ecially in the estimation of v a nd its deriv a tiv es. This made the nonparametric plug-in metho d p erform p o orly . In order to av oid this, the Radon-Nik o dym deriv ativ e for the nonpara - metric plug-in rule has b een ev aluated on the tra jectories restricted to the in terv al [ h n , 1], where h n is the same (and unique) smo othing parameter used in the estimation of the deriv ativ es of u i and v i . The v alue of h n has b een c hosen among { 2∆ , 4∆ , . . . , 20∆ } via cross-v alidation: f o r eac h h n = k ∆ we compute the corresp o nding estimated classifica- 17 tion erro r with the usual lea v e-one-out device (ev ery tra ining observ ation is classifie d, a s if it w ere a new incoming observ ation, using t he remaining data as a training sample). In T able 1 w e displa y the mean and t he standard deviation (b et w een paren theses) of the prop ortion of correct classifications o v er the 1000 Monte Carlo samples. W e see that the parametric plug-in pro cedure is the one p erforming b est: it is v ery near the optim um. As it could b e exp ected, the nonpara metric plug- in b ehav es w orse than the pa r a - metric one. Its b est p erformance corresp onds to the rando m start cases u i (0) > 0 f or i = 0 , 1. In these situations, it is t he second b etter classifier. When u i (0) = 0, the parametric plug-in is still the winner, the k -NN with PLS is the second and the k -NN with the suprem um metric and the nonparametric plug-in p erform similarly . It is interes ting to note that the k -NN classification metho d is alw a ys reliable ( ev en with the suprem um metric, although PLS semimetric yields b etter results). Thus one of the conclusions of the study is that , when classifying functional dat a , the k -NN pro cedure is generally a safe choice, free o f mo del assumptions. Table 1 here. 4.2 A real data set W e compare the p erformance o f the k -NN classification pro cedure with the no npara- metric plug-in one in the analysis of data fro m researc h in experimental cardiology . The exp erimen t was conducted at the V all d’Hebron Hospital (Barcelona, Spain). See Ruiz-Meana et al . (2003) for bio c hemical and medical details on the data and Cuev as, F ebrero and F raiman (2004, 2006) for previous analysis of these observ ations. The v ariable under study is the mito c hondrial calcium ov erlo a d (MCO), whic h mea- sures the lev el o f the mito c hondrial calcium ion (Ca2+). This v a riable was observ ed ev ery 10 seconds during an hour in isolated mouse cardiac cells. The aim of the study w as to assess whether a drug called Carip or ide increased the MCO lev el. The data w e analyze here consist of tw o samples of functions with sizes n 0 = 45 (control group) and n 1 = 4 4 (t r eat ment group with Carip oride). In Fig ure 2 w e display (a) all t he data and (b) the g roup means. Figure 2 here. In man y cases the first three minu tes eac h curv e shows oscillations whic h corresp ond to normal contractions of the cells. This first pa r t of the curv es has b een eliminated (as in the original exp eriments with these data) b ecause it has high v ariability and dep ends on uncon trolled factors. T o obtain a b etter approac h of the distributions to normalit y , w e ha v e considered a transformation of the da ta, X = log(MCO − 85). The p erformance of any of the clas- sification pro cedures considered is described b y the probabilit y of correctly classifying one of the transformed observ ations, approx imated via cross-v alidation. 18 Ob viously , in this case, we do no t ha v e enough information to consider using the parametric plug- in classifier. Consequen tly w e o nly employ the k -NN (with uniform metric and PLS-based semimetric) and the nonparametric plug-in discrimination rules. The r esults app ear in T able 2. It is interes ting to no tice that the results in this case, in some sense, are the opp osite to those obtained with the sim ulations. The nonparametric plug-in cle arly outp erforms the other t w o a nd the k -NN w ith the suprem um metric do es b etter tha n the k -NN with PLS. Table 2 here. Ac kno wledgemen t . The authors wan t to thank Jav ier Segura for bringing to our kno wledge some numerical differentiation tec hniques (in particular, the k -p oint rule). 5 App endix A.1 Pa rameter estimation for the mo dels of Subsection 2.3 Two Br ownian mo tion s In the simulations of Section 4 the estimator of c is ˆ c = arg min c P N j =1 ( ˆ m 0 ( t j ) − c t j ) 2 , where m i is the sample mean of the observ a tions coming from P i . The parameters θ i and σ 2 are resp ectiv ely estimated b y ˆ θ i = P n i j =1 ( X j (0; i ) − ˆ m i (0)) 2 / ( n i − 1) and ˆ σ 2 = P i =0 , 1 P n i j =1 ( X j (1; i ) − ˆ m i (1) − X i (0; i ) + ˆ m i (0)) 2 / ( n 0 + n 1 − 1). Two Ornstein-Uhlenb e ck pr o c esses The estimation of the unknown parameters ( β i , η i and σ i , i = 0 , 1) is carried out via linear least-squares regr ession b et w een the realizations of the pro cess at consecutiv e time p oints. The main idea is that, for i = 0 , 1 and f or any 0 ≤ s < t ≤ 1, we ha v e X ( t ; i ) = X ( s ; i ) e − β i ( t − s ) + η i (1 − e − β i ( t − s ) ) + σ i p 1 − e − 2 β i ( t − s ) Z , ( 2 5) where Z is N (0 , 1 ). The up dating form ula (25 ) is v alid when X (0; i ) is either determin- istic or ra ndom. In particular, for i = 0 , 1 , k = 1 , . . . , n i and j = 0 , . . . , N − 1, X k ( t j +1 ; i ) = a i X k ( t j ; i ) + b i + σ i p 1 − e − 2 β i ∆ Z k j , (26) where a i := e − β i ∆ , b i := η i (1 − e − β i ∆ ) and Z k j are i.i.d. v ariables N ( 0 , 1). Observ e that, by estimating the parameters of the simple linear r egression equation (26), w e can construct estimators of β i , η i and σ i . When X (0 ; i ) is deterministic, we compute the least-squares estimators of a i and b i , that is, the v alues ˆ a i and ˆ b i minimizing P n i k =1 P N − 1 j =0 u 2 k j , where u k j := X k ( t j +1 ; i ) − (ˆ a i X k ( t j ; i ) + ˆ b i ) are the residuals. Then ˆ β i = − log (ˆ a i ) ∆ , ˆ η i = ˆ b i 1 − ˆ a i , ˆ σ 2 i = 1 (1 − ˆ a 2 i )( n i N − 2) n i X k =1 N − 1 X j =0 u 2 k j . (27) 19 When X (0; i ) is r a ndom, w e can compute ˆ β i and ˆ σ 2 i as in ( 2 7), but η i is b etter estimated by ˆ η i = P n i j =1 P N k =0 X ij ( t k ) / ( n i ( N + 1)). A.2 Pro ofs of the results in 2.4 Proo f of Pr oposition 1 (a) By the functional CL T in ( C [0 , 1] , k · k ) (see p. 172 in Ara ujo and Gin´ e, 1 980) the sequence √ n ( ˆ m n − m ) conv erges w eakly . This en tails that the sequence k √ n ( ˆ m n − m ) k is b ounded in probability whic h in turn implies (16). Concerning (17) and (18), let us denote X ∗ i ( t ) = X i ( t ) − m ( t ) , t ∈ [0 , 1] , i = 1 , 2 , . . . . Note that, for t ∈ [ h n , 1 − h n ], | m ′ ( t ) − ˆ m ′ n ( t ) | ≤ m ′ ( t ) − m ( t + h n ) − m ( t − h n ) 2 h n + 1 2 h n n n X i =1 X ∗ i ( t + h n ) + 1 2 h n n n X i =1 X ∗ i ( t − h n ) ≤ m ′ ( t ) − m ( t + h n ) − m ( t − h n ) 2 h n + 1 h n n n X i =1 X ∗ i . (28) The CL T applied to the sequence { X ∗ n } allows us to conclude that the second term in the righ t-hand side of (28) is O P ( n 1 / 2 h n ) − 1 . A second o r der T a ylor expansion of the first term implies that there exist ψ (1) n ∈ ( t − h n t ) and ψ (2) n ∈ ( t, t + h n ) suc h that m ′ ( t ) − m ( t + h n ) − m ( t − h n ) 2 h n = h n 4 m ′′ ( ψ (1) n ) − m ′′ ( ψ (2) n ) ≤ Lh 2 n 4 = O ( h 2 n ) , where L is the Lipsc hitz constant a sso ciated with m ′′ . Applying a similar reasoning to (1 8), we o btain tha t, if t ∈ [ h n , 1 − h n ], then, | m ′′ ( t ) − ˆ m ′′ n ( t ) | ≤ m ′′ ( t ) − m ( t + h n ) + m ( t − h n ) − 2 m ( t ) h 2 n + 4 1 h 2 n n n X i =1 Y i . (29) The CL T implies that the order of the second term in (29) is O P ( n 1 / 2 h 2 n ) − 1 . A second order T aylor’s expansion on t again give s that m ′′ ( t ) − m ( t + h n ) + m ( t − h n ) − 2 m ( t ) h 2 n = m ′′ ( t ) − 1 2 m ′′ ( ψ (1) n ) + m ′′ ( ψ (2) n ) ≤ Lh n . (b) Since ˆ Γ( t, 1) − Γ( t, 1 ) = 1 n X i ( X ∗ i ( t ) + m ( t ) − ˆ m n ( t ))( X ∗ i (1) + m (1) − ˆ m n (1)) ! − Γ( t, 1) = 1 n X i X ∗ i ( t ) X ∗ i (1) − Γ( t, 1) ! + ( m ( t ) − ˆ m n ( t )) 1 n X i X ∗ i (1) +( m (1) − ˆ m n (1)) 1 n X i X ∗ i ( t ) + ( m ( t ) − ˆ m n ( t ))( m (1) − ˆ m n (1)) , 20 then k ˆ Γ( · , 1) − Γ( · , 1) k ≤ 1 n X i ( X ∗ i X ∗ i (1) − Γ( · , 1)) + k m − ˆ m n k 1 n X i X ∗ i (1) + | m (1) − ˆ m n (1) | 1 n X i X ∗ i + k m − ˆ m n k | m (1) − ˆ m n (1) | =: T (1) n + T (2) n + T (3) n + T (4) n . The assumption E k X 1 k 4 < ∞ implies E k X ∗ i X ∗ i (1) k 2 < ∞ and th us t he sequence { X ∗ i X ∗ i (1) } satisfies the CL T in the s uprem um norm. Then, since E [ X ∗ i X ∗ i (1)] = Γ( · , 1 ) , w e hav e that T (1) n = O P ( n − 1 / 2 ). Also T (2) n = O P ( n − 1 ) b ecause the CL T (real case) im- plies that P i X ∗ i (1) /n = O P ( n − 1 / 2 ) and, according to Prop o sition 1 (a), k m − ˆ m n k = O P ( n − 1 / 2 ). The CL T applied to { X ∗ i } and Prop osition 1 ( a ) yield tha t T (3) n and T (4) n are O P ( n − 1 ). This allows us to conclude (19 ). The deriv ativ es of Γ( · , 1) a re handled as those of m . The estimators of Γ(0 , · ) a nd σ ( · ) b eha v e analogo usly to Γ( · , 1). ✷ Proo f of Pr oposition 2 a) According to expression (22) fo r ˆ v n ( t ), this estimator is a quotient of t w o conv ergent sequence s. As that in the denominato r, ˆ u n (0), con v erges to u (0) > 0, an upp er b ound for the ov era ll rate of the qu otien t is the slo w est rate betw een ˆ Γ n (0 , t ) and ˆ u n (0). Similar argumen ts apply for the first a nd second deriv ativ es . b) Let t ∈ [ δ n , 1]. The h yp othesis on u ′ implies that inf t ≥ δ n u ( t ) ≥ O ( δ n ). Since n − 1 / 2 = o ( δ n ), fr o m (19) w e obtain that inf t ≥ δ n ˆ u n ( t ) ≥ O P ( δ n ). Therefore, a direct calculation based on the expression of ˆ v n together with Prop osition 1 b) leads to sup t ∈ [ δ n , 1] | ˆ v n ( t ) − v ( t ) | = O P 1 δ n √ n . (30) The same reasoning, taking into accoun t the relative or ders b et w een δ n and h n leads to sup t ∈ [ δ n , 1] | ˆ v ′ n ( t ) − v ′ ( t ) | = O P 1 δ n h n √ n + O h 2 n δ n (31) sup t ∈ [ δ n , 1] | ˆ v ′′ n ( t ) − v ′′ ( t ) | = O P 1 δ n h 2 n √ n + O h n δ 2 n . (32) No w, let t ∈ [0 , δ n ]. Using the second-order T a ylor expansion of v at δ n , together 21 with the definition (23) of ˆ v n , we obtain that there exists ψ n ∈ ( t, δ n ) suc h that | ˆ v n ( t ) − v ( t ) | ≤ | ˆ v n ( δ n ) − v ( δ n ) | + ( δ n − t ) | ˆ v ′ n ( δ n ) − v ′ ( δ n ) | + 1 2 ( t − δ n ) 2 | ˆ v ′′ n ( δ n ) − v ′′ ( δ n ) | + 1 2 ( t − δ n ) 2 | v ′′ ( δ n ) − v ′′ ( ψ n ) | ≤ O P 1 δ n √ n + O P 1 h n √ n + O h 2 n + O P δ n h 2 n √ n + O ( h n ) + O ( δ 3 n ) = O P δ n h 2 n √ n + O ( h n ) + O ( δ 3 n ) , where w e ha v e applied (30) , (31) and (32) a nd the fact tha t v ′′ is Lipsch itz. Then the first statement in Prop osition 2 b) is deduced from here and (3 0 ). The r emaining tw o statemen ts are pro v ed similarly . ✷ Next w e state a t echnic al lemma whic h will b e emplo y ed to pro v e Theorem 3. Lemma 1 L et { Y ( t ) , t ∈ [0 , 1 ] } b e a sto chastic p r o c ess whose me an function m ( t ) and varianc e function σ 2 ( t ) satisfy that m (0 ) = σ (0) = 0 and b oth have a b ounde d derivative. L et { δ n } b e p ositive numb ers which c onver ge to zer o. Then E Z δ n 0 | Y ( t ) | dt = O ( δ 3 / 2 n ) and E Z δ n 0 Y 2 ( t ) dt = O ( δ 2 n ) . Proo f: Let H b e a common upp er b ound fo r the deriv ativ es of m 2 and σ 2 . Z δ n 0 E | Y ( t ) | dt ≤ Z δ n 0 E 1 / 2 ( Y 2 ( t )) dt = Z δ n 0 ( m ( t ) 2 + σ 2 ( t )) 1 / 2 dt ≤ (2 H ) 1 / 2 Z δ n 0 t 1 / 2 dt = O ( δ 3 / 2 n ) . The second statemen t in the lemma follow s analo g ously . ✷ Proo f of Theorem 3: F rom expressions (8) and ( 9 ) w e see that f = dµ 0 /dµ 1 is a function of m i , u i , v i and their deriv ativ es . Stat emen t a) corresp onds to t he simplest case in whic h u i (0) > 0. In this situation, the simple structure of the estimators sho ws that an upp er b ound fo r the con v ergence ra t e for lo g f n ( x ) is the worst rate for the estimators inv olve d in its definition, namely that o f the estimators v ′′ 0 and v ′′ 1 . Hence, w e concen trate on part b). F or sim plicit y w e will omit the sub-index in v i for the res t of the proof . First notice that in the express ions f or d µ 0 /dµ 1 whic h w e obtained in Theorem 2 t he second deriv ativ es of v only app ear inside in tegrals. In other w ords, w e only need to care ab o ut differences o f the type Z 1 0 X r ( t )( ˆ k n ( t ) ˆ v ′′ n ( t ) − k ( t ) v ′′ ( t )) dt = O P Z 1 0 X r ( t ) k ( t )( ˆ v ′′ n ( t ) − v ′′ ( t )) dt , (33) 22 for r = 1 , 2. Here k is a function dep ending on u, v , u ′ , v ′ , m and m ′ and X is a mixture of the Bro wnian motions under consideration. Let us a na lyze the case in Theorem 2 b) for whic h r = 1 and the function k can b e expressed as k = k 1 / ( v (( v u ′ − uv ′ ) 2 ), where k 1 is a function which can b e written in terms of u, v , u ′ , v ′ , m and m ′ . Therefore, the assumptions in Theorem 2, imply that k is b ounded. Let K b e a n upp er b ound of k . W e split in t w o the integral in the right-hand side of (33), o v er the in terv als [0 , δ n ] and [ δ n , 1]. Now , from (32) in the pro of of Prop osition 2, we ha v e that E | Z 1 δ n X ( t ) k ( t )( ˆ v ′′ n ( t ) − v ′′ ( t )) dt | X 1 , . . . , X n ≤ O P 1 δ n h 2 n √ n + O h n δ 2 n Z 1 δ n E ( X 2 ( t )) dt 1 / 2 . (34) With resp ect to t he other in tegral, we hav e tha t E | Z δ n 0 X ( t ) k ( t )( ˆ v ′′ n ( t ) − v ′′ ( t )) dt | X 1 , . . . , X n ≤ K k ˆ v ′′ n − v ′′ k E Z δ n 0 | X ( t ) | dt = O P δ 1 / 2 n √ n ! + O h n δ 1 / 2 n + O ( δ 5 / 2 n ) , (35) where the last equalit y comes fro m Lemma 1 a nd Prop osition 2 b). Equations (34) a nd (35) give E | Z 1 0 X ( t ) k ( t )( ˆ v ′′ n ( t ) − v ′′ ( t )) dt | X 1 , . . . , X n ≤ O P 1 δ n h 2 n √ n + O h n δ 2 n + O ( δ 5 / 2 n ) . T aking h n = δ 9 / 2 n and δ n = n − 1 / 25 equates the three terms and yields the result. ✷ Proo f of Proposition 3: It follo ws the same steps as the pro of of Prop osition 1, the only difference b eing that if w e apply a third order T a ylor expansion in (29 ), we o btain m ′′ ( t ) − m ( t + h n ) + m ( t − h n ) − 2 m ( t ) h 2 n = h n 3! m ′′′ ( ψ 1 n ) − m ′′′ ( ψ 2 n ) ≤ Lh 2 n 3! , and the result follows. ✷ Proo f of Theorem 4: Let us use t he following inequality (see, e.g., Devroy e et al ., 1996, p. 93) ˆ L n − L ∗ ≤ 2 E ( | η ( X ) − η n ( X ) | | X n ) , where η is giv en in (5) and η n is obtained substituting f = dµ 0 /dµ 1 b y ˆ f n in (5). Without loss of g eneralit y in this pro of we consider p = P { Y = 0 } = 1 / 2. 23 Observ e that, f and ˆ f n are a lwa ys p ositiv e since they are Radon-Nik o dym deriv ativ es of one probabilit y measure with resp ect to another. Th us, for an y x , we ha v e | η ( x ) − η n ( x ) | = | f ( x ) − ˆ f n ( x ) | (1 + ˆ f n ( x )(1 + f ( x )) ≤ | f ( x ) − ˆ f n ( x ) | , whic h implies that ˆ L n − L ∗ ≤ 2 E | f ( x ) − ˆ f n ( x ) | X n . (36) W e obtain con vergence rates (in probabilit y) for the conditional expectation in the righ t of ( 3 6). S ince all the cases are similar, let us consider the simple situation in whic h m 0 6 = m 1 and Γ 0 = Γ 1 = Γ with Γ( s, t ) = u (min( s, t )) v (max( s, t )). Then f − ˆ f n = dP m 0 , Γ dP m 1 , Γ − dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 1 = dP m 0 , Γ dP m 1 , Γ − dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 0 + dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 0 1 − dP ˆ m 1 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 1 ! . (3 7) By Theorem 2 (b) and the mean v alue theorem w e ha v e that, for any x , dP m 0 , Γ dP m 1 , Γ ( x ) − dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 0 ( x ) = e z ( z 1 − z 2 ) , where (using the nota tion of Theorem 2) z 1 = D 1 + D 2 − 2 G (0) v (0) x (0) + 2 G (1) v (1) x (1) − 2 Z 1 0 x ( t ) v ( t ) G ′ ( t ) dt, z 2 = ˆ D 1;0 + ˆ D 2;0 − 2 ˆ G (0) ˆ v 0 (0) ! x (0) + 2 ˆ G 0 (1) ˆ v 0 (1) x (1) − 2 Z 1 0 x ( t ) ˆ v 0 ( t ) ˆ G ′ 0 ( t ) dt and z = λ z 1 + (1 − λ ) z 2 for some λ ∈ [0 , 1]. The subscripts 0 in the expression of z 2 mean that the estimation is carried out o nly with the sample from P 0 . Consequen tly , E | dP m 0 , Γ dP m 1 , Γ ( X ) − dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 0 ( X ) | X n ! ≤ E ( e | Z 1 | + | Z 2 | " | D 1 − ˆ D 1;0 | + | D 2 − ˆ D 2;0 | + 2 G (0) v (0) − ˆ G (0) ˆ v 0 (0) ! | X (0) | +2 G (1) v (1) − ˆ G 0 (1) ˆ v 0 (1) | X (1) | + 2 Z 1 0 | X ( t ) | G ′ ( t ) v ( t ) − ˆ G ′ 0 ( t ) ˆ v 0 ( t ) dt # X n ) (38) ≤ κ ( | D 1 − ˆ D 1;0 | E e A k X k |X n + | D 2 − ˆ D 2;0 | + 2 max t =0 , 1 G ( t ) v ( t ) − ˆ G 0 ( t ) ˆ v 0 ( t ) (39) +2 Z 1 0 G ′ ( t ) v ( t ) − ˆ G ′ 0 ( t ) ˆ v 0 ( t ) dt ! E k X k e A k X k |X n ) (40) 24 where κ = exp ( | D 1 | + | ˆ D 1;0 | ) and A = max | D 2 | + | ˆ D 2;0 | , G v + ˆ G 0 ˆ v 0 , G ′ v + ˆ G ′ 0 ˆ v 0 ! . Using Prop ositions 1 and 2 w e obtain that t he conditional exp ectations app ear ing in (39) and ( 4 0) are b ounded in probability . Then E | dP m 0 , Γ dP m 1 , Γ ( X ) − dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 0 ( X ) | X n ! = O P max j =1 , 2 | D j − ˆ D j ;0 | + O P max t =0 , 1 G ( t ) v ( t ) − ˆ G 0 ( t ) ˆ v 0 ( t ) ! + O P Z 1 0 G ′ ( t ) v ′ ( t ) − ˆ G ′ 0 ( t ) ˆ v ′ 0 ( t ) dt ! . T o find the con v ergence ra tes t o 0 of these la st three terms w e use the express ions of D 1 , D 2 and G a pp earing in Theorem 2. Some straighforw ard computations yield | D 1 − ˆ D 1;0 | = O P ( k ˆ v ′ 0 − v ′ k ), | D 2 − ˆ D 2;0 | = O P ( k ˆ v 0 − v k ) , max t =0 , 1 G ( t ) v ( t ) − ˆ G 0 ( t ) ˆ v 0 ( t ) = O P ( k ˆ v ′ 0 − v ′ k ) and Z 1 0 G ′ ( t ) v ′ ( t ) − ˆ G ′ 0 ( t ) ˆ v ′ 0 ( t ) dt = O P ( k ˆ v ′′ 0 − v ′′ k ) . Th us w e get E | dP m 0 , Γ dP m 1 , Γ ( X ) − dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 0 ( X ) | X n ! = O P ( k ˆ v ′′ 0 − v ′′ k ) . (41) Let us now fo cus o n the last term of (37). The analysis is similar to the o ne carried out ab ov e. On the one hand, for an y x it holds that dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 0 ( x ) ≤ κ e 2 B k x k , where B = max( | ˆ D 2;0 | , k ˆ G 0 / ˆ v 0 k , k ˆ G ′ 0 / ˆ v 0 k ). On the other hand, for an y x it also holds that 1 − dP ˆ m 1 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 1 ( x ) ≤ | C 1 − ˆ C 1 | + 1 2 ˆ C 1 e Λ k x k 2 | ˆ C 3 | x 2 (0) + | ˆ C 2 | x 2 (1) + Z 1 0 x 2 ( t ) | ˆ F ′ ( t ) | ˆ v 0 ( t ) ˆ v 1 ( t ) dt ! (42) ≤ | C 1 − ˆ C 1 | + ˆ C 1 Λ e Λ k x k 2 k x k 2 , where Λ = ( | ˆ C 3 | + | ˆ C 2 | + R 1 0 | ˆ F ′ | / ( ˆ v 0 ˆ v 1 )) / 2. Consequen tly E dP ˆ m 0 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 0 ( X ) | 1 − dP ˆ m 1 , ˆ Γ 0 dP ˆ m 1 , ˆ Γ 1 ( X ) | X n ! ≤ κ n | C 1 − ˆ C 1 | E e 2 B k X k |X n + ˆ C 1 Λ E k X k 2 e 2 B k X k +Λ k X k 2 X n o . (43) 25 The conditional exp ectations in (4 3) and ˆ C 1 are O P (1). The term Λ is O P (max j =0 , 1 k ˆ v ′′ j − v ′′ k ). The differenc e | C 1 − ˆ C 1 | is O P (max j =0 , 1 k ˆ v j − v k ). Thus the term in (43) is O P (max j =0 , 1 k ˆ v ′′ j − v ′′ k ). This, together with (41) and Prop osition 2 (a), yield the desired result. ✷ References Abraham, C., Biau, G. and Cadre, B. (2006 ). On the k ernel rule for function classifica- tion. Ann. In s t. Stat. Math. 58 , 61 9–633. Araujo A. a nd Gin ´ e, E. (1980). The Centr al Lim it The or em for R e al and Ba n ach V alue d R ando m V ariables. Wiley . Audib ert, J.Y. and Tsybak o v, A.B. (2007) . F ast learning rates for plug-in classifiers. A nn. Statist. 35 , 608–633 . Ba ´ ıllo, A., Cuev as, A. and F raiman, R. (2 009). Classification metho ds for functional data. T o app ear in O x for d Handb o ok on Statistics and FDA , F. F errat y and Y. Romain, eds. Oxford Unive rsit y Press. C ´ ero u, F. and Guy ader, A. (2006). Nearest neigh b or classification in infinite dimension. ESAIM Pr ob ab. Stat. 10 , 340- 3 55. Cuev as, A., F ebrero, M. and F raiman, R. (2004). An ano v a test for functional data. Comput. Statist. Data Anal. 47 , 11 1–122. Cuev as, A., F ebrero, M. and F raiman, R. (2006). On the use of the b o otstrap for estimating functions with functional dat a . Co m put. Statist. D ata Anal. 51 , 10 63– 1074. Devro y e, L., Gy¨ orfi, L. and Lugosi, G. (1996 ). A Pr ob abilistic The ory of Pattern R e c o g- nition . Springer, New Y ork. Duda, R.O., Hart, P .E., Stork, D.G . (2 0 00). Pattern Classific ation, 2nd e dition . Wiley . F errat y , F . and Vieu, P . (2006). Nonp ar ametric F unctional Data A nalysis: The ory and Pr actic e . Springer. F eldman, J. (1958 ). Equiv alence and p erp endicularity o f Gaussian pro cesses. Pacific J. Math. 8 , 699–708. Fisher, R.A. (1936 ). The use o f m ultiple measuremen ts in taxonomic problems. Annals of Eugenics 7 , 179–1 88. F olland, G. B. (1999 ) . R e al Analysis Mo dern T e chniques an d their Applic ations . Wiley , New Y ork. Gautsc hi, W. (1997). Numeric al Analysis. An Intr o duction . Birkh¨ auser. Boston. Hand, D. (2006). Classifier tec hnology and the illusion of prog r ess. Statist. Sci. 21 , 1–34. 26 Hastie, T., Tibshirani, R. and F riedman, J. (20 0 1). The Elements of Statistic al L e arning. Springer. New Y o rk. Herzeg, D . and Cve tk o vic, L. ( 1 986). On a nume rical differentiation. SIAM J. Numer. A nal. 23 , 686–691. James, G . M., Hastie, T. J. (2001). F unctional linear discriminant analysis for irregularly sampled curve s. J. R oy. Statist. So c . Se r. B 63 , 533–5 50. Jørsb o e, O. G. (1968). Equivalenc e or Singularity of Gaussian Me as ur es on F unction Sp a c es . V arious Publications Series, No. 4, Matematisk Institut, Aarhus Univ e rsitet, Aarh us. Ramsa y , J.O. and Silve rman, B.W. (2005). F unctional Da ta Analysis . Second edition. Springer. Ruiz-Meana, M., Garc ´ ıa-Do r a do, D., Pina, P ., Inserte, J., Agull´ o, L. and Soler-Soler, J. (2003). Carip oride preserv es mito c hondrial proto n gradien t and delay s A TP depletion in cardiom y o cites during isc hemic conditions. Am. J. Physiol. He art Cir c. Physiol. 285 , H999-H10 06. Sac ks, J. and Ylvisak er, N.D. (19 66). D esigns for regression pr o blems with correlated errors. Ann. Math. Statist. 37 , 66–8 9. Segall, A. and Kailath, T. (1975). Radon- Nik o dym deriv ativ es with res p ect to meas ures induced b y discon tin uous indep enden t-incremen t pro cesses. Ann. Pr ob ab. 3 , 449– 464. Shin, J. (2008). An extension o f Fisher’s discriminant a na lysis for sto c hastic pro cesses. J. Multiv. Anal. 99 , 1191 –1216. Stone, C. J. (1977). Consisten t nonparametric regression. Ann. Statist. 5 , 595 –645. V a khania, N.N. (1975) . The top olog ical supp ort of Ga ussian measure in Banach space. Nagoya Math. J. 57 , 59– 63. V a rb erg, D.E. (1961 ). On equiv alence of Gaussian measures. Pacific J. Math. 11 , 751–762. V a rb erg, D .E. (1964). On Gaussian measures equiv alent to Wiener measure. T r ans. A mer. Math. So c. 113 , 262–2 7 3. 27 k -NN k k ∞ k -NN PLS Nonpar. plug-in P aram. plug-in Ba y es rule Tw o Bro wnian motions Deterministic at t = 0 ( θ 0 = θ 1 = 0) c = 1 . 5, σ = 1 0.68 0.73 0.71 0.77 0.77 (0.07) (0.07) (0.16) (0.06) (0.06) c = 3, σ = 1 0.90 0.91 0.86 0.93 0.93 (0.05) (0.05) (0.16) (0.04) (0.03) c = 2, σ = 2 0.60 0.64 0.64 0.69 0.69 (0.08) (0.08) (0.16) (0.07) (0.06) Random at t = 0 ( θ 0 , θ 1 6 = 0) c = 1 . 5, σ = 1 θ 0 = θ 1 = 1 0.67 0.66 0.71 0.77 0.77 (0.07) (0.08) (0.08) (0.07) (0.06) c = 1 . 5, σ = 1 θ 0 = θ 1 = 0 . 5 0.67 0.70 0.72 0.77 0.77 (0.07) (0.08) (0.08) (0.06) (0.06) Tw o Ornstein- Uhlen beck pro cesses Deterministic at t = 0 β 0 = 1, η 0 = 0, σ 0 = 1 β 1 = 1, η 1 = 1 0.54 0.58 0.60 0.63 0.62 (0.08) (0.08) (0.14) (0.07) (0.07) β 0 = 0 . 4, η 0 = 0, σ 0 = 0 . 4 β 1 = 1 , η 1 = 1 0.83 0.86 0.82 0.88 0.88 (0.09) (0.06) (0.16) (0.05) (0.05) Random at t = 0 β 0 = 0 . 5, η 0 = 0, σ 0 = 1 β 1 = 1, η 1 = 0 . 5 0.59 0.60 0.63 0.63 0.64 (0.13) (0.11) (0.14) (0.07) (0.14) β 0 = 0 . 5, η 0 = 0, σ 0 = 2 β 1 = 1, η 1 = 2 0.69 0.72 0.74 0.74 0.74 (0.11) (0.10) (0.11) (0.07) (0.09) T able 1: R esults of t he Mon te Carlo study k -NN k k ∞ k -NN PLS Nonpar. plug-in 0.79 0.66 0.85 T able 2: Prop ortion of correctly classified fo r the tr a nsformed cell data. 28 0 0.2 0.4 0.6 0.8 1 −3 −2 −1 0 1 2 3 4 t c=1.5, σ =1 0 0.2 0.4 0.6 0.8 1 −4 −2 0 2 4 6 t c=1.5, σ =1, θ 0 = θ 1 =1 (a) (b) 0 0.2 0.4 0.6 0.8 1 −0.5 0 0.5 1 t β 0 =0.4, η 0 =0, σ 0 =0.4, β 1 =1, η 1 =1 0 0.2 0.4 0.6 0.8 1 −6 −4 −2 0 2 4 6 t β 0 =0.5, η 0 =0, σ 0 =2, β 1 =1, η 1 =2 (c) (d) Figure 1: Some tr a jectories ( P 0 in gray and P 1 in dotted black) of the pro cesses used in the Monte Carlo study . In (a) and (b) w e ha v e t w o Bro wnian motions and in (c) and (d) the processes are Ornstein-Uhlen b eck . In (a) and (c) X (0) | Y = i is 0 and in (b) and (d) it is random. 29 0 1000 2000 3000 0 500 1000 1500 2000 2500 3000 3500 Time in seconds MCO 0 1000 2000 3000 400 600 800 1000 1200 1400 1600 1800 Time in seconds MCO (a) (b) Figure 2: Cell data (control gro up in grey and treatment gro up in blac k): (a) all the original observ ations; (b) sample means. 30
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment