Piecewise Convex Function Estimation and Model Selection

Piecewise Con v ex F unction Estimation and Mo del Select ion Kurt S. Riedel Abstract. Giv en noisy data, function estim ation is considered when the unkno wn function is kno wn a priori to consist of a small nu m b er of regions where the function is ei ther con v ex or concav e. When the regions are known a priori , t he estimate is reduced to a ﬁnite dimen- sional con vex optimization in the dual space. When the n umber of regions is unkno wn, t he mo del selection problem i s to determ ine the n umber of con vexit y c hange points. W e use a pilot esti mator based on the expect ed n umber of false inﬂection poi nts. § 1. In t ro duction Our basic tenet is: “Most real world functions are piecewise ℓ -con v ex with a small n um b er of c hange p oin t s of con vexit y .” Giv en N measuremen ts of the unkno wn function, f ( t ), contaminated with random noise, w e seek to estimate f ( t ) while preserving the geometric ﬁdelit y of t he estima te, ˆ f ( t ), with resp ect t o the true function. In other w ords, the n um b er and lo cat ion of t he change p oi n ts of conv exity of ˆ f ( t ) should appro ximate those of f ( t ). W e say t hat f ( t ) has k change p oints of ℓ -con vexit y w i th c hange p oin t s x 1 ≤ x 2 . . . ≤ x k if ( − 1) k − 1 f ( ℓ ) ( t ) ≥ 0 for x k ≤ t ≤ x k +1 . F or ℓ = 0, f ( t ) i s nonnegativ e and for ℓ = 1, the function is nondecreasing. In regions where the constrain t of ℓ -con v exity i s activ e, f ( ℓ ) ( t ) = 0 and f ( t ) is a p olynomial of degree ℓ − 1. F or 1-con vexit y , f ( t ) is constan t in the active constrain t regions and for 2-con vexit y , the func tion is linear. Our sub jectiv e b eli ef is that most p eople prefer smo othly v arying functions suc h as quadratic or cubic po lynomials eve n in t he active constrain t regions. Th us, piecewise 3-con vexit y or 4-con ve xity are also reasonable h yp otheses. The idea of constraining the function ﬁt to preserv e ℓ -con vexit y prop erties has b een considered b y a num b er of authors. The more di ﬃcult problems of determining the num b er and lo cation of the ℓ -con v exity breakp oin ts will b e a fo cus of this art i cle. W e refer to t he estimati o n of the n um b er of c hange p oints as the “mo del selectio n problem” because it resem bles mo del selection in an i nﬁnite family of parametric mo dels. Approximation Theory VI II 0 Charles K. Chui and La rry L. Sch uma k er (eds.), pp. 0—1. Copy right o c 1995 b y W orld Scien tiﬁc Publishing Co., Inc. All rights of repro duction in any fo rm reser ved. ISBN 0-12-xxxxxx-x Pie c ewise Conve x Mo del Sele ction 1 § 2 Con vex An alysis In this section, we assume t hat the c hange p oin t s { x 1 . . . x k } of ℓ - con vexit y are giv en and that the function is in the Sob olev space, W m,p [0 , 1] with m ≥ ℓ and 1 < p < ∞ where W m,p = { f | f ( m ) ∈ L p [0 , 1] and f , f ′ . . . f ( m − 1) ( t ) absolutely contin uous } . W e decomp o se W m,p in t o a direct sum of the space of p oly nomi als of degree m − 1, P m − 1 plus the set of functions whose ﬁrst m − 1 deriv atives v anish at t = 0 whic h w e denote b y W 0 m,p [10]. Giv en c hange p oi n ts, { x 1 , x 2 . . . x k } , we deﬁne the clo sed con vex cone V k,ℓ m,p [ x 1 , . . . , x k ] = { f ∈ W m,p | ( − 1) k − 1 f ( ℓ ) ( t ) ≥ 0 for x k − 1 ≤ t < x k } . Let x denote the k ro w v ector, ( x 1 , x 2 . . . x k ). W e deﬁne the class of functions with at most k change p oin ts as V k,ℓ m,p ≡ [ x 1 ≤ x 2 ... ≤ x k  V k,ℓ m,p [ x 1 , . . . , x k ] ∪ ( − V k,ℓ m,p [ x 1 , . . . , x k ])  . By allowing x k ′ = x k ′ +1 , we ha v e em b edded V k,ℓ m,p in t o V k +1 , ℓ m,p . V k,ℓ m,p is the union of con vex cones, and is closed but not con vex. F or the case p = ∞ , similar piecewise ℓ -con vex classes are deﬁned i n [2]. T o d ecomp ose W m,p in terms of V k,ℓ m,p , w e require that eac h function in W m,p has a piece- wise contin uous ℓ -th deriv ative . By the Sob ol ev em b edding t heorem, this corresp onds to the case m ≥ ℓ + 1. Let k f k p j,p ≡ R 1 0 | f ( j ) ( t ) | p dt . W e endo w W m,p with the norm: |k f k| p m,p = m − 1 X j =0 | f ( j ) ( t = 0) | p + k f k p m,p . The dual space of W m,p is isomorphic t o the direct sum of P m − 1 and W 0 m,q with q = p/ ( p − 1) and the dualit y pairing: hh g , f ii = m − 1 X j =0 b j a j + Z 1 0 f ( m ) ( t ) g ( m ) ( t ) dt . (2 . 1) In (2.1 ) , f ∈ W m,p , g ∈ W m,q , a j ≡ f ( j ) (0) and b j ≡ g ( j ) (0). W e denote the dualit y pairing b y hh·ii and the L 2 inner pro duct b y h·i . The space W m,p has a repro ducing kernel, R ( t, s ), suc h that for eac h t , f ( t ) = hh R t , f ii [ 10]. A linear operator, L ∗ i ∈ W ∗ m,p has represen tations L i f = hh L i R ( · , s ) , f ( s ) ii and L i f = h L i δ ( s − · ) , f ( s ) i . W e are given n measuremen ts of f ( t ): y i = L i f + ǫ i = h h i , f i + ǫ i = hh m i , f ii + ǫ i , (2 . 2) where L i R ( · , s ) are linear op erat ors i n W ⊥ m,p , and the ǫ i are i ndependen t, normally distri buted random v ari a bles with v ariance σ 2 i > 0. W e represen t L i as m i ( s ) ≡ L i R ( · , s ) a nd h i ( s ) ≡ L i δ ( s − · ) and assume h i ∈ W ⊥ ℓ, 1 . In the standard case where y i = f ( t i ) + ǫ i , m i ( s ) = R ( t i , s ) a nd h i ( s ) = δ ( s − t i ). 2 Kurt S. Rie del A robustiﬁed esti ma te of f ( t ) given the measuremen ts { y i } is ˆ f ≡ argmin VP[ f ∈ V k,ℓ m,p [ x 1 , . . . , x k ]]: VP[ f ] ≡ λ p Z | f ( m ) ( s ) | p ds + N X i =1 ψ i ( h h i , f i − y i ) , (2 . 3) where t he ψ i are strictly conv ex, con tin uous functions. The standard case i s p = 2 and ψ i ( y i − h h i , f i ) = | y i − f ( t i ) | 2 /nσ 2 i . The set of { h i , i = 1 , . . . , N } separate p olynomial s of degree m − 1 means that h h i , P m − 1 k =0 c k t k i = 0, ∀ i implies c k ≡ 0. Theorem 1. L et { h i } sep ar ate p olynomials of de gr e e m − 1 , then the min- imization pr oblem (2 . 3) has an unique solution in V k,ℓ m,p [ x ] and the mini- mizing function i s in C 2 m − ℓ − 2 and satisﬁes the d i ﬀer ential e quation: ( − 1) m d m [ | ˆ f ( m ) | p − 2 ˆ f ( m ) ( t )] + n X i =1 ψ ′ i ( h h i , ˆ f i − y i ) h i ( t ) = 0 , (2 . 4) in those r e gions wher e | f ( ℓ ) | > 0 f or 1 < p < ∞ . Pro of: The functional (2.3) is strictly con ve x, low er semicontin uous and co ercive , so b y Theorem 2.1.2 of Ek el a nd and T emam, it has a unique minim um, f 0 , on any closed con vex set. F r o m t he gene ralized calculus of con vex analysis, the solution satisﬁes 0 ∈ ( − 1) m d m [( | f ( m ) | p − 2 f ( m ) ( t )]Σ ψ ′ i ( h h i , ˆ f i − y i ) h i ( t ) + ∂ N V ( f ) (2 . 4) where N V ( f ) is the normal cone of V k,ℓ m,p [ x ] at f [1, p. 189]. F ro m [9], eac h elemen t of N V ( f ) is the limit of a discrete sum: P t a t δ ( ℓ ) ( · − t ) where the t ′ s are in the active cons traint region. In tegrating ( 2 . 4) yields | f ( m ) | p − 2 f ( m ) ( t ) = n X i =1 ψ ′ i ( h h i , ˆ f i − y i ) h h i ( s ) , ( s − t ) m − 1 + i ( m − 1)! + Z ( s − t ) m − ℓ − 1 + dµ ( s ) ( m − ℓ − 1)! , (2 . 5) where dµ corresp onds to a particular el ement of N V ( f ). Since ( s − t ) m − ℓ − 1 + is m − ℓ − 2 times diﬀeren tiable, the righ t hand si de of (2.5) is m − ℓ − 2 times diﬀeren ti able. In tegrating (2 . 5) yields f ∈ C 2 m − ℓ − 2 . The in terv als on whi c h f ( ℓ ) ( t ) v anishes are unkno wns and need t o b e found as part of the optimization. Using the diﬀeren tial characterization (2.3) loses the con vexit y properties of t he und erlying functional. F or this reason, extremizing the dual functional i s no w preferred. Theorem 2. The dual vari a ti onal pr oblem is: Minimiz e over α ∈ l R n VP ∗ [ α ; x ] ≡ λ 1 − q q Z | [ P x ∗ M α ] ( m ) ( s ) | q ds + n X i =1 ψ ∗ i ( α i ) − α i y i , (2 . 6) Pie c ewise Conve x Mo del Sele ction 3 wher e M α ( t ) ≡ P i m i ( t ) α i and ψ ∗ i is the F enchel/L e gendr e tr ansf orm of ψ i . The dual pr oje ction P x ∗ is deﬁne d as Z | [ P x ∗ g ] ( m ) ( s ) | q ds ≡ inf ˜ g ∈ V − Z 1 0 | g ( m ) − ˜ g ( m ) ( s ) | q , (2 . 7) wher e the minimizati o n i s over ˜ g in the dual c one subje ct to g ( j ) (0) = ˜ g ( j ) (0) , 0 ≤ j < m . The dual pr oblem i s strictly c onvex and i ts minimum is the ne gative of the inﬁmum of (2. 3). Pro of: Let ψ V b e the indicat o r function o f V k,ℓ m,p [ x ] and deﬁne U ( f ) = λ p Z 1 0 | f ( m ) ( s ) | p ds + ψ V ( f ) . (2 . 8) W e cla im that the Legendre transform of U ( f ) is the ﬁrst t erm in (2.6). Note that ψ ∗ V ( g ) = ψ V − ( g ), the indicato r function of t he dual cone V − . Since the Legendre transform of t he ﬁrst term in (2. 8) is V ∗ 1 ( g ) = λ 1 − q q Z 1 0 | g ( m ) ( s ) | q ds for g ∈ W 0 m,q , and ∞ otherwise . Our claim follows from [ U 1 + U 2 ] ∗ ( g ) = inf g ′ { U ∗ 1 ( g − g ′ ) + U ∗ 2 ( g ′ ) } . The remainder of the theorem follows from the general duality theorem of Aubin and Ekeland [1, p. 221]. F or t he case ℓ = m , the mi nimization ov er the dual cone can b e done explicitly . F or ℓ < m , Theorem 1 i s pro ven in [9] and Theorem 2 i s pro ven in [6] for the case p = 2 and ψ ( y ) = y 2 . Equation (2.5) and the corresp onding smo othness results app ear in [9] for the case ℓ = 1, p = 2 and L i = δ ( t − t i ). § 3. Change p oint estimat i on When t he n um b er of c hange p oi nts is ﬁxed, but the lo cat ions are unkno wn, w e can estimate them by mi nimizing the functional in (2.3) with resp ect to the c hange po in t lo cati ons. W e now sho w that there exists a set of minimizing ch ange p oints. Theorem 3. F or e ach k , ther e exis t change p oi nts { x j , j = 1 , . . . k } that minimize the vari ational pr oblem (2.3). Pro of: W e use the dual v ariat ional problem (2.5) and max i mize ov er x ∈ [0 , 1] k after minimizi ng o ver the α ∈ R N . The functional (2.5) i s jointly con ti nuous in α, x and con vex in α . Theorem 3 foll ows from the min-max theorem [1, p. 296]. The c hange p oint l o cations need not b e unique. The pro of requires ≤ i nstead of < i n the o r dering x j ≤ x j +1 to make the ch ange p oi n t space compact. When x j = x j +1 , the n umber of eﬀectiv e change po in t s is less than k . Finding the x that minimizes VP ∗ is computationally i n tensive and requires the solution of a con v ex programming problem at eac h step. Theorems 1-3 are v alid when ℓ ≤ m including ℓ = m . Restricting to p = 2, w e ha ve the following theorem from [9]: 4 Kurt S. Rie del Theorem 4. [Ut reras] L et f b e i n a close d c onvex c one, V ⊆ W m, 2 , let ˆ f u b e the unc onstr aine d mini mizer of (2.3) given y i and ˆ f c b e the c onstr aine d minimizer (with p = 2 and ψ i ( y ) = | y | 2 /σ 2 i ). Then k f − ˆ f c k V ≤ k f − ˆ f u k V wher e k f k 2 V ≡ λ 2 R | f ( m ) ( s ) | 2 ds + P N i =1 ψ i ( L i f ) . Theorem 4 sho ws t hat if one is certain that f is in a particular closed con vex cone, the constrained estimate is alw ays b etter than the uncon- strained one. Unfortunately Theorem 4 does not generali ze to unions of con vex cones and th us do es not apply to V k,ℓ m, 2 . § 4. Num b er of false inﬂection p oin t s W e no w consider unconstrained estimates of f ( t ) and exami ne the n umber of false ℓ -i nﬂection p oin ts. W e assume that the noisy measuremen ts of f o ccur at nearly regula rly spaced l o cations, t i (with h i ( t ) = δ ( t − t i )). Sp eciﬁcally , we ass ume that d n ≡ s up t { F n ( t ) − F ( t ) } tends t o zero as n − b with b ≥ 0 where F n ( t ) is t he empiri cal distri bution of t he t i and F ( t ) is the limiti ng distribution. F or regular l y spaced p oin t s, d n ∼ 1 /n . This nearly regularly spaced assumption allows us to a ppro xi ma te t he discrete sums o v er the t i b y integrals. A smoothi ng k ernel estimate of f ( ℓ ) ( t ) i s a w eigh t ed a ve rage of the y i : d f ( ℓ ) ( t ) = X i y i κ ( t − t i h n )  t i +1 − t i − 1 2  , (4 . 1) where h n is the k ernel halfwidth and κ is the kernel. κ i s required t o satisfy the momen t conditions: R 1 − 1 s j κ ( s ) ds = ℓ ! δ j,ℓ , 0 ≤ j < ℓ + 2, with κ ∈ C 2 [ − 1 , 1] and that κ ( ± 1) = κ ′ ( ± 1) = 0. W e call suc h functions- C 2 extended k ernels. When f ∈ C m , the optimal hal fwi dt h scales as h n ∼ n − 1 / (2 m +1) , a nd the optimal spline smo othing parameter scales as λ n ∼ n − 2 m/ (2 m +1) . In [5], Mammen et al . deriv e the num b er of false inﬂection p oints for k ernel estima tion of a probabili t y density . W e presen t the analogous result for r egression fun ction estimat ion. The proofs in o ur case are easier b ecause w e need only sho w that discrete sums con v erge to their limit s. Our results are for arbitr a ry ℓ while [4, 5] considered ℓ = 1 , 2. Theorem 5. (Analog of [4 , 5]) L et f ( t ) ∈ C ℓ +1 [ a, b ] have K ℓ -inﬂe ction p oints { x 1 , . . . x k } with f ( ℓ +1) ( x j ) 6 = 0 , f ( ℓ ) ( a ) 6 = 0 and f ( ℓ ) ( b ) 6 = 0 . Con- sider a s e quenc e of kernel smo other es timates with C 2 extende d kerne ls. L et the se quenc e of kernel halfwidths, h n , s atisfy 0 < liminf n h n n 1 / (2 ℓ +3) ≤ l imsup n h n n 1 / (2 ℓ +3) < ∞ , then the exp e cte d numb er of ℓ -inﬂe cti on p oints is E [ ˆ K ] − K = 2 K X j =1 H s nh 2 ℓ +3 | f ( ℓ +1) ( x j ) | 2 σ 2 k κ ( ℓ +1) k F ′ ( x j ) ! , (4 . 2) Pie c ewise Conve x Mo del Sele ction 5 wher e σ 2 = V ar [ ǫ i ] , H ( z ) ≡ φ ( z ) /z + Φ( z ) − 1 with φ and Φ b eing the Gaussian densi ty and distribution pr ovi de d that d n < n − 1 / 2 . Pro of: The pro of consists of applying the Cra m´ er-Leadbett er zero-crossing form ula to (4.1) and then taki ng the limit as n → ∞ . Theorem (Cram´ er-Leadbetter) L et N b e the numb er of zer o cr ossings of a diﬀer enti a b le Gaussian pr o c ess , Z ( t ) , in the time interval [ 0,T]. Then E [ N ] = Z T 0 γ ( s ) ρ ( s ) σ ( s ) φ  m ( s ) σ ( s )  G ( η ( s ) ) ds , (4 . 3) wher e σ 2 ( s ) = V ar [ Z ( s )] , γ 2 ( s ) = V ar [ Z ′ ( s )] , µ ( s ) = Corr [ Z ( s ) Z ′ ( s )] , ρ ( s ) 2 = 1 − µ ( s ) 2 , m ( s ) = E [ Z ( s )] , η ( s ) = m ′ ( s ) − γ ( s ) µ ( s ) m ( s ) /σ ( s ) γ ( s ) ρ ( s ) . W e claim that for (4.1), σ 2 n ( s ) → σ 2 k κ ( ℓ ) k 2 F ′ ( s ) /nh 2 ℓ +1 , γ 2 n ( s ) → σ 2 k κ ( ℓ +1) k 2 F ′ ( s ) /nh 2 ℓ +3 , µ n ( s ) → O R ( h n + 1 /nh 2 ℓ +1 ), ρ n ( s ) 2 → 1, m n ( s ) → f ( ℓ ) ( s ) + O ( h n + 1 /nh ℓ +1 ). T o show the con vergence of the discrete sums to in tegrals, w e use R g ( s ) ds = P i g ( t i )[ t i +1 − t i − 1 ] / 2 + O R ( sup i [ t i +1 − t i − 1 ] 2 /h 2 n ) where O R denotes a relative si ze of O . More detailed pro ofs of the con v ergence of the discrete sums to integrals can b e found in [2]. Since the in tegrand i n (4. 3) is bounded and con v erging p oin t wise, the dominated con ve rgence theorem sho ws that the sequence of in t egra ls giv en b y (4.3) con verges. Equation (4.2) follo ws b y ev aluat ing the in t egra l using the metho d of steep est descen t. Corollary . L et f ( t ) ∈ C ℓ +1 [ a, b ] , d n /h ℓ +1 n → 0 and nh 2 ℓ +3 n → ∞ with κ a C 2 extende d kernel. The p r ob abi lity that d f ( ℓ +1) has a false inﬂe ction p oint outside of a wi d th of δ fr om the actual ( ℓ + 1) -inﬂe ction p oints is O (exp( − nh 2 ℓ +3 n )) . F or t he case p = 2, the smo othing spline estima te is a linear estimate of the form d f ( ℓ ) ( t ) = P i y i g n,λ ( t, t i ) where g n,λ ( t, t i ) solv es the equation: ( − 1) m λ n g (2 m ) n,λ ( t, s ) + P n i g n,λ ( t i , s ) = δ ( t − s ), with the b oundary condi- tions, ∂ j t g n,λ (0 , s ) = 0 = ∂ j t g n,λ (1 , s ) for m ≤ j < 2 m . Theorem 6. [Silverman] L et λ − 1 / 2 m n d n → 0 , | F ′′ ( t ) | < ∞ and 0 < c 1 < F ′ ( t ) < c 2 . The Gr e en ’s f unction, g n,λ ( t, t i ) , of the s mo othing s pline c onver ges to a kerne l function wi th the halfwidth, h ( t ) = [ λF ′ ( t )] 1 / 2 m : h ( t ) ∂ j s g n,λ ( t + h ( t ) s, t ) → ∂ j s κ ( s ) /F ′ ( t ) , 0 ≤ j < m wher e the e quivalent kernel satisﬁes ( − 1) m κ (2 m ) ( t ) + κ ( t ) = δ ( t ) with de c ay at inﬁnity b oundary c onditi ons. The c onver genc e i s unifo rm f o r in any close d sub domain, t ∈ [ δ , 1 − δ ] and t + h ( t ) s ∈ [0 , 1] . Although [7] c onsiders only m = 2, the proof easily extends to m > 2. Using this con v ergence result, The orem 5 also holds for smoothing splines: Theorem 7. F or a se quenc e of smo othing spline estimates of f as given by Thm. 6, Eq . (4.2) holds pr ovi de d that the s mo othing p ar ameters s atisfy 6 Kurt S. Rie del 0 < limi nf n λ 1 / 2 m n n 1 / (2 ℓ +3) ≤ limsup n λ 1 / 2 m n n 1 / (2 ℓ +3) < ∞ and ℓ < 2 m − 5 / 2 . § 5. Data-base d Pilot Estimators with Geometric Fidelity W e consider t wo step estimators that b egin b y estimating f ( ℓ ) and f ( ℓ +1) using an unconstrained estimate wi th h n ∼ l og 2 ( n ) n 1 / (2 ℓ +3) . In the second step, we p erform a constrai ned ﬁt, at some l o cations requiring d f ( ℓ ) to b e monotone and in other regions requiring d f ( ℓ − 1) to be monotone. F rom the pilot estimate, w e determine the n um b er, ˆ k , and appro x imate lo catio ns of the inﬂection p oints. A t each empiri cal inﬂection p o in t, ˆ x j , w e deﬁne t he α un certain t y in terv al b y [ ˆ x j − z α ˆ σ ( ˆ x j ) , ˆ x j + z α ˆ σ ( ˆ x j )], where ˆ σ 2 ( ˆ x j ) = σ 2 k κ ( ℓ ) k 2 F ′ ( s ) / | d f ( ℓ +1) ( ˆ x j ) | 2 nh 2 ℓ +1 and z α is the tw o sided α - quan t i le for a normal distribution. If an even n umber of uncertaint y i n terv als o ve rlap, w e constrain the ﬁt suc h that d f ( ℓ ) to b e p o sitive/ne gative in eac h interv al. If an o dd num- b er of uncertain t y interv als ov erlap, we cons train the ﬁt suc h t hat d f ( ℓ +1) to b e p osi t ive /negative in a subregion of t he uncertain ty interv al whic h con tai ns an ev en n um b er of inﬂection p oin ts of d f ( ℓ +1) . (The si g n of d f ( ℓ ) or d f ( ℓ +1) is c hosen to mat c h the outer regio n.) Asymptoti cally , the uncer- tain t y interv als do not ov erlap and w e constrain the ﬁt suc h that d f ( ℓ +1) is p ositive/negativ e in eac h uncertain ty in terv al. Theorem 8. Consider a two stag e estimator that with pr ob ability, 1 − O ( p n ) , c orr e ctly cho oses a close d c onvex c one V , with f ∈ V , i n the ﬁrst stage and the n p erforms a c onstr aine d r e gr ession as in (2.3) with p = 2 . F or f ∈ W m, 2 , under the r estricti o ns of Thm 4.4 of [9], the e stimate, ˆ f , c onver ges as E k ˆ f − f k 2 j ∼ α j λ ( m − j ) /m k f k 2 m + β j σ 2 / ( nλ 2 k +1 2 m ) , wher e n is lar ge enough that λ n k f k 2 m > p n (1 /nσ 2 ) P i | f ( t i ) | 2 and p n nλ 1 2 m → 0 . Pro of: If the constrain ts a re correct, Theorem 4 yields t he a symp- totic error b ound [9]. W e need to sho w that mi ssp eciﬁed mo dels do not con tri bute signiﬁcan tl y to the error. If the mo del is missp eciﬁed, then k ˆ f − f k V ≤ λ ( k f k m + k ˆ f k m ) + (1 /nσ 2 ) P i | ˆ f ( t i ) − f ( t ) − ǫ i | 2 + ǫ 2 i ≤ λ k f k m + ( 1 /nσ 2 ) P i ( y 2 i + ǫ 2 i ) ≤ k f k V + 1 . 1 χ 2 n ( p n ) /n . The exp ected er- ror is E k ˆ f − f k 2 j ≤ E f ∈ V k ˆ f − f k 2 j + p n E f / ∈ V k ˆ f − f k 2 j . Note Theorem 4. 4 of [9 ] applies to both pieces. Asymptoticall y as p n → 0, χ n ( p n ) ≤ 1 . 5 n + O ( p n ), where χ 2 n ( p n ) is deﬁned by R ∞ χ dp χ n = p n . A similar result is given in [3] for the case of constrained least squares. The tri c k of Theorem 8 is to constrain d f ( ℓ +1) to b e posit i v e (or negative) in the uncertaint y in t erv al of the estimated inﬂection points ra ther than constraining d f ( ℓ ) to ha ve a single zero a round ˆ x j . Pie c ewise Conve x Mo del Sele ction 7 W e r ecommend c ho osing the ﬁrst sta ge halfwidth, h n prop ortional to the halfwidth c hosen b y generalized crossv alidati o n (GCV): h n = ι ( n ) h GC V where ι ( n ) = l og 2 ( n ) n 1 / (2 ℓ +3) − 1 / (2 m +1) . The secon d stage smoothi ng pa- rameter, λ n , is c hosen to b e the GCV v alue λ n = λ GC V . Other sc hemes [8] c ho o se the ﬁnal smoothing parameter to b e the small est v alue t hat yields only k inﬂection p oin t s i n an unconstrained ﬁt. Since spurious inﬂection p oin ts asymptotically o ccur only in a neigh b orho o d of an actual inﬂec- tion point, these earlier sch emes ov ersmo oth a wa y from the actual i nﬂec- tion points. In con tra st , our second stage use t he asymptotically optimal amoun t of smo othi ng while preserving geometr i c ﬁdelit y . Ac kno wle dgmen ts. W ork funded b y U.S. Dept. of Energy Gran t DE- F G02-86 ER53223.. Reference s 1. Aubin, J.-P . and I . Ekeland, Applie d Nonline ar Analy sis , John Wiley , New Y ork 1984. 2. Gasser, Th. and M ¨ uller, H., Estimating functions and their deriv ative s b y the kerne l met ho d, Scand. J. of Stat. 11 (19 84), 171–185. 3. Mammen, E., N onparametric regression under qualitative smo othness assumptions, Ann. Stat ist., 19 (199 1 ), 741-759. 4. Mammen, E., On qualitati v e smo othness of ke rnel densit y estimates, Univ erist y of Heidelb erg Rep ort 614 . 5. Mammen, E. , J.S. Marron and N.J. Fisher, Some asymptotics for mul- timo dal tests based on kerne l densit y estimates, P ro b. Th. Rel. Fields, 91 (1992) , 11 5 -132. 6. Michelli, C. A., and F. Ut reras, Smoothing and i n terp olation in a con- v ex set of Hi lb ert sp ace, SIAM J. Stat. Sci. Comp. 9 (1985), 7 28-746. 7. Sil verman, B. W., Spli ne smo othing: the equiv alen t v ariable kerne l metho d, Ann. Stat. 12 (198 4 ), 898-916. 8. Sil verman, B. W., Some properti es of a test for m ult i mo dality based on k ernel density estimates, in “Probability , Statistics and Analy sis”, J. F. C. Kingman a nd G. E. H. Reuter ed s., pp. 248-259. Cam bridge Univ ersit y Press, 1983 . 9. Ut reras, F., Smo ot hi ng noisy data under monotonicity constrain ts - Existence, c haracterization and con vege nce rates, N umerische Mat h. 47 (1985) , 61 1 -625. 10. W ahba, G., Spline Mo dels for Observati onal Data , SIAM, Phil adelphia, P A 1991. Kurt S. Rie del New Y ork Universit y , Courant Institute 251 Mercer St., New Y ork, NY 1001 2 riedel@cims.nyu.edu

Piecewise Convex Function Estimation and Model Selection

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment