Inference for Multivariate Normal Mixtures
Multivariate normal mixtures provide a flexible model for high-dimensional data. They are widely used in statistical genetics, statistical finance, and other disciplines. Due to the unboundedness of the likelihood function, classical likelihood-based…
Authors: Jiahua Chen (University of British Columbia), Xianming Tan (Nankai University)
Inference for Multiv ariate Normal Mixtures Jiahua Chen Dep artment of Statistics, University of British Columbia V anc ouver, BC, V 6T 1Z2, Canada Xianmi ng T an LMPC and Scho ol of Mathematic al Scienc es, Nankai University Tianjin, 300071 , P.R. China Abstract Multiv ariate n ormal mixtures p ro vide a fl exible mo d el for high-dimensional d ata. They are widely used in statistical genetics, statistica l finance, and other disci- plines. Due to the unb oundedn ess of the likelihoo d function, classical like liho o d- based metho ds, which ma y hav e nice pr actical prop erties, are inconsisten t. In this pap er, w e recommend a pen alized lik eliho o d metho d for estimating the mixing dis- tribution. W e sho w that the maxim um p enalized lik eliho o d estimator is s trongly consisten t when the n um b er of comp onen ts has a kno w n upp er b ound. W e also ex- plore a con venien t EM-algo rithm for computin g the maxim um p enalized lik eliho o d estimator. Extensiv e simulat ions are conducted to e xplore the effe ctiv eness and t he practical limitat ions of b oth the new metho d and the ratified maxim um lik eliho o d estimators. Guid elines are p r o vid ed based on the simulat ion results. Key wor ds: Multiv ariate normal m ixtu re, Penaliz ed maxim um lik eliho o d estimator, Str on g consistency . P ACS: 02.5 0.-r 1 In t ro duction In the pa st few decades, there has b een an explo ding v olume of literature on mixture models [22, 13, 15, 6]. V ario us mixture distributions including normal mixtures are used in a wide v ariety of situations. Sc hork et al. [19] review ed Email addr esses: jhch en@stat. ubc.ca (Jiah u a Ch en), tanxm@ nankai.e du.cn (Xianming T an). Preprint su bmitted to Elsevier 13 No vem b er 201 8 the applications of mixture mo dels in h uman genetics a nd T adesse et al. [20] used a normal mixture mo del for clustering a nalysis. Application examples can b e found in [5, 12, 16] and [1]. Finite mixtures of multiv ariate normals hav e also drawn substan tial atten- tion recen tly . Lindsa y and Ba sak [14 ] devise d a system of moment equations and a fast algorithm to estimate the para meters of mu ltiv ariate normal mix- ture distributions under an equal-cov ariance-matrix assumption. How ev er the equalit y assumption is crucial, and failing this condition leads to a substan tia l loss in the accuracy of the fit [15]. Unequal-v ariance norma l mixture mo dels ha v e an ill effect on the likelihoo d function [3]. Placing a p o sitiv e lo w er b ound on the comp o nen t v ariances helps, but the resulting statistical pro cedure can b e a wkw ard b ecause it is not contin uous in the da ta. Placing a p ositiv e low er b ound on the ratio of the componen t v ariances is b etter. In the univ aria te case the resulting constrained maxim um like liho o d estimator is consisten t fo r b oth constan t and shrinking low er b ounds [8, 2 1]. Though consistency is yet to b e pro v ed, Ingr assia [9] applied the constrained metho d to m ultiv ariate observ a- tions. Ra y and Lindsa y [17] fo und that in con tra st to t he univ ariate case, the m ultiv ariat e normal mixture densit y can hav e more mo des than the n um b er of comp onen ts. Inference on m ultiv ariate normal mixture mo dels is hence more difficult. In this pap er, w e in ves tigate a p enalized lik eliho o d metho d for estimating the mixing distribution. The p enalized lik eliho o d estimations form a p opulation class of metho ds, see [7, 4]. When t he n um b er of comp onen t s has a kno wn upp er b ound, the maximum p enalized lik eliho o d estimator (PMLE) is fo und to b e strongly consisten t. An EM-algor it hm is dev elop ed and extensiv e sim u- lations are conducted. Although a fter some ratification, the usual maxim um lik eliho o d estimators and the PMLE w ork similarly after the remov al of de- generating lo cal maxima in the univ ariate case [2], the PMLE is a dv antageous for m ultiv ariate no r mal mixture mo dels. The pap er is organized as follows . In Section 2, the p enalized lik eliho o d metho d is intro duced. Tw o t heorems on strong consistency are presen ted with the pro ofs deferred to the App endix. The EM-algorithm for solving the maximiza- tion pro blem for the p enalized like liho o d function is given. Section 3 contains the sim ulation results. 2 2 P enalized lik eliho o d metho d 2.1 Consistency of the PMLE Let ϕ ( x ; µ, Σ) be the m ultiv ariate normal densit y with ( d × 1) mean v ector µ and d × d co v ariance matrix Σ, i.e., ϕ ( x ; µ, Σ) = { 2 π | Σ |} − d/ 2 exp {− 1 2 ( x − µ ) τ Σ − 1 ( x − µ ) } . A d -dimensional random v ector X has a m ultiv aria te finite no rmal mixture distribution of order p if its densit y function is giv en b y f ( x ; G ) = π 1 ϕ ( x ; µ 1 , Σ 1 ) + π 2 ϕ ( x ; µ 2 , Σ 2 ) + · · · π p ϕ ( x ; µ p , Σ p ) (1) where G is the mixing distribution assigning probabilit y π j to parameter set ( µ j , Σ j ) of the j th k ernel densit y ϕ ( x ; µ j , Σ j ). Let x 1 , x 2 , . . . , x n b e a random sample from (1). Then l n ( G ) = n X i =1 log f ( x i , G ) is the log-like liho o d function. Ev en if | Σ j | > 0 for all j , l n ( G ) is un b ounded at µ 1 = x 1 when | Σ 1 | gets arbitrarily small. The penalized log- lik eliho o d function is of the form pl n ( G ) = l n ( G ) + p n ( G ) where p n ( G ) is the p enalt y dep ending o n the mixing distribution G and the sample size n . Let ˆ G n b e the mixing distribution in the parameter space at whic h pl n ( G ) att ains its maxim um. W e call ˆ G n the p enalized maximum lik e- liho o d estimator (PMLE). W e c ho ose a p enalt y function suc h that: C1. p n ( G ) = P p j =1 e p n ( Σ j ), C2. A t any fixed G suc h that | Σ j | > 0 for all j = 1 , 2 , . . . , p , we ha v e p n ( G ) = o ( n ), and sup G max { 0 , p n ( G ) } = o ( n ). In addition, p n ( G ) is differen tiable with resp ect to G and as n → ∞ , p ′ n ( G ) = o ( √ n ) at a n y fixed G suc h that | Σ j | > 0 fo r all j = 1 , 2 , . . . , p . Here w e treat G as a v ector o f parameters containe d in the mixing dis- tribution G . C3. F or large enough n , e p n (Σ) ≤ 4(log n ) 2 log | Σ | , when | Σ | is smaller than cn − 2 d for some c > 0. These conditions are quite flexible and functions satisfying these conditions can be easily constructed. A class of suc h functions will b e giv en in the sim u- 3 lation section. Condition C1 simplifies the n umerical computation. Condition C2 limits the effect of the p enalt y . The k ey conditio n is C3: it counters the damaging effect of a degenerate comp onen t cov ariance matrix. The order of the p enalty size is w ell calibrated a s will b e seen in the pro of, y et the exact v alue of the constan t 4 is not imp ort a n t . The p enalty function can also b e view ed as a prior function via Ba ye sian analysis. Theorem 1 Assume that the true density function f ( x ; G 0 ) = p 0 X j =1 π 0 j ϕ ( x ; µ 0 j , Σ 0 j ) satisfies π 0 j > 0 , | Σ 0 j | > 0 , and ( µ 0 j , Σ 0 j ) 6 = ( µ 0 k , Σ 0 k ) for al l j = 1 , 2 , . . . , p 0 and j 6 = k . Assume that the p enalty function p n ( G ) satisfies C1 - C 3 and ˜ G n is a m ixing distribution of or der p 0 satisfying pl n ( ˜ G n ) − pl n ( G 0 ) ≥ c > −∞ , for al l n . Then, as n → ∞ , ˜ G n → G 0 , almost sur ely. The pro of is deferred to the App endix. Since pl n ( ˆ G n ) − pl n ( G 0 ) ≥ 0, the PMLE ˆ G is strongly consisten t. Because ˆ G n and G 0 ha v e the same order, all elemen ts in ˆ G n con verge to those of G 0 almost surely . F urthermore, let S n ( G ) = n X i =1 ∂ log f ( x i ; G ) ∂ G b e the v ector score function at G . Let S ′ n ( G ) = n X i =1 ∂ S n ( G ) ∂ G b e the matrix o f the second deriv a tiv e of the log-like liho o d function. A t G = G 0 , the normal mixture mo del is regular and hence the Fisher information I n ( G 0 ) = nI ( G 0 ) = − E { S ′ n ( G 0 ) } = E h { S n ( G 0 ) } τ S n ( G 0 ) i is p ositive definite. Using classical asymptotic tec hniques as in [11], and under condition C2 suc h that p ′ n ( G ) = o p ( n 1 / 2 ), w e hav e ˆ G n − G 0 = { S ′ n ( G 0 ) } − 1 S n ( G 0 ) + o p ( n − 1 / 2 ) . Therefore, ˆ G n is an asymptotically normal and efficien t estimator. 4 Theorem 2 Under the same c onditions as in The or em 1, as n → ∞ , √ n { ˆ G n − G 0 } → N (0 , I ( G 0 )) in distribution. The pro of is straigh tforw a rd and o mitt ed. In practice, w e may kno w only an upp er b ound for p 0 rather than its exact v alue. The follow ing theorem deals with this situation. Theorem 3 Assume the sam e c onditions as i n The or em 1, exc ept that the or der of the finite normal m ixtur e mo del p 0 is known only to b e smal ler than or e qual to p . L et ˜ G n b e a mixing distribution of or der p satisfying pl n ( ˜ G n ) − pl n ( G 0 ) ≥ c > −∞ for al l n . Then, as n → ∞ , G n w → G 0 almost sur ely. The pro of is deferred to the App endix. 2.2 The EM-alg o rithm W e recommend the EM-algorithm due to its simplicity in co ding, and it s guaran teed con v ergence to some local maximu m under v ery general conditions [24, 18, 7]. In our sim ulat ions, w e use a n um b er of initial v alues to reduce the risk o f p o or lo cal maxima. W e also r ecommend some con v enien t and effectiv e p enalt y functions for the EM-algorithm. Let z ij b e the mem b ership indicator v ariable suc h that it equals 1 when x i is from the j t h comp onen t of the normal mixture mo del, and equals 0 otherwise. The comple te observ atio n log-lik eliho o d under a normal mixture model is then giv en b y l c ( G ) = n X i =1 p X k =1 z ik log π k − 1 2 log | Σ k | − 1 2 ( x i − µ k ) τ Σ − 1 k ( x i − µ k ) . Giv en the curren t mixing distribution G ( m ) = ( π ( m ) 1 , . . . , π ( m ) p , µ ( m ) 1 , . . . , µ ( m ) p , Σ ( m ) 1 , . . . , Σ ( m ) p ) , the EM-algorithm iterates as follows : In the E-Step, w e compute π ( m +1) ij = E { z ij | x 1 , . . . , x n , G ( m ) } = π ( m ) j φ ( x i ; µ ( m ) j , Σ ( m ) j ) P p j =1 π ( m ) j φ ( x i ; µ ( m ) j , Σ ( m ) j ) . 5 Replacing z ij b y π ( m +1) ij in l c ( G ), w e g et Q ( G ; G ( m ) ) = E { l c ( G ) + p n ( G ) | x 1 , . . . , x n , G ( m ) } = p X j =1 (log π j ) n X i =1 π ( m +1) ij − 1 2 p X j =1 (log | Σ j | ) n X i =1 π ( m +1) ij − 1 2 p X j =1 n X i =1 π ( m +1) ij ( x i − µ j ) τ Σ − 1 j ( x i − µ j ) + p n ( G ) . This completes the E-step. In the M-step, w e maximize Q ( G ; G ( m ) ) with respect to G to obtain G ( m +1) . W e suggest the follow ing p enalt y functions in practice: p n ( G ) = − a n p X j =1 n tr( S x Σ − 1 j ) + log | Σ j | o (2) with S x b eing the sample cov ariance matrix, and tr( · ) b eing the trace function. Using this p enalt y function, Q ( G ; G ( m ) ) is maximized at G = G ( m +1) with π ( m +1) j = 1 n n X i =1 π ( m +1) ij , µ ( m +1) j = P n i =1 π ( m +1) ij x i nπ ( m +1) j , Σ ( m +1) j = 2 a n S x + S ( m +1) j 2 a n + nπ ( m +1) j where S ( m +1) j = n X i =1 π ( m +1) ij ( x i − µ ( m +1) j )( x i − µ ( m +1) j ) τ . F rom a Ba ye sian p oin t of view, the p enalt y function (2) puts a Wishart dis- tribution pr io r on Σ j , and S x is the mo de of the prior distribution. Increasing the v alue of a n implies a stronger conviction o n S x as the p ossible v a lue o f Σ j . The EM-algorithm iterates betw een the E-step and the M-step. The penal- ized like liho o d increases after each iteration. A t the same time, the p enalize d lik eliho o d is b ounded ov er the parameter space. Hence, the EM-a lg orithm con- v erges to a non-degenerate lo cal maximum. This is the dividing line b et w een the p enalized likelihoo d and the ordinary lik eliho o d. In b o th cases, the EM- algorithm may conv erge to an undesired lo cal maxima starting f rom a p o or initial v alue. In the sim ulations, we use ten initial v alues including the true v alue for eac h data set to control this p oten t ia l problem. 6 3 Sim ulation study . When computing the MLE the lo cal maxima lo cated b y the EM-algorit hm with degenerate co v ariance matrices are first remo v ed. The o ne that at tains the largest lik eliho o d v alue a mo ng those remaining is then iden tified as the MLE or the ra t ified MLE of the mixing distribution. Although this approac h lac ks solid t heoretical supp ort, it w orks w ell fo r univ ariate normal mixture mo dels [2]. The consistency result for the PMLE for multiv ariate normal mixture mo dels do es not guara n tee it s superiority in pra ctice. Thus , we feel obliged to compare the p erformance of the PMLE with that o f the ratified MLE. In addition, there is a general shortage of tho r o ugh simu lation studies in the con text of m ultiv ariate normal mixture mo dels. This pap er partially fills that kno wledge gap. W e use bia s and standard deviation to measure the accuracy of the rati- fied MLE and the PMLE. W e also record the num b er of times that the EM- algorithm degenerates when the r a tified MLE is att empted. F o r clar ity , the sim ulation results are organized in to t wo subs ections. 3.1 Simulation m o dels a nd settings The size of the parameter space for the finite m ultiv ariate normal mixture mo del explo des with the dimension. It is difficult to use a few typical specific distributions to co v er all asp ects of this mo del. W e struggled to come up with a few particularly imp ortan t cases. W e considered four categories of mixture mo dels: tw o-comp onent biv ariate normal mixture mo dels ( p = 2 , d = 2); three-comp onen t biv ariat e no r ma l mixture mo dels ( p = 3 , d = 2); t wo- comp onen t triv ariate normal mixture mo dels ( p = 2 , d = 3); and three- comp onen t triv ariate normal mixture mo dels ( p = 3 , d = 3). In eac h category , w e c hose 3 × 6 mo dels formed by comp onent mean v ector and cov ariance matrix configurations. These combin ations mimic practical sit- uations and mak e the comparison of the perfor mance o f the r atified MLE and the PMLE meaningful. The cov ariance matrices in the simulation mo dels are designed to ha ve the follo wing general form when d = 2: Σ = cos θ − sin θ sin θ cos θ λ 1 0 0 λ 2 cos θ sin θ − sin θ cos θ . By the c hoices of the eigen v alues λ 1 , λ 2 , and the orien tatio n angle θ , we obt a in 7 v arious configurations of biv ariate normal mixture mo dels. The cov ariance matrices in the simulation mo dels are designed to ha ve the follo wing general form when d = 3: Σ = P ( α, β , γ )diag[ λ 1 , λ 2 , λ 3 ] P T ( α, β , γ ) with P ( α, β , γ ) = cos α cos γ − cos β sin α sin γ − cos β cos γ sin α − cos α sin γ sin α sin β cos γ sin α + cos α cos β sin γ cos α cos β cos γ − sin α sin γ − cos α sin β sin β sin γ cos γ sin β cos β , that is, a 3 × 3 ro tation matrix. F or each m ultiv ariate normal mixture mo del, w e sp ecify the mixing prop ort ion, co v ariance matrix, and mean v ector f or eac h comp onen t. Tw o-component biv ariate normal mixture mo dels. W e set the comp o- nen t prop ortions ( π 1 , π 2 ) = (0 . 3 , 0 . 7). No other cases are considered. Due to the in v ariance prop ert y of the m ultiv ariate normal distribution, t he distance b etw een the tw o mean ve ctors is the only configuration that can mak e a difference. Th us, we sim ulated only three pairs of mean v ectors represen ting the situatio n where tw o comp onen t mean v ectors are in near, mo derate, and distan t lo cations as in the follow ing table: near mo derate distant Comp onen t 1 (0, -1) (0, -3) (0, -5) Comp onen t 2 (0, 1) (0, 3) (0, 5) There are many features in the pa ir of cov ariance matrices that may ha ve an effect o n the p erfo rmance of the ra t ified MLE or PMLE. The sizes of the eigen v alues are most imp ortant in their r a tio λ 2 /λ 1 . The ang le θ determines the relativ e orien tation b et w een t w o comp onen t densities. Our c hoices based 8 on these considerations are giv en in the following t a ble: Comp onen t 1 Comp onen t 2 λ 1 λ 2 θ λ 1 λ 2 θ 1 1 1 0 1 1 0 2 1 5 0 1 1 0 3 1 5 π / 4 1 1 0 4 1 5 π / 2 1 1 0 5 1 5 π / 4 1 5 0 6 1 5 π / 2 1 5 0 Three-comp onent biv ariate normal mixture mo dels. W e set the com- p onen t prop ortions ( π 1 , π 2 , π 3 ) = ( . 15 , . 35 , . 50). The three mean v ectors may form a straigh t line, an acute tria ng le, o r an obtuse triangle. W e select thr ee represen tativ e ones as follo ws: straigh t acute obtuse Comp onen t 1 (0, -2) (0, -2) (0, -2) Comp onen t 2 (0, 0) (3, 0) (1, 0) Comp onen t 3 (0, 2) (0, 2) (0, 2) W e select six triplets of co v ariance ma t r ices as follo ws: Comp onen t 1 Comp onen t 2 Comp onen t 3 λ 1 λ 2 θ λ 1 λ 2 θ λ 1 λ 2 θ 1 1 1 0 1 1 0 1 1 0 2 1 1 0 1 1 0 1 5 0 3 1 1 0 1 5 0 1 5 π / 4 4 1 1 0 1 5 0 1 5 π / 2 5 1 5 0 1 5 π / 4 1 5 − π / 4 6 1 5 0 1 5 π / 4 1 5 − π / 2 Tw o-component triv ariate normal mixture mo dels. W e again let ( π 1 , π 2 ) = 9 (0 . 3 , 0 . 7). At the same time, only t he distance b etw een the t wo mean v ectors matters. The t w o mean vec tors are chosen to be: near mo derate distan t Comp onen t 1 (0, 0, -1) (0, 0, -3) (0, 0, -5) Comp onen t 2 (0, 0, 1) (0, 0, 3) (0, 0, 5) The co v ariance matrix pairs are c hosen as follo ws: Comp onen t 1 Comp onen t 2 ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) 1 (1, 1, 1) (0, 0, 0) (1, 1, 1) (0, 0, 0) 2 (1, 1, 1) (0, 0, 0) (1, 3, 10) (0, 0, 0) 3 (1, 3, 10) (0, 0, 0) (1 , 3, 10) (0, 0, 0) 4 (1, 3, 10) (0, 0, 0) (1 , 3, 10) ( − π , π , π )/3 5 (1, 3, 10) (0, 0, 0) (1 , 3, 10) ( π , − π , π )/3 6 (1, 3, 10) (0, 0, 0) (1 , 3, 10) ( π , π , − π )/3 Three-comp onent triv ariate normal mixture mo dels. W e let the com- p onen t prop ortions ( π 1 , π 2 , π 3 ) b e ( . 15 , . 35 , . 50). Recall that any three p oin ts fall into one plane. Th us, the inv ariance prop erty of the normal distribution allo ws us to set the first en try o f the mean v ector to 0: straigh t acute obtuse Comp onen t 1 (0, 0, -2) (0, 0, -2) (0 , 0, -2) Comp onen t 2 (0, 0, 0) (0, 3, 0) (0, 1, 0) Comp onen t 3 (0, 0, 2) (0, 0, 2) (0, 0, 2) 10 The co v ariance matrix triplets are c hosen a s follo ws: Comp onen t 1 Comp onent 2 Comp onen t 3 ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) 1 (1, 1, 1) (0, 0, 0) (1, 1, 1) (0, 0, 0) (1, 1, 1) (0, 0, 0 ) 2 (1, 1, 1) (0, 0, 0) (1, 1, 1) (0, 0, 0) (1, 3, 10) ( 0, 0, 0) 3 (1, 1, 1) (0, 0, 0) (1, 3, 10) (0, 0, 0) (1, 3, 10) ( − π , π , π )/3 4 (1, 1, 1) (0, 0, 0) (1, 3, 10) (0, 0, 0) (1, 3, 10) ( π , − π , π )/3 5 (1, 3, 10) (0, 0, 0) (1, 3, 10) ( − π , π , π )/3 (1, 3, 1 0 ) ( π , − π , π )/3 6 (1, 3, 10) (0, 0, 0) (1, 3, 10) ( π , − π , π )/3 (1, 3, 1 0 ) ( π , π , − π )/3 W e let n = 200 for the t wo-compo nen t biv ariate mixtures and n = 30 0 fo r the other mixtures to ensure a reasonable estimation of the mixing distribution. W e generate 1000 data sets for eac h mo del. W e ha v e presen t ed four categories o f finite normal mixture mo dels. F or ease of reference we use, for example, I.1.2 to refer to the mo del from Category I with mean v ector configuration 1 and co v ariance matrix configuration 2. Ev en though there are man y mo r e mixing distribution configurations for whic h sim ulation studies a r e needed, there is a limit to how m uc h one pap er can ac hieve . W e do not consider the case where p is unkno wn. All estimators in this case are expected to b e p o or although the consistency result for the PMLE remains true. P enalt y term and initial v alues. W e compute the ratified MLE and tw o p enalized MLEs corresp onding to a n = n − 1 and a n = n − 1 / 2 in (2). W e call these MLE, PMLE1, and PMLE2, resp ectiv ely . The ten initial v a lues are chose n fr o m t w o groups. The first group of ini- tial v alues includes the true mixing distribution and four others obtained b y p erturbing t he comp onent mean ve ctors o f the true mixing distribution. The second group of initial v alues w as data-based. W e first calculate the sample mean vec tor and the sample co v ariance matrix. Then we set the mixing pro- p ortions all equal to 1 /p and the comp onen t cov ariance matrices all equal to the sample cov ariance matrix. W e then apply similar p erturbat io n to the sample mean v ector to obtain another fiv e sets of initial v alues. 11 3.2 Simulation r es ults Num b er of Degeneracies. When the EM-algorithm conv erges to a mix- ing distribution with singular comp onen t co v aria nce matrices, w e sa y that it degenerates. The EM-algorithm for the PMLE do es not degenerate whic h is theoretically ensured. Regardless of the qualit y of the initial v alue, the corre- sp onding EM-alg o rithm alw a ys con verges to some non-degenerate lo cal max- im um. The PMLE is a go o d estimator if the largest lo cal maxim um is a go o d estimator. When computing the rat ified MLE, the EM-a lg orithm sometimes conv erges t o a degenerate lo cal maxim um. W e recorded the n um b er of times that the EM- algorithm degenerated while computing the ratified MLE in our sim ulation. Since eac h data set had ten initial v alues, the n umber of degenerate outcomes is out of 10,000 for eac h en try . F or tw o-comp onent biv ariate normal mixture mo dels, it is immediately clear that the n um b er of degenerate o ut comes increases when the mean v ectors are more widely separated. The co v ariance structure is also imp or t a n t. F or exam- ple, when the eigen v ectors of one cov ariance matrix are rotated by a n angle of π / 2 (v ariance configurations 4 and 6), so that the tw o clusters o f o bserv ations b ecome more mixed, the num b er of degenerate outcomes declines. This ob- serv at ion is somewhat coun ter-in tuitiv e but can b e explained as follows. The success of the EM-algorithm is hea vily dep enden t on sensible initial v alues. When the tw o mean v ectors are close and the comp onen ts are w ell mixed, differen t initial v alues do not matter as m uc h. How ev er, when the tw o mean v ectors are distan t, the lo catio n of the initial mean v ectors is crucial. Th us the degenerate outcomes w ere mostly due to the second group of initial v alues. In the other three categories, the ab ov e phenomenon p ersists. That is, the fre- quency of degeneracy increases when comp onen ts are more widely separated. In addition, for these categories w e observ e a higher frequency of degeneracies on av erage. W e b eliev e this is b ecause the EM-algorithm is more sensitiv e to the qualit y of the initial v alues when the mixture mo dels are more complicated. Degeneracy of the EM-algor it hm should not b e a serious problem for the ratified MLE, as long a s the non-degenerate outcomes o f the algorithm pro vide go o d estimates. W e hence pro ceed to examine the bias and v ariance prop erties of the PMLE and the largest non-degenerate lo cal ma xima regarded a s the ratified MLE. Bias and St andard Deviation. W e compute the elemen t-wise mean bias and standard deviation based on 1000 sim ulated samples fro m eac h mo del. W e presen t only a subset of represen tat ive outcomes from each category; t he complete set is a v aila ble up on request. 12 Tw o represen ta tiv e outcomes fo r mo dels I.1.1 and I.2.4 in Category I are giv en in T able 2. There is ab out a 10 % reduction in the standar d deviation for PMLE2 compared to the ratified MLE or PMLE1 for the pa rameters in comp onen t 1 of Mo del I.1.1. The same is true for Mo dels I.1.5 and I.1.6 (not presen ted). The PMLE2 also has a relativ ely lo w er bias in these mo dels. The results for the remaining mo dels a r e comparable to those for I.2.4: there is little appreciable difference b etw een the three estimation methods. The biases of all t hr ee estimators for estimating µ 2 are high under I.1.1 and I.1.5 in whic h the tw o mean v ectors are lined up in the µ 1 direction. Due to the orien t ation o f the t w o comp onen t co v ariance matrices, it is ha r d to tell the t wo mean v ectors a part. The biases and standard deviations for estimating σ 22 under I.1.1, I.1.2, . . . , I.1.6 are also hig h or relativ ely high. T able 2 ab out here. W e presen t outcomes f o r t w o mo dels ( I I.1.1, I I.2.4 ) in Categor y I I in T ables 3 and 4. F or b oth mo dels, fo r the parameters in comp onen t 1, there is a 10% t o 20% reduction in the standard deviation for PMLE2 compared to the other tw o estimators. The bias of PMLE2 is also lo w er. Some reductions in comp onen ts 2 and 3 are also noticed but to v arying degrees. In the other mo dels, the p erformance of PMLE2 do es not dominate that of the ratified MLE or PMLE1. Under a straigh t-line configuration o f the comp onent mean vec tors, the bia s for estimating µ 2 is relativ ely high. F or a tria ng le configuration, the roles o f µ 1 and µ 2 are no longer differen t. This bias problem is not estimator dep enden t although PMLE2 helps sligh tly . The estimation of σ 22 again comes with b oth higher bias and higher standard deviation in general. F or this category of mo dels, the problem spreads into other parts of the co v ariance matrix. T ables 3, 4 ab out here. W e rep ort sim ulatio n results f or three mo dels (I I I.1.1, I I I.2.4, I I I.3.6 ) in Cate- gory I I I in T ables 5, 6, and 7. W e again observ e that PMLE2 has smaller bias and standard deviation for estimating the parameters in the first comp onen t where the mixing pro p ortion is small, and in mo del I I I.1.1 where the tw o mean v ectors are close. The gain is as m uc h as 30% for σ 33 . The gains seem t o disapp ear when the tw o comp onen t mean v ectors are far from each other. Nev ertheless, PMLE2 still app ears to b e the b est estimator in terms of b oth bias and standard deviation. 13 T ables 5, 6 7 abo ut here. W e rep ort sim ulation results fo r three mo dels (IV.1.1, IV.2.4, IV.3.6) in Cat- egory IV in T ables 8, 9, and 10. Again, PMLE2 has the low est standard devi- ations fo r estimating the parameters in the first comp onent where the mixing prop ortion is small. The comparison is the sharp est in mo del IV.2.4 for σ 13 . In con tra st to the mo dels f or the other categor ies, here the sup eriority of PMLE2 is widespread. In fact, PMLE2 is sup erior for parameters in comp onent 2, and mixed for parameters in comp onen t 3. W e caution that ev en the b est estimator is not necessarily a go o d estimator f o r triv ariate mixture mo dels. Ov erall, none o f the three estimators do es a great job at estimating mixing distributions, p ossibly due to their fundamen tal nature, e.g., small Fisher Information for high-dimension multiv ariate normal mixture mo dels. This pro blem is exp ected to disapp ear with increased sample size. T ables 8, 9 10 ab out here. Summary of the sim ulation results . T o conclude, the p enalized lik eliho o d estimators, b o t h PMLE1 and PMLE2, are completely free fro m degeneracy problems. Moreo v er, PMLE2 has the b est general p erformance in terms of bias and standard deviation. This is most ob vious when the comp onen ts are not w ell separated. In applications, it is unnecessary to first judge whether it is safe to use the ra tified MLE, when a superior PMLE2 is av ailable. Although we do not completely dismiss the use of the ratified MLE, it is clearly adv antageous to use PMLE2 outr ig h t. W e further caution against the use of high-dimension m ultiv ariat e normal mixture mo dels in practice when the sample size is not large. In these situations, ev en the b est p erforming estimator may not b e a go o d estimator. References [1] R. Alexandrid is, S. Lin, M. Irwin, Class disco v ery and classification of tum or samples us ing mixture mo d eling of gene expr ession data − a un ified approac h , Bioinformatics 20 (2004 ) 254 5-2552 . [2] J. Chen, X. T an, R . Zh ang, Inf er en ce for normal mixtur e in mean and v ariance, In pr ess, Statistica S inica (2008). [3] N. E. Da y , Estimating the comp onents of a mixture of normal distr ib utions, Biometrik a 56 (19 69) 463- 474. 14 [4] P . B. Eggermon t, V. N. LaRiccia, Maxim um P enalized Like liho o d Estimation, V olume I. New Y ork: Springer, 2001 . [5] C. F raley , A. E. Raftery , Ho w many clusters? Whic h clustering metho d? Answ ers via mo d el-based cluster analysis, The Computer Journal 41 (1998) 578-5 88. [6] S. F ruhwirth-Sc hnatter, Finite Mixture a nd Mark o v Switc hing Mo dels, Springer, 2006. [7] P . J. Green, On u se of th e EM algorithm for p enalized likelihoo d estimation, J . Ro y . Statist. Soc. Ser. B 52 (1990) 443-4 52. [8] R. J. Hatha wa y , A constrained formula tion of maximun-lik eliho o d estimation for n ormal mixture d istributions, Ann . Statist. 13 (198 5) 795- 800. [9] S. Ingrassia, A lik eliho o d-based constrained algorithm for m ultiv ariate normal mixture mod els, Statistical Metho d s & Ap plications 13 (2004) 151-1 66. [10] J. K iefer, J. W olfo witz, Consistency on the maxim u m likel iho o d estimator in the presence of infinitely many incident al parameters, Ann. Math. Statist. 27 (1956 ) 887-90 6. [11] E. L . Lehmann, Theory of Poin t Estimation, John Wiley & S ons, 1983. [12] S. Lin, S. Bisw as, On mo delling lo cus h eterogeneit y u sing mixture distributions, BMC Genetics 5 (2004) 29. [13] B. G. Lindsay , Mixture Mo dels: Theory , Geometry and Applications, Ha yw ard: Institute for Mathemati cal Statistics, 1995. [14] B. G. Lindsay , P . Basak, Multiv ariate normal mixtures: A fast consistent metho d of momen ts, J. Amer. Statist. Asso c. 86 (19 93) 468 -476. [15] G. J. MacLac hlan, D. Pe el, Finite Mixture Mo dels, Wiley , New Y ork, 2000. [16] A. E. Raftery , N. Dean, V ariable selectio n for m o del-based clustering, J. Amer. Statist. Asso c. 101 (20 06) 168-178. [17] S. Ra y , B. G. Lindsa y , The topography of multiv ariate normal mixtures, Ann. Statist. 33 (200 5) 2042 -2065 . [18] R. A. Ric hard, H. F. W alk er, Mixtur e d ensities, maxim u m likel iho o d and the EM algorithm, SIAM Rev. 26 (198 4) 195-2 39. [19] N. Schork, D. Allison, B. T hiel, Mixtur e distributions in h uman genetics researc h, Stat. Met ho ds Med. Res 5 (19 96) 155 -178. 15 [20] M. T adesse, N. Sh a, M. V annucci, Ba y esian v ariable selection in clustering high- dimensional d ata, J. Amer. Statist. Asso c 100 (2005) 602-617. [21] X. T an, J. Ch en , R. Zhang, Consistency of the constrained maximum lik eliho o d estimator in fi nite normal mixture mo dels, 2007 Pro ceedings of the American Statistical Asso ciation [CD-R OM], Alexandria, V A: American Statistical Asso ciation (2007) 211 3-2119 . [22] D.M. Titterington, A.F.M. Smith, U.E. Mak ov, Statistical Analysis of Finite Mixture Distribu tions, Ch ic h ester: Wiley , 1985. [23] A. W ald, Note on the consistency of the m axim u m likel iho o d estimate, Ann. Math. S tatist. 20 (1949) 595 -601. [24] C.-F. W u, On the con v ergence prop erties of the EM algorithm, Ann . Statist. 11 (1983) 95-103. 16 App endix The ordinary like liho o d f unction is unbounded b ecause when the co v ariance matrix of a k ernel densit y b ecomes close to singular, the lik eliho o d contribution of the observ at io ns near its mean vector g o es to infinit y . Th us, a k ey step in our pro of is to assess the n um b er of suc h observ ations. In the univ ariate case, Chen et al. [2] obtained the following result: Lemma 1 : Assume that x 1 , x 2 , . . . , x n is a r andom sample fr om a fin ite nor- mal mixtur e distribution with density f ( x ) , x ∈ R . L et F n b e the empiri- c al distribution function an d de fi ne M = max { sup x f ( x ) , 8 } , and δ n ( σ ) = − M σ log ( σ ) + n − 1 . Exc ept for a zer o-pr ob abi l i ty event not de p ending on σ , we have for al l lar ge e n ough n , (a) for σ b etwe en exp( − 2) a n d 8 / ( nM ) , sup µ [ F n ( µ − σ log ( σ )) − F n ( µ )] ≤ 4 δ n ( σ ); (b) for σ b etwe en 0 and 8 / ( nM ) , sup µ [ F n ( µ − σ log σ ) − F n ( µ )] ≤ 2 n − 1 (log n ) 2 . The consistency result for the m ultiv ariate normal mixture mo del is built o n a generalized r esult. More sp ecifically , the follo wing lemma giv es a b ound for the m ultiv ariat e no r ma l mixture mo del: Lemma 2 : L et x 1 , x 2 , · · · , x n b e a r andom sample fr om a d -dime n sional multi- variate no rmal m ixtur e mo del with p c omp onents such that its den sity function is given by f ( x , G 0 ) = p X j =1 π j 0 ϕ ( x ; µ j 0 , Σ j 0 ) . Assume that al l Σ j 0 ar e p ositive definite. F or any me an and c ovarian c e matrix p air ( µ, Σ) s uch that | Σ | < exp( − 4 d ) , exc ept for a zer o pr ob ability event n ot dep ending on ( µ, Σ) , we have, for n lar ge enough, that H n ( µ, Σ) = n X i =1 I { ( x i − µ ) τ Σ − 1 ( x i − µ ) ≤ − (log | Σ | ) 2 } ≤ 4(log 2 n ) I ( | Σ | ≤ α n ) + 8 nδ n ( | Σ | ) I ( α n ≤ | Σ | ) , wher e ( α n = (4 / M d ) 2 d n − 2 d , δ n ( | Σ | ) = − M | Σ | 1 / 2 d log | Σ | + n − 1 , and M = max { 8 , λ − 1 / 2 0 } with λ 0 b eing the smal lest eigenvalue among those of Σ j 0 , ( j = 1 , 2 , . . . , p ) . 17 Pro of of Lemma 2: Let 0 < λ 1 ≤ λ 2 ≤ · · · ≤ λ d and ( a 1 , . . . , a d ) b e the eigen v alues and corresp onding eigenv ectors of unit length of Σ. W e hav e that { x : ( x − µ ) τ Σ − 1 ( x − µ ) ≤ − (log | Σ | ) 2 } = { x : d X j =1 λ − 1 j | a τ j ( x − µ ) | 2 ≤ − (log | Σ | ) 2 } ⊆ { x : | a τ j ( x − µ ) | ≤ − q λ j log | Σ | , j = 1 , . . . , d } ⊆ { x : | a τ 1 ( x − µ ) | ≤ − q λ 1 log | Σ |} . F urthermore, let Q = { b i : i = 1 , 2 , . . . } b e a sequence of unit v ectors so that Q forms a dense subset of unit v ectors in R d . Hence, for an y given a 1 and an y b ounded subset B ∈ R d , w e can find a v ector b in Q such that they are arbitrarily close so that { x ∈ B : | a τ 1 ( x − µ ) | ≤ − q λ 1 log | Σ |} ⊆ { x ∈ B : | b τ ( x − µ ) | ≤ − q 2 λ 1 log | Σ |} . Based on this observ a tion, w e get sup µ H n ( µ, Σ) = sup µ n X i =1 I { ( x i − µ ) τ Σ − 1 ( x i − µ ) ≤ − (log | Σ | ) 2 } ≤ sup b ∈ Q sup µ n X i =1 I {| b τ ( x i − µ ) | ≤ q 2 λ 1 | log | Σ ||} . On the other hand, g iv en an y non-random unit v ector b , b τ x i , i = 1 , 2 , . . . , n is a random sample from the univ ariate normal mixture mo del with density f b ( x ) = p X j =1 π j 0 φ ( x ; b τ µ j 0 , b τ Σ j 0 b ) . W e remark that since some pairs of ( b τ µ j 0 , b τ Σ j 0 b ) can b e equal, this uni- v ariate mixture distribution can ha ve few er than p comp onents . This do es not affect the follow ing deriv ation. Recall that λ 0 is the smallest eigenv alue a mo ng those of Σ j 0 , j = 1 , . . . , p . Then sup b ∈ Q sup x f b ( x ) ≤ sup b ∈ Q max { ( b τ Σ j 0 b ) − 1 / 2 , j = 1 , . . . , p } = λ − 1 2 0 . Applying Lemma 1 to the univ ariate data b τ x i , i = 1 , . . . , n , except fo r a zero-ev ent not dep ending on Σ, as n → ∞ , w e hav e 18 sup µ n X i =1 I {| b τ ( x i − µ ) | ≤ q λ 1 | log | Σ ||} ≤ 4(log 2 n ) I ( | Σ | ≤ α n ) + 8 nδ n ( | Σ | ) I ( α n ≤ | Σ | ) . The conclusion of the lemma simply claims that the ab ov e inequalit y is true o v er a ll b ∈ Q with only a zero-probability -ev ent exception. The zero-probabilit y claim remains true b ecause Q is countable. Pro of of Theorem 1 : W e g ive a pro of f or the case p = 2; the pro o f for the general case is similar. Let Γ b e the parameter space for G and define Γ 1 = { G ∈ Γ : | Σ 1 | ≤ | Σ 2 | ≤ ε 0 } Γ 2 = { G ∈ Γ : | Σ 1 | ≤ τ 0 , | Σ 2 | ≥ ε 0 } Γ 3 = Γ − (Γ 1 ∪ Γ 2 ) where ε 0 > τ 0 > 0 are t w o small p ositive constan ts to b e sp ecified so on. The first subspace represen ts the case where the t w o comp o nen ts hav e nearly singular cov ariance matrices. Hence the observ ations inside the small ellipse cen tered at the mean parameter mak e a large con t r ibution to the log lik eliho o d function. Let K 0 = E { log f ( X ; G 0 ) } . The constan ts ε 0 , τ 0 m ust satisfy the follo wing four conditions: 1: 0 < ε 0 < exp {− 4 d } ; 2: − log ε 0 − (log ε 0 ) 2 ≤ 4( K 0 − 2); 3: 16 M ε 1 / 2 d 0 (log ε 0 ) 2 ≤ 1; 4: 16 M dτ 0 (log τ 0 ) 2 ≤ 2 5 δ 0 ; for some δ 0 > 0 to b e sp ecified. The existe nce of ε 0 , τ 0 is ob vious. W e pro ceed with the pro o f in three steps. Step 1. F or an y G ∈ Γ 1 , w e sho w that a lmost surely , sup Γ 1 pl n ( G ) − pl n ( G 0 ) → −∞ . Define t wo index sets A = { i : ( x i − µ 1 ) τ Σ − 1 1 ( x i − µ 1 ) ≤ (log | Σ 1 | ) 2 } , B = { i : ( x i − µ 2 ) τ Σ − 1 2 ( x i − µ 2 ) ≤ (log | Σ 2 | ) 2 } , 19 and for an y index set S ∈ { 1 , 2 , . . . , n } , denote l n ( G ; S ) = X i ∈ S log f ( X i , G ) . W e can write l n ( G ) = l n ( G ; A ) + l n ( G ; A c B ) + l n ( G ; A c B c ), where A c and B c are the complemen t sets of A and B resp ectiv ely . F or an y index set S , denote n ( S ) as its cardinality . It is easy to see that l n ( G ; A ) ≤ n ( A ) log | Σ 1 | − 1 2 , l n ( G ; B ) ≤ n ( B ) log | Σ 2 | − 1 2 . Applying Lemma 2 to n ( A ) and n ( B ), noting that | Σ 1 | ≤ ǫ 0 for G in Γ 1 , and C3 on the p enalt y function, w e find that l n ( G ; A ) + e p n (Σ 1 ) ≤ 16 d log n + 8 M ε 1 2 d 0 (log ε 0 ) 2 n l n ( G ; A c B ) + e p n (Σ 2 ) ≤ 16 d log n + 8 M ε 1 2 d 0 (log ε 0 ) 2 n. The k ey p oin t underlying the ab o v e t wo inequalities is t ha t they are bounded b y an arbitrarily small fraction of n . F urther, for observ ations aw a y fro m µ 1 and µ 2 , w e hav e l n ( G ; A c B c ) ≤ X i ∈ A c B c log[ π 1 exp { log | Σ 1 | − 1 2 − 1 2 (log | Σ 1 | ) 2 } + π 2 exp { log | Σ 2 | − 1 2 − 1 2 (log | Σ 2 | ) 2 } ] ≤ X i ∈ A c B c {− 1 2 log ε 0 − 1 2 (log ε 0 ) 2 } ≤ n ( K 0 − 2) The last line in the a b o v e deriv ation is obtained b y c ho osing a small enough ǫ 0 as specified earlier. Combining t hese inequalities , we get pl n ( G ) ≤ n ( K 0 − 1), and hence almost surely sup Γ 1 pl n ( G ) − pl n ( G 0 ) ≤ − n + 16 d log n. That is, sup Γ 1 pl n ( G ) − pl n ( G 0 ) → −∞ almost surely whic h completes the first step. Step 2. F or G ∈ Γ 2 , w e also show that almost surely sup Γ 2 pl n ( G ) − pl n ( G 0 ) → −∞ . 20 Recall that for eac h i ∈ A , ( x i − µ 1 ) τ Σ − 1 1 ( x i − µ 1 ) is b ounded b y (log Σ 1 ) 2 . Hence, it is easy to v erify that for i ∈ A , ϕ ( x i ; µ 1 , Σ 1 ) ≤ | Σ 1 | − 1 / 2 exp {− 1 4 ( x i − µ 1 ) τ Σ − 1 1 ( x i − µ 1 ) } . F or i 6∈ A , ϕ ( x i ; µ 1 , Σ 1 ) ≤ exp {− 1 4 ( x i − µ 1 ) T Σ − 1 1 ( x i − µ 1 ) } . Therefore, letting (not a densit y itself ) g ( x ; G ) = π 1 exp {− 1 4 ( x − µ 1 ) T Σ − 1 1 ( x − µ 1 ) } + π 2 ϕ ( x ; µ 2 , Σ 2 ) , w e ha ve log f ( x i ; G ) ≤ log g ( x i ; G ) + I ( i ∈ A ) log | Σ 1 | − 1 / 2 . Hence, w e get l n ( G ; A ) ≤ n ( A ) log | Σ 1 | − 1 2 + n X i =1 g ( x i ; G ) . It is ob vious that for any G ∈ Γ 2 , (a) E 0 { log g ( X ; G ) /f ( X ; G 0 ) } < 0 by Jensen’s inequalit y and the fact t ha t the integration of g ( x , G ) is less than 1; (b) g ( x ; G ) ≤ ε − 1 0 b y the definition of Γ 2 . Hence for each giv en G ∈ Γ 2 , b y the la w of large n umbers, 1 n n X i =1 log { g ( X i ; G ) /f ( X i ; G 0 ) } → E { g ( X ; G ) /f ( X ; G 0 ) } < 0 . F or each fixed x , w e can extend the definition of g ( x ; G ) in G on to the com- pacted Γ 2 while maintaining pro p erties (a ) and (b) and its contin uit y in G . Th us, a classical tec hnique a s in [23] can b e readily employ ed to show that a s n → ∞ , sup G ∈ Γ 2 ( 1 n n X i =1 log g ( X i ; G ) f ( X i ; G 0 ) !) → − δ ( τ 0 ) < 0 (3) for some decreasing function δ ( τ 0 ). Hence, it is p ossible to c ho ose a small enough τ 0 ≤ ǫ 0 , suc h that sup Γ 2 pl n ( G ) − pl n ( G 0 ) ≤ sup Γ 2 { n ( A ) log | Σ 1 | − 1 2 + p n ( G ) } + sup Γ 2 n X i =1 log ( g ( X i , G ) f ( X i , G 0 ) ) ≤ 8 M τ 0 (log τ 0 ) 2 n − 9 10 δ ( ǫ 0 ) n ≤ − 1 2 δ ( ǫ 0 ) n. 21 The first term of the third line ab o v e is from the assessmen t of n ( A ), C3 on p n ( G ). Note also that p n ( G 0 ) = o ( n ). Therefore, almost surely , sup Γ 2 pl n ( G ) − pl n ( G 0 ) → −∞ . Step 3. F rom the abov e tw o steps, w e kno w that ˜ G n ∈ Γ 3 with probability 1. A t the same time, when G ∈ Γ 3 , w e hav e p n ( G ) = o (1). By the definition o f the maxim um p enalized lik eliho o d estimator, w e hav e l n ( ˜ G n ) − l n ( G 0 ) ≥ p n ( G ) − p n ( G 0 ) = o (1) . (4) Since the parameter space Γ 3 is now completely regular, a n estimator with prop erty (4) is easily show n to b e consisten t b y the classical tec hnique [2 3] ev en with a p enalt y o f size o ( n ). ✷ Pro of of Theorem 3 : When p 0 < p < ∞ , we cannot exp ect that eve ry part of G con ve rges to that of G 0 . Instead, w e measure their difference as tw o distributions. Let H ( G, G 0 ) = Z R d ×A | G ( λ ) − G 0 ( λ ) | exp {−| λ |} dλ where λ = ( µ 1 , µ 2 , ..., µ d , σ 11 , σ 12 , σ 22 , ..., σ dd ) ∈ R d × A , | λ | = d X j =1 | µ j | + d X i =1 i X j =1 | σ ij | , and A is a subs et of R d × ( d +1) / 2 con ta ining all eligible com binations of d × ( d + 1) / 2 real n um b ers whic h form a symmetric p ositiv e definite matrix. It is w ell kno wn that A is an op en connected subset o f R d × ( d +1) / 2 and is regular enough al- though it ma y not b e easy to visualize its shap e. It can b e show n that H ( G n , G 0 ) → 0 implies G n → G 0 in distribution. An estimator ˜ G n is strongly consisten t if H ( ˜ G n , G 0 ) → 0 almost surely . Again, for the sake of clarity , we consider only the sp ecial case with p = 2 , p 0 = 1, that is, to fit a non-mixture multiv ariate normal mo del with a t w o- comp onen t multiv ariate nor ma l mixture mo del. The extension of o ur pro of to general situations is straigh tforward and the ma jor h urdle is merely a compli- cated prese n t a tion. Most in termediate conclusions in t he pro o f o f consistency of the PMLE when p = p 0 = 2 are still applicable; some need minor c hanges. W e use man y of these results and nota tions to establish a brief pro of. F or an arbitrarily small p ositiv e n umber δ , define H ( δ ) = { G : G ∈ Γ , H ( G, G 0 ) ≥ 22 δ } . That is, H ( δ ) contains all mixing distributions with up to p comp onen ts that are at least δ > 0 distance from the true mixing distribution G 0 . Since G 0 6∈ H ( δ ), w e ha v e E [log { g ( X ; G ) /f ( X ; G 0 ) } ] < 0 for an y G ∈ H ( δ ). Th us, (3) remains v a lid after b eing sligh tly revised as follows : sup G ∈H ( δ ) ∩ Γ 2 n − 1 n X i =1 log { g ( X i ; G ) /f ( X i ; G 0 ) } → − η ( τ ) for some p ositiv e η ( τ ) dep ending on Γ 2 . Because of this, the deriv atio ns in the pro of of Theorem 1 still apply after Γ k is replaced by H ( δ ) ∩ Γ k ( k = 1 , 2). That is, with prop er c ho ice o f ǫ 0 and τ 0 , we similarly get sup G ∈H ( δ ) ∩ Γ k pl n ( G ) − pl n ( G 0 ) → −∞ for k = 1 , 2. With what w e ha v e pro ved, it is seen that the p enalized maxim um lik eliho o d estimator of G , ˜ G n , mus t a lmost surely b elong to H c ( δ ) ∪ Γ 3 , where H c ( δ ) is the complemen t of H ( δ ). Since δ is arbitra rily small, ˜ G n ∈ H c ( δ ) implies H ( ˜ G n , G 0 ) → 0. On the other hand, ˜ G n ∈ Γ 3 is equiv alent to putting a p ositiv e lo w er b o und on the comp o nen t v ariances, whic h also implies H ( ˜ G n , G 0 ) → 0 b y [10]. That is, consistency of the PMLE is also true when p = 2 but p 0 = 1. A generalization of the ab ov e deriv a tion leads to the conclusion of Theorem 3. 23 T able 1 Num b er of Dege neracies Mean.V ar.Confi g 1 2 3 4 5 6 2-comp onen t biv ariate normal mixture near 0 11 19 5 40 8 mo derate 1911 3256 441 6 2 523 157 distan t 4997 4998 4966 4782 4998 4943 3-comp onen t biv ariate normal mixture straigh t 3049 5058 4947 1998 2306 2491 acute 2888 4505 4812 4052 4057 4561 obtuse 3253 4980 4983 2885 3022 3511 2-comp onen t triv ariate normal mixture near 1 4872 5003 4866 4961 1466 mo derate 4011 5000 5001 5000 5000 4900 distan t 5000 5000 5000 5000 5000 5000 3-comp onen t triv ariate normal mixture straigh t 5009 5010 5002 5002 5000 5000 acute 5006 5034 5000 5002 5000 5000 obtuse 5009 5038 5002 5004 5000 5001 24 T able 2 Bias (std) under 2-comp onen t biv ariate normal mixture m o dels. MLE PMLE1 PMLE2 Mo del I.1.1, co mp onent 1 π = 0 . 3 -0 .03 (0.11) -0.02 (0.11 ) -0.01 (0.10) µ 1 = 0 -0.16 (0.53 ) -0.16 (0.53) -0.13 (0.50 ) µ 2 = − 1 0.72 (1.17) 0.72 (1.17) 0.71 (1.14 ) σ 11 = 1 -0.14 (0.41) -0.14 (0. 40) -0.13 (0.37) σ 12 = 0 -0.01 (0.39) 0.00 (0.38) 0.00 (0.3 4) σ 22 = 1 -0.03 (0.71) -0.03 (0. 70) -0.01 (0.64) Mo del I.1.1, co mp onent 2 π 2 = 0 . 7 0.03 (0.11) 0.02 (0.11) 0.01 (0.10 ) µ 1 = 0 0.04 (0.19) 0.04 (0.19) 0.04 (0.19 ) µ 2 = 1 -0.39 (0.47 ) -0.39 (0.47) -0.37 (0.48 ) σ 11 = 1 -0.07 (0.18) -0.07 (0. 18) -0.07 (0.18) σ 12 = 0 0.00 (0.19 ) 0.00 (0.19) 0.00 (0.19) σ 22 = 1 0.33 (0.44 ) 0.33 (0.44) 0.30 (0.43) Mo del I.2.4, co mp onent 1 π 1 = 0 . 3 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) µ 1 = 0 -0.02 (0.28 ) -0.02 (0.28) -0.02 (0.28 ) µ 2 = − 3 -0.01 (0. 13) -0.01 (0.1 3) -0.01 (0.13) σ 11 = 5 -0.04 (0.93) -0.04 (0. 93) -0.04 (0.93) σ 12 = 0 0 .00 (0.30) 0.00 (0.30) 0.00 (0.30) σ 22 = 1 -0.02 (0.19) -0.02 (0. 19) 0 .00 (0.19) Mo del I.2.4, co mp onent 2 π 2 = 0 . 7 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) µ 1 = 0 0.00 (0.09) 0.00 (0.09) 0.00 (0.09) µ 2 = 3 0.00 (0.09) 0.00 (0.09) 0.00 (0.09) σ 11 = 1 -0.01 (0.12) -0.01 (0. 12) -0.01 (0.12) σ 12 = 0 0 .00 (0.08) 0.00 (0.08) 0.00 (0.08) σ 22 = 1 0 .00 (0.12) 0.00 (0.12) 0.00 (0.12) 25 T able 3 Bias (std) under 3-comp onen t biv ariate normal mixture m o dels. MLE PMLE1 PMLE2 Mo del I I.1.1, comp onent 1 π = 0 . 15 -0.10 (0.06) -0.08 (0.07) -0.04 (0.07) µ 1 = 0 0.69 (1.15 ) 0.58 (1.2 8) 0.25 (1.01) µ 2 = − 2 1.17 (2.48) 1.15 (2.32) 1.24 (1.94) σ 11 = 1 -0.33 (0.91) -0.46 (0.60 ) -0.33 (0.52) σ 12 = 0 -0.04 (0.54) -0.02 (0.46 ) 0.02 (0. 48) σ 22 = 1 -0.22 (1.16) -0.22 (1.01 ) 0.12 (1. 01) Mo del I I.1.1, comp onent 2 π 2 = 0 . 35 -0.02 (0.10 ) -0.02 (0.10 ) -0.03 (0.08) µ 1 = 0 -0 .10 (0.3 9) -0.08 (0.3 8) -0.06 (0.39 ) µ 2 = 0 0.61 (1.54 ) 0.63 (1.5 3) 0.56 (1.44) σ 11 = 1 -0.13 (0.29) -0.13 (0.30 ) -0.14 (0.31) σ 12 = 0 0.02 (0.32) 0.01 (0.33) 0.02 (0.34) σ 22 = 1 0.24 (0.70) 0.20 (0.71) 0.22 (0.69) Mo del I I.1.1, comp onent 3 π 3 = 0 . 5 0.11 (0.11) 0.10 (0.12) 0.06 (0.10) µ 1 = 0 0.02 (0.20 ) 0.01 (0.2 1) 0.01 (0.24) µ 2 = 2 -1.23 (0.90 ) -1.16 (0.89 ) -1.02 (0.89) σ 11 = 1 -0.08 (0.16) -0.08 (0.17 ) -0.10 (0.19) σ 12 = 0 0.03 (0.26) 0.03 (0.27) 0.00 (0.28) σ 22 = 1 0.86 (0.68) 0.81 (0.70) 0.65 (0.67) 26 T able 4 Bias (std) under 3-comp onen t biv ariate normal mixture m o dels. Mo del I I.2.4, comp onent 1 π 1 = 0 . 15 0.00 (0.04) 0.01 (0.0 4) 0.01 (0.03) µ 1 = 0 0.23 (0.86) 0.18 (0.7 4) 0.19 (0.72) µ 2 = − 2 0.12 (0.83) 0.11 (0.6 3) 0.11 (0.54) σ 11 = 1 0.07 (0.6 9) 0.06 (0.60 ) 0.10 (0. 59) σ 12 = 0 -0.05 (0.54) -0.03 (0.40 ) -0.04 (0.38) σ 22 = 1 0.17 (0.9 9) 0.18 (0.95 ) 0.20 (0. 90) Mo del I I.2.4, comp onent 2 π 2 = 0 . 35 -0.01 (0.05 ) -0.01 (0.05 ) -0.01 (0.05) µ 1 = 3 -0.43 (1.12 ) -0.40 (1.09 ) -0.38 (1.08) µ 2 = 0 0.15 (0.82) 0.14 (0.8 0) 0.13 (0.79) σ 11 = 1 0.37 (1.1 2) 0.33 (1.05 ) 0.31 (1. 03) σ 12 = 0 -0.01 (0.35) -0.02 (0.34 ) -0.03 (0.37) σ 22 = 5 -0.69 (1.60) -0.65 (1.57 ) -0.62 (1.55) Mo del I I.2.4, comp onent 3 π 3 = 0 . 5 0.00 (0.05) 0.00 (0.05) 0.00 (0.0 5) µ 1 = 0 0.33 (0.88) 0.31 (0.8 8) 0.30 (0.87) µ 2 = 2 -0.19 (0.57 ) -0.17 (0.53 ) -0.16 (0.51) σ 11 = 5 -0.38 (1.31) -0.36 (1.31 ) -0.36 (1.30) σ 12 = 0 0.00 (0.2 8) -0.01 (0.26 ) -0.01 (0.27) σ 22 = 1 0.37 (1.1 5) 0.34 (1.11 ) 0.33 (1. 08) 27 T able 5 Bias (std) under 2-comp onen t triv ariate normal mixtur e mo dels. MLE PMLE1 PMLE2 Mo del I I I.1.1, comp onen t 1 π 1 = 0 . 3 -0.09 (0.15) -0.08 (0. 15) -0.05 (0.14) µ 1 = 0 -0.28 (0.61 ) -0.26 (0.58) -0.17 (0.51 ) µ 2 = 0 -0.15 (0.58 ) -0.14 (0.57) -0.09 (0.52 ) µ 3 = − 1 0.52 (0.09) 0.54 (0.11) 0.61 (0.09 ) σ 11 = 1 -0.12 (0.47) -0.11 (0. 46) -0.11 (0.36) σ 12 = 0 -0.01 (0.38) 0.00 (0.35) 0.02 (0.2 7) σ 13 = 0 -0.10 (0.48) -0.10 (0. 47) -0.07 (0.37) σ 22 = 1 -0.09 (0.56) -0.11 (0. 47) -0.13 (0.36) σ 23 = 0 -0.04 (0.49) -0.02 (0. 47) -0.01 (0.37) σ 33 = 1 0.22 (0.91 ) 0.18 (0.83) 0.12 (0.66) Mo del I I I.1.1, comp onen t 2 π 2 = 0 . 7 0.09 (0.15) 0.08 (0.15) 0.05 (0.14 ) µ 1 = 0 0.01 (0.15) 0.01 (0.15) 0.01 (0.16 ) µ 2 = 0 0.02 (0.15) 0.02 (0.15) 0.02 (0.17 ) µ 3 = 1 -0.45 (0.41 ) -0.44 (0.41) -0.42 (0.44 ) σ 11 = 1 -0.05 (0.13) -0.05 (0. 13) -0.05 (0.14) σ 12 = 0 0.00 (0.10 ) 0.00 (0.10) 0.00 (0.10) σ 13 = 0 -0.02 (0.13) -0.02 (0. 13) -0.02 (0.14) σ 22 = 1 0.03 (0.13 ) -0.03 (0.13) -0.04 (0.14 ) σ 23 = 0 0.01 (0.14 ) 0.01 (0.14) 0.01 (0.15) σ 33 = 1 0.44 (0.38 ) 0.43 (0.38) 0.39 (0.39) 28 T able 6 Bias (std) under 2-comp onen t triv ariate normal mixtur e mo dels. Mo del I I I.2.4, comp onent 1 π 1 = 0 . 3 0.00 (0.04) 0.00 (0.04) 0.00 (0.04) µ 1 = 0 0.01 (0.13) 0.01 (0.13) 0.01 (0.13) µ 2 = 0 0.01 (0.22) 0.01 (0.22) 0.01 (0.22) µ 3 = − 3 -0.03 (0.52) -0.03 (0.52 ) -0.04 (0.52) σ 11 = 1 -0.01 (0.17) -0.01 (0.17) -0.01 (0. 17) σ 12 = 0 -0.01 (0.20) -0.01 (0.20) -0.01 (0. 19) σ 13 = 0 0 .03 (0.45) 0.03 (0.45) 0.03 (0.4 5) σ 22 = 3 -0.05 (0.49) -0.05 (0.49) -0.04 (0. 49) σ 23 = 0 0 .00 (0.75) 0.00 (0.75) 0.01 (0.7 5) σ 33 = 10 -0.36 (2.10) -0.36 (2.1 1) -0.38 (2.09 ) Mo del I I I.2.4, comp onent 2 π 2 = 0 . 7 0.00 (0.04) 0.00 (0.04) 0.00 (0.04) µ 1 = 0 0.00 (0.15) 0.00 (0.15) 0.00 (0.15) µ 2 = 0 -0.01 (0.19) -0.01 (0.19) -0.01 (0. 19) µ 3 = 3 -0.01 (0.11) -0.01 (0.11) -0.01 (0. 11) σ 11 = 4 . 87 -0.03 (0.47) -0.03 (0.48) -0.03 (0.47 ) σ 12 = − 3 . 23 0.03 (0.49) 0.03 (0.49 ) 0.03 (0.48) σ 13 = − 0 . 5 0.01 (0.23) 0.01 (0.2 3) 0.01 (0.23) σ 22 = 7 . 2 -0.07 (0.71) -0.07 (0.72) -0.07 (0.7 1) σ 23 = 2 . 16 -0.02 (0.30) -0.02 (0.30) -0.02 (0.30 ) σ 33 = 1 . 94 -0.01 (0.22) -0.01 (0.22) 0.00 (0. 22) 29 T able 7 Bias (std) under 2-comp onen t triv ariate normal mixtur e mo dels. Mo del I I I.3.6, comp onent 1 π 1 = 0 . 3 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) µ 1 = 0 0.00 (0.10) 0.00 (0.10) 0.00 (0.10) µ 2 = 0 0.01 (0.19) 0.01 (0.19) 0.00 (0.19) µ 3 = − 5 0.01 (0.37) 0.01 (0.37) 0.01 (0.37) σ 11 = 1 -0.01 (0.15) -0.01 (0.15) -0.01 (0. 15) σ 12 = 0 0 .01 (0.18) 0.01 (0.18) 0.01 (0.1 8) σ 13 = 0 0 .02 (0.36) 0.02 (0.36) 0.02 (0.3 6) σ 22 = 3 -0.05 (0.45) -0.05 (0.45) -0.04 (0. 45) σ 23 = 0 -0.02 (0.64) -0.02 (0.64) -0.02 (0. 64) σ 33 = 10 -0.06 (1.81) -0.06 (1.8 1) -0.06 (1.80 ) Mo del I I I.3.6, comp onent 2 π 2 = 0 . 7 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) µ 1 = 0 0.00 (0.15) 0.00 (0.15) 0.00 (0.15) µ 2 = 0 0.00 (0.19) 0.00 (0.19) 0.00 (0.19) µ 3 = 5 0.00 (0.10) 0.00 (0.10) 0.00 (0.10) σ 11 = 4 . 87 -0.05 (0.46) -0.05 (0.46) -0.05 (0.46 ) σ 12 = 3 . 23 -0.03 (0.46) -0.03 (0.46) -0.03 (0.46 ) σ 13 = − 0 . 5 0.00 (0.22) 0.00 (0.2 2) 0.00 (0.22) σ 22 = 7 . 2 -0.02 (0.70) -0.02 (0.70) -0.03 (0.7 0) σ 23 = − 2 . 16 -0.01 (0.29 ) -0.01 (0.29) -0.01 (0.2 9) σ 33 = 1 . 94 -0.01 (0.20) -0.01 (0.20) 0.00 (0. 20) 30 T able 8 Bias (std) under 3-comp onen t triv ariate normal mixtur e mo dels. MLE PMLE1 PMLE2 Mo del IV.1.1, comp onen t 1 π 1 = 0 . 15 -0.05 (0.07 ) -0.06 (0.07 ) -0.01 (0.07) µ 1 = 0 0.10 (0.64) 0.28 (0.9 7) 0.12 (0.69) µ 2 = 0 -0.08 (0.64 ) 0.11 (0.9 7) -0.04 (0.65) µ 3 = − 2 3.07 (2.16) 2.65 (2.1 7) 2.16 (1.89) σ 11 = 1 -0.05 (0.73) -0.25 (0.63 ) -0.19 (0.47) σ 12 = 0 0.07 (0.5 0) 0.05 (0.40 ) 0.04 (0. 35) σ 13 = 0 -0.01 (0.58) 0.00 (0.5 1) 0.00 (0.48) σ 22 = 1 -0.04 (0.74) -0.23 (0.63 ) -0.16 (0.47) σ 23 = 0 0.03 (0.5 1) 0.03 (0.47 ) 0.04 (0. 43) σ 33 = 1 -0.01 (1.16) 0.01 (1.1 9) 0.31 (1.05) Mo del IV.1.1, comp onen t 2 π 2 = 0 . 35 -0.05 (0.09 ) -0.07 (0.11 ) -0.05 (0.09) µ 1 = 0 -0.05 (0.33 ) -0.10 (0.45 ) -0.02 (0.37) µ 2 = 0 0.04 (0.33) -0.02 (0.4 3) 0.01 (0.34) µ 3 = 0 0.00 (1.47) 0.02 (1.5 2) 0.26 (1.42) σ 11 = 1 -0.09 (0.26) -0.12 (0.32 ) -0.11 (0.29) σ 12 = 0 0.02 (0.2 0) 0.01 (0.23 ) 0.02 (0. 21) σ 13 = 0 -0.05 (0.32) -0.05 (0.41 ) -0.03 (0.35) σ 22 = 1 -0.09 (0.28) -0.11 (0.30 ) -0.11 (0.28) σ 23 = 0 0.02 (0.3 3) -0.01 (0.37 ) 0.01 (0.3 3) σ 33 = 1 0.46 (0.8 3) 0.48 (0.93 ) 0.46 (0. 84) Mo del IV.1.1, comp onen t 3 π 3 = 0 . 5 0.10 (0.12) 0.13 (0.15) 0.06 (0.1 2) µ 1 = 0 0.01 (0.19) 0.00 (0.1 8) 0.00 (0.21) µ 2 = 0 -0.01 (0.18 ) -0.01 (0.17 ) 0.00 (0.21 ) µ 3 = 2 -0.96 (0.81 ) -1.00 (0.79 ) -0.97 (0.86) σ 11 = 1 -0.07 (0.17) -0.07 (0.17 ) -0.08 (0.19) σ 12 = 0 0.01 (0.1 2) 0.00 (0.11 ) 0.01 (0. 13) σ 13 = 0 -0.04 (0.22) -0.04 (0.22 ) -0.04 (0.24) σ 22 = 1 -0.06 (0.16) -0.06 (0.16 ) -0.07 (0.18) σ 23 = 0 0.04 (0.2 2) 0.03 (0.22 ) 0.03 (0. 25) σ 33 = 1 0.76 (0.7 2) 0.88 (0.77 ) 0.75 (0. 76) 31 T able 9 Bias (std) under 3-comp onen t triv ariate normal mixtur e mo dels. MLE PMLE1 PMLE2 Mo del IV.2.4, comp onen t 1 π 1 = 0 . 15 0.00 (0.05) 0.00 (0.04) 0.01 (0.04) µ 1 = 0 0.04 (0.43) 0.04 (0.37) 0.02 (0.29) µ 2 = 0 0.20 (0.96) 0.20 (0.90) 0.24 (0.88) µ 3 = − 2 0.19 (0.86) 0.17 (0.80) 0.20 (0.80) σ 11 = 1 0 .05 (0.63) 0.02 (0.52) 0.01 (0.3 8) σ 12 = 0 -0.03 (0.54) -0.01 (0.41) -0.01 (0. 34) σ 13 = 0 0 .04 (0.79) 0.01 (0.58) 0.01 (0.3 5) σ 22 = 1 0 .18 (1.06) 0.13 (0.81) 0.18 (0.7 3) σ 23 = 0 -0.15 (1.09) -0.10 (0.65) -0.09 (0. 62) σ 33 = 1 0 .65 (2.52) 0.53 (2.17) 0.68 (2.3 1) Mo del IV.2.4, comp onen t 2 π 2 = 0 . 35 -0.01 (0.06) -0.01 (0.06) -0.02 (0. 06) µ 1 = 0 0.01 (0.19) 0.01 (0.19) 0.01 (0.18) µ 2 = 3 -0.51 (1.25) -0.46 (1.21) -0.34 (1. 13) µ 3 = 0 0.24 (0.94) 0.21 (0.91) 0.13 (0.86) σ 11 = 1 0 .56 (1.54) 0.50 (1.47) 0.35 (1.2 7) σ 12 = 0 -0.49 (1.32) -0.44 (1.26) -0.32 (1. 10) σ 13 = 0 0 .09 (0.42) 0.08 (0.42) 0.05 (0.4 1) σ 22 = 3 0 .48 (1.78) 0.41 (1.71) 0.20 (1.5 3) σ 23 = 0 -0.33 (0.98) -0.30 (0.96) -0.25 (0. 88) σ 33 = 10 -1.40 (3.55) -1.26 (3.4 5) -1.03 (3.31 ) Mo del IV.2.4, comp onen t 3 π 3 = 0 . 5 0.01 (0.05) 0.01 (0.05) 0.00 (0.05) µ 1 = 0 -0.02 (0.18) -0.02 (0.18) -0.01 (0. 19) µ 2 = 0 0.37 (0.87) 0.34 (0.86) 0.27 (0.79) µ 3 = 2 -0.28 (0.72) -0.25 (0.68) -0.17 (0. 58) σ 11 = 4 . 87 -0.57 (1.42) -0.51 (1.36) -0.39 (1.22 ) σ 12 = − 3 . 23 0.45 (1.24) 0.41 (1.20 ) 0.30 (1.07) σ 13 = 0 . 5 -0.07 (0.33) -0.06 (0.33) -0.04 (0.3 2) σ 22 = 7 . 2 -0.46 (1.48) -0.42 (1.46) -0.33 (1.3 8) σ 23 = − 2 . 16 0.31 (0.95) 0.27 (0.89 ) 0.18 (0.77) σ 33 = 1 . 94 0.88 (2.23) 0.79 (2.1 6) 0.58 (1.90) 32 T able 10 Bias (std) under 3-comp onen t triv ariate normal mixtur e mo dels. MLE PMLE1 PMLE2 Mo del IV.3.6, comp onen t 1 π 1 = 0 . 15 0.00 (0.05) 0.00 (0.05) 0.00 (0.05) µ 1 = 0 0.05 (0.41) 0.05 (0.41) 0.05 (0.40) µ 2 = 0 -0.01 (0.64) -0.01 (0.64) -0.01 (0. 61) µ 3 = − 2 -0.21 (1.23) -0.21 (1.23 ) -0.23 (1.20) σ 11 = 1 0 .28 (1.24) 0.28 (1.24) 0.24 (1.1 2) σ 12 = 0 -0.19 (1.16) -0.19 (1.16) -0.15 (1. 05) σ 13 = 0 0 .14 (1.04) 0.14 (1.03) 0.13 (0.9 9) σ 22 = 3 0 .21 (1.48) 0.21 (1.48) 0.18 (1.4 0) σ 23 = 0 -0.42 (1.54) -0.42 (1.54) -0.39 (1. 50) σ 33 = 10 -1.37 (3.73) -1.37 (3.7 3) -1.34 (3.64 ) Mo del IV.3.6, comp onen t 2 π 2 = 0 . 35 -0.01 (0.06) -0.01 (0.06) -0.01 (0. 06) µ 1 = 0 -0.01 (0.33) -0.01 (0.33) 0.00 (0.32) µ 2 = 3 -0.20 (0.61) -0.2 (0.61) -0.19 (0.60) µ 3 = 0 0.25 (0.96) 0.25 (0.96) 0.26 (0.94) σ 11 = 4 . 87 -0.15 (1.18) -0.15 (1.18) -0.13 (1.14 ) σ 12 = − 3 . 2 1.23 (2.89) 1.23 (2.8 9) 1.2 (2.87 ) σ 13 = 0 . 5 -0.16 (0.62) -0.16 (0.62) -0.15 (0.6 2) σ 22 = 7 . 2 -0.24 (1.56) -0.24 (1.56) -0.21 (1.5 2) σ 23 = − 2 . 16 0.21 (0.77) 0.21 (0.77 ) 0.19 (0.73) σ 33 = 1 . 94 0.21 (1.61) 0.21 (1.6 1) 0.18 (1.52) Mo del IV.3.6, comp onen t 3 π 3 = 0 . 5 0.02 (0.07) 0.02 (0.07) 0.02 (0.07) µ 1 = 0 -0.02 (0.22) -0.02 (0.22) -0.02 (0. 22) µ 2 = 0 0.16 (0.43) 0.17 (0.43) 0.16 (0.43) µ 3 = 2 -0.33 (0.68) -0.33 (0.68) -0.32 (0. 68) σ 11 = 4 . 87 -0.18 (0.66) -0.18 (0.66) -0.17 (0.65 ) σ 12 = 3 . 23 -1.06 (2.14) -1.06 (2.15) -1.04 (2.15 ) σ 13 = − 0 . 5 0.17 (0.47) 0.17 (0.4 7) 0.16 (0.47) σ 22 = 7 . 2 -0.21 (0.97) -0.21 (0.98) -0.20 (0.9 8) σ 23 = − 2 . 16 0.03 (0.45) 0.03 (0.45 ) 0.03 (0.46) σ 33 = 1 . 94 0.03 (0.39) 0.03 (0.3 8) 0.03 (0.38) 33
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment