Inference for Multivariate Normal Mixtures

Inference for Multiv ariate Normal Mixtures Jiahua Chen Dep artment of Statistics, University of British Columbia V anc ouver, BC, V 6T 1Z2, Canada Xianmi ng T an LMPC and Scho ol of Mathematic al Scienc es, Nankai University Tianjin, 300071 , P.R. China Abstract Multiv ariate n ormal mixtures p ro vide a ﬂ exible mo d el for high-dimensional d ata. They are widely used in statistical genetics, statistica l ﬁnance, and other disci- plines. Due to the unb oundedn ess of the likelihoo d function, classical like liho o d- based metho ds, which ma y hav e nice pr actical prop erties, are inconsisten t. In this pap er, w e recommend a pen alized lik eliho o d metho d for estimating the mixing dis- tribution. W e sho w that the maxim um p enalized lik eliho o d estimator is s trongly consisten t when the n um b er of comp onen ts has a kno w n upp er b ound. W e also ex- plore a con venien t EM-algo rithm for computin g the maxim um p enalized lik eliho o d estimator. Extensiv e simulat ions are conducted to e xplore the eﬀe ctiv eness and t he practical limitat ions of b oth the new metho d and the ratiﬁed maxim um lik eliho o d estimators. Guid elines are p r o vid ed based on the simulat ion results. Key wor ds: Multiv ariate normal m ixtu re, Penaliz ed maxim um lik eliho o d estimator, Str on g consistency . P ACS: 02.5 0.-r 1 In t ro duction In the pa st few decades, there has b een an explo ding v olume of literature on mixture models [22, 13, 15, 6]. V ario us mixture distributions including normal mixtures are used in a wide v ariety of situations. Sc hork et al. [19] review ed Email addr esses: jhch en@stat. ubc.ca (Jiah u a Ch en), tanxm@ nankai.e du.cn (Xianming T an). Preprint su bmitted to Elsevier 13 No vem b er 201 8 the applications of mixture mo dels in h uman genetics a nd T adesse et al. [20] used a normal mixture mo del for clustering a nalysis. Application examples can b e found in [5, 12, 16] and [1]. Finite mixtures of multiv ariate normals hav e also drawn substan tial atten- tion recen tly . Lindsa y and Ba sak [14 ] devise d a system of moment equations and a fast algorithm to estimate the para meters of mu ltiv ariate normal mix- ture distributions under an equal-cov ariance-matrix assumption. How ev er the equalit y assumption is crucial, and failing this condition leads to a substan tia l loss in the accuracy of the ﬁt [15]. Unequal-v ariance norma l mixture mo dels ha v e an ill eﬀect on the likelihoo d function [3]. Placing a p o sitiv e lo w er b ound on the comp o nen t v ariances helps, but the resulting statistical pro cedure can b e a wkw ard b ecause it is not contin uous in the da ta. Placing a p ositiv e low er b ound on the ratio of the componen t v ariances is b etter. In the univ aria te case the resulting constrained maxim um like liho o d estimator is consisten t fo r b oth constan t and shrinking low er b ounds [8, 2 1]. Though consistency is yet to b e pro v ed, Ingr assia [9] applied the constrained metho d to m ultiv ariate observ a- tions. Ra y and Lindsa y [17] fo und that in con tra st to t he univ ariate case, the m ultiv ariat e normal mixture densit y can hav e more mo des than the n um b er of comp onen ts. Inference on m ultiv ariate normal mixture mo dels is hence more diﬃcult. In this pap er, w e in ves tigate a p enalized lik eliho o d metho d for estimating the mixing distribution. The p enalized lik eliho o d estimations form a p opulation class of metho ds, see [7, 4]. When t he n um b er of comp onen t s has a kno wn upp er b ound, the maximum p enalized lik eliho o d estimator (PMLE) is fo und to b e strongly consisten t. An EM-algor it hm is dev elop ed and extensiv e sim u- lations are conducted. Although a fter some ratiﬁcation, the usual maxim um lik eliho o d estimators and the PMLE w ork similarly after the remov al of de- generating lo cal maxima in the univ ariate case [2], the PMLE is a dv antageous for m ultiv ariate no r mal mixture mo dels. The pap er is organized as follows . In Section 2, the p enalized lik eliho o d metho d is intro duced. Tw o t heorems on strong consistency are presen ted with the pro ofs deferred to the App endix. The EM-algorithm for solving the maximiza- tion pro blem for the p enalized like liho o d function is given. Section 3 contains the sim ulation results. 2 2 P enalized lik eliho o d metho d 2.1 Consistency of the PMLE Let ϕ ( x ; µ, Σ) be the m ultiv ariate normal densit y with ( d × 1) mean v ector µ and d × d co v ariance matrix Σ, i.e., ϕ ( x ; µ, Σ) = { 2 π | Σ |} − d/ 2 exp {− 1 2 ( x − µ ) τ Σ − 1 ( x − µ ) } . A d -dimensional random v ector X has a m ultiv aria te ﬁnite no rmal mixture distribution of order p if its densit y function is giv en b y f ( x ; G ) = π 1 ϕ ( x ; µ 1 , Σ 1 ) + π 2 ϕ ( x ; µ 2 , Σ 2 ) + · · · π p ϕ ( x ; µ p , Σ p ) (1) where G is the mixing distribution assigning probabilit y π j to parameter set ( µ j , Σ j ) of the j th k ernel densit y ϕ ( x ; µ j , Σ j ). Let x 1 , x 2 , . . . , x n b e a random sample from (1). Then l n ( G ) = n X i =1 log f ( x i , G ) is the log-like liho o d function. Ev en if | Σ j | > 0 for all j , l n ( G ) is un b ounded at µ 1 = x 1 when | Σ 1 | gets arbitrarily small. The penalized log- lik eliho o d function is of the form pl n ( G ) = l n ( G ) + p n ( G ) where p n ( G ) is the p enalt y dep ending o n the mixing distribution G and the sample size n . Let ˆ G n b e the mixing distribution in the parameter space at whic h pl n ( G ) att ains its maxim um. W e call ˆ G n the p enalized maximum lik e- liho o d estimator (PMLE). W e c ho ose a p enalt y function suc h that: C1. p n ( G ) = P p j =1 e p n ( Σ j ), C2. A t any ﬁxed G suc h that | Σ j | > 0 for all j = 1 , 2 , . . . , p , we ha v e p n ( G ) = o ( n ), and sup G max { 0 , p n ( G ) } = o ( n ). In addition, p n ( G ) is diﬀeren tiable with resp ect to G and as n → ∞ , p ′ n ( G ) = o ( √ n ) at a n y ﬁxed G suc h that | Σ j | > 0 fo r all j = 1 , 2 , . . . , p . Here w e treat G as a v ector o f parameters containe d in the mixing dis- tribution G . C3. F or large enough n , e p n (Σ) ≤ 4(log n ) 2 log | Σ | , when | Σ | is smaller than cn − 2 d for some c > 0. These conditions are quite ﬂexible and functions satisfying these conditions can be easily constructed. A class of suc h functions will b e giv en in the sim u- 3 lation section. Condition C1 simpliﬁes the n umerical computation. Condition C2 limits the eﬀect of the p enalt y . The k ey conditio n is C3: it counters the damaging eﬀect of a degenerate comp onen t cov ariance matrix. The order of the p enalty size is w ell calibrated a s will b e seen in the pro of, y et the exact v alue of the constan t 4 is not imp ort a n t . The p enalty function can also b e view ed as a prior function via Ba ye sian analysis. Theorem 1 Assume that the true density function f ( x ; G 0 ) = p 0 X j =1 π 0 j ϕ ( x ; µ 0 j , Σ 0 j ) satisﬁes π 0 j > 0 , | Σ 0 j | > 0 , and ( µ 0 j , Σ 0 j ) 6 = ( µ 0 k , Σ 0 k ) for al l j = 1 , 2 , . . . , p 0 and j 6 = k . Assume that the p enalty function p n ( G ) satisﬁes C1 - C 3 and ˜ G n is a m ixing distribution of or der p 0 satisfying pl n ( ˜ G n ) − pl n ( G 0 ) ≥ c > −∞ , for al l n . Then, as n → ∞ , ˜ G n → G 0 , almost sur ely. The pro of is deferred to the App endix. Since pl n ( ˆ G n ) − pl n ( G 0 ) ≥ 0, the PMLE ˆ G is strongly consisten t. Because ˆ G n and G 0 ha v e the same order, all elemen ts in ˆ G n con verge to those of G 0 almost surely . F urthermore, let S n ( G ) = n X i =1 ∂ log f ( x i ; G ) ∂ G b e the v ector score function at G . Let S ′ n ( G ) = n X i =1 ∂ S n ( G ) ∂ G b e the matrix o f the second deriv a tiv e of the log-like liho o d function. A t G = G 0 , the normal mixture mo del is regular and hence the Fisher information I n ( G 0 ) = nI ( G 0 ) = − E { S ′ n ( G 0 ) } = E h { S n ( G 0 ) } τ S n ( G 0 ) i is p ositive deﬁnite. Using classical asymptotic tec hniques as in [11], and under condition C2 suc h that p ′ n ( G ) = o p ( n 1 / 2 ), w e hav e ˆ G n − G 0 = { S ′ n ( G 0 ) } − 1 S n ( G 0 ) + o p ( n − 1 / 2 ) . Therefore, ˆ G n is an asymptotically normal and eﬃcien t estimator. 4 Theorem 2 Under the same c onditions as in The or em 1, as n → ∞ , √ n { ˆ G n − G 0 } → N (0 , I ( G 0 )) in distribution. The pro of is straigh tforw a rd and o mitt ed. In practice, w e may kno w only an upp er b ound for p 0 rather than its exact v alue. The follow ing theorem deals with this situation. Theorem 3 Assume the sam e c onditions as i n The or em 1, exc ept that the or der of the ﬁnite normal m ixtur e mo del p 0 is known only to b e smal ler than or e qual to p . L et ˜ G n b e a mixing distribution of or der p satisfying pl n ( ˜ G n ) − pl n ( G 0 ) ≥ c > −∞ for al l n . Then, as n → ∞ , G n w → G 0 almost sur ely. The pro of is deferred to the App endix. 2.2 The EM-alg o rithm W e recommend the EM-algorithm due to its simplicity in co ding, and it s guaran teed con v ergence to some local maximu m under v ery general conditions [24, 18, 7]. In our sim ulat ions, w e use a n um b er of initial v alues to reduce the risk o f p o or lo cal maxima. W e also r ecommend some con v enien t and eﬀectiv e p enalt y functions for the EM-algorithm. Let z ij b e the mem b ership indicator v ariable suc h that it equals 1 when x i is from the j t h comp onen t of the normal mixture mo del, and equals 0 otherwise. The comple te observ atio n log-lik eliho o d under a normal mixture model is then giv en b y l c ( G ) = n X i =1 p X k =1 z ik  log π k − 1 2 log | Σ k | − 1 2 ( x i − µ k ) τ Σ − 1 k ( x i − µ k )  . Giv en the curren t mixing distribution G ( m ) = ( π ( m ) 1 , . . . , π ( m ) p , µ ( m ) 1 , . . . , µ ( m ) p , Σ ( m ) 1 , . . . , Σ ( m ) p ) , the EM-algorithm iterates as follows : In the E-Step, w e compute π ( m +1) ij = E { z ij | x 1 , . . . , x n , G ( m ) } = π ( m ) j φ ( x i ; µ ( m ) j , Σ ( m ) j ) P p j =1 π ( m ) j φ ( x i ; µ ( m ) j , Σ ( m ) j ) . 5 Replacing z ij b y π ( m +1) ij in l c ( G ), w e g et Q ( G ; G ( m ) ) = E { l c ( G ) + p n ( G ) | x 1 , . . . , x n , G ( m ) } = p X j =1 (log π j ) n X i =1 π ( m +1) ij − 1 2 p X j =1 (log | Σ j | ) n X i =1 π ( m +1) ij − 1 2 p X j =1 n X i =1 π ( m +1) ij ( x i − µ j ) τ Σ − 1 j ( x i − µ j ) + p n ( G ) . This completes the E-step. In the M-step, w e maximize Q ( G ; G ( m ) ) with respect to G to obtain G ( m +1) . W e suggest the follow ing p enalt y functions in practice: p n ( G ) = − a n p X j =1 n tr( S x Σ − 1 j ) + log | Σ j | o (2) with S x b eing the sample cov ariance matrix, and tr( · ) b eing the trace function. Using this p enalt y function, Q ( G ; G ( m ) ) is maximized at G = G ( m +1) with                            π ( m +1) j = 1 n n X i =1 π ( m +1) ij , µ ( m +1) j = P n i =1 π ( m +1) ij x i nπ ( m +1) j , Σ ( m +1) j = 2 a n S x + S ( m +1) j 2 a n + nπ ( m +1) j where S ( m +1) j = n X i =1 π ( m +1) ij ( x i − µ ( m +1) j )( x i − µ ( m +1) j ) τ . F rom a Ba ye sian p oin t of view, the p enalt y function (2) puts a Wishart dis- tribution pr io r on Σ j , and S x is the mo de of the prior distribution. Increasing the v alue of a n implies a stronger conviction o n S x as the p ossible v a lue o f Σ j . The EM-algorithm iterates betw een the E-step and the M-step. The penal- ized like liho o d increases after each iteration. A t the same time, the p enalize d lik eliho o d is b ounded ov er the parameter space. Hence, the EM-a lg orithm con- v erges to a non-degenerate lo cal maximum. This is the dividing line b et w een the p enalized likelihoo d and the ordinary lik eliho o d. In b o th cases, the EM- algorithm may conv erge to an undesired lo cal maxima starting f rom a p o or initial v alue. In the sim ulations, we use ten initial v alues including the true v alue for eac h data set to control this p oten t ia l problem. 6 3 Sim ulation study . When computing the MLE the lo cal maxima lo cated b y the EM-algorit hm with degenerate co v ariance matrices are ﬁrst remo v ed. The o ne that at tains the largest lik eliho o d v alue a mo ng those remaining is then iden tiﬁed as the MLE or the ra t iﬁed MLE of the mixing distribution. Although this approac h lac ks solid t heoretical supp ort, it w orks w ell fo r univ ariate normal mixture mo dels [2]. The consistency result for the PMLE for multiv ariate normal mixture mo dels do es not guara n tee it s superiority in pra ctice. Thus , we feel obliged to compare the p erformance of the PMLE with that o f the ratiﬁed MLE. In addition, there is a general shortage of tho r o ugh simu lation studies in the con text of m ultiv ariate normal mixture mo dels. This pap er partially ﬁlls that kno wledge gap. W e use bia s and standard deviation to measure the accuracy of the rati- ﬁed MLE and the PMLE. W e also record the num b er of times that the EM- algorithm degenerates when the r a tiﬁed MLE is att empted. F o r clar ity , the sim ulation results are organized in to t wo subs ections. 3.1 Simulation m o dels a nd settings The size of the parameter space for the ﬁnite m ultiv ariate normal mixture mo del explo des with the dimension. It is diﬃcult to use a few typical speciﬁc distributions to co v er all asp ects of this mo del. W e struggled to come up with a few particularly imp ortan t cases. W e considered four categories of mixture mo dels: tw o-comp onent biv ariate normal mixture mo dels ( p = 2 , d = 2); three-comp onen t biv ariat e no r ma l mixture mo dels ( p = 3 , d = 2); t wo- comp onen t triv ariate normal mixture mo dels ( p = 2 , d = 3); and three- comp onen t triv ariate normal mixture mo dels ( p = 3 , d = 3). In eac h category , w e c hose 3 × 6 mo dels formed by comp onent mean v ector and cov ariance matrix conﬁgurations. These combin ations mimic practical sit- uations and mak e the comparison of the perfor mance o f the r atiﬁed MLE and the PMLE meaningful. The cov ariance matrices in the simulation mo dels are designed to ha ve the follo wing general form when d = 2: Σ =    cos θ − sin θ sin θ cos θ       λ 1 0 0 λ 2       cos θ sin θ − sin θ cos θ    . By the c hoices of the eigen v alues λ 1 , λ 2 , and the orien tatio n angle θ , we obt a in 7 v arious conﬁgurations of biv ariate normal mixture mo dels. The cov ariance matrices in the simulation mo dels are designed to ha ve the follo wing general form when d = 3: Σ = P ( α, β , γ )diag[ λ 1 , λ 2 , λ 3 ] P T ( α, β , γ ) with P ( α, β , γ ) =        cos α cos γ − cos β sin α sin γ − cos β cos γ sin α − cos α sin γ sin α sin β cos γ sin α + cos α cos β sin γ cos α cos β cos γ − sin α sin γ − cos α sin β sin β sin γ cos γ sin β cos β        , that is, a 3 × 3 ro tation matrix. F or each m ultiv ariate normal mixture mo del, w e sp ecify the mixing prop ort ion, co v ariance matrix, and mean v ector f or eac h comp onen t. Tw o-component biv ariate normal mixture mo dels. W e set the comp o- nen t prop ortions ( π 1 , π 2 ) = (0 . 3 , 0 . 7). No other cases are considered. Due to the in v ariance prop ert y of the m ultiv ariate normal distribution, t he distance b etw een the tw o mean ve ctors is the only conﬁguration that can mak e a diﬀerence. Th us, we sim ulated only three pairs of mean v ectors represen ting the situatio n where tw o comp onen t mean v ectors are in near, mo derate, and distan t lo cations as in the follow ing table: near mo derate distant Comp onen t 1 (0, -1) (0, -3) (0, -5) Comp onen t 2 (0, 1) (0, 3) (0, 5) There are many features in the pa ir of cov ariance matrices that may ha ve an eﬀect o n the p erfo rmance of the ra t iﬁed MLE or PMLE. The sizes of the eigen v alues are most imp ortant in their r a tio λ 2 /λ 1 . The ang le θ determines the relativ e orien tation b et w een t w o comp onen t densities. Our c hoices based 8 on these considerations are giv en in the following t a ble: Comp onen t 1 Comp onen t 2 λ 1 λ 2 θ λ 1 λ 2 θ 1 1 1 0 1 1 0 2 1 5 0 1 1 0 3 1 5 π / 4 1 1 0 4 1 5 π / 2 1 1 0 5 1 5 π / 4 1 5 0 6 1 5 π / 2 1 5 0 Three-comp onent biv ariate normal mixture mo dels. W e set the com- p onen t prop ortions ( π 1 , π 2 , π 3 ) = ( . 15 , . 35 , . 50). The three mean v ectors may form a straigh t line, an acute tria ng le, o r an obtuse triangle. W e select thr ee represen tativ e ones as follo ws: straigh t acute obtuse Comp onen t 1 (0, -2) (0, -2) (0, -2) Comp onen t 2 (0, 0) (3, 0) (1, 0) Comp onen t 3 (0, 2) (0, 2) (0, 2) W e select six triplets of co v ariance ma t r ices as follo ws: Comp onen t 1 Comp onen t 2 Comp onen t 3 λ 1 λ 2 θ λ 1 λ 2 θ λ 1 λ 2 θ 1 1 1 0 1 1 0 1 1 0 2 1 1 0 1 1 0 1 5 0 3 1 1 0 1 5 0 1 5 π / 4 4 1 1 0 1 5 0 1 5 π / 2 5 1 5 0 1 5 π / 4 1 5 − π / 4 6 1 5 0 1 5 π / 4 1 5 − π / 2 Tw o-component triv ariate normal mixture mo dels. W e again let ( π 1 , π 2 ) = 9 (0 . 3 , 0 . 7). At the same time, only t he distance b etw een the t wo mean v ectors matters. The t w o mean vec tors are chosen to be: near mo derate distan t Comp onen t 1 (0, 0, -1) (0, 0, -3) (0, 0, -5) Comp onen t 2 (0, 0, 1) (0, 0, 3) (0, 0, 5) The co v ariance matrix pairs are c hosen as follo ws: Comp onen t 1 Comp onen t 2 ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) 1 (1, 1, 1) (0, 0, 0) (1, 1, 1) (0, 0, 0) 2 (1, 1, 1) (0, 0, 0) (1, 3, 10) (0, 0, 0) 3 (1, 3, 10) (0, 0, 0) (1 , 3, 10) (0, 0, 0) 4 (1, 3, 10) (0, 0, 0) (1 , 3, 10) ( − π , π , π )/3 5 (1, 3, 10) (0, 0, 0) (1 , 3, 10) ( π , − π , π )/3 6 (1, 3, 10) (0, 0, 0) (1 , 3, 10) ( π , π , − π )/3 Three-comp onent triv ariate normal mixture mo dels. W e let the com- p onen t prop ortions ( π 1 , π 2 , π 3 ) b e ( . 15 , . 35 , . 50). Recall that any three p oin ts fall into one plane. Th us, the inv ariance prop erty of the normal distribution allo ws us to set the ﬁrst en try o f the mean v ector to 0: straigh t acute obtuse Comp onen t 1 (0, 0, -2) (0, 0, -2) (0 , 0, -2) Comp onen t 2 (0, 0, 0) (0, 3, 0) (0, 1, 0) Comp onen t 3 (0, 0, 2) (0, 0, 2) (0, 0, 2) 10 The co v ariance matrix triplets are c hosen a s follo ws: Comp onen t 1 Comp onent 2 Comp onen t 3 ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) ( λ 1 , λ 2 , λ 3 ) ( α, β , γ ) 1 (1, 1, 1) (0, 0, 0) (1, 1, 1) (0, 0, 0) (1, 1, 1) (0, 0, 0 ) 2 (1, 1, 1) (0, 0, 0) (1, 1, 1) (0, 0, 0) (1, 3, 10) ( 0, 0, 0) 3 (1, 1, 1) (0, 0, 0) (1, 3, 10) (0, 0, 0) (1, 3, 10) ( − π , π , π )/3 4 (1, 1, 1) (0, 0, 0) (1, 3, 10) (0, 0, 0) (1, 3, 10) ( π , − π , π )/3 5 (1, 3, 10) (0, 0, 0) (1, 3, 10) ( − π , π , π )/3 (1, 3, 1 0 ) ( π , − π , π )/3 6 (1, 3, 10) (0, 0, 0) (1, 3, 10) ( π , − π , π )/3 (1, 3, 1 0 ) ( π , π , − π )/3 W e let n = 200 for the t wo-compo nen t biv ariate mixtures and n = 30 0 fo r the other mixtures to ensure a reasonable estimation of the mixing distribution. W e generate 1000 data sets for eac h mo del. W e ha v e presen t ed four categories o f ﬁnite normal mixture mo dels. F or ease of reference we use, for example, I.1.2 to refer to the mo del from Category I with mean v ector conﬁguration 1 and co v ariance matrix conﬁguration 2. Ev en though there are man y mo r e mixing distribution conﬁgurations for whic h sim ulation studies a r e needed, there is a limit to how m uc h one pap er can ac hieve . W e do not consider the case where p is unkno wn. All estimators in this case are expected to b e p o or although the consistency result for the PMLE remains true. P enalt y term and initial v alues. W e compute the ratiﬁed MLE and tw o p enalized MLEs corresp onding to a n = n − 1 and a n = n − 1 / 2 in (2). W e call these MLE, PMLE1, and PMLE2, resp ectiv ely . The ten initial v a lues are chose n fr o m t w o groups. The ﬁrst group of ini- tial v alues includes the true mixing distribution and four others obtained b y p erturbing t he comp onent mean ve ctors o f the true mixing distribution. The second group of initial v alues w as data-based. W e ﬁrst calculate the sample mean vec tor and the sample co v ariance matrix. Then we set the mixing pro- p ortions all equal to 1 /p and the comp onen t cov ariance matrices all equal to the sample cov ariance matrix. W e then apply similar p erturbat io n to the sample mean v ector to obtain another ﬁv e sets of initial v alues. 11 3.2 Simulation r es ults Num b er of Degeneracies. When the EM-algorithm conv erges to a mix- ing distribution with singular comp onen t co v aria nce matrices, w e sa y that it degenerates. The EM-algorithm for the PMLE do es not degenerate whic h is theoretically ensured. Regardless of the qualit y of the initial v alue, the corre- sp onding EM-alg o rithm alw a ys con verges to some non-degenerate lo cal max- im um. The PMLE is a go o d estimator if the largest lo cal maxim um is a go o d estimator. When computing the rat iﬁed MLE, the EM-a lg orithm sometimes conv erges t o a degenerate lo cal maxim um. W e recorded the n um b er of times that the EM- algorithm degenerated while computing the ratiﬁed MLE in our sim ulation. Since eac h data set had ten initial v alues, the n umber of degenerate outcomes is out of 10,000 for eac h en try . F or tw o-comp onent biv ariate normal mixture mo dels, it is immediately clear that the n um b er of degenerate o ut comes increases when the mean v ectors are more widely separated. The co v ariance structure is also imp or t a n t. F or exam- ple, when the eigen v ectors of one cov ariance matrix are rotated by a n angle of π / 2 (v ariance conﬁgurations 4 and 6), so that the tw o clusters o f o bserv ations b ecome more mixed, the num b er of degenerate outcomes declines. This ob- serv at ion is somewhat coun ter-in tuitiv e but can b e explained as follows. The success of the EM-algorithm is hea vily dep enden t on sensible initial v alues. When the tw o mean v ectors are close and the comp onen ts are w ell mixed, diﬀeren t initial v alues do not matter as m uc h. How ev er, when the tw o mean v ectors are distan t, the lo catio n of the initial mean v ectors is crucial. Th us the degenerate outcomes w ere mostly due to the second group of initial v alues. In the other three categories, the ab ov e phenomenon p ersists. That is, the fre- quency of degeneracy increases when comp onen ts are more widely separated. In addition, for these categories w e observ e a higher frequency of degeneracies on av erage. W e b eliev e this is b ecause the EM-algorithm is more sensitiv e to the qualit y of the initial v alues when the mixture mo dels are more complicated. Degeneracy of the EM-algor it hm should not b e a serious problem for the ratiﬁed MLE, as long a s the non-degenerate outcomes o f the algorithm pro vide go o d estimates. W e hence pro ceed to examine the bias and v ariance prop erties of the PMLE and the largest non-degenerate lo cal ma xima regarded a s the ratiﬁed MLE. Bias and St andard Deviation. W e compute the elemen t-wise mean bias and standard deviation based on 1000 sim ulated samples fro m eac h mo del. W e presen t only a subset of represen tat ive outcomes from each category; t he complete set is a v aila ble up on request. 12 Tw o represen ta tiv e outcomes fo r mo dels I.1.1 and I.2.4 in Category I are giv en in T able 2. There is ab out a 10 % reduction in the standar d deviation for PMLE2 compared to the ratiﬁed MLE or PMLE1 for the pa rameters in comp onen t 1 of Mo del I.1.1. The same is true for Mo dels I.1.5 and I.1.6 (not presen ted). The PMLE2 also has a relativ ely lo w er bias in these mo dels. The results for the remaining mo dels a r e comparable to those for I.2.4: there is little appreciable diﬀerence b etw een the three estimation methods. The biases of all t hr ee estimators for estimating µ 2 are high under I.1.1 and I.1.5 in whic h the tw o mean v ectors are lined up in the µ 1 direction. Due to the orien t ation o f the t w o comp onen t co v ariance matrices, it is ha r d to tell the t wo mean v ectors a part. The biases and standard deviations for estimating σ 22 under I.1.1, I.1.2, . . . , I.1.6 are also hig h or relativ ely high. T able 2 ab out here. W e presen t outcomes f o r t w o mo dels ( I I.1.1, I I.2.4 ) in Categor y I I in T ables 3 and 4. F or b oth mo dels, fo r the parameters in comp onen t 1, there is a 10% t o 20% reduction in the standard deviation for PMLE2 compared to the other tw o estimators. The bias of PMLE2 is also lo w er. Some reductions in comp onen ts 2 and 3 are also noticed but to v arying degrees. In the other mo dels, the p erformance of PMLE2 do es not dominate that of the ratiﬁed MLE or PMLE1. Under a straigh t-line conﬁguration o f the comp onent mean vec tors, the bia s for estimating µ 2 is relativ ely high. F or a tria ng le conﬁguration, the roles o f µ 1 and µ 2 are no longer diﬀeren t. This bias problem is not estimator dep enden t although PMLE2 helps sligh tly . The estimation of σ 22 again comes with b oth higher bias and higher standard deviation in general. F or this category of mo dels, the problem spreads into other parts of the co v ariance matrix. T ables 3, 4 ab out here. W e rep ort sim ulatio n results f or three mo dels (I I I.1.1, I I I.2.4, I I I.3.6 ) in Cate- gory I I I in T ables 5, 6, and 7. W e again observ e that PMLE2 has smaller bias and standard deviation for estimating the parameters in the ﬁrst comp onen t where the mixing pro p ortion is small, and in mo del I I I.1.1 where the tw o mean v ectors are close. The gain is as m uc h as 30% for σ 33 . The gains seem t o disapp ear when the tw o comp onen t mean v ectors are far from each other. Nev ertheless, PMLE2 still app ears to b e the b est estimator in terms of b oth bias and standard deviation. 13 T ables 5, 6 7 abo ut here. W e rep ort sim ulation results fo r three mo dels (IV.1.1, IV.2.4, IV.3.6) in Cat- egory IV in T ables 8, 9, and 10. Again, PMLE2 has the low est standard devi- ations fo r estimating the parameters in the ﬁrst comp onent where the mixing prop ortion is small. The comparison is the sharp est in mo del IV.2.4 for σ 13 . In con tra st to the mo dels f or the other categor ies, here the sup eriority of PMLE2 is widespread. In fact, PMLE2 is sup erior for parameters in comp onent 2, and mixed for parameters in comp onen t 3. W e caution that ev en the b est estimator is not necessarily a go o d estimator f o r triv ariate mixture mo dels. Ov erall, none o f the three estimators do es a great job at estimating mixing distributions, p ossibly due to their fundamen tal nature, e.g., small Fisher Information for high-dimension multiv ariate normal mixture mo dels. This pro blem is exp ected to disapp ear with increased sample size. T ables 8, 9 10 ab out here. Summary of the sim ulation results . T o conclude, the p enalized lik eliho o d estimators, b o t h PMLE1 and PMLE2, are completely free fro m degeneracy problems. Moreo v er, PMLE2 has the b est general p erformance in terms of bias and standard deviation. This is most ob vious when the comp onen ts are not w ell separated. In applications, it is unnecessary to ﬁrst judge whether it is safe to use the ra tiﬁed MLE, when a superior PMLE2 is av ailable. Although we do not completely dismiss the use of the ratiﬁed MLE, it is clearly adv antageous to use PMLE2 outr ig h t. W e further caution against the use of high-dimension m ultiv ariat e normal mixture mo dels in practice when the sample size is not large. In these situations, ev en the b est p erforming estimator may not b e a go o d estimator. References [1] R. Alexandrid is, S. Lin, M. Irwin, Class disco v ery and classiﬁcation of tum or samples us ing mixture mo d eling of gene expr ession data − a un iﬁed approac h , Bioinformatics 20 (2004 ) 254 5-2552 . [2] J. Chen, X. T an, R . Zh ang, Inf er en ce for normal mixtur e in mean and v ariance, In pr ess, Statistica S inica (2008). [3] N. E. Da y , Estimating the comp onents of a mixture of normal distr ib utions, Biometrik a 56 (19 69) 463- 474. 14 [4] P . B. Eggermon t, V. N. LaRiccia, Maxim um P enalized Like liho o d Estimation, V olume I. New Y ork: Springer, 2001 . [5] C. F raley , A. E. Raftery , Ho w many clusters? Whic h clustering metho d? Answ ers via mo d el-based cluster analysis, The Computer Journal 41 (1998) 578-5 88. [6] S. F ruhwirth-Sc hnatter, Finite Mixture a nd Mark o v Switc hing Mo dels, Springer, 2006. [7] P . J. Green, On u se of th e EM algorithm for p enalized likelihoo d estimation, J . Ro y . Statist. Soc. Ser. B 52 (1990) 443-4 52. [8] R. J. Hatha wa y , A constrained formula tion of maximun-lik eliho o d estimation for n ormal mixture d istributions, Ann . Statist. 13 (198 5) 795- 800. [9] S. Ingrassia, A lik eliho o d-based constrained algorithm for m ultiv ariate normal mixture mod els, Statistical Metho d s & Ap plications 13 (2004) 151-1 66. [10] J. K iefer, J. W olfo witz, Consistency on the maxim u m likel iho o d estimator in the presence of inﬁnitely many incident al parameters, Ann. Math. Statist. 27 (1956 ) 887-90 6. [11] E. L . Lehmann, Theory of Poin t Estimation, John Wiley & S ons, 1983. [12] S. Lin, S. Bisw as, On mo delling lo cus h eterogeneit y u sing mixture distributions, BMC Genetics 5 (2004) 29. [13] B. G. Lindsay , Mixture Mo dels: Theory , Geometry and Applications, Ha yw ard: Institute for Mathemati cal Statistics, 1995. [14] B. G. Lindsay , P . Basak, Multiv ariate normal mixtures: A fast consistent metho d of momen ts, J. Amer. Statist. Asso c. 86 (19 93) 468 -476. [15] G. J. MacLac hlan, D. Pe el, Finite Mixture Mo dels, Wiley , New Y ork, 2000. [16] A. E. Raftery , N. Dean, V ariable selectio n for m o del-based clustering, J. Amer. Statist. Asso c. 101 (20 06) 168-178. [17] S. Ra y , B. G. Lindsa y , The topography of multiv ariate normal mixtures, Ann. Statist. 33 (200 5) 2042 -2065 . [18] R. A. Ric hard, H. F. W alk er, Mixtur e d ensities, maxim u m likel iho o d and the EM algorithm, SIAM Rev. 26 (198 4) 195-2 39. [19] N. Schork, D. Allison, B. T hiel, Mixtur e distributions in h uman genetics researc h, Stat. Met ho ds Med. Res 5 (19 96) 155 -178. 15 [20] M. T adesse, N. Sh a, M. V annucci, Ba y esian v ariable selection in clustering high- dimensional d ata, J. Amer. Statist. Asso c 100 (2005) 602-617. [21] X. T an, J. Ch en , R. Zhang, Consistency of the constrained maximum lik eliho o d estimator in ﬁ nite normal mixture mo dels, 2007 Pro ceedings of the American Statistical Asso ciation [CD-R OM], Alexandria, V A: American Statistical Asso ciation (2007) 211 3-2119 . [22] D.M. Titterington, A.F.M. Smith, U.E. Mak ov, Statistical Analysis of Finite Mixture Distribu tions, Ch ic h ester: Wiley , 1985. [23] A. W ald, Note on the consistency of the m axim u m likel iho o d estimate, Ann. Math. S tatist. 20 (1949) 595 -601. [24] C.-F. W u, On the con v ergence prop erties of the EM algorithm, Ann . Statist. 11 (1983) 95-103. 16 App endix The ordinary like liho o d f unction is unbounded b ecause when the co v ariance matrix of a k ernel densit y b ecomes close to singular, the lik eliho o d contribution of the observ at io ns near its mean vector g o es to inﬁnit y . Th us, a k ey step in our pro of is to assess the n um b er of suc h observ ations. In the univ ariate case, Chen et al. [2] obtained the following result: Lemma 1 : Assume that x 1 , x 2 , . . . , x n is a r andom sample fr om a ﬁn ite nor- mal mixtur e distribution with density f ( x ) , x ∈ R . L et F n b e the empiri- c al distribution function an d de ﬁ ne M = max { sup x f ( x ) , 8 } , and δ n ( σ ) = − M σ log ( σ ) + n − 1 . Exc ept for a zer o-pr ob abi l i ty event not de p ending on σ , we have for al l lar ge e n ough n , (a) for σ b etwe en exp( − 2) a n d 8 / ( nM ) , sup µ [ F n ( µ − σ log ( σ )) − F n ( µ )] ≤ 4 δ n ( σ ); (b) for σ b etwe en 0 and 8 / ( nM ) , sup µ [ F n ( µ − σ log σ ) − F n ( µ )] ≤ 2 n − 1 (log n ) 2 . The consistency result for the m ultiv ariate normal mixture mo del is built o n a generalized r esult. More sp eciﬁcally , the follo wing lemma giv es a b ound for the m ultiv ariat e no r ma l mixture mo del: Lemma 2 : L et x 1 , x 2 , · · · , x n b e a r andom sample fr om a d -dime n sional multi- variate no rmal m ixtur e mo del with p c omp onents such that its den sity function is given by f ( x , G 0 ) = p X j =1 π j 0 ϕ ( x ; µ j 0 , Σ j 0 ) . Assume that al l Σ j 0 ar e p ositive deﬁnite. F or any me an and c ovarian c e matrix p air ( µ, Σ) s uch that | Σ | < exp( − 4 d ) , exc ept for a zer o pr ob ability event n ot dep ending on ( µ, Σ) , we have, for n lar ge enough, that H n ( µ, Σ) = n X i =1 I { ( x i − µ ) τ Σ − 1 ( x i − µ ) ≤ − (log | Σ | ) 2 } ≤ 4(log 2 n ) I ( | Σ | ≤ α n ) + 8 nδ n ( | Σ | ) I ( α n ≤ | Σ | ) , wher e ( α n = (4 / M d ) 2 d n − 2 d , δ n ( | Σ | ) = − M | Σ | 1 / 2 d log | Σ | + n − 1 , and M = max { 8 , λ − 1 / 2 0 } with λ 0 b eing the smal lest eigenvalue among those of Σ j 0 , ( j = 1 , 2 , . . . , p ) . 17 Pro of of Lemma 2: Let 0 < λ 1 ≤ λ 2 ≤ · · · ≤ λ d and ( a 1 , . . . , a d ) b e the eigen v alues and corresp onding eigenv ectors of unit length of Σ. W e hav e that { x : ( x − µ ) τ Σ − 1 ( x − µ ) ≤ − (log | Σ | ) 2 } = { x : d X j =1 λ − 1 j | a τ j ( x − µ ) | 2 ≤ − (log | Σ | ) 2 } ⊆ { x : | a τ j ( x − µ ) | ≤ − q λ j log | Σ | , j = 1 , . . . , d } ⊆ { x : | a τ 1 ( x − µ ) | ≤ − q λ 1 log | Σ |} . F urthermore, let Q = { b i : i = 1 , 2 , . . . } b e a sequence of unit v ectors so that Q forms a dense subset of unit v ectors in R d . Hence, for an y given a 1 and an y b ounded subset B ∈ R d , w e can ﬁnd a v ector b in Q such that they are arbitrarily close so that { x ∈ B : | a τ 1 ( x − µ ) | ≤ − q λ 1 log | Σ |} ⊆ { x ∈ B : | b τ ( x − µ ) | ≤ − q 2 λ 1 log | Σ |} . Based on this observ a tion, w e get sup µ H n ( µ, Σ) = sup µ n X i =1 I { ( x i − µ ) τ Σ − 1 ( x i − µ ) ≤ − (log | Σ | ) 2 } ≤ sup b ∈ Q sup µ n X i =1 I {| b τ ( x i − µ ) | ≤ q 2 λ 1 | log | Σ ||} . On the other hand, g iv en an y non-random unit v ector b , b τ x i , i = 1 , 2 , . . . , n is a random sample from the univ ariate normal mixture mo del with density f b ( x ) = p X j =1 π j 0 φ ( x ; b τ µ j 0 , b τ Σ j 0 b ) . W e remark that since some pairs of ( b τ µ j 0 , b τ Σ j 0 b ) can b e equal, this uni- v ariate mixture distribution can ha ve few er than p comp onents . This do es not aﬀect the follow ing deriv ation. Recall that λ 0 is the smallest eigenv alue a mo ng those of Σ j 0 , j = 1 , . . . , p . Then sup b ∈ Q sup x f b ( x ) ≤ sup b ∈ Q max { ( b τ Σ j 0 b ) − 1 / 2 , j = 1 , . . . , p } = λ − 1 2 0 . Applying Lemma 1 to the univ ariate data b τ x i , i = 1 , . . . , n , except fo r a zero-ev ent not dep ending on Σ, as n → ∞ , w e hav e 18 sup µ n X i =1 I {| b τ ( x i − µ ) | ≤ q λ 1 | log | Σ ||} ≤ 4(log 2 n ) I ( | Σ | ≤ α n ) + 8 nδ n ( | Σ | ) I ( α n ≤ | Σ | ) . The conclusion of the lemma simply claims that the ab ov e inequalit y is true o v er a ll b ∈ Q with only a zero-probability -ev ent exception. The zero-probabilit y claim remains true b ecause Q is countable. Pro of of Theorem 1 : W e g ive a pro of f or the case p = 2; the pro o f for the general case is similar. Let Γ b e the parameter space for G and deﬁne Γ 1 = { G ∈ Γ : | Σ 1 | ≤ | Σ 2 | ≤ ε 0 } Γ 2 = { G ∈ Γ : | Σ 1 | ≤ τ 0 , | Σ 2 | ≥ ε 0 } Γ 3 = Γ − (Γ 1 ∪ Γ 2 ) where ε 0 > τ 0 > 0 are t w o small p ositive constan ts to b e sp eciﬁed so on. The ﬁrst subspace represen ts the case where the t w o comp o nen ts hav e nearly singular cov ariance matrices. Hence the observ ations inside the small ellipse cen tered at the mean parameter mak e a large con t r ibution to the log lik eliho o d function. Let K 0 = E { log f ( X ; G 0 ) } . The constan ts ε 0 , τ 0 m ust satisfy the follo wing four conditions: 1: 0 < ε 0 < exp {− 4 d } ; 2: − log ε 0 − (log ε 0 ) 2 ≤ 4( K 0 − 2); 3: 16 M ε 1 / 2 d 0 (log ε 0 ) 2 ≤ 1; 4: 16 M dτ 0 (log τ 0 ) 2 ≤ 2 5 δ 0 ; for some δ 0 > 0 to b e sp eciﬁed. The existe nce of ε 0 , τ 0 is ob vious. W e pro ceed with the pro o f in three steps. Step 1. F or an y G ∈ Γ 1 , w e sho w that a lmost surely , sup Γ 1 pl n ( G ) − pl n ( G 0 ) → −∞ . Deﬁne t wo index sets A = { i : ( x i − µ 1 ) τ Σ − 1 1 ( x i − µ 1 ) ≤ (log | Σ 1 | ) 2 } , B = { i : ( x i − µ 2 ) τ Σ − 1 2 ( x i − µ 2 ) ≤ (log | Σ 2 | ) 2 } , 19 and for an y index set S ∈ { 1 , 2 , . . . , n } , denote l n ( G ; S ) = X i ∈ S log f ( X i , G ) . W e can write l n ( G ) = l n ( G ; A ) + l n ( G ; A c B ) + l n ( G ; A c B c ), where A c and B c are the complemen t sets of A and B resp ectiv ely . F or an y index set S , denote n ( S ) as its cardinality . It is easy to see that l n ( G ; A ) ≤ n ( A ) log | Σ 1 | − 1 2 , l n ( G ; B ) ≤ n ( B ) log | Σ 2 | − 1 2 . Applying Lemma 2 to n ( A ) and n ( B ), noting that | Σ 1 | ≤ ǫ 0 for G in Γ 1 , and C3 on the p enalt y function, w e ﬁnd that l n ( G ; A ) + e p n (Σ 1 ) ≤ 16 d log n + 8 M ε 1 2 d 0 (log ε 0 ) 2 n l n ( G ; A c B ) + e p n (Σ 2 ) ≤ 16 d log n + 8 M ε 1 2 d 0 (log ε 0 ) 2 n. The k ey p oin t underlying the ab o v e t wo inequalities is t ha t they are bounded b y an arbitrarily small fraction of n . F urther, for observ ations aw a y fro m µ 1 and µ 2 , w e hav e l n ( G ; A c B c ) ≤ X i ∈ A c B c log[ π 1 exp { log | Σ 1 | − 1 2 − 1 2 (log | Σ 1 | ) 2 } + π 2 exp { log | Σ 2 | − 1 2 − 1 2 (log | Σ 2 | ) 2 } ] ≤ X i ∈ A c B c {− 1 2 log ε 0 − 1 2 (log ε 0 ) 2 } ≤ n ( K 0 − 2) The last line in the a b o v e deriv ation is obtained b y c ho osing a small enough ǫ 0 as speciﬁed earlier. Combining t hese inequalities , we get pl n ( G ) ≤ n ( K 0 − 1), and hence almost surely sup Γ 1 pl n ( G ) − pl n ( G 0 ) ≤ − n + 16 d log n. That is, sup Γ 1 pl n ( G ) − pl n ( G 0 ) → −∞ almost surely whic h completes the ﬁrst step. Step 2. F or G ∈ Γ 2 , w e also show that almost surely sup Γ 2 pl n ( G ) − pl n ( G 0 ) → −∞ . 20 Recall that for eac h i ∈ A , ( x i − µ 1 ) τ Σ − 1 1 ( x i − µ 1 ) is b ounded b y (log Σ 1 ) 2 . Hence, it is easy to v erify that for i ∈ A , ϕ ( x i ; µ 1 , Σ 1 ) ≤ | Σ 1 | − 1 / 2 exp {− 1 4 ( x i − µ 1 ) τ Σ − 1 1 ( x i − µ 1 ) } . F or i 6∈ A , ϕ ( x i ; µ 1 , Σ 1 ) ≤ exp {− 1 4 ( x i − µ 1 ) T Σ − 1 1 ( x i − µ 1 ) } . Therefore, letting (not a densit y itself ) g ( x ; G ) = π 1 exp {− 1 4 ( x − µ 1 ) T Σ − 1 1 ( x − µ 1 ) } + π 2 ϕ ( x ; µ 2 , Σ 2 ) , w e ha ve log f ( x i ; G ) ≤ log g ( x i ; G ) + I ( i ∈ A ) log | Σ 1 | − 1 / 2 . Hence, w e get l n ( G ; A ) ≤ n ( A ) log | Σ 1 | − 1 2 + n X i =1 g ( x i ; G ) . It is ob vious that for any G ∈ Γ 2 , (a) E 0 { log g ( X ; G ) /f ( X ; G 0 ) } < 0 by Jensen’s inequalit y and the fact t ha t the integration of g ( x , G ) is less than 1; (b) g ( x ; G ) ≤ ε − 1 0 b y the deﬁnition of Γ 2 . Hence for each giv en G ∈ Γ 2 , b y the la w of large n umbers, 1 n n X i =1 log { g ( X i ; G ) /f ( X i ; G 0 ) } → E { g ( X ; G ) /f ( X ; G 0 ) } < 0 . F or each ﬁxed x , w e can extend the deﬁnition of g ( x ; G ) in G on to the com- pacted Γ 2 while maintaining pro p erties (a ) and (b) and its contin uit y in G . Th us, a classical tec hnique a s in [23] can b e readily employ ed to show that a s n → ∞ , sup G ∈ Γ 2 ( 1 n n X i =1 log g ( X i ; G ) f ( X i ; G 0 ) !) → − δ ( τ 0 ) < 0 (3) for some decreasing function δ ( τ 0 ). Hence, it is p ossible to c ho ose a small enough τ 0 ≤ ǫ 0 , suc h that sup Γ 2 pl n ( G ) − pl n ( G 0 ) ≤ sup Γ 2 { n ( A ) log | Σ 1 | − 1 2 + p n ( G ) } + sup Γ 2 n X i =1 log ( g ( X i , G ) f ( X i , G 0 ) ) ≤ 8 M τ 0 (log τ 0 ) 2 n − 9 10 δ ( ǫ 0 ) n ≤ − 1 2 δ ( ǫ 0 ) n. 21 The ﬁrst term of the third line ab o v e is from the assessmen t of n ( A ), C3 on p n ( G ). Note also that p n ( G 0 ) = o ( n ). Therefore, almost surely , sup Γ 2 pl n ( G ) − pl n ( G 0 ) → −∞ . Step 3. F rom the abov e tw o steps, w e kno w that ˜ G n ∈ Γ 3 with probability 1. A t the same time, when G ∈ Γ 3 , w e hav e p n ( G ) = o (1). By the deﬁnition o f the maxim um p enalized lik eliho o d estimator, w e hav e l n ( ˜ G n ) − l n ( G 0 ) ≥ p n ( G ) − p n ( G 0 ) = o (1) . (4) Since the parameter space Γ 3 is now completely regular, a n estimator with prop erty (4) is easily show n to b e consisten t b y the classical tec hnique [2 3] ev en with a p enalt y o f size o ( n ). ✷ Pro of of Theorem 3 : When p 0 < p < ∞ , we cannot exp ect that eve ry part of G con ve rges to that of G 0 . Instead, w e measure their diﬀerence as tw o distributions. Let H ( G, G 0 ) = Z R d ×A | G ( λ ) − G 0 ( λ ) | exp {−| λ |} dλ where λ = ( µ 1 , µ 2 , ..., µ d , σ 11 , σ 12 , σ 22 , ..., σ dd ) ∈ R d × A , | λ | = d X j =1 | µ j | + d X i =1 i X j =1 | σ ij | , and A is a subs et of R d × ( d +1) / 2 con ta ining all eligible com binations of d × ( d + 1) / 2 real n um b ers whic h form a symmetric p ositiv e deﬁnite matrix. It is w ell kno wn that A is an op en connected subset o f R d × ( d +1) / 2 and is regular enough al- though it ma y not b e easy to visualize its shap e. It can b e show n that H ( G n , G 0 ) → 0 implies G n → G 0 in distribution. An estimator ˜ G n is strongly consisten t if H ( ˜ G n , G 0 ) → 0 almost surely . Again, for the sake of clarity , we consider only the sp ecial case with p = 2 , p 0 = 1, that is, to ﬁt a non-mixture multiv ariate normal mo del with a t w o- comp onen t multiv ariate nor ma l mixture mo del. The extension of o ur pro of to general situations is straigh tforward and the ma jor h urdle is merely a compli- cated prese n t a tion. Most in termediate conclusions in t he pro o f o f consistency of the PMLE when p = p 0 = 2 are still applicable; some need minor c hanges. W e use man y of these results and nota tions to establish a brief pro of. F or an arbitrarily small p ositiv e n umber δ , deﬁne H ( δ ) = { G : G ∈ Γ , H ( G, G 0 ) ≥ 22 δ } . That is, H ( δ ) contains all mixing distributions with up to p comp onen ts that are at least δ > 0 distance from the true mixing distribution G 0 . Since G 0 6∈ H ( δ ), w e ha v e E [log { g ( X ; G ) /f ( X ; G 0 ) } ] < 0 for an y G ∈ H ( δ ). Th us, (3) remains v a lid after b eing sligh tly revised as follows : sup G ∈H ( δ ) ∩ Γ 2 n − 1 n X i =1 log { g ( X i ; G ) /f ( X i ; G 0 ) } → − η ( τ ) for some p ositiv e η ( τ ) dep ending on Γ 2 . Because of this, the deriv atio ns in the pro of of Theorem 1 still apply after Γ k is replaced by H ( δ ) ∩ Γ k ( k = 1 , 2). That is, with prop er c ho ice o f ǫ 0 and τ 0 , we similarly get sup G ∈H ( δ ) ∩ Γ k pl n ( G ) − pl n ( G 0 ) → −∞ for k = 1 , 2. With what w e ha v e pro ved, it is seen that the p enalized maxim um lik eliho o d estimator of G , ˜ G n , mus t a lmost surely b elong to H c ( δ ) ∪ Γ 3 , where H c ( δ ) is the complemen t of H ( δ ). Since δ is arbitra rily small, ˜ G n ∈ H c ( δ ) implies H ( ˜ G n , G 0 ) → 0. On the other hand, ˜ G n ∈ Γ 3 is equiv alent to putting a p ositiv e lo w er b o und on the comp o nen t v ariances, whic h also implies H ( ˜ G n , G 0 ) → 0 b y [10]. That is, consistency of the PMLE is also true when p = 2 but p 0 = 1. A generalization of the ab ov e deriv a tion leads to the conclusion of Theorem 3. 23 T able 1 Num b er of Dege neracies Mean.V ar.Conﬁ g 1 2 3 4 5 6 2-comp onen t biv ariate normal mixture near 0 11 19 5 40 8 mo derate 1911 3256 441 6 2 523 157 distan t 4997 4998 4966 4782 4998 4943 3-comp onen t biv ariate normal mixture straigh t 3049 5058 4947 1998 2306 2491 acute 2888 4505 4812 4052 4057 4561 obtuse 3253 4980 4983 2885 3022 3511 2-comp onen t triv ariate normal mixture near 1 4872 5003 4866 4961 1466 mo derate 4011 5000 5001 5000 5000 4900 distan t 5000 5000 5000 5000 5000 5000 3-comp onen t triv ariate normal mixture straigh t 5009 5010 5002 5002 5000 5000 acute 5006 5034 5000 5002 5000 5000 obtuse 5009 5038 5002 5004 5000 5001 24 T able 2 Bias (std) under 2-comp onen t biv ariate normal mixture m o dels. MLE PMLE1 PMLE2 Mo del I.1.1, co mp onent 1 π = 0 . 3 -0 .03 (0.11) -0.02 (0.11 ) -0.01 (0.10) µ 1 = 0 -0.16 (0.53 ) -0.16 (0.53) -0.13 (0.50 ) µ 2 = − 1 0.72 (1.17) 0.72 (1.17) 0.71 (1.14 ) σ 11 = 1 -0.14 (0.41) -0.14 (0. 40) -0.13 (0.37) σ 12 = 0 -0.01 (0.39) 0.00 (0.38) 0.00 (0.3 4) σ 22 = 1 -0.03 (0.71) -0.03 (0. 70) -0.01 (0.64) Mo del I.1.1, co mp onent 2 π 2 = 0 . 7 0.03 (0.11) 0.02 (0.11) 0.01 (0.10 ) µ 1 = 0 0.04 (0.19) 0.04 (0.19) 0.04 (0.19 ) µ 2 = 1 -0.39 (0.47 ) -0.39 (0.47) -0.37 (0.48 ) σ 11 = 1 -0.07 (0.18) -0.07 (0. 18) -0.07 (0.18) σ 12 = 0 0.00 (0.19 ) 0.00 (0.19) 0.00 (0.19) σ 22 = 1 0.33 (0.44 ) 0.33 (0.44) 0.30 (0.43) Mo del I.2.4, co mp onent 1 π 1 = 0 . 3 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) µ 1 = 0 -0.02 (0.28 ) -0.02 (0.28) -0.02 (0.28 ) µ 2 = − 3 -0.01 (0. 13) -0.01 (0.1 3) -0.01 (0.13) σ 11 = 5 -0.04 (0.93) -0.04 (0. 93) -0.04 (0.93) σ 12 = 0 0 .00 (0.30) 0.00 (0.30) 0.00 (0.30) σ 22 = 1 -0.02 (0.19) -0.02 (0. 19) 0 .00 (0.19) Mo del I.2.4, co mp onent 2 π 2 = 0 . 7 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) µ 1 = 0 0.00 (0.09) 0.00 (0.09) 0.00 (0.09) µ 2 = 3 0.00 (0.09) 0.00 (0.09) 0.00 (0.09) σ 11 = 1 -0.01 (0.12) -0.01 (0. 12) -0.01 (0.12) σ 12 = 0 0 .00 (0.08) 0.00 (0.08) 0.00 (0.08) σ 22 = 1 0 .00 (0.12) 0.00 (0.12) 0.00 (0.12) 25 T able 3 Bias (std) under 3-comp onen t biv ariate normal mixture m o dels. MLE PMLE1 PMLE2 Mo del I I.1.1, comp onent 1 π = 0 . 15 -0.10 (0.06) -0.08 (0.07) -0.04 (0.07) µ 1 = 0 0.69 (1.15 ) 0.58 (1.2 8) 0.25 (1.01) µ 2 = − 2 1.17 (2.48) 1.15 (2.32) 1.24 (1.94) σ 11 = 1 -0.33 (0.91) -0.46 (0.60 ) -0.33 (0.52) σ 12 = 0 -0.04 (0.54) -0.02 (0.46 ) 0.02 (0. 48) σ 22 = 1 -0.22 (1.16) -0.22 (1.01 ) 0.12 (1. 01) Mo del I I.1.1, comp onent 2 π 2 = 0 . 35 -0.02 (0.10 ) -0.02 (0.10 ) -0.03 (0.08) µ 1 = 0 -0 .10 (0.3 9) -0.08 (0.3 8) -0.06 (0.39 ) µ 2 = 0 0.61 (1.54 ) 0.63 (1.5 3) 0.56 (1.44) σ 11 = 1 -0.13 (0.29) -0.13 (0.30 ) -0.14 (0.31) σ 12 = 0 0.02 (0.32) 0.01 (0.33) 0.02 (0.34) σ 22 = 1 0.24 (0.70) 0.20 (0.71) 0.22 (0.69) Mo del I I.1.1, comp onent 3 π 3 = 0 . 5 0.11 (0.11) 0.10 (0.12) 0.06 (0.10) µ 1 = 0 0.02 (0.20 ) 0.01 (0.2 1) 0.01 (0.24) µ 2 = 2 -1.23 (0.90 ) -1.16 (0.89 ) -1.02 (0.89) σ 11 = 1 -0.08 (0.16) -0.08 (0.17 ) -0.10 (0.19) σ 12 = 0 0.03 (0.26) 0.03 (0.27) 0.00 (0.28) σ 22 = 1 0.86 (0.68) 0.81 (0.70) 0.65 (0.67) 26 T able 4 Bias (std) under 3-comp onen t biv ariate normal mixture m o dels. Mo del I I.2.4, comp onent 1 π 1 = 0 . 15 0.00 (0.04) 0.01 (0.0 4) 0.01 (0.03) µ 1 = 0 0.23 (0.86) 0.18 (0.7 4) 0.19 (0.72) µ 2 = − 2 0.12 (0.83) 0.11 (0.6 3) 0.11 (0.54) σ 11 = 1 0.07 (0.6 9) 0.06 (0.60 ) 0.10 (0. 59) σ 12 = 0 -0.05 (0.54) -0.03 (0.40 ) -0.04 (0.38) σ 22 = 1 0.17 (0.9 9) 0.18 (0.95 ) 0.20 (0. 90) Mo del I I.2.4, comp onent 2 π 2 = 0 . 35 -0.01 (0.05 ) -0.01 (0.05 ) -0.01 (0.05) µ 1 = 3 -0.43 (1.12 ) -0.40 (1.09 ) -0.38 (1.08) µ 2 = 0 0.15 (0.82) 0.14 (0.8 0) 0.13 (0.79) σ 11 = 1 0.37 (1.1 2) 0.33 (1.05 ) 0.31 (1. 03) σ 12 = 0 -0.01 (0.35) -0.02 (0.34 ) -0.03 (0.37) σ 22 = 5 -0.69 (1.60) -0.65 (1.57 ) -0.62 (1.55) Mo del I I.2.4, comp onent 3 π 3 = 0 . 5 0.00 (0.05) 0.00 (0.05) 0.00 (0.0 5) µ 1 = 0 0.33 (0.88) 0.31 (0.8 8) 0.30 (0.87) µ 2 = 2 -0.19 (0.57 ) -0.17 (0.53 ) -0.16 (0.51) σ 11 = 5 -0.38 (1.31) -0.36 (1.31 ) -0.36 (1.30) σ 12 = 0 0.00 (0.2 8) -0.01 (0.26 ) -0.01 (0.27) σ 22 = 1 0.37 (1.1 5) 0.34 (1.11 ) 0.33 (1. 08) 27 T able 5 Bias (std) under 2-comp onen t triv ariate normal mixtur e mo dels. MLE PMLE1 PMLE2 Mo del I I I.1.1, comp onen t 1 π 1 = 0 . 3 -0.09 (0.15) -0.08 (0. 15) -0.05 (0.14) µ 1 = 0 -0.28 (0.61 ) -0.26 (0.58) -0.17 (0.51 ) µ 2 = 0 -0.15 (0.58 ) -0.14 (0.57) -0.09 (0.52 ) µ 3 = − 1 0.52 (0.09) 0.54 (0.11) 0.61 (0.09 ) σ 11 = 1 -0.12 (0.47) -0.11 (0. 46) -0.11 (0.36) σ 12 = 0 -0.01 (0.38) 0.00 (0.35) 0.02 (0.2 7) σ 13 = 0 -0.10 (0.48) -0.10 (0. 47) -0.07 (0.37) σ 22 = 1 -0.09 (0.56) -0.11 (0. 47) -0.13 (0.36) σ 23 = 0 -0.04 (0.49) -0.02 (0. 47) -0.01 (0.37) σ 33 = 1 0.22 (0.91 ) 0.18 (0.83) 0.12 (0.66) Mo del I I I.1.1, comp onen t 2 π 2 = 0 . 7 0.09 (0.15) 0.08 (0.15) 0.05 (0.14 ) µ 1 = 0 0.01 (0.15) 0.01 (0.15) 0.01 (0.16 ) µ 2 = 0 0.02 (0.15) 0.02 (0.15) 0.02 (0.17 ) µ 3 = 1 -0.45 (0.41 ) -0.44 (0.41) -0.42 (0.44 ) σ 11 = 1 -0.05 (0.13) -0.05 (0. 13) -0.05 (0.14) σ 12 = 0 0.00 (0.10 ) 0.00 (0.10) 0.00 (0.10) σ 13 = 0 -0.02 (0.13) -0.02 (0. 13) -0.02 (0.14) σ 22 = 1 0.03 (0.13 ) -0.03 (0.13) -0.04 (0.14 ) σ 23 = 0 0.01 (0.14 ) 0.01 (0.14) 0.01 (0.15) σ 33 = 1 0.44 (0.38 ) 0.43 (0.38) 0.39 (0.39) 28 T able 6 Bias (std) under 2-comp onen t triv ariate normal mixtur e mo dels. Mo del I I I.2.4, comp onent 1 π 1 = 0 . 3 0.00 (0.04) 0.00 (0.04) 0.00 (0.04) µ 1 = 0 0.01 (0.13) 0.01 (0.13) 0.01 (0.13) µ 2 = 0 0.01 (0.22) 0.01 (0.22) 0.01 (0.22) µ 3 = − 3 -0.03 (0.52) -0.03 (0.52 ) -0.04 (0.52) σ 11 = 1 -0.01 (0.17) -0.01 (0.17) -0.01 (0. 17) σ 12 = 0 -0.01 (0.20) -0.01 (0.20) -0.01 (0. 19) σ 13 = 0 0 .03 (0.45) 0.03 (0.45) 0.03 (0.4 5) σ 22 = 3 -0.05 (0.49) -0.05 (0.49) -0.04 (0. 49) σ 23 = 0 0 .00 (0.75) 0.00 (0.75) 0.01 (0.7 5) σ 33 = 10 -0.36 (2.10) -0.36 (2.1 1) -0.38 (2.09 ) Mo del I I I.2.4, comp onent 2 π 2 = 0 . 7 0.00 (0.04) 0.00 (0.04) 0.00 (0.04) µ 1 = 0 0.00 (0.15) 0.00 (0.15) 0.00 (0.15) µ 2 = 0 -0.01 (0.19) -0.01 (0.19) -0.01 (0. 19) µ 3 = 3 -0.01 (0.11) -0.01 (0.11) -0.01 (0. 11) σ 11 = 4 . 87 -0.03 (0.47) -0.03 (0.48) -0.03 (0.47 ) σ 12 = − 3 . 23 0.03 (0.49) 0.03 (0.49 ) 0.03 (0.48) σ 13 = − 0 . 5 0.01 (0.23) 0.01 (0.2 3) 0.01 (0.23) σ 22 = 7 . 2 -0.07 (0.71) -0.07 (0.72) -0.07 (0.7 1) σ 23 = 2 . 16 -0.02 (0.30) -0.02 (0.30) -0.02 (0.30 ) σ 33 = 1 . 94 -0.01 (0.22) -0.01 (0.22) 0.00 (0. 22) 29 T able 7 Bias (std) under 2-comp onen t triv ariate normal mixtur e mo dels. Mo del I I I.3.6, comp onent 1 π 1 = 0 . 3 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) µ 1 = 0 0.00 (0.10) 0.00 (0.10) 0.00 (0.10) µ 2 = 0 0.01 (0.19) 0.01 (0.19) 0.00 (0.19) µ 3 = − 5 0.01 (0.37) 0.01 (0.37) 0.01 (0.37) σ 11 = 1 -0.01 (0.15) -0.01 (0.15) -0.01 (0. 15) σ 12 = 0 0 .01 (0.18) 0.01 (0.18) 0.01 (0.1 8) σ 13 = 0 0 .02 (0.36) 0.02 (0.36) 0.02 (0.3 6) σ 22 = 3 -0.05 (0.45) -0.05 (0.45) -0.04 (0. 45) σ 23 = 0 -0.02 (0.64) -0.02 (0.64) -0.02 (0. 64) σ 33 = 10 -0.06 (1.81) -0.06 (1.8 1) -0.06 (1.80 ) Mo del I I I.3.6, comp onent 2 π 2 = 0 . 7 0.00 (0.03) 0.00 (0.03) 0.00 (0.03) µ 1 = 0 0.00 (0.15) 0.00 (0.15) 0.00 (0.15) µ 2 = 0 0.00 (0.19) 0.00 (0.19) 0.00 (0.19) µ 3 = 5 0.00 (0.10) 0.00 (0.10) 0.00 (0.10) σ 11 = 4 . 87 -0.05 (0.46) -0.05 (0.46) -0.05 (0.46 ) σ 12 = 3 . 23 -0.03 (0.46) -0.03 (0.46) -0.03 (0.46 ) σ 13 = − 0 . 5 0.00 (0.22) 0.00 (0.2 2) 0.00 (0.22) σ 22 = 7 . 2 -0.02 (0.70) -0.02 (0.70) -0.03 (0.7 0) σ 23 = − 2 . 16 -0.01 (0.29 ) -0.01 (0.29) -0.01 (0.2 9) σ 33 = 1 . 94 -0.01 (0.20) -0.01 (0.20) 0.00 (0. 20) 30 T able 8 Bias (std) under 3-comp onen t triv ariate normal mixtur e mo dels. MLE PMLE1 PMLE2 Mo del IV.1.1, comp onen t 1 π 1 = 0 . 15 -0.05 (0.07 ) -0.06 (0.07 ) -0.01 (0.07) µ 1 = 0 0.10 (0.64) 0.28 (0.9 7) 0.12 (0.69) µ 2 = 0 -0.08 (0.64 ) 0.11 (0.9 7) -0.04 (0.65) µ 3 = − 2 3.07 (2.16) 2.65 (2.1 7) 2.16 (1.89) σ 11 = 1 -0.05 (0.73) -0.25 (0.63 ) -0.19 (0.47) σ 12 = 0 0.07 (0.5 0) 0.05 (0.40 ) 0.04 (0. 35) σ 13 = 0 -0.01 (0.58) 0.00 (0.5 1) 0.00 (0.48) σ 22 = 1 -0.04 (0.74) -0.23 (0.63 ) -0.16 (0.47) σ 23 = 0 0.03 (0.5 1) 0.03 (0.47 ) 0.04 (0. 43) σ 33 = 1 -0.01 (1.16) 0.01 (1.1 9) 0.31 (1.05) Mo del IV.1.1, comp onen t 2 π 2 = 0 . 35 -0.05 (0.09 ) -0.07 (0.11 ) -0.05 (0.09) µ 1 = 0 -0.05 (0.33 ) -0.10 (0.45 ) -0.02 (0.37) µ 2 = 0 0.04 (0.33) -0.02 (0.4 3) 0.01 (0.34) µ 3 = 0 0.00 (1.47) 0.02 (1.5 2) 0.26 (1.42) σ 11 = 1 -0.09 (0.26) -0.12 (0.32 ) -0.11 (0.29) σ 12 = 0 0.02 (0.2 0) 0.01 (0.23 ) 0.02 (0. 21) σ 13 = 0 -0.05 (0.32) -0.05 (0.41 ) -0.03 (0.35) σ 22 = 1 -0.09 (0.28) -0.11 (0.30 ) -0.11 (0.28) σ 23 = 0 0.02 (0.3 3) -0.01 (0.37 ) 0.01 (0.3 3) σ 33 = 1 0.46 (0.8 3) 0.48 (0.93 ) 0.46 (0. 84) Mo del IV.1.1, comp onen t 3 π 3 = 0 . 5 0.10 (0.12) 0.13 (0.15) 0.06 (0.1 2) µ 1 = 0 0.01 (0.19) 0.00 (0.1 8) 0.00 (0.21) µ 2 = 0 -0.01 (0.18 ) -0.01 (0.17 ) 0.00 (0.21 ) µ 3 = 2 -0.96 (0.81 ) -1.00 (0.79 ) -0.97 (0.86) σ 11 = 1 -0.07 (0.17) -0.07 (0.17 ) -0.08 (0.19) σ 12 = 0 0.01 (0.1 2) 0.00 (0.11 ) 0.01 (0. 13) σ 13 = 0 -0.04 (0.22) -0.04 (0.22 ) -0.04 (0.24) σ 22 = 1 -0.06 (0.16) -0.06 (0.16 ) -0.07 (0.18) σ 23 = 0 0.04 (0.2 2) 0.03 (0.22 ) 0.03 (0. 25) σ 33 = 1 0.76 (0.7 2) 0.88 (0.77 ) 0.75 (0. 76) 31 T able 9 Bias (std) under 3-comp onen t triv ariate normal mixtur e mo dels. MLE PMLE1 PMLE2 Mo del IV.2.4, comp onen t 1 π 1 = 0 . 15 0.00 (0.05) 0.00 (0.04) 0.01 (0.04) µ 1 = 0 0.04 (0.43) 0.04 (0.37) 0.02 (0.29) µ 2 = 0 0.20 (0.96) 0.20 (0.90) 0.24 (0.88) µ 3 = − 2 0.19 (0.86) 0.17 (0.80) 0.20 (0.80) σ 11 = 1 0 .05 (0.63) 0.02 (0.52) 0.01 (0.3 8) σ 12 = 0 -0.03 (0.54) -0.01 (0.41) -0.01 (0. 34) σ 13 = 0 0 .04 (0.79) 0.01 (0.58) 0.01 (0.3 5) σ 22 = 1 0 .18 (1.06) 0.13 (0.81) 0.18 (0.7 3) σ 23 = 0 -0.15 (1.09) -0.10 (0.65) -0.09 (0. 62) σ 33 = 1 0 .65 (2.52) 0.53 (2.17) 0.68 (2.3 1) Mo del IV.2.4, comp onen t 2 π 2 = 0 . 35 -0.01 (0.06) -0.01 (0.06) -0.02 (0. 06) µ 1 = 0 0.01 (0.19) 0.01 (0.19) 0.01 (0.18) µ 2 = 3 -0.51 (1.25) -0.46 (1.21) -0.34 (1. 13) µ 3 = 0 0.24 (0.94) 0.21 (0.91) 0.13 (0.86) σ 11 = 1 0 .56 (1.54) 0.50 (1.47) 0.35 (1.2 7) σ 12 = 0 -0.49 (1.32) -0.44 (1.26) -0.32 (1. 10) σ 13 = 0 0 .09 (0.42) 0.08 (0.42) 0.05 (0.4 1) σ 22 = 3 0 .48 (1.78) 0.41 (1.71) 0.20 (1.5 3) σ 23 = 0 -0.33 (0.98) -0.30 (0.96) -0.25 (0. 88) σ 33 = 10 -1.40 (3.55) -1.26 (3.4 5) -1.03 (3.31 ) Mo del IV.2.4, comp onen t 3 π 3 = 0 . 5 0.01 (0.05) 0.01 (0.05) 0.00 (0.05) µ 1 = 0 -0.02 (0.18) -0.02 (0.18) -0.01 (0. 19) µ 2 = 0 0.37 (0.87) 0.34 (0.86) 0.27 (0.79) µ 3 = 2 -0.28 (0.72) -0.25 (0.68) -0.17 (0. 58) σ 11 = 4 . 87 -0.57 (1.42) -0.51 (1.36) -0.39 (1.22 ) σ 12 = − 3 . 23 0.45 (1.24) 0.41 (1.20 ) 0.30 (1.07) σ 13 = 0 . 5 -0.07 (0.33) -0.06 (0.33) -0.04 (0.3 2) σ 22 = 7 . 2 -0.46 (1.48) -0.42 (1.46) -0.33 (1.3 8) σ 23 = − 2 . 16 0.31 (0.95) 0.27 (0.89 ) 0.18 (0.77) σ 33 = 1 . 94 0.88 (2.23) 0.79 (2.1 6) 0.58 (1.90) 32 T able 10 Bias (std) under 3-comp onen t triv ariate normal mixtur e mo dels. MLE PMLE1 PMLE2 Mo del IV.3.6, comp onen t 1 π 1 = 0 . 15 0.00 (0.05) 0.00 (0.05) 0.00 (0.05) µ 1 = 0 0.05 (0.41) 0.05 (0.41) 0.05 (0.40) µ 2 = 0 -0.01 (0.64) -0.01 (0.64) -0.01 (0. 61) µ 3 = − 2 -0.21 (1.23) -0.21 (1.23 ) -0.23 (1.20) σ 11 = 1 0 .28 (1.24) 0.28 (1.24) 0.24 (1.1 2) σ 12 = 0 -0.19 (1.16) -0.19 (1.16) -0.15 (1. 05) σ 13 = 0 0 .14 (1.04) 0.14 (1.03) 0.13 (0.9 9) σ 22 = 3 0 .21 (1.48) 0.21 (1.48) 0.18 (1.4 0) σ 23 = 0 -0.42 (1.54) -0.42 (1.54) -0.39 (1. 50) σ 33 = 10 -1.37 (3.73) -1.37 (3.7 3) -1.34 (3.64 ) Mo del IV.3.6, comp onen t 2 π 2 = 0 . 35 -0.01 (0.06) -0.01 (0.06) -0.01 (0. 06) µ 1 = 0 -0.01 (0.33) -0.01 (0.33) 0.00 (0.32) µ 2 = 3 -0.20 (0.61) -0.2 (0.61) -0.19 (0.60) µ 3 = 0 0.25 (0.96) 0.25 (0.96) 0.26 (0.94) σ 11 = 4 . 87 -0.15 (1.18) -0.15 (1.18) -0.13 (1.14 ) σ 12 = − 3 . 2 1.23 (2.89) 1.23 (2.8 9) 1.2 (2.87 ) σ 13 = 0 . 5 -0.16 (0.62) -0.16 (0.62) -0.15 (0.6 2) σ 22 = 7 . 2 -0.24 (1.56) -0.24 (1.56) -0.21 (1.5 2) σ 23 = − 2 . 16 0.21 (0.77) 0.21 (0.77 ) 0.19 (0.73) σ 33 = 1 . 94 0.21 (1.61) 0.21 (1.6 1) 0.18 (1.52) Mo del IV.3.6, comp onen t 3 π 3 = 0 . 5 0.02 (0.07) 0.02 (0.07) 0.02 (0.07) µ 1 = 0 -0.02 (0.22) -0.02 (0.22) -0.02 (0. 22) µ 2 = 0 0.16 (0.43) 0.17 (0.43) 0.16 (0.43) µ 3 = 2 -0.33 (0.68) -0.33 (0.68) -0.32 (0. 68) σ 11 = 4 . 87 -0.18 (0.66) -0.18 (0.66) -0.17 (0.65 ) σ 12 = 3 . 23 -1.06 (2.14) -1.06 (2.15) -1.04 (2.15 ) σ 13 = − 0 . 5 0.17 (0.47) 0.17 (0.4 7) 0.16 (0.47) σ 22 = 7 . 2 -0.21 (0.97) -0.21 (0.98) -0.20 (0.9 8) σ 23 = − 2 . 16 0.03 (0.45) 0.03 (0.45 ) 0.03 (0.46) σ 33 = 1 . 94 0.03 (0.39) 0.03 (0.3 8) 0.03 (0.38) 33

Inference for Multivariate Normal Mixtures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment