Statistical Speech Model Description with VMF Mixture Model

1 Statistic al Speech Model Description with VMF Mixture Model Zhanyu Ma and Arne Leijon Abstract —Efﬁcient quantization of the l inear predictiv e cod- ing (LPC) parameters plays a key role in parametric speech coding. The l i ne spectral f requency (LSF) representation of the LPC parameters has f ound its applications in speech mo del quantization. In practical implementation of ve ctor quantization (VQ), probability density function (PDF)-optimized VQ has been shown to be more efﬁcient than the VQ based on training data. In this p aper , we present the LSF parameters by a unit vector fo rm, which has directional characteristics. The underlying distribution of this unit vector variable is modeled by a von Mi ses- Fisher mixture model (VM M). With the h i gh rate th eory , the optimal inter -componen t bi t allocation strategy is proposed and the distortion-rate (D-R) relation is d erived for the VMM based- VQ (VVQ). Experimental results show that th e VVQ outperforms our rec ently introduced D VQ an d th e con ventional GVQ. Index T erms —Speech coding, line spectral frequencies, vector quantization, von Mises-Fi sher di stribution, mixture modeling I . I N T RO D U C T I O N Q U AN T IZA TION of the line p redictive coding (L PC) model is ubiq uitously applied in speech cod ing [1]– [4]. The line spectral fre quency (L SF) [5] presentation of the LPC mo del is th e com m only u sed one in q uantization [1], [6] because of its r elativ ely un iform sp ectral sensitivity [7]. Efﬁcient qu antization m e thods for the L SF p arameters have been studied inten si vely in the literatu re ( see e.g. , [6], [8]– [10]). Among these metho ds, th e probab ility density function (PDF)-optimiz e d vector qu antization (VQ) sche m e ha s been shown to be superior to tho se based o n training data [8], [9]. In PDF-o p timized VQ, the un derlying distribution of the LSF p arameters is d e scribed by a statistical parame tr ic model, e.g. , Gaussian mixtu re model (GM M ) [8]. Once this model is obtained, the codeb ook can be either trained by using a suf ﬁcient am ount o f data (theoretically inﬁnitely large) generated from the o btained model o r calculated theoretically . Thus PDF-optim ized VQ can prevent the codeb ook f rom overﬁtting to the training data, and henc e the perform a n ce of VQ can be sign iﬁcantly improved [8], [9]. Statistical modeling plays an imp o rtant role in PDF- optimized VQ, h ence in th e liter ature, several studies have been co nducted to seek an effecti ve mode l to explicitly captu re the statistical proper ties of the LSF paramete r s or its co rre- sponding transform ations. A fr equently u sed method is t he GMM-based VQ (GVQ) , which mod e ls the L SF par ameters’ distribution with a GMM [8], [9]. By recognizing the bound ed proper ty (all the L SF par ameters a re placed in the inte r val Zhanyu Ma is wit h Pattern Recogni tion and Intelligent System Lab., Bei jing Uni ve rsity of Posts and T elecommunicati ons, Beij ing 100876, Chin a. Arne Leijon is with School of Electrical Engineering, KT H - Ro yal Institute of T echnology , SE -100 44 Stockholm, Sweden . (0 , π ) ), Lind blom and Samuelsson [11] pro posed a bound ed GVQ scheme b y tru n cating and reno rmalizing the standa r d Gaussian d istribution. In p revious work, the LSF parame- ters were linearly scaled in to the in terval (0 , 1) . Authors introdu c ed a beta mixture model (BMM) -based VQ scheme, which took into accou nt the bound ed su p port nature of th e LSF parameters. As th e LSF para m eters ar e also strictly ordered , a Dirichlet m ixture mod el (DMM)-b ased VQ (DVQ) scheme was recen tly pr esented to explicitly utilize both the bound ed a n d the or d ering pro perties [1 0]. In the D VQ scheme, the LSF parame te r s were tr ansformed linearly to the ∆ LSF parameters. Modeling the under lying d istribution of the ∆ LSF parameters with a DMM yields b etter disto r tion-rate (D- R) relation than those obtained by modelin g the LSF par ameters with a GMM [4], [9], [12] and a BMM. Hence, the p ractical quantization p e rforma n ce was also improved signiﬁcantly [10]. Previous stud ies sugge st the fact that tr a nsformin g the LSF parameters into some o ther fo r m and applying a suitable statistical m odel to efﬁciently d escribe th e distribution can potentially beneﬁt the p ractical q uantization [10]. In this letter , we study the hig h rate D-R perfo rmance of the LSF parameter by u sing the recently p roposed sq uare- root ∆ LSF (SR ∆ LSF) representa tio n. This re p resentation is obtained by taking the positive squ are-roo t of the ∆ LSF parameters. By con c atenating a re d undan t element to th e end of the SR ∆ LSF parameter, a u nit vector that contains o nly positive elements is ob ta in ed. Geom etrically , this unit vector has d irectional ch aracteristics and is distributed o n the hy per- sphere with center at the origin. For su ch unit vector , the von Mises-Fisher (vMF) d istribution is an ideal an d wid ely used statistical mo del to d escribe the und erlying distribution [13]. One app lication domain of vMF d istribution is in info rmation retriev al where the cosine similarity is an effective measur e of similarity for a nalyzing text documents [14]. Another application domain of this distribution is in bioinformatics (e.g., [14]) and collabora ti ve ﬁltering (e.g., [15]) in which the Pearson cor r elation coefﬁcient serves as the similarity mea- sure. More recen tly , T ag h ia et al. pro posed a text-independe n t speaker id entiﬁcation system based on mo deling th e und erly- ing distribution o f SR ∆ LSF p arameters b y a mixture of vMF distributions. Here, we mod el the underly in g distribution of the SR ∆ LSF p arameters by a VMM an d pro pose a VMM-based VQ (VVQ) scheme. Acco rding to the hig h rate quantiza tion theory [ 16], the D-R relation can be analytically der iv ed for a single vMF distribution with c o nstrained entropy . Based o n the h igh rate theory , the optimal inter-compo nent bit allocatio n strategy is propo sed. Finally , the D-R perfor m ance f or the overall VVQ is derived. Compared with the r ecently presented 2 D VQ and th e conventionally u sed GVQ, th e VVQ shows convincing improvement. Hen c e , it pote n tially permits be tter practical quan tization per forman ce. The rem aining parts are organize d as follows. In section II, different representation s o f th e LSF par a meters are introduced . W e brieﬂy review the vMF distribution and the corre sp onding parameter estimation method s in section III. A PDF-optimized VQ based o n VMM is p ropo sed in section I V and the expe ri- mental resu lts a r e shown in section V. Finally , we draw some conclusion s and d iscuss future work in section VI a nd VII. I I . L S F , ∆ L S F, A N D S R ∆ L S F A. Rep resentations The L SF parameter s are widely used in speech codin g due to the a d vantage over som e other forms of represen tations (such as L A Rs, ASRCs). The LSF parameters with dimen sio nality K ar e deﬁned as s = [ s 1 , s 2 , . . . , s K ] T , (1) which are interleaved on the unit circle [5]. By rec ognizing that th e LSF pa rameters are in the in terval (0 , π ) and are strictly or d ered, we propo sed a particular r e pre- sentation o f LSF par ameters called ∆ LSF for the p urpo se of LSF quantization [1 0]. Th e ∆ LSF v parameters in rep resented as v = ϕ ( s ) = [ s 1 , s 2 − s 1 , . . . , s K − s K − 1 ] T . (2) Another rep r esentation of the LSF paramete r s wer e intro- duced, which too k the square- r oot of the ∆ LSF param eters. Hence, a K -dimension al SR ∆ LSF x par ameters can be o b- tained as x = φ ( v ) = v 1 2 = [ √ v 1 , √ v 2 , . . . , √ v K ] T . (3) In, we mo deled the u nderly in g distribution of th e SR ∆ LSF by a ( K + 1 ) -variate VMM and propo sed a text-independ ent speaker id entiﬁcation system based on the SR ∆ LSF rep resen- tation, which sh owed com petitive perfo r mance co mpared to the ben c h mark app roach. B. Distortion T ransformation When getting th e SR ∆ LSF par a m eters fro m the ∆ LSF parameters, the parame ter spa ce is wrapped. Hen ce, we study the distortion transfo rmation between th e SR ∆ LSF and the ∆ LSF spaces in th is section. Denote the PDFs of v and x as g ( v ) and f ( x ) , respectively . Assuming that the K -dimen sional SR ∆ LSF space is divided into J cells and with the optimal lattice quantizer, the overall quantization d istortion ( using th e squ are err o r as the criterion) for x can b e written as [16] D x = J X j =1 Z V j , x k x − x j k 2 f ( x ) d x ≈ 1 V x Z V x e T e d e (4) where e = x − b x den otes th e quan tization err or and all the cells V j , x are of identica l shape acco rding to G e r sho conjecture [16]. The mapping φ from ∆ LSF space to SR ∆ LSF space chan ges th e distortion p er cell in th e ∆ LSF dom ain at v as 1 V v R V v e T J φ ( v ) T J φ ( v ) e d e , where J φ ( v ) is the Jacobian matrix J φ ( v ) i,j = ( ∂ φ − 1 ( x ) i ∂ x j | x = φ ( v ) = 2 √ v i i = j 0 i 6 = j . (5) Then th e overall q uantization d istortion tra n sformatio n b e- tween x and v can b e denoted as D v = Z V v g ( v ) 1 V x Z V x e T J φ ( v ) T J φ ( v ) e d e d v = Z V v g ( v ) tr  1 V x Z V x ee T d e J φ ( v ) T J φ ( v )  d v = Z V v g ( v ) tr  1 K D x · I · J φ ( v ) T J φ ( v )  d v = 1 K D x · Z V v g ( v ) tr h J φ ( v ) T J φ ( v ) i d v = 4 K D x · Z V v g ( v ) K X k =1 v k d v = 4 K D x · K X k =1 Z V v k e g ( v k ) v k dv k , (6) where I is th e id e ntity matrix , the q uantization no ise e is white in the optimal lattices, e g ( v k ) is the marginal d istribution of v k , and we assumed that th e qu a ntization noise e is indep endent of x (and, therefo re, independe nt of v a s well) [16]. According to the neutrality [10] of the Dirichlet variable v , the m arginal distribution e g ( v k ) is beta d istributed. Theref o re, the mea n value of v k with respect to its marginal distribution can b e calculated explicitly . I n ou r previous work, the measuremen t transform ation b etween the L SF space an d the ∆ LSF space was pr e sented. Th erefore, with these transfor mation metho ds, we can co mpare the high rate D-R perf ormance in all the three different space s fairly with consistent measurem ents. I I I . S TA T I S T I C A L M O D E L F O R S R ∆ L S F P A R A M E T E R S The v MF distribution and its corre sponding VMM are widely used in modeling the unde r lying distribution of the u nit vector [1 4]. Therefore , we ap ply th e VMM as the statistical model for SR ∆ LSF Parameters. A. V on Mises-Fis her Mixture Model Let x = [ x 1 , x 2 , . . . , x k ] T denote a K -dimensiona l vecto r satisfying P K k =1 x 2 k < 1 . The n , th e ( K + 1) -dimension a l unit random vector [ x T , 1 − P K k =1 x 2 k ] T on the K -dimen sional unit hype r sphere S K is said to have ( K + 1) -variate vMF distribution if its PDF is given by F ( x | µ , λ ) = c K +1 ( λ ) e λ µ T x , (7) where k µ k = 1 , λ ≥ 0 , and K ≥ 2 [13]. Th e no rmalizing constant c K + 1 ( λ ) is given by c K +1 ( λ ) = λ K − 1 2 (2 π ) K +1 2 I K − 1 2 ( λ ) , (8) where I ν ( · ) rep resents the modiﬁed Bessel fun ction of the ﬁrst kind of order ν [17]. The density function F ( x | µ , λ ) is characterized by the mean direction µ an d the concentration parameter λ . 3 W ith I mixture co mponen ts, the likelihood fun ction of the VMM with i.i.d. ob servation X = [ x 1 , x 2 , . . . , x N ] is f ( X | M , λ , π ) = N Y n =1 I X i =1 π i F ( x n | µ i , λ i ) , (9) where π = [ π 1 , π 2 , . . . , π I ] T ( π i > 0 , P I i =1 π i = 1 ) is the weights, M = [ µ 1 , µ 2 , . . . , µ I ] is th e mean direc tions, an d λ = [ λ 1 , λ 2 , . . . , λ I ] T is the con c entration parameters. B. P a rameter Estimation Let Z = { z 1 , z 2 , . . . , z I } be th e corre sponding set of hidden random variables, where z n = i m eans x n is sampled f r om the i th vMF com ponen t. Gi ven X , Z , and the model param eters ( M , λ , π ) , the co mplete log-likelihood of X writes ln p ( X , Z | M , λ , π ) = N X n =1 ln  π z n F ( x n | µ z n , λ z n )  . (10) As obtainin g th e m aximum- likelihood (ML) estimates from the co mplete log- likelihood is n ot trac ta b le [13], an efﬁcient expectation-ma ximization (EM) appr oach is developed which provides the ML estimates to the m odel p arameters [14], [18]. The E-step and the M-step are summarized as: • E-step p ( i | x n ) = α i F ( x n | µ i , λ i ) P I j =1 α j F ( x n | µ j , λ j ) (11) • M-step b α i = 1 n N X n =1 p ( i | x n ) , b µ i = P N n =1 x n p ( i | x n ) k P N n =1 x n p ( i | x n ) k , (12) ¯ r i = k P N n =1 x n p ( i | x n ) k P N n =1 p ( i | x n ) , b λ i = ¯ r i K − ¯ r 3 i 1 − ¯ r 2 i . (13) I V . P D F - O P T I M I Z E D V E C T O R Q UA N T I Z A T I O N In designing practical quantizer s, one ch allenging p roblem is that when the amoun t of the trainin g d ata is not sufﬁciently large eno ugh, th e o btained cood book may tend to b e over - ﬁtted to the training set and per form worse for th e whole real data set. Th e PDF-optimized VQ can overcome such pro blem either by gener ating sufﬁciently large amount of training data from th e obtained PDF or calculatin g the optimal code boo k explicitly with the obtained PDF [8], [9]. Thus, with the trained VMM, we can d esign a PDF-optimized VQ. A. Distortion- R ate Relation with Constrained Entr opy W ith the high rate assum ption, th e ana ly sis of the quanti- zation p e r forman ce is analy tica lly tractable [16]. Sin c e codin g at a ﬁnite r a te is the mo tiv ation o f using quanize rs, constraint must b e im posed o n VQ design. Gen erally spea king, there are two common ly used cases, namely the con strained resolution (CR) and the constrain ed entro py (CE). In the CR case, the nu mber of index lev els is ﬁxed. It is widely applied in commun ication sy stem s. Th e CE case, on the oth e r hand , imposes the con straint on a verage bit rate. It is less restrictive than the CR case and yields lower av erage b it rates. As the computatio nal capabilities of ha rdware incr eases, it beco mes more attractive to exploit a d vantages inh erent in CE ca se [16]. Assuming th at th e PDF of variable x is f ( x ) , the D-R relation in CE case, o n a per dimen sion basis, writes D ( R ) = C ( r, K ) · e − r K ( R − h ( x )) , (14) where h ( x ) is the differential entropy of x , R is the average rate f or qu antization, and C ( r , K ) is a con stant depen ds on the distortion type r ( e.g. , r = 2 m e a ns the Eu clidean distortion ) and the variable’ s dimension (degrees of freedom) K . B. Optima l In ter -compon ent Bit Allo cation When applying a mixtur e mod e l based q uantizer, we model the PDF as a weighte d ad dition of mixture compo nents and design a quantizer for ea ch com p onent. The total ra te R will be divided into two parts, o ne for identify ing the ind ices of th e mix ture com ponen ts and the o ther f or qu antizing the mixture co mpone nts. Given I mixtur e compo nents, the rate spent on iden tifying the indices is R a = ln I . T he rema ining rate R q = R − R a will b e used fo r qu antizing the mix ture compon ents. There fore, an op timal inter-comp o nent bit allo- cation strategy is require d so that th e d esigned q uantizer can achieve the smallest mean d istortion at a given R q . In CE case, the objective is to minimize the me a n distortio n D ( R ) = I X i =1 π i D i ( R i ) , (15) where R i is the rate assigned to compon ent i and satisﬁes R q = P I i =1 π i R i . T o r each the optimal m e a n distortio n , each c ompon ent should have its best CE perfo rmance. This indicates that the distortion for each mixtur e compo nent writes D i ( R i ) = C ( r, K ) · e − r K ( R i − h i ( x )) . (16) The differential en tropy for componen t i in a VMM is h i ( x ) = − Z h ln c K +1 ( λ i ) + λ i µ i T x i · c K +1 ( λ i ) e λ i µ i T x d x = − ln c K +1 ( λ i ) − λ i µ i T µ i = − ln c K +1 ( λ i ) − λ i , where we used the fact th at k µ i k = 1 . The con strained optimization problem in (15) can be solved b y the method of Lag range multip lier s. W ith some math e matics, the r ate assigned to the i th mix ture comp o nent is R i = R q + h i ( x ) − I X i =1 π i h i ( x ) . (17) C. Distortion-Ra te Relation by VMM In CE case and with optim a l inter-comp onent b it allocation , the distortio ns contributed b y all the mix ture compo nents are identical to each o ther be cause R i − h i ( x ) is a constant which only depen ds o n th e trained mode l [1 6] . Then the D- R relation is D ( R ) = I X i =1 π i D i ( R i ) = D i ( R i ) , ∀ i ∈ { 1 , 2 , . . . , I } . (18) 4 40 41 42 43 44 45 46 47 48 49 50 0 1 2 3 4 5 6 7 8 9 x 10 −3 P S f r a g r e p l a c e m e n t s R (in bit) Distorti on (a) D-R performance of all VQs. 46 2.2 2.4 2.6 2.8 3 3.2 x 10 −3 P S f r a g r e p l a c e m e n t s R (in bit) Distorti on (b) D-R relation for DVQ (zoomed in). 46 1.2 1.25 1.3 1.35 1.4 1.45 x 10 −3 P S f r a g r e p l a c e m e n t s R (in bit) Distorti on (c) D-R relation for VVQ (zoomed in). Fig. 1. D-R pe rformance comparisons of GVQ, DVQ, and VVQ. T o distinguish VQs, we use the red, green, and blue lines to denote the performance obtained by GVQ, DVQ, and VVQ, respec tively . For each VQ, s olid line, das h-circle line, and dot-diamond line represent the performance obtained by 16 , 32 , and 64 mixture components, respectively . V . E X P E R I M E N TA L R E S U LT S A N D D I S C U S S I O N The p roposed in ter-component b it allo cation stra tegy o p- timizes the D-R relation of VVQ. T o dem o nstrate the D- R p erform ance, we compar ed it with our r ecently pr esented D VQ [10] and the widely used GVQ [9], [1 2]. The TIMI T [ 1 9] database with wide b and speech (sampled at 1 6 kHz) was used. W e extracted 16 -dimen sional LPC parameter s a n d tran sfo rmed them to LSF parameter s, ∆ LSF parameter s, and SR ∆ LSF , respectively . W ith window len gth equal to 25 millisecond s and step size equal to 20 m illiseconds, ap p roxima te 706 , 0 00 LSF vectors (the same amou nt for ∆ L SF an d SR ∆ LSF as we ll) were obtained from th e train ing pa r tition. GMM, DMM, and VMM were trained b ased on the relating vectors an d the D-R relations were calculated, respectively . The mean values o f 2 0 round s of simulation s ar e repor te d . Figure 1 shows the D-R perfor mance co mparison s. It can be observed that VVQ leads to smaller d istortion at different r ates, compared to GVQ and D VQ . W e believe th is is d ue to the efﬁcient modeling of the SR ∆ LSF param eters. Fur thermor e, better D-R perfo rmance can b e obtained with mo re mix ture comp onents. Ther e f ore, VVQ poten tially perm its su p erior practical VQ per f ormanc e. V I . C O N C L U S I O N A novel PDF-optim ized VQ for LSF p arameters quan tiza- tion was pro posed. The LSF par a m eters were transfor med to the squar e-root ∆ LSF doma in a n d we mod eled the unde r lying distribution by a von Mises-Fisher mixtur e mod el ( VMM). According to the p rinciple of high r ate qua ntization theory and with th e con strained en tropy case, the optimal inter- compon ent bit allo cation strategy was p r oposed based on the VMM. Th e mean distortio n of the VMM based vector quantizer (VVQ) was minimized at a g iven rate so that the D- R relation was o btained. Compa r ed to our recently p roposed Dirichlet mixture model based VQ and th e co n ventionally used Gaussian mix ture model based VQ, the prop osed VVQ perfor ms be tter at a wide range of b it rates. V I I . F U T U R E W O R K For our futu re work, we need to implemen t a practical scheme to carr y out the VQ. One possible solu tion is to propo se an efﬁ cient quantizer fo r the v on Mises-Fisher (vM F) source, e.g. , similar as the method in [2 0]. Another possible solution is to decor relate the v MF vector variable into a set of scala r variables, each of which has an explicit PDF representatio n. Then we can replace th e VQ with a set o f indepen d ent scalar quantizer s. This appro a ch is similar to the Dirichlet source decorr elation and th e Dirichlet mixture model based VQ intro d uced in [10]. R E F E R E N C E S [1] K. K. Paliwal and W . B. Klei jn, Spee ch Coding and Synthesis . Am- sterdam, The Netherland s: Elsevier , 1995, ch. Quantizati on of L PC paramete rs, pp. 433–466. [2] W . B. Kleijn, T . Backstrom, and P . Alku, “On line spectral frequenc ies, ” IEEE Signal Proce ssing Letters , vol. 10, no. 3, pp. 75–77, 2003. [3] Z. Ma, “Bayesian estimation of the diric hlet distrib ution with exp ectat ion propagat ion, ” in Proce edings of Eur opean Signal P r ocessin g Confer ence , 2012. [4] Z. Ma, S. Chatterj ee, W . B. Kleijn, and G. J., “Dirichlet mixture modeling to estima te an empirica l lower bound for LS F quantiza tion, ” SIgnal Proce ssing , vol. 104, pp. 291–295, Nov . 2014. [5] F . Itakura, “Line spectrum representation of linear predicti ve coefﬁci ents of speech signals, ” Journal of the Acoustical Socie ty of America , vol. 57, p. 535, 1975. [6] K. K. Pa liwa l and B. S. Atal, “Efﬁc ient vector quantizat ion of LPC paramete rs at 24 bits/frame, ” IEEE T ransacti ons on Speech and Audio Pr ocessing , vol. 1, no. 1, pp. 3–14, Jan. 1993. [7] J. Li, N. Chaddha, and R. M. Gray , “ Asymptotic performance of ve ctor quantiz ers with a perc eptual distortion measure, ” IEE E T ransact ions on Informatio n Theory , vol. 45, pp. 1082 – 1091, May 1999. [8] P . Hedelin and J. Skog lund, “V ector quantizat ion based on Gaussian mixture m odels, ” IEEE T ransaction s on Speec h and Audio Processi ng , vol. 8, no. 4, pp. 385–40 1, Jul. 2000. [9] A. D. Subramaniam and B. D. Rao, “PDF optimiz ed paramet ric ve ctor quantiz ation of speech line spectral frequencie s, ” IEEE T ransactions on Speec h and Audi o Proce ssing , vol. 11, pp. 130–14 2, Mar 2003. [10] Z. Ma, A. L eijon, and W . B. Kleijn, “V ector quantizati on of L SF paramete rs with a mixture of Dirichl et distrib utions, ” IEEE T ransac tions on Audio, Speec h, and Languag e P r ocessin g , vol . 21, no. 9, pp. 1777– 1790, Sept 2013. [11] J. Lindblom and J. Samuelsson, “Bounded support Gaussian m ixture modeling of speech spect ra, ” IEEE T ransacti ons on Speech and Audio Pr ocessing , vol. 11, no. 1, pp. 88–99, Jan. 2003. [12] S. Chatterj ee and T . V . Sreeni vas, “Low complexity wideband LSF quantiz ation using GMM of uncorrelated Gaussian m ixtures, ” in 16th Eur opean Signal Pro cessing Confer ence (EUSIPCO) , 2008. [13] K. V . Mardia and P . E. Jupp, Directi onal Statisti cs . John Wile y and Sons, 2000. [14] A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra, “Clustering on the unit hyperspher e using von Mises-Fisher distri but ions, ” J ournal of Machin e Learning Researc h , vol . 6, pp. 1345–1382, 2005. [15] B. Sarwar , G. Karypi s, J. K onstan, and J. Riedl, “Item based collabo - rati ve ﬁltering recommend ation algori thms, ” in P r oc. 10th Internationa l Confer ence on the W orld W ide W eb , 2001, pp. 285–295. [16] W . B. Kleijn, A basis for source coding , 2010, KTH lec ture note s. [17] M. Abramowitz and I. A. Stegun, Handbook of Mathematic al Functions . Ne w Y ork: D over Publicat ions, 1965. [18] S. Sra, “A short note on paramete r approximat ion for von Mises- Fisher distrib utions: and a fast implementati on of I s ( x ) , ” Computational Statist ics , vol. 27, no. 1, pp. 177–19 0, 2012. [19] “D ARP A-TIMIT, ” Acoustic-pho netic continuous speec h corpus, NIST Speec h Disc 1. 1-1 , 1990. [20] J. Hamkins and K. Zeger , “Gaussian source coding with spherical codes, ” IEEE T ransaction s on Information Theory , vol . 48, no. 11, pp. 2980–2989, 2002. 5 A P P E N D I X A D I S C U S S I O N A B O U T T H E I N C O N S I S T E N C Y O F L I K E L I H O O D C O M PA R I S O N A N D D - R C O M PA R I S O N This section is o nly for d iscussion a n d will no t a ppear in the ﬁna l sub mission. As we observed b efore, the likelihood obtain ed by DMM is higher than the likelihood ob tained b y VMM. I f we calculate the differential en tr opy o f the trained PDF empirically as h ( x ) = − E [ln f ( x )] ≈ − 1 N N X n =1 ln f ( x n ) , (19) a h igher likeliho od leads to a smaller differential entropy . According to (1 4), this in dicates b etter D-R per forman ce. Howe ver, in our man u script, VMM performs better than GMM, when we app lied the mixture quantizer strategy . Why would this h appen ? In our manuscript, f o r th e CE case, we calcu la te the D-R perfor mance of the mixture mod el by (18). From (18), we have D ( R ) = D i ( R i ) (20) = C ( r, K ) · e − r K ( R i − h i ( x )) , (21) = C ( r, K ) · e − r K [ P I i =1 π i ( R i − h i ( x )) ] , (22) = C ( r, K ) · e − r K ( R − ln I − P I i =1 π i h i ( x ) ) . (23) From (21) to (22), we u sed th e fact tha t R i − h i ( x ) is the same for all i . From (22) to (23), we u sed th e fact tha t R − ln I = P I i =1 R i . Thereafter, we have ln I + I X i =1 π i h i ( x ) (24) ≥ − I X i =1 π i ln π i − I X i =1 π i Z ln f i ( x ) · f i ( x ) d x (25) = − I X i =1 π i Z ln π i · f i ( x ) d x − I X i =1 π i Z ln f i ( x ) · f i ( x ) d x = − I X i =1 π i Z ln [ π i f i ( x )] · f i ( x ) d x (26) ≥ − I X i =1 π i Z ln " I X i =1 π i f i ( x ) # · f i ( x ) d x (27) = − Z ln " I X i =1 π i f i ( x ) # · I X i =1 π i f i ( x ) d x (28) = h ( x ) . (29) This inequ ality ind icates tha t, th e D- R perfor mance cal- culated in the CE c a se (with mixture qua n tizer , (18)) is, in general, not ide ntical to the D-R performan ce c alculated with the wh o le PDF ( (14)). Th e ineq uality in (25) vanishes when all the compon ents h av e the same weigh ts. The eq uality (27) holds if there is no overlapping among the mixture compo nents or we do no t take the mixture mod e lin g (I=1 ). The in equality above introdu ces a systematic gap (a lo ss at the D-R perfor mance). This ga p depend s on the training and the d istribution a ssumption. Therefore, smaller differential entropy for the who le PDF ( h ( x ) ) ca n only g uarantee better D-R p erform ance, if we do not take mixture qu antizer strat- egy . In mixture qua ntizer, it can not gu arantee a better D-R perfor mance.

Statistical Speech Model Description with VMF Mixture Model

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment