Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model
Independent low-rank matrix analysis (ILRMA) is a fast and stable method for blind audio source separation. Conventional ILRMAs assume time-variant (super-)Gaussian source models, which can only represent signals that follow a super-Gaussian distribu…
Authors: Shinichi Mogami, Norihiro Takamune, Daichi Kitamura
Independent Lo w-Rank Matrix Analysis Based on T ime-V ariant Sub-Gaussian Source Model Shinichi Mogami ∗ , Norihiro T akamune ∗ , Daichi Kitamura † , Hiroshi Saruwatari ∗ , Y u T akahashi ‡ , Kazunobu K ondo ‡ , Hiroaki Nakajima ‡ , and Nobutaka Ono § ∗ The Univ ersity of T okyo, T okyo, Japan † National Institute of T echnology , Kagawa Colle ge, Kagaw a, Japan ‡ Y amaha Corporation, Shizuoka, Japan § T okyo Metropolitan Uni versity , T okyo, Japan Abstract —Independent low-rank matrix analysis (ILRMA) is a fast and stable method for blind audio sour ce separation. Con ventional ILRMAs assume time-variant (super -)Gaussian source models, which can only repr esent signals that follow a super -Gaussian distrib ution. In this paper , we f ocus on ILRMA based on a generalized Gaussian distribution (GGD-ILRMA) and propose a new type of GGD-ILRMA that adopts a time- variant sub-Gaussian distribution for the source model. By using a new update scheme called generalized iterative projection f or homogeneous source models, we obtain a con vergence-guaranteed update rule f or demixing spatial parameters. In the experimental evaluation, we show the versatility of the proposed method, i.e., the pr oposed time-variant sub-Gaussian source model can be applied to various types of source signal. I . I N T RO D U C T I O N Blind source separation (BSS) [1]–[8] is a technique for extracting specific sources from an observed multi- channel mixture signal without knowing a priori infor- mation about the mixing system. The most commonly used algorithm for BSS in the (over)determined case (number of microphones ≥ number of sources) is indepen- dent component analysis (ICA) [1]. As a state-of-the-art ICA- based BSS method, Kitamura et al. proposed independent low-rank matrix analysis (ILRMA) [9], [10], which is a uni- fication of independent vector analysis (IV A) [5], [6] and nonnegati ve matrix factorization (NMF) [11]. ILRMA assumes both statistical independence between sources and a lo w-rank time-frequency structure for each source, and the frequency- wise demixing systems are estimated without encountering the permutation problem [3], [4]. ILRMA is a faster and more stable algorithm than multichannel NMF (MNMF) [12]– [14], which is an algorithm for BSS that estimates the mixing system on the basis of spatial cov ariance matrices. The original ILRMA based on Itakura–Saito (IS) div ergence assumes a time-variant isotropic complex Gaussian distribu- tion for the source generati ve model. Hereafter, we refer to the original ILRMA as IS-ILRMA . Recently , various types of source generati ve model ha ve been proposed in ILRMA for ro- bust BSS. In particular , t -ILRMA [15] and GGD-ILRMA [16], [17] have been proposed as generalizations of IS-ILRMA with a complex Student’ s t distribution and a complex general- ized Gaussian distribution (GGD), respectiv ely . In t -ILRMA and GGD-ILRMA, the kurtosis of the generativ e models’ distributions can be parametrically changed along with the degree-of-freedom parameter in Student’ s t distribution and the shape parameter in the GGD. By changing the kurtosis of the distrib utions, we can control how often the source signal outputs outliers or its expected sparsity . In particular , in sub- Gaussian models, i.e., models that follo w distributions with a platykurtic shape, the source signal rarely outputs outliers. Therefore, the sub-Gaussian modeling of sources is expected to accurately estimate the source spectrogram without ignoring its important spectral peaks. Furthermore, many audio sources follow platykurtic distributions; it is kno wn that musical in- strument signals obey sub-Gaussian distributions [18]. Howe ver , neither con ventional t -ILRMA nor GGD-ILRMA assumes that the source model follows a sub-Gaussian distri- bution. Both t -ILRMA and GGD-ILRMA can adopt only a super-Gaussian (or Gaussian) model, i.e., models that follow distributions with a leptokurtic shape, for the source model. In t -ILRMA, this is because the complex Student’ s t distribu- tion becomes only super-Gaussian for any degree-of-freedom parameter . In GGD-ILRMA, on the other hand, it is because the estimation algorithm for the demixing matrix has not yet been deriv ed for a sub-Gaussian case, although the GGD itself can represent a sub-Gaussian distribution depending on its shape parameter . More specifically , the con ventional iterative pr ojection (IP) [7], which is an algorithm that updates the demixing matrix mainly used in IV A and ILRMA, cannot be applied to sub-Gaussian-based GGD-ILRMA owing to mathematical difficulties. In this paper , we propose a new type of ILRMA that assumes time-v ariant sub-Gaussian source models. This paper includes three novelties. First, we construct a new update scheme for the demixing matrix called generalized IP for homogeneous sour ce models (GIP-HSM) . Second, we derive a con ver gence-guaranteed update rule for the demixing matrix in GGD-ILRMA with a shape parameter of four . T o the best of our knowledge, this is the world’ s first attempt to model the source signal with a time-variant sub-Gaussian distribution, and we derive the update rule by applying the abov e-mentioned scheme. Third, we show the v alidity of the proposed sub-Gaussian GGD-ILRMA via BSS experiments on music and speech signals. W e confirm that the proposed method is a versatile approach to source modeling, i.e., the proposed time-v ariant sub-Gaussian model can represent super-Gaussian or Gaussian signals as well as sub-Gaussian signals owing to its time-variant nature, whereas the con ven- tional models can only represent super-Gaussian or Gaussian signals. I I . P RO B L E M F O R M U L A T I O N A. F ormulation of Demixing Model Let N and M be the numbers of sources and channels, respectiv ely . The short-time Fourier transforms (STFTs) of the multichannel source, observed, and estimated signals are defined as s ij = ( s ij 1 , . . . , s ij N ) > ∈ C N , (1) x ij = ( x ij 1 , . . . , x ij M ) > ∈ C M , (2) y ij = ( y ij 1 , . . . , y ij N ) > ∈ C N , (3) where i = 1 , . . . , I ; j = 1 , . . . , J ; n = 1 , . . . , N ; and m = 1 . . . , M are the integral indices of the frequency bins, time frames, sources, and channels, respectiv ely , and > denotes the transpose. W e assume the mixing system x ij = A i s ij , (4) where A i = ( a i 1 , . . . , a iN ) ∈ C M × N is a frequenc y-wise mixing matrix and a in is the steering v ector for the n th source. When M = N and A i is not a singular matrix, the estimated signal y ij can be expressed as y ij = W i x ij , (5) where W i = A − 1 i = ( w i 1 , . . . , w iN ) H is the demixing matrix, w in is the demixing filter for the n th source, and H denotes the Hermitian transpose. ILRMA estimates both W i and y ij from only the observation x ij assuming statistical independence between s ij n and s ij n 0 , where n 6 = n 0 . B. Generative Model and Cost Function in GGD-ILRMA GGD-ILRMA utilizes the isotropic complex GGD. The probability density function of the GGD is p ( z ) = β 2 π r 2 Γ(2 /β ) exp − | z | β r β , (6) where β is the shape parameter , r is the scale parameter, and Γ( · ) is the gamma function. Fig. 1 shows the shapes of the GGD with β = 2 and β = 4 . When β = 2 , (6) corresponds to the probability density function of the complex Gaussian distribution with a mesokurtic shape. In the case of 0 < β < 2 , the distribution becomes super-Gaussian with a leptokurtic shape. In the case of β > 2 , the distribution becomes sub- Gaussian with a platykurtic shape. In GGD-ILRMA, we assume the time-variant isotropic complex GGD as the source generative model, which is (a) GGD with β = 2 . (b) GGD with β = 4 . Fig. 1. Examples of shapes of complex GGD. (a) When β = 2 , shape of GGD corresponds to that of Gaussian distribution. (b) When β = 4 , GGD is platykurtic. independently defined in each time-frequency slot as follo ws: p ( Y n ) = Y i,j p ( y ij n ) = Y i,j β 2 π r ij n 2 Γ(2 /β ) exp − | y ij n | β r ij n β , (7) r ij n p = X k t ikn v kj n , (8) where the local distribution p ( y ij n ) is defined as a circularly symmetric complex Gaussian distribution, i.e., the probability of p ( y ij n ) only depends on the power of the complex value y ij n . r ij n is the time-frequency-v arying scale parameter and p is the domain parameter in NMF modeling. Moreov er , the variables t ikn and v kj n are the elements of the basis matrix T n ∈ R I × K ≥ 0 and the activ ation matrix V n ∈ R K × J ≥ 0 , respectiv ely , where R ≥ 0 denotes the set of nonneg ativ e real numbers. k = 1 , · · · , K is the integral index, and K is set to a much smaller v alue than I and J , which leads to the low-rank approximation. From (7), the negati ve log-likelihood function L GGD of the observed signal x ij can be obtained as follows by assuming independence between sources: L GGD = − 2 J X i log | det W i | + X i,j,n | y ij n | β r ij n β + 2 log r ij n ! , (9) where y ij n = w in x ij and we used the transformation of random variables from x ij to y ij . The cost function of GGD-ILRMA (9) coincides with that of IS-ILRMA when β = p = 2 . By minimizing (9) w .r .t. W i and r ij n under the limitation (8), we estimate the demixing system that maximizes the independence between sources. Fig. 2 shows a conceptual model of GGD-ILRMA. When each of the original sources has a low-rank spectrogram, the spectrogram of their mixture should be more complicated, where the rank of the mixture spectrogram will be greater than that of the source spectrogram. On the basis of this assumption, in GGD-ILRMA, the low-rank constraint for each estimated spectrogram is introduced by employing NMF . The demixing matrix W i is estimated so that the spectrogram of Fig. 2. Principle of source separation in GGD-ILRMA, where ˜ x m and ˜ y n are time-domain signals of x ij n and y ij n , respectively . the estimated signal becomes a low-rank matrix modeled by T n V n , whose rank is at most K . The estimation of W i , T n , and V n can consistently be carried out by minimizing (9) in a fully blind manner . I I I . C O N V E N T I O NA L M E T H O D A. Update Rule for Demixing Matrix In IS-ILRMA, the demixing matrix W i can be efficiently updated by IP , which can be applied only when the cost function is the sum of − log | det W i | and the quadratic form of w in (this corresponds to GGD-ILRMA with β = 2 ). In GGD-ILRMA, the update rule of W i is also deriv ed using the majorization-minimization (MM) algorithm [19]. When 0 < β ≤ 2 , we can use the following inequality of weighted arithmetic and geometric means to design the majorization function: | y ij n | β ≤ β 2 | y ij n | 2 α ij n 2 − β + 1 − β 2 α ij n β , (10) where α ij n is an auxiliary variable and the equality of (10) holds if and only if α ij n = | y ij n | . By applying (10) to (9), the majorization function of (9) can be designed as L GGD ≤ − 2 J X i log | det W i | + J X i,n w H in F in w in + const ., (11) F in = β 2 J X j 1 α ij n 2 − β ( P k t ikn v kj n ) β p x ij x H ij , (12) where the constant term is independent of w in . By applying IP to (11) and substituting the equality condition α ij n = | y ij n | into (12), the update rule for W i is deriv ed as F in = β 2 J X j 1 | y ij n | 2 − β ( P k t ikn v kj n ) β p x ij x H ij , (13) w in ← F − 1 in W − 1 i e n , (14) w in ← w in q 1 / ( w H in F in w in ) . (15) When β = p = 2 , these update rules (13)–(15) coincide with those in IS-ILRMA. Note that the update rules (13)–(15) are valid only when 0 < β ≤ 2 , which is equi valent to the condition that the inequality (10) holds. In fact, when β > 2 , it is thought to be impossible to design a majorization function to which we can apply IP because no quadratic function w .r .t. x can majorize x β . Con ventional GGD-ILRMA achie ves various types of source generativ e model: when β = 2 , the entry of the source spectrogram follows the complex Gaussian distrib ution (the same model as that of IS-ILRMA), and when β < 2 , the entry of the source spectrogram follows the complex leptokurtic distribution. Howe ver , a source generati ve model that follows a platykurtic complex GGD is yet to be achie ved. Since the marginal distribution of the time-variant super-Gaussian or Gaussian model w .r .t. the time frame becomes only super - Gaussian, any signals that follo w sub-Gaussian distributions, such as music signals, cannot be appropriately dealt with by the con ventional GGD-ILRMA. B. Update Rule for Low-Rank Sour ce Model The update rules for T n and V n in IS-ILRMA and GGD- ILRMA can be deriv ed by the MM algorithm, which is a popular approach for NMF . W e obtain the following update rules: t ikn ← t ikn β P j | y ij n | β ( P k 0 t ik 0 n v k 0 j n ) β p +1 v kj n 2 P j 1 P k 0 t ik 0 n v k 0 j n v kj n p β + p , (16) v kj n ← v kj n β P j | y ij n | β ( P k 0 t ik 0 n v k 0 j n ) β p +1 t ikn 2 P j 1 P k 0 t ik 0 n v k 0 j n t ikn p β + p . (17) See Appendix A for their detailed deriv ation. In GGD-ILRMA, the cost function (9) is minimized by alternately repeating the update of the demixing matrix W i using (13)–(15) and the update of the low-rank source models T n and V n using (16) and (17), respecti vely . A monotonic decrease in the cost is guaranteed ov er these update rules. I V . P R O PO S E D M E T H O D A. Motivation The con ventional methods [9], [16] hav e a limitation that the source signal cannot be appropriately represented when the signal follows a sub-Gaussian distrib ution. In this paper, we propose an MM-algorithm-based update rule for GGD- ILRMA to maximize the likelihood based on the sub-Gaussian source model. T o derive the update rule, we also extend the problem of demixing matrix estimation into a more general- ized form and propose its optimization scheme based on GIP- HSM. In contrast to the time-v ariant super-Gaussian or Gaussian model, the marginal distribution of the time-variant sub- Gaussian model w .r .t. the time frame can be sub-Gaussian as well as Gaussian or super-Gaussian, depending on its time variance of the scale parameter r ij n . For example, the time-variant sub-Gaussian model is platykurtic when r ij n is constant w .r .t. the time frame, whereas it becomes mesokurtic or leptokurtic when r ij n fluctuates appropriately . This sho ws that the proposed time-variant sub-Gaussian model covers distributions with a wider range between platykurtic and leptokurtic shapes than other con ventional source models. Therefore, the proposed GGD-ILRMA is expected to have a robust performance against the variation of the target signals. B. Derivation of GIP-HSM The cost functions of IS-ILRMA and GGD-ILRMA are generalized as L = I X i =1 " − 2 log | det W i | + N X n =1 f in ( w in ) # + const ., (18) where the constant term is independent of w in and f in : C N → R is a real-valued function that satisfies the following three conditions: 1) f in ( w ) is differentiable w .r .t. w at an arbitrary point. 2) ∀ c > 0 , w ∈ C N f in ( w ) ≤ c is con ve x (naturally satisfied when f in ( w ) is con vex). 3) ∀ η , f in ( η w ) = η d f in ( w ) , namely , f in is a homoge- neous function of degree d . The term f in ( w in ) is determined by the distribution of the source generativ e model, e.g., f in ( w in ) = (1 /J ) P j | w H in x ij | β /r ij n β in GGD-ILRMA. Here we show that the optimization of (18) w .r .t. w in is composed of “direction optimization” and “scale optimization” for each frequency bin. Let u in be an N -dimensional vector that satisfies f in ( u in ) = 1 . Then, w in can be uniquely repre- sented as w in = η in u in , where η in is a positiv e real value. By regarding f in ( u ) as the norm of u , we can interpret u in as a unit vector w .r .t. the f in -norm. Substituting w in = η in u in into (18), the cost function is represented as L = I X i =1 − 2 log det η 1 u 1 · · · η N u N H + N X n =1 f in ( η in u in ) = I X i =1 " − 2 log Y n η in · | det U i | ! + N X n =1 η in d f in ( u in ) # = I X i =1 " − 2 log | det U i | + N X n =1 − 2 log η in + η in d # , (19) where U i = u i 1 · · · u iN H . Therefore, the minimization of the cost function can be interpreted as the minimization of − log | det U i | for each frequenc y bin and the minimization of − 2 log η in + η in d for each source and frequency bin. These direction optimization and scale optimization problems are independent of each other . The optimal η in can be calculated by a closed form because the deri v ativ e of the cost function w .r .t. η in can be written as d dη in ( − 2 log η in + η in d ) = − 2 η in + dη in d − 1 . (20) Hence, letting the right side of (20) be zero, we can obtain the optimal η in as η in = d p 2 /d. (21) The actual difficulty in the optimization of the demixing matrix is the direction optimization, i.e., the minimization of − 2 log | det U i | . Since minimizing − log x 2 is equi valent to maximizing x 2 , we can reformulate this problem as maximize | det U i | 2 s.t. f in ( u in ) = 1 . (22) Since it is generally dif ficult to solv e this problem by a closed form, we apply an approach called vectorwise coordinate descent. In this algorithm, we focus on u in , namely , the Hermitian transpose of a particular row vector of U i . By cofactor expansion, we can deform the problem (22) as maximize b H in u in 2 s.t. f in ( u in ) = 1 , (23) where b in is a column vector of the adjugate matrix B i = b i 1 · · · b iN H of U i . Since b in only depends on u in 0 ( n 0 6 = n ) and is independent of u in , (23) can be regarded as a function of u in by fixing the other ro w vectors of U i . Using the method of Lagrange multipliers, the stationary condition is b in ( b H in u in ) + λ ∂ f in ∂ u H in ( u in ) = 0 , (24) where λ is a Lagrange multiplier . Since ( b H in u in ) is a scalar , the stationary condition can be rewritten as ∂ f in ∂ u H in ( u in ) k b in , (25) where the binary relation “ x k y ” means that x is parallel to y . In (25), b in is represented in terms of W in as b in = (det U i ) U − 1 i e n = (det U i )(diag( η − 1 i 1 , . . . , η − 1 iN ) W i ) − 1 e n = (det U i ) W − 1 i diag( η i 1 , . . . , η iN ) e n = ( η in det U i ) W − 1 i e n k W − 1 i e n , (26) where diag ( c 1 , . . . , c N ) denotes the N × N diagonal matrix whose ( n, n ) th element is c n , and e n is an N -dimensional vector whose n th element is one and whose other elements are zero. Since f in is con vex, the stationary point of the objecti ve function (23) must also be the optimal point. Therefore, the cost function (19) that includes (22) monotonically decreases with each update of the direction u in . In conclusion, to minimize the cost function (18), we update the vector w in by the follo wing two steps in GIP-HSM. (a) Find a vector w 0 in that satisfies ∂ f in ∂ w H in ( w 0 in ) k W − 1 i e n . (27) (b) Update w in as w in ← w 0 in d q 2 / ( d · f in ( w 0 in )) . (28) The first step (a) and second step (b) correspond to the direction and scale optimizations, respectively . Note that w 0 in calculated in the first step does not need to satisfy f in ( w 0 in ) = 1 because the scale is automatically adjusted in the second step. In fact, if w 0 in is represented as w 0 in = η 0 in u in , the second step results in w in ← η 0 in u in · d q 2 / ( d · η 0 in d ) = u in d p 2 /d, (29) at which point both the direction and the scale are optimized. C. Sub-Gaussian ILRMA Based on GIP-HSM Using GIP-HSM, we propose a new update rule in GGD- ILRMA whose shape parameter β is set to four (time-variant sub-Gaussian model). The cost function of GGD-ILRMA with β = 4 is written as J = − 2 J X i log | det W i | + X i,j,n | w H in x ij | 4 r ij n 4 + const ., (30) where the constant term is independent of w in . It seems possible to apply GIP-HSM by letting f in ( w in ) = (1 /J ) P j | w H in x ij | 4 /r ij n 4 . In this case, howe ver , it is diffi- cult to solve (27), which is reduced to a cubic vector equation w .r .t. w 0 in . Instead, we apply an MM algorithm to deriv e an update rule that does not contain any cubic vector equations. Hereafter , we prove the following theorem, and then design a new type of majorization function of (30) using the theorem. Theor em 1: Let f ( w ) = (1 /J ) P J j =1 ( | w H x j | 4 /r j 4 ) and g ( w ) = ( w H Gw ) 2 , where G is defined in terms of a vector ˜ w as H = 1 r 1 x 1 · · · 1 r J x J , (31) ˜ q = ˜ q 1 · · · ˜ q J > = H H ˜ w , (32) ˜ Q = k ˜ q k 2 − ˜ q 1 ˜ q ∗ 2 · · · − ˜ q 1 ˜ q ∗ J − ˜ q 2 ˜ q ∗ 1 k ˜ q k 2 · · · − ˜ q 2 ˜ q ∗ J . . . . . . . . . . . . − ˜ q J ˜ q ∗ 1 − ˜ q J ˜ q ∗ 2 · · · k ˜ q k 2 , (33) G = 1 q J P j | ˜ q j | 4 H ˜ QH H . (34) Then, g ( w ) satisfies f ( w ) ≤ g ( w ) for arbitrary w and the equality holds when w = ˜ w . Pr oof: Let q = q 1 · · · q J > = H H w . f ( w ) and g ( w ) can be written as f ( w ) = 1 J X j | q j | 4 , (35) g ( w ) = 1 J P j | ˜ q j | 4 ( q H ˜ Qq ) 2 . (36) Then, the objective inequality f ( w ) ≤ g ( w ) holds if and only if X j | q j | 4 X j | ˜ q j | 4 ≤ ( q H ˜ Qq ) 2 . (37) The quadratic form of the right side in (37) can be deformed as q H ˜ Qq = tr( ˜ Qq q H ) = X j | q j | 2 k ˜ q k − X i 6 = j q i q ∗ j ˜ q ∗ i ˜ q j = X j | q j | 2 X j | ˜ q j | 2 − X i 6 = j q i q ∗ j ˜ q ∗ i ˜ q j . (38) Hence, we prov e the following inequality hereafter: X j | q j | 4 X j | ˜ q j | 4 ≤ X j | q j | 2 X j | ˜ q j | 2 − X i 6 = j q i q ∗ j ˜ q ∗ i ˜ q j 2 . (39) Let x 1 = X j | q j | 2 , (40) x 2 = s X i 6 = j | q i | 2 | q j | 2 , (41) y 1 = X j | ˜ q j | 2 , (42) y 2 = s X i 6 = j | ˜ q i | 2 | ˜ q j | 2 . (43) Since x 1 2 − x 2 2 = X j | q j | 4 ≥ 0 , (44) y 1 2 − y 2 2 = X j | ˜ q j | 4 ≥ 0 , (45) and x 1 , x 2 , y 1 , y 2 ≥ 0 , it is obvious that x 1 y 1 − x 2 y 2 ≥ 0 . (46) Furthermore, we obtain the following inequality by applying the Cauchy–Schwarz inequality: x 2 y 2 = v u u t X i 6 = j | q i | 2 | q j | 2 X i 6 = j | ˜ q i | 2 | ˜ q j | 2 ≥ X i 6 = j q i q ∗ j ˜ q ∗ i ˜ q j ≥ X i 6 = j q i q ∗ j ˜ q ∗ i ˜ q j , (47) where we used the fact that X i 6 = j q i q ∗ j ˜ q ∗ i ˜ q j = X i = H H in ˜ w in , (53) ˜ Q in = k ˜ q in k 2 − ˜ q i 1 n ˜ q ∗ i 2 n · · · − ˜ q i 1 n ˜ q ∗ iJ n − ˜ q i 2 n ˜ q ∗ i 1 n k ˜ q in k 2 · · · − ˜ q i 2 n ˜ q ∗ iJ n . . . . . . . . . . . . − ˜ q iJ n ˜ q ∗ i 1 n − ˜ q iJ n ˜ q ∗ i 2 n · · · k ˜ q in k 2 , (54) G in = 1 q J P j | ˜ q ij n | 4 H in ˜ Q in H H in , (55) where ˜ w in is an auxiliary variable and the equality of (51) holds when w in = ˜ w in . Since g in ( w in ) = ( w H in G in w in ) 2 is a dif ferentiable, con ve x, and homogeneous function of w in , we can apply GIP-HSM to minimize J + . The optimal condition for the direction of w in is determined as ∂ g in ∂ w H in ( w 0 in ) = ( w 0 in H G in w 0 in ) G in w 0 in k W − 1 i e n . (56) Since ( w 0 in H G in w 0 in ) is a scalar, one of the solutions of (56) is w 0 in = G − 1 in W − 1 i e n . (57) Substituting ˜ w in = w in into (53)–(55), we obtain the follow- ing update rule for optimizing the direction of w in : H in = 1 r i 1 n x i 1 · · · 1 r iJ n x iJ , (58) q in = q i 1 n · · · q iJ n > = H H in w in , (59) Q in = k q in k 2 − q i 1 n q ∗ i 2 n · · · − q i 1 n q ∗ iJ n − q i 2 n q ∗ i 1 n k q in k 2 · · · − q i 2 n q ∗ iJ n . . . . . . . . . . . . − q iJ n q ∗ i 1 n − q iJ n q ∗ i 2 n · · · k q in k 2 , (60) G in = 1 q J P j | q ij n | 4 H in Q in H H in , (61) w in ← G − 1 in W − 1 i e n . (62) Finally , we operate the following scale optimization by apply- ing (28): q in = q i 1 n · · · q iJ n > = H H in w in , (63) w in ← w in 4 q J / (2 P j | q ij n | 4 ) , (64) which is the scale optimization w .r .t. the f in -norm. In GGD-ILRMA with β = 4 , the demixing matrix W i is updated by (58)–(64), and the low-rank models T n and V n are updated by (16) and (17), respectiv ely . These update rules are deri ved using the MM algorithm and GIP-HSM, thus guaranteeing a monotonic decrease of the cost function (30). V . E X P E R I M E N TA L E V A L U A T I O N A. BSS Experiment on Music Signals W e compared the separation performance of the proposed sub-Gaussian GGD-ILRMA ( β = 4 ) with those of con ven- tional IS-ILRMA [9] and GGD-ILRMA ( β < 2 ) [16]. W e ar- tificially produced monaural dry music sources of four melody parts (melody 1: main melody , melody 2: counter melody , midrange, and bass) using Microsoft GS W av etable Synth, where sev eral musical instruments were chosen to play these melody parts [20], [21]. Six combinations of sources, Music 1–Music 6, were constructed by selecting typical combinations of instruments with dif ferent melody parts. The combinations of dry sources used in this experiment are shown in T able I. T o simulate a reverberant mixture, the observed signals were produced by con voluting the impulse response E2A, which was obtained from the R WCP database [22], shown in Fig. 3. As the ev aluation score, we used the improvement of the signal-to-distortion ratio (SDR) [23], which indicates the ov er- all separation quality . An STFT was performed using a 128- ms-long Hamming window with a 64-ms-long shift. The shape parameter β of the GGD was set to 1, 1.99, 2 in con ventional GGD-ILRMA and 4 in the proposed method, where β = 1 . 99 is the best parameter according to [16]. The other conditions are shown in T able II. Fig. 4 shows the av erage SDR improvements for Music 1–Music 6. The proposed sub-Gaussian GGD-ILRMA on av erage outperforms the other conv entional ILRMAs. This T ABLE I C O MB I NATI O N S O F DRY S OU R C E S Index Source 1 Source 2 Music 1 Fg. (bass) Ob . (melody 1) Music 2 Fg. (bass) Tp. (melody 1) Music 3 Ob . (melody 1) Fl. (melody 2) Music 4 Pf. (midrange) Ob . (melody 1) Music 5 Pf. (midrange) Tp. (melody 1) Music 6 Tp. (melody 1) Fl. (melody 2) Fig. 3. Recording conditions of impulse response E2A ( T 60 = 300 ms ) obtained from R WCP database [22]. result confirms that the proposed sub-Gaussian source model is more appropriate for dealing with music signals than other con ventional (super-)Gaussian source models. B. BSS Experiment on Speech Signals W e also confirmed the separation performance of the pro- posed sub-Gaussian GGD-ILRMA for speech signals, which are less likely to follow a sub-Gaussian distribution than music signals. W e used the monaural dry speech sources from the source separation task in SiSEC2011 [24], Speech 1–Speech 4. An STFT was performed using a 256-ms-long Hamming window with a 128-ms-long shift. The other conditions were the same as those of the music source separation experiment. Fig. 5 shows the av erage SDR impro vements for Speech 1– Speech 4. The proposed GGD-ILRMA on a verage outperforms the other con ventional ILRMAs ev en for speech signals, which are expected to be sparse and follo w super -Gaussian distribu- tions. This sho ws that the proposed time-variant sub-Gaussian model can appropriately model super -Gaussian signals as well as sub-Gaussian signals owing to its time-v ariant property , as described in Sect. IV -A. V I . C O N C L U S I O N W e proposed a new type of ILRMA, which assumes that the source signal follows the time-v ariant isotropic complex sub-Gaussian GGD. By using a new update scheme called GIP-HSM, we obtained a con ver gence-guaranteed update rule for the demixing matrix. Furthermore, in the experimental ev aluation, we revealed the versatility of the proposed method, i.e., the proposed time-variant sub-Gaussian source model can deal with v arious types of source signal, ranging from sub- Gaussian music signals to super-Gaussian speech signals. T ABLE II E X PE R I M EN TA L C O ND I T I ON S FO R MU S I C A N D S P E EC H S OU R C E S E P A R A T I ON Sampling frequency 16 kHz Number of iterations 1000 Number of bases 20 Number of trials 10 Domain parameter p = 0 . 5 Initial demixing matrix W i identity matrix Entries of initial source model matrices T n and V n uniformly distributed random v alues Music 1 Music 2 Music 3 Music 4 Music 5 Music 6 0 2 4 6 8 10 SDR imp rovement [dB] GGD ( β = 1) IS ( β = 2) GGD ( β = 1.99) GGD ( β = 4) Average Fig. 4. A verage SDR improv ement of GGD-ILRMA ( β = 1 ), GGD-ILRMA ( β = 1 . 99 ), IS-ILRMA, and proposed GGD-ILRMA ( β = 4 ) for six music signals. “ A verage” bar graph on right side shows av erage SDR impro vement among six music signals for each method. Sp eech 1 Sp eech 2 Sp eech 3 Sp eech 4 − 2 0 2 4 6 8 10 12 14 SDR imp rovement [dB] GGD ( β = 1) IS ( β = 2) GGD ( β = 1.99) GGD ( β = 4) Average Fig. 5. A verage SDR improv ement of GGD-ILRMA ( β = 1 ), GGD-ILRMA ( β = 1 . 99 ), IS-ILRMA, and proposed GGD-ILRMA ( β = 4 ) for four speech signals. “ A verage” bar graph on right side shows av erage SDR impro vement among four speech signals for each method. A C K N O W L E D G M E N T This work was partly supported by SECOM Science and T echnology Foundation and JSPS KAKENHI Grant Numbers JP17H06572 and JP16H01735. R E F E R E N C E S [1] P . Comon, “Independent component analysis, a new concept?” Signal Pr ocess. , v ol. 36, no. 3, pp. 287–314, 1994. [2] P . Smaragdis, “Blind separation of con volved mixtures in the frequency domain, ” Neurocomputing , vol. 22, no. 1, pp. 21–34, 1998. [3] H. Sa wada, R. Mukai, S. Araki, and S. Makino, “ A rob ust and precise method for solving the permutation problem of frequency-domain blind source separation, ” IEEE T rans. ASLP , vol. 12, no. 5, pp. 530–538, 2004. [4] H. Saruwatari, T . Kawamura, T . Nishikawa, A. Lee, and K. Shikano, “Blind source separation based on a fast-conv ergence algorithm com- bining ICA and beamforming, ” IEEE T rans. ASLP , v ol. 14, no. 2, pp. 666–678, 2006. [5] A. Hiroe, “Solution of permutation problem in frequency domain ICA using multiv ariate probability density functions, ” in Proc. ICA , 2006, pp. 601–608. [6] T . Kim, H. T . Attias, S.-Y . Lee, and T .-W . Lee, “Blind source separation exploiting higher-order frequency dependencies, ” IEEE T rans. ASLP , vol. 15, no. 1, pp. 70–79, 2007. [7] N. Ono, “Stable and fast update rules for independent vector analysis based on auxiliary function technique, ” in Proc. W ASP AA , 2011, pp. 189–192. [8] K. Y atabe and D. Kitamura, “Determined blind source separation via proximal splitting algorithm, ” in Pr oc. ICASSP , 2018, pp. 776–780. [9] D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “De- termined blind source separation unifying independent vector analysis and nonnegativ e matrix factorization, ” IEEE/A CM T rans. ASLP , vol. 24, no. 9, pp. 1626–1641, 2016. [10] D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation with independent low-rank matrix analysis, ” in Audio Source Separation , S. Makino, Ed. Springer, Cham, March 2018, ch. 6, pp. 125–155. [11] D. D. Lee and H. S. Seung, “Learning the parts of objects by non- negati ve matrix factorization, ” Natur e , vol. 401, no. 6755, pp. 788–791, 1999. [12] A. Ozerov and C. F ´ evotte, “Multichannel nonneg ativ e matrix factoriza- tion in conv olutiv e mixtures for audio source separation, ” IEEE T rans. ASLP , vol. 18, no. 3, pp. 550–563, 2010. [13] A. Ozerov , C. F ´ evotte, R. Blouet, and J. L. Durrieu, “Multichannel nonnegati ve tensor factorization with structured constraints for user- guided audio source separation, ” in Pr oc. ICASSP , 2011, pp. 257–260. [14] H. Sawada, H. Kameoka, S. Araki, and N. Ueda, “Multichannel exten- sions of non-negati ve matrix factorization with complex-valued data, ” IEEE T rans. ASLP , vol. 21, no. 5, pp. 971–982, 2013. [15] S. Mogami, D. Kitamura, Y . Mitsui, N. T akamune, H. Saruwatari, and N. Ono, “Independent low-rank matrix analysis based on complex Student’ s t -distribution for blind audio source separation, ” in Proc. MLSP , 2017. [16] D. Kitamura, S. Mogami, Y . Mitsui, N. T akamune, H. Saruwatari, N. Ono, Y . T akahashi, and K. Kondo, “Generalized independent low- rank matrix analysis using heavy-tailed distributions for blind source separation, ” EURASIP J ournal on Advances in Signal Processing , vol. 2018, no. 28, pp. 1–25, 2018. [17] R. Ikeshita and Y . Kaw aguchi, “Independent lo w-rank matrix analysis based on multiv ariate comple x e xponential po wer distribution, ” in Pr oc. ICASSP , 2018, pp. 741–745. [18] G. R. Naik and W . W ang, “ Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions, ” International Journal of Electr onics , vol. 99, no. 10, pp. 1333–1350, 2012. [19] D. R. Hunter and K. Lange, “Quantile regression via an MM algorithm, ” J. Comput. Graph. Stat. , v ol. 9, no. 1, pp. 60–77, 2000. [20] D. Kitamura, H. Saruwatari, H. Kameoka, Y . T akahashi, K. Kondo, and S. Nakamura, “Multichannel signal separation combining direc- tional clustering and nonnegati ve matrix factorization with spectrogram restoration, ” IEEE T rans. ASLP , vol. 23, no. 4, pp. 654–669, 2015. [21] D. Kitamura, “Open dataset: songkitamura, ” http://d- kitamura.net/en/ dataset en.htm. Accessed 27 May 2018. [22] S. Nakamura, K. Hiyane, F . Asano, T . Nishiura, and T . Y amada, “ Acous- tical sound database in real en vironments for sound scene understanding and hands-free speech recognition, ” in Pr oc. LREC , 2000, pp. 965–968. [23] E. Vincent, R. Gribonv al, and C. F ´ evotte, “Performance measurement in blind audio source separation, ” IEEE T rans. ASLP , vol. 14, no. 4, pp. 1462–1469, 2006. [24] S. Araki, F . Nesta, E. V incent, Z. Koldo vsk ´ y, G. Nolte, A. Ziehe, and A. Benichoux, “The 2011 signal separation ev aluation campaign (SiSEC2011): - audio source separation -, ” in Pr oc. L V A/ICA , 2012, pp. 414–422. A P P E N D I X A. Derivation of Update Rule for Low-Rank Sour ce Model The update rules for T n and V n in GGD-ILRMA can be deriv ed by the MM algorithm. In the MM algorithm, we minimize the majorization function instead of the original cost function. T o deriv e the majorization function in GGD-ILRMA, we introduce Jensen’ s inequality X k t ikn v kj n ! − β p ≤ X k φ ij nk t ikn v kj n φ ij nk − β p (65) and the tangent-line inequality log X k t ikn v kj n ≤ 1 ψ ij n X k t ikn v kj n − 1 ! + log ψ ij n , (66) where φ ij nk > 0 and ψ ij n > 0 are auxiliary variables and φ ij nk satisfies P k φ ij nk = 1 . The equalities of (65) and (66) hold if and only if φ ij nk = t ikn v kj n P k 0 t ik 0 n v k 0 j n , (67) ψ ij n = X k t ikn v kj n , (68) respectiv ely . By substituting (8) into (9) and applying (65) and (66) to (9), the majorization function of (9) can be designed as L GGD ≤ X i,j,n,k φ β p +1 ij nk | y ij n | β ( t ikn v kj n ) β p + 2 t ikn v kj n pψ ij n + const ., (69) where the constant term is independent of t ikn and v kj n . By setting the partial deriv ativ es of (69) w .r .t. t ikn and v kj n to zero, we obtain the update rules (16) and (17), respectiv ely .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment