Directional emphasis in ambisonics

We describe an ambisonics enhancement method that increases the signal strength in specified directions at low computational cost. The method can be used in a static setup to emphasize the signal arriving from a particular direction or set of directi…

Authors: W. Bastiaan Kleijn

Directional emphasis in ambisonics
1 Directional emphasis in ambisonics W . Bastiaan Kleijn Abstract —W e describe an ambisonics enhancement method that increases the signal strength in specified directions at low computational cost. The method can be used in a static setup to emphasize the signal arriving from a particular direction or set of directions. It can also be used in an adaptiv e arrangement where it sharpens directionality and reduces the distortion in timbre associated with low-degree ambisonics repr esentations. The emphasis operator has very low computational complexity and can be applied to time-domain as well as time-fr equency ambisonics repr esentations. The operator upscales a low-degree ambisonics r epresentation to a higher degr ee repr esentation. Index T erms —Ambisonics, emphasis, directionality . I . I N T RO D U C T I O N Ambisonics [1]–[5] is a representation for sound fields that can take the form of a series of countably infinite spatial basis functions, each multiplied by a temporal scalar audio signal. Signal acquisition and rate (storage) constraints lead to truncation of the series, limiting the de gree of the ambisonics description. A low ambisonics degree bounds the frequency- dependent radius within which the sound field is described accurately and may make the soundfield unnatural outside this region. This can lead to audible artefacts in a signal rendered from the low-de gree description, e.g., [3]–[11]. W e describe an approach that addresses these issues and additionally facilitates directional emphasis for other purposes. The artefacts of low-degree ambisonics can be explained as follows [11]. Con ventional rendering uses the Moore-Penrose in verse to map the ambisonics representation to loudspeaker signals. Hence it minimizes the energy produced by the virtual or physical loudspeakers subject to the known ambisonics coefficients being correct. The constraint means the sound field is accurate in a sweet zone around the origin. The imposed energy efficiency requires the loudspeaker contributions to the sound-field to add coherently in the sweet zone, but not outside. As the radius of the zone is frequency dependent, a low-pass timbre is heard by a listener at the origin. More-o ver , energy minimization implies the distribution of the acoustic energy ov er many loudspeakers, reducing the directionality of the sound field outside the sweet zone. W e define our objective more carefully . Consider a sound field in a source-free re gion around the origin of our coordinate system. The sound field in this region can be generated by a continuous density of monopole sound sources located on a 2-sphere centered at the origin, e.g., [12]. The temporal source signals of the monopole sources form a two-dimensional (2D) scalar source field on the 2-sphere that forms an alternativ e specification of the sound field. The strengthening of the directionality of the sound field can then be defined as the W . B. Kleijn is with Google LLC, San Francisco and V ictoria Univ ersity of W ellington, email: see http://ecs.victoria.ac.nz/Main/BastiaanKleijn emphasizing of the source field on the 2-sphere. The emphasis operator can be interpreted as an acoustic spotlight . W e aim to implement the emphasis operator directly in the ambisonics representation. Our objective is a low-comple xity operator that applies to both time domain and time-frequency domain representations. In addition to the standard goal of adaptive emphasis of the sound field (strengthening existing directionality), a secondary goal is static emphasis of the sound field (time-inv ariant emphasis operator). W e are not aware of existing systems that provide static emphasis (time-in variant emphasis) of an ambisonics repre- sentation. Adaptive emphasis operators can be classified into two classes. The first class does not change the ambisonics degree. The common max r E emphasis operator [3], [4] minimizes sidelobes resulting from the truncation to a low degree representation [6]. The second class upscales the lo w- degree ambisonics information into a high-degree ambisonics representation [8], [9], [11]. Only the second class facilitates idempotency : the analysis of the rendered sound field returns the original ambisonics description. In addition to the fore- mentioned classes, methods exist that are an integral compo- nent of rendering, e.g., [7], [13], usually restricted to mapping the ambisonics representation into one or two plane waves [5]. Our contribution is an emphasis operator that has two advantages compared to the state-of-the-art operators for idem- potent rendering [8], [9], [11]. First, it can be used in both static and adapti ve emphasis applications (existing methods are aimed at adapti ve emphasis). Second, our operator, which is based on Clebsch-Gordan coef ficients, has low computational complexity . It can be used in the time domain and it raises the ambisonics degree with a matrix multiplication requiring only a handful of multiplies per output sample. T o ensure idempotent adaptive rendering, the operator can incorporate a projection [11] without added computational cost. I I . T H E O RY This section first describes the source field in section II-A, then defines the emphasis operator in section II-B and methods to compute it in section II-C. The discussion is for complex spherical harmonics but extends to the real case. Similarly to, e.g., [14], we write the spherical harmonics as Y m n ( θ , φ ) = ( − 1) m s (2 n + 1) 4 π ( n − | m | )! ( n + | m | )! P | m | n (cos( θ ))e imφ , (1) where the P m n are the associated Le gendre functions, θ ∈ [0 , π ] is ele v ation and φ ∈ [ − π , π ] is the azimuth. The Y m n ( · , · ) are orthonormal on the unit 2-sphere and use the Condon-Shortly phase con vention. The form of (1) implies that Y m ∗ n ( θ , φ ) = Y − m n ( θ , φ ) , simplifying deriv ations. 2 A. Relating the 3D Sound F ield and the 2D Sour ce F ield Our aim in this subsection is to provide the background for deriving an emphasis operator in section II-B. While it is not obvious how to define an emphasis of the sound field directly , it is clear that such an emphasis corresponds to a sharpening of the source field on the 2-sphere defined in section I. W e follow an approach used earlier in [12] and [15]. While the approach is illustrated in the frequency domain, the same reasoning holds in the time domain. W e consider an internal sound field expansion of the form p ( r , θ , φ, k ) = ∞ X n =0 n X m = − n B m n ( k ) j n ( k r ) Y m n ( θ , φ ) , (2) where p ( · ) is pressure, r is radius, j n ( · ) is the spherical Bessel function, B m n are the ambisonics coefficients and k = ω c is the wa venumber ( ω is angular frequency and c is soundspeed). Let us assume the sound field to be generated by the source field µ ( θ, φ, k ) on a sphere of radius r 0 : p ( r , θ , φ, k ) = Z d Ω µ ( θ 0 , φ 0 , k ) G ( x, x 0 , k ) sin( θ 0 ) r 0 2 dθ 0 dφ 0 , (3) where G ( x, x 0 , k ) is a Green’ s function and x = ( r, θ , φ ) . The Green’ s function G ( x, x 0 , k ) can be written as G ( x,x 0 , k ) = e − j k k x − x 0 k 4 π k x − x 0 k = ∞ X n =0 n X m = − n ( − j ) k h (2) n ( k r 0 ) j n ( k r ) Y − m n ( θ 0 , φ 0 ) Y m n ( θ , φ ) for r 0 ≥ r (4) where h (2) n is the spherical Hankel function of the second kind. Let us define the source field at radius r 0 by a discrete sequence of spherical harmonics coefficients: µ ( θ 0 , φ 0 , k ) = ∞ X n =0 n X m = − n γ m n ( k ) Y m n ( θ 0 , φ 0 ) . (5) Integrating µ ( θ 0 , φ 0 , k ) G ( x, x 0 , k ) over the 2-sphere of ra- dius r 0 , using orthogonality of the spherical harmonics, we obtain an expression for p ( r, θ , φ, k ) in terms of γ m n ( k ) that facilitates mode matching. This relates the sound field (2) with the source field on the 2-sphere (5): γ m n ( k ) = B m n ( k ) r 0 j − n e j kr 0 , k r 0 → ∞ (6) where we used the asymptotic beha vior of h (2) n [14], [16]: lim kr →∞ h (2) n ( k r ) = j ( n +1) e − j kr kr . (6) is the main result of this section. It shows that empha- sizing the source field on the sphere does not correspond to a straight emphasizing of the sound field p ( r, θ , φ, k ) . The source field (5) in the frequency domain is µ ( θ , φ, k ) = r 0 e j kr 0 ∞ X n =0 n X m = − n g n B m n ( k ) Y m n ( θ , φ ) , k r 0 → ∞ , (7) where we defined, for later con venience, g n = ( − j ) n . Except for a radius-dependent scaling, the vector g n provides the mapping from the ambisonics coef ficients to the spherical harmonics representation of the source field. As we also aim to derive time-domain emphasis opera- tors, we note that by applying the inv erse Fourier transform 1 2 π R · e j ωt dω (7) can also be written in the time domain. B. Emphasizing the Angular Dependency of a Signal Our objectiv e in this section is to enable us to emphasize a particular direction with low computational complexity . That is, we aim to emphasize (“sharpen”) the angular dependency of the source field µ associated with the sound field. T o reduce computational requirements we aim to find expressions for the ambisonics coefficients B n m ( · ) of the sharpened sound field without explicitly calculating the source field µ . Our emphasis operator uses two time scales. The first time scale resolves the temporal behavior of the monaural source signals and is characterized by their bandwidth. The second time scale captures the rate of change of the parameters of the emphasis operator . For a time-inv ariant emphasis in a particular direction, these parameters do not change in time. More commonly , the second time scale resolves the changes in the spatial arrangement and loudness of the sound sources. Frequenc y domain implementations in practice use time-frequency transforms. Hence both frequency and time domain implementations can accomodate time-dependencies on the second time scale. W e first define the emphasis operator for the source field µ ( θ , φ, k ) . W e start with a suitable function v ( θ , φ, k ) : [0 , π ] × [0 , 2 π ] → [0 , ∞ ) that is real and, ideally , nonnegati ve and can be used to emphasize the source field µ ( · , · , k ) over the 2- sphere. The emphasis operation is then ˜ µ ( θ , φ, k ) = v ( θ, φ, k ) µ ( θ , φ, k ) . (8) For the pressure p , the emphasis operation is not a multipli- cation. W e define ν as a general emphasis operator that also applies to pressure and is the multiplication with v (8) in the source-field domain. W e simplify our notation by introducing a single index for the spherical harmonics and omitting function arguments where that is not ambiguous. Let ˜ Q be the degree of the ambisonics expansion for µ . W e define Q = ( ˜ Q + 1) 2 . Assuming that the source field µ is of finite degree we hav e µ = Q − 1 X q =0 γ q Y q . (9) W e choose v to be of degree ˜ L and define L = ( ˜ L + 1) 2 : v = L − 1 X l =0 V l Y l . (10) Note that the finite degree ˜ L pre vents strict nonnegativity . Exploiting that the spherical harmonics form a basis of the 2-sphere, we can write each multiplication of pairs of spherical harmonics as a weighted sum of spherical har- monics. Let Y ( Q ) ( θ , φ ) be the Q -dimensional column vec- tor [ Y 0 ( θ , φ ) , Y 1 ( θ , φ ) , · · · , Y Q − 1 ( θ , φ ) , ] T . Let us denote the Kronecker product with ⊗ . W e furthermore use that the 3 multiplications of two spherical harmonics of degree L and Q can be written as a weighted sum of spherical harmonics with degree less or equal to Q + L . Thus, we can write Y ( Q ) ⊗ Y ( L ) = C Y ( P ) , (11) where C ∈ R QL × P with ˜ P = ˜ Q + ˜ L and P = ( ˜ P + 1) 2 is a real, non-square matrix with Clebsch-Gordan coef ficients as elements. The matrix C depends only on the degree of the ambisonics representation and on the degree of the emphasis operator v . Hence it can generally be computed off-line. The standard formula for the multiplication of spherical harmonics shows that the matrix C is sparse and this can be exploited. Howe ver , as will be shown below , for static or slowly varying emphasis (static or slowly varying emphasis operator), optimal computational efficienc y can be obtained without consideration of the sparsity of C . W e can use standard formulas for the Clebsch-Gordan coefficients to compute C . Howe ver , giv en that relation (11) exists, we can use it to compute the matrix of Clebsch- Gordan coefficients ( C ) by creating a set of linear equations corresponding to a set of random (or selected) angles. The emphasized source field ˜ µ can be written in terms of the spherical harmonics expansions for µ and v : ˜ µ = ν µ = v µ = Q − 1 X q =0 L − 1 X l =0 γ q V l Y q Y l (12) = P − 1 X i =0 Y i ( C T ( γ ( Q ) ⊗ V ( L ) )) i , (13) where we used (11). W e now ha ve the ambisonics expansion of ˜ µ in terms of ambisonics expansions for µ and v . Let ◦ be the Hadamard (element-wise) product and g ( P ) = [ g n (0) , · · · g n ( P − 1) ] T , where with some ab use of notation, n ( i ) = b √ i c is the degree n in Y k n = Y i . It then follo ws from (6) that (13) implies that the emphasis operator for the ambisonics coef ficients of a pressure field satisfies g ( P ) ◦ ˜ B ( P ) = C T  ( g ( Q ) ◦ B ( Q ) ) ⊗ V ( L )  , (14) which specifies the degree-e xpanded ambisonics representa- tion of the sound field (2) after emphasis. Next, we discuss the efficient computation of (14). W e will show that if the emphasis operator V is time-in variant, then (14) can be computed with one P × Q matrix multiply per sample, requiring P Q multiplies to compute all output channels. Thus, for a degree-1 ambisonics representation, ˜ Q = 1 , and a degree-2 emphasis operator , ˜ L = 2 , only four multiplies per output channel are required. One approach to obtaining high computational ef ficiency for computing (14) is to exploit that C and V are fixed or slowly varying. Let 1 ( Q ) = [1 , · · · , 1] T be a Q -dimensional vector of ones. Some algebra leads to g ( P ) ◦ ˜ B ( P ) = ¯ C T (( g ( Q ) ◦ B ( Q ) ) ⊗ 1 ( L ) ) , (15) where we wrote ¯ C T = C T ◦ (1 ( P ) ( V ( L ) ⊗ 1 ( Q ) ) T ) , which is a matrix that retains the dimensionality of C T . Finally we note that we can define a matrix A ( QL ) = I ( Q ) ⊗ 1 ( L ) , where I ( Q ) is the identity matrix with Q rows and columns, such that B ( Q ) ⊗ 1 ( L ) = A ( LQ ) B ( Q ) . Thus, we can write ˜ B ( P ) =  diag − 1 ( g ( P ) ) ¯ C T A ( LQ ) diag( g ( Q ) )  B ( Q ) . (16) As  diag − 1 ( g ( P ) ) ¯ C T A ( LQ ) diag( g ( Q ) )  ∈ C P × Q is a time-in variant matrix for a fixed emphasis operator and slowly time-varying for an adapti ve emphasis operator , it be computed off-line or at a slo w update rate. Thus, we ha ve pro ven that P Q multiplies for each sample suffice for the emphasis operator . The usage of (14) without further modification is also relev ant, as it facilitates rapid adaptation of the emphasis operator ν . In this second approach, the emphasis operator is composed of two components: i ) a Kronecker product operation, which is an unrolled outer product of a Q × 1 signal vector and an L × 1 emphasis vector , follo wed by ii ) a P × QL matrix multiply . While the size of the matrix C T is larger than that of ¯ C T A ( LQ ) in (16), it is a sparse matrix. From the explicit formula for the Clebsch-Gordan series for product of two spherical harmonics formulas it follows that the number of multiplies is also for this case P Q . While the formulation (14) is less conv eniently structured, the fact that it has no computational overhead may make it more attractive for scenarios that require rapid updates. For both approaches discussed, the emphasis operation can be performed in the time domain or in the time-frequency do- main. The domains result in different outcomes. The methods apply to real and complex spherical harmonics expansions. C. An Adaptive Emphasis Operator The emphasis operator can be used to place an acoustic spotlight on a particular direction, in the ambisonics domain. For a time-in variant and source-independent emphasis the tools defined in section II-B suffice. Howe ver , a natural application of the emphasis operator is to emphasize an existing source- field power distribution ov er directions. This section discusses how to find such an adaptive emphasis operator ν . In most applications, the required adaptation rate is low making both emphasis approaches of section II-B rele vant. Considering the pressure p as a stochastic process, a design for v ( θ, φ, k ) with the desired emphasis result is: v ( θ, φ, k ) = β E[ | µ ( θ , φ, k ) | α ] , (17) where E is ensemble expectation, β is a normalization and α > 0 is a real constant. A time-domain representation can also be used. As the time-domain representation av erages ov er frequencies, the results are not the same. Even integer v alues for α result in tractable expressions for (17). W e illustrate the case α = 2 . Emphasis strengths can be varied by repeating the procedure and by using a lower degree ambisonics representation as basis. T o ev aluate (17) for α = 2 , we first rewrite the complex conjugate of the source field as µ ∗ ( θ , φ, k ) = r 0 e − j kr 0 ∞ X n =0 n X m = − n j n ˘ B m n ( k ) Y m n ( θ , φ ) , k r 0 → ∞ . (18) where we used Y m ∗ n = Y − m n and defined ˘ B m n ( k ) = B − m ∗ n ( k ) . 4 Next, we again simplify the notation and write (7) using a single index and without the function arguments: µ ( θ , φ, k ) = r 0 e j kr 0 ∞ X q =0 g n ( q ) B q Y q , k r 0 → ∞ , (19) where, with some abuse of notation, we write n as a function of q . Based on (18) and (19) we can rewrite (17) as v ( θ, φ, k ) = β r 0 2 ∞ X l =0 ∞ X q =0 g n ( l ) g ∗ n ( q ) E[ B l ˘ B q ] Y l Y q . (20) The same form is also obtained for the time-domain case. W e rewrite (20) as an expansion in spherical harmonics rather than products of spherical harmonics. W e assume that the original source field µ is a degree- ˜ Q source field and use Q = ( ˜ Q + 1) 2 . Selecting the normalization β = β 0 r 2 : v ( θ, φ, k ) β 0 = 1 r 0 2 E[ | µ ( θ , φ, k ) | 2 ] = P − 1 X i =0 Y i ( C T E[( g ( Q ) ◦ B ( Q ) ) ⊗ ( g ( Q ) ∗ ◦ ˘ B ( Q ) ])]) i (21) where P = (2 ˜ Q + 1) 2 , and C is of the form of (11). (21) provides the adaptiv e emphasis operator in the desired form of an expansion in the spherical harmonics Y i . In a practical application the expectation must be approx- imated. It is natural to assume ergodicity for the signals and approximate the expectation operator with an averaging over time in each bin of a time-frequency representation and simply ov er time in a time-domain representation. The expectation contrib utes most to the computational effort for (21). Howe ver , the av eraging can be undersampled to satisfy any computational complexity constraint. The remain- ing computations in (21) are done at the update rate of the emphasis operator , which typically is low . Hence these remaining computations normally do not play a significant role in the computational complexity of finding v . The sparsity of C can be exploited to minimize computational effort. In general, the emphasis operator changes the sound field also in the sweet zone where the sound field computed from the unemphasized representation is accurate. In the adaptiv e case, emphasis in this region is usually undesirable. The problem can be removed by using a projection onto the nearest solution for which the sweet zone is unchanged [11]. Because of the orthogonality of the spherical harmonics, the projection can be implemented by overwriting the low-de gree ambisonics coefficients with the corresponding original coefficients and requires no additional computational effort. I I I . R E S U L T S The aim of this letter is to show that static and adaptive directional emphasis can be implemented at negligible compu- tational complexity in the ambisonics domain. For perceptual experiments that show the benefit of directional emphasis we refer to other work: [8], [9] and in particular [11], which implements adaptiv e emphasis in the source-field domain. Fig. 1. Mean absolute source field on the 2-sphere for degree-2 signals (top), a degree-4 static emphasis operator (middle left) and an adaptiv e degree- 8 emphasis operator (middle right), and the resulting emphasized signals (bottom). The color bars in the bottom figures show color linearly proportional to distance along the vertical axis, increasing upward. In this section, we illustrate the operation of the static and adaptiv e emphasis operator . All computations were performed in the spherical harmonic domain with the methods of section II using complex spherical harmonics. For illustration only , the results were conv erted to the shown densities on the 2-sphere. Fig. 1 sho ws mean source fields and the enhancement operator v (the acoustic emphasis operator) on the 2-sphere for simulated sound fields. On the left is a de gree-2 signal enhanced by a static degree-4 ambisonics acoustic emphasis operator . Only the signal highlighted by the emphasis operator is clearly audible. On the right we show the behavior of an adaptiv e acoustic emphasis operator , for a degree-2 signal en- hanced by a degree-8 ambisonics adaptiv e emphasis operator ( α = 4 ). As expected, negati ve values for v in the source domain were small and away from the high-intensity areas. Their significance reduces further with increasing emphasis operator degree, and increasing emphasis operator smoothness. I V . C O N C L U S I O N Practical implementations of ambisonics truncate its series representation of the soundfield because of constraints on es- timation and bit rate. For standard rendering, the consequence of the truncation is that the timbre and directionality of the acoustic scenario, as percei ved by the listener , are distorted. A strengthening of the directionality of the ambisonics repre- sentation can address these problems [8], [9], [11]. W e hav e sho wn that it is possible to define an emphasis operator that strengthens the directionality of the sound field at negligible computational cost by using Clebsch-Gordan co- efficients. In contrast to existing idempotent methods [8], [9], [11], the procedure is attractiv e for real-time implementation and is particularly suitable for rendering ov er headsets. More- ov er it facilitates a static emphasis. The new method can be applied to time domain or time- frequency domain ambisonics representations. It can be used for representations based on real and complex spherical har- monics (only the latter was illustrated). 5 R E F E R E N C E S [1] M. Gerzon, “Periphony: With-height sound reproduction, ” J. Audio Eng. Soc. , vol. 21, p. 210, Jan/Feb. 1973. [2] M. A. Gerzon, “ Ambisonics in multichannel broadcasting and video, ” J . Audio Eng. Soc. , vol. 33, p. 859871, Nov . 1985. [3] J. Daniel, J.-B. Rault, and J.-D. Polack, “ Ambisonics encoding of other audio formats for multiple listening conditions, ” in Audio Engineering Society Convention 105 . Audio Engineering Society , 1998. [4] J. Daniel, S. Moreau, and R. Nicol, “Further inv estigations of high-order ambisonics and wavefield synthesis for holophonic sound imaging, ” in Audio Engineering Society Con vention 114 . Audio Engineering Society , 2003. [5] M. Frank, F . Zotter , and A. Sontacchi, “Producing 3D audio in ambisonics, ” in Audio Engineering Society Conference: 57th International Conference: The Futur e of A udio Entertainment T ec hnology Cinema, T ele vision and the Internet , Mar 2015. [Online]. A vailable: http://www .aes.org/e- lib/bro wse.cfm?elib=17605 [6] F . Zotter and M. Frank, “ All-round ambisonic panning and decoding, ” Journal of the audio engineering society , vol. 60, no. 10, pp. 807–820, 2012. [7] S. Berge and N. Barrett, “High angular resolution planewa ve expan- sion, ” in Pr oc. of the 2nd International Symposium on Ambisonics and Spherical Acoustics May , 2010, pp. 6–7. [8] A. W abnitz, N. Epain, A. McEwan, and C. Jin, “Upscaling ambisonic sound scenes using compressed sensing techniques, ” in 2011 IEEE W orkshop on Applications of Signal Pr ocessing to Audio and Acoustics (W ASP AA) , Oct 2011, pp. 1–4. [9] A. W abnitz, N. Epain, and C. T . Jin, “ A frequency-domain algorithm to upscale ambisonic sound scenes, ” in 2012 IEEE International Con- fer ence on Acoustics, Speech and Signal Processing (ICASSP) , March 2012, pp. 385–388. [10] A. J. Heller, E. Benjamin, and R. Lee, “ A toolkit for the design of ambisonic decoders, ” in Linux Audio Conference , 2012, pp. 1–12. [11] W . B. Kleijn, A. Allen, J. Skoglund, and F . Lim, “Incoherent idempotent ambisonics rendering, ” in Applications of Signal Pr ocessing to Audio and Acoustics (W ASP AA), 2017 IEEE W orkshop on . IEEE, 2017, pp. 209–213. [12] M. A. Poletti, “Three-dimensional surround sound systems based on spherical harmonics, ” Journal of the Audio Engineering Society , vol. 53, no. 11, pp. 1004–1025, 2005. [13] V . Pulkki, “Spatial sound reproduction with directional audio coding, ” Journal of the Audio Engineering Society , vol. 55, no. 6, pp. 503–516, 2007. [14] J. Ahrens, Analytic Methods of Sound Field Synthesis . Heidelberg: Springer , 2012. [15] Y . J. W u and T . D. Abhayapala, “Theory and design of soundfield reproduction using continuous loudspeaker concept, ” IEEE T ransactions on Audio, Speech, and Language Processing , vol. 17, no. 1, pp. 107– 116, 2009. [16] E. G. Williams, F ourier acoustics: sound radiation and nearfield acous- tical holography . Academic press, 1999.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment