Improved multipath time delay estimation using cepstrum subtraction

IMPR O VED MUL TIP A TH TIME DELA Y ESTIMA TION USING CEPSTR UM SUBTRA CTION Eric L. F er guson ∗ , Stefan B. W illiams Australian Centre for Field Robotics The Uni versity of Sydne y , Australia Craig T . Jin Computing and Audio Research Laboratory The Uni versity of Sydne y , Australia ABSTRA CT When a motor-po wered vessel travels past a ﬁxed hydrophone in a multipath en vironment, a Llo yd’ s mirror constructi ve/destructi ve in- terference pattern is observed in the output spectrogram. The power cepstrum detects the periodic struct ure of the Llo yd’ s mirror pattern by generating a sequence of pulses (rahmonics) located at the funda- mental quefrency (periodic time) and its multiples. This sequence is referred to here as the ‘rahmonic component’ of the po wer cepstrum. The fundamental quefrency , which is the reciprocal of the frequency difference between adjacent interference fringes, equates to the mul- tipath time delay . The other component of the power cepstrum is the non-rahmonic (extraneous) component, which combines with the rahmonic component to form the (total) po wer cepstrum. A data processing technique, termed ‘cepstrum subtraction’, is described. This technique suppresses the extraneous component of the po wer cepstrum, leaving the rahmonic component that contains the desired multipath time delay information. This technique is applied to real acoustic recordings of motor-vessel transits in a shallow water en vi- ronment, where the broadband noise radiated by the vessel arri ves at the hydrophone via a direct ray path and a time-delayed multipath. The results show that cepstrum subtraction improv es multipath time delay estimation by a factor of two for the at-sea e xperiment. Index T erms — time delay estimation, underwater acoustics, cepstrum, source localization, autocorrelation 1. INTR ODUCTION In underwater acoustics, the Lloyd’ s mirror ef fect [1, 2] is observed as an interference pattern in the spectrogram (variation with time of the spectrum) of a receiving hydrophone. It occurs when the sinusoidal signal components that compose a source of broadband radiated noise arri ve at the recei ver via the direct propagation path and combine with amplitude-scaled time-delayed replicas arriving via indirect propagation paths (multipaths). If the two signal arriv als are in phase , they reinforce each other through coherent addition re- sulting in constructive interference; when the y are in antiphase , they cancel and destructive interference occurs. As a result, the receiv ed intensities of a source’ s spectral lines are observ ed to undergo a pe- riodic, or c yclic, variation with range [3]. The resultant intensity is giv en by A 2 r = A 2 d + A 2 i + 2 A d A i cos( κδ l ) , (1) where A d and A i are the amplitudes of the respecti ve direct path and indirect path arri vals, κ = 2 π /λ ( λ denotes w av elength) is the spa- tial frequenc y (or w avenumber) and δ l is the path length dif ference ∗ The author gratefully acknowledges the contributions provided by the IEEE Oceanic Society Scholarships, the University of Sydney Electrical and Information Engineering PhD Scholarship, and the Defence Science and T echnology Group Postgraduate Research Scholarship. between the tw o paths; the last term in (1) is oscillatory due to inter - ference between the direct path and multipath arriv als, which gi ves rise to a Lloyd’ s mirror interference pattern [4]. Destructive interfer- ence fringes occur when κδl = (2 k − 1) π , k = 1 , 2 , 3 , ..., where k is the fringe order (number). For adjacent destructive interference fringes ( κ k +1 − κ k ) δ l = 2 π . (2) Hence from (2), at any giv en instant of time, the destructi ve interfer- ence fringes ha ve a uniform spacing in the direction of the frequency axis which is giv en by ( f k +1 − f k ) = c δ l = 1 τ i,d , (3) where c is the isospeed of sound trav el in the underwater medium and τ i,d is the time dif ference in the arriv al times of the signal prop- agating via the respectiv e indirect and direct paths. Note that the temporal frequency difference f k +1 − f k equates to the reciprocal of the multipath time delay . Fundamentally , the cepstrum is the spectrum of a logarithmic spectrum and it is used in practice to detect periodicity in a fr equency spectrum . The quefr ency is a measure of the periodicity of the spec- tral ripple and it has the units of (periodic) time. In speech analysis, periodicity in the voice spectrum is observed as a regular spacing in frequency of the fundamental voice frequency (pitch) and its har- monics [5, 6, 7]. If the spectrum of the logarithm of the speech power spectrum is computed, a peak will appear corresponding to the pitch period of the voiced-speech se gment being analysed [6]. For the present case of a transit of a broadband source past an acoustic sensor in an underwater acoustic multipath propagation en- vironment, periodic structure is observed in the output spectrogram as an ordered arrangement of Lloyd’ s mirror interference fringes which form a regular periodic pattern. The quefrency can be con- sidered as the periodic time associated with a series of interference fringes that are uniformly spaced in frequency . The po wer cepstrum is generated by taking the in verse F ourier transform of the logarithm of the power spectrum [8]. The ﬁrst rahmonic is the fundamental period, which is equal to the reciprocal of the frequency interv al be- tween any two adjacent destructive interference fringes (or , equally , between any two adjacent constructi ve interference fringes). Hence, cepstrum analysis is suited to detecting and quantifying periodicity in a spectrogram. It is also suited to multipath time delay estima- tion [9, 10]. The original contribution of this research is a novel data pro- cessing technique that suppresses cepstrum components extraneous to the rahmonics, thereby improving the estimation of the multipath time delay for broadband signals arri ving at a recei ving hydrophone. This improvement in time delay estimation is demonstrated using real data collected during an at-sea experiment. 2. UNDER W A TER A COUSTIC SENSOR OUTPUT MODEL 2.1. Dir ect path signal combined with indirect path signal For the present case, the signal of interest is continuous broadband acoustic noise radiated by a small motor-po wered boat underway . The output of the hydrophone, which is located above the seaﬂoor , is modelled as a combination of: (a) an underwater acoustic signal propagating directly from the source to the sensor s ( t ) , where t de- notes time, and (b) an amplitude-scaled time-delayed replica propa- gating via an indirect path that includes a seaﬂoor boundary reﬂec- tion αs ( t − τ β ) , where α is the attenuation (or boundary reﬂection) coefﬁcient ( 0 < α ≤ 1 ), and τ β is the time delay . The combination of the signal with a single echo can be represented as [7] x ( t ) = s ( t ) + αs ( t − τ β ) . (4) The time delay between the indirect propagation path and the direct propagation path results in a phase difference between the di- rect path and multipath signals equivalent to 2 π f τ β , where f is the signal frequenc y . Assuming that the transmission loss is the same for each propagation path, the instantaneous po wer of the resultant signal is giv en by [7]   X ( f )   2 =   S ( f )   2  1 + α 2 + 2 α cos(2 π f τ β )  , (5) where the power spectrum of the direct path signal   S ( f ) 2   is mod- ulated by a periodic function of the frequency . T aking the logarithm of the output po wer spectrum con verts the pr oduct in (5) to a sum of two components, i.e. log   X ( f )   2 = log   S ( f )   2 + log  1 + α 2 + 2 α cos(2 π f τ β )  , (6) where the additiv e periodic component has a period determined by the multipath delay τ β . 2.2. P ower cepstrum For the periodic structure of a Lloyd’ s mirror interference pattern, the quefrency (or periodic time) of the ﬁrst harmonic is the recipro- cal of the uniform frequency difference between adjacent destructi ve interference fringe minima. F or the present case of Lloyd’ s mirror interference pattern formation resulting from the combination of the direct and indirect propagation path signals at the sensor , the que- frency of the ﬁrst rahmonic is the multipath time delay . In other words, cepstrum analysis of the hydrophone’ s noisy signal output provides an estimate of the multipath time delay τ β , which is the difference in the arriv al times of the direct path signal and the in- direct path signal at the hydrophone. The deﬁnition of the power cepstrum C X X ( τ ) can be expressed mathematically as [8] C X X ( τ ) = F − 1  log   X ( f )   2  , (7) where τ is the quefrenc y (or delay time), F − 1 denotes the in v erse Fourier transform and   X ( f )   2 is equivalent the two-sided power spectrum of a receiv ed signal x ( t ) . In the present case, the power cepstrum is giv en by the in verse Fourier transform F − 1 of (6), which can be written as C ( τ ) = F − 1  log   S ( f )   2  + ∞ X n =1 a n δ ( τ − nτ β ) . (8) where a n is the strength of the n th rahmonic and it is giv en by a n = 2 n ( − 1) n +1 α n for n = 1 , 2 , 3 , ... (9) The rahmonic strengths are observed to alternate in sign and de- crease as the reciprocal of the rahmonic number n in the same way as the Mercator series. Hence, the power cepstrum of the recei ved signal consists of the power cepstrum of the direct path signal S ( f ) and a sequence of impulse functions (or rahmonics) of strength a n located at quefrency τ β and its multiples. Figure 1(a) shows the com- puted power cepstrum C ( τ ) when a broadband direct-path signal is combined with a time-delayed replica ( α = 1 , τ β = 224 µ s). Fig- ure 1(b) and Figure 1(c) sho w the variation with quefrency of the re- spectiv e po wer cepstrum components; the non-rahmonic component which is given by C 1 ( τ ) = F − 1  log   S ( f )   2  and the rahmonic impulse component which is given by C 2 ( τ ) = P ∞ n =1 a n δ ( τ − nτ β ) . Figure 1(d) sho ws the autocorrelation as a function of positiv e lag with the maximum v alue of the echo’ s sinc function correspond- ing to a lag (i.e. the echo time delay) of 224 µ s. In practice, additiv e noise biases the autocorrelation function and may obscure the peaks corresponding to propagation time delay difference [11]. Also, when there is more that one indirect propagation path, the identiﬁcation of individual propagation paths using autocorrelation analysis becomes more difﬁcult, and so cepstrum analysis is preferred because each propagation path has a series of more readily identiﬁable cepstrum peaks located at the rahmonic quefrencies that characterize a speciﬁc propagation path [11]. 2.3. Cepstrum subtraction A novel data processing technique is presented that suppresses cep- strum components extraneous to the rahmonics. From (8), the po wer cepstrum of the received signal C ( τ ) consists of the non-rahmonic components C 1 ( τ ) deriv ed from stationary process S ( f ) (which re- mains stationary after in verse Fourier transformation [12]), and a se- quence of rahmonic impulse functions C 2 ( τ ) for another (indepen- dent) stationary process [13]. Note that C 1 ( τ ) is independent of the time delay variable τ β , whereas C 2 ( τ ) is dependent with impulses located at quefrency τ β and its multiples. The ensemble average of the power cepstrum ov er M realiza- tions may be giv en by 1 M M X m =1 C m ( k ) = 1 M M X m =1 C 1 ,m ( k ) + 1 M M X m =1 C 2 ,m ( k ) , (10) where C 1 ,m ( k ) and C 2 ,m ( k ) are the m th realizations at discrete que- frency k for the non-rahmonic and rahmonic components, respec- tiv ely . For τ β 6 = 0 and assuming τ β is independent of realization m , then the expected v alue of C 2 ( k ) at any quefrenc y is zero, i.e. lim M →∞ 1 M M X m =1 C 2 ,m ( k ) = 0 . (11) Therefore, the non-rahmonic component C 1 ( k ) can be estimated by taking the average of the received power cepstrum ¯ C 1 ( k ) over a suf- ﬁciently long time (i.e. for a sufﬁciently lar ge M ), where ¯ C 1 ( k ) = 1 M M X l =1 C m ( k ) . (12) Note that the long time-av eraged rahmonic contrib ution at quefrenc y k is negligible for a suf ﬁciently lar ge M . This approach is consistent with other ensemble av eraging methods [14]. The rahmonic component for realization m can be estimated by subtracting the estimate of the non-rahmonic component from the 1 0 1 2 ( a ) 1 0 1 2 ( b ) 0 200 400 6 0 0 800 1000 1 0 1 2 ( c ) 0 200 400 600 800 1000 0 . 5 0 . 0 0 . 5 1 . 0 ( d ) Autocor r elation R xx ( τ ) Magnitu de C 2 ( τ ) Magnitu de C 1 ( τ ) Magnitu de C( τ ) 1200 1200 Fig. 1 . (a) V ariation with quefrency of the power cepstrum C ( τ ) of a direct-path signal combined with an indirect-path echo α = 1 , τ β = 224 µ s. (b) V ariation with quefrency of the power cepstrum component C 1 ( τ ) . (c) Similar to (b), but for the rahmonic impulse component C 2 ( τ ) . (d) V ariation with lag of the autocorrelation func- tion for the direct path signal combined with the multipath echo. receiv ed power cepstrum. The cepstrum subtraction process is de- scribed mathematically by , ˆ C 2 ,m ( k ) = C m ( k ) − a ( k ) ¯ C 1 ( k ) , (13) where ˆ C 2 ,m ( k ) is an estimate of the rahmonic component for the m th frame at discrete quefrency k , C m ( k ) is the receiv ed po wer cep- strum for the m th frame at quefrency k , and ¯ C 1 ( k ) is the long time- av eraged receiv ed cepstrum gi ven by (12). The parameter a ( k ) con- trols the amount of the non-rahmonic component that is subtracted from the power cepstrum. For full non-rahmonic component sub- traction a ( k ) = 1 and for over -subtraction a ( k ) > 1 , analogous to spectral subtraction [15]. The cepstrum subtraction process is vi- sualized in Figure 1, where the non-rahmonic contribution in (b) is subtracted from the received power cepstrum (a), leaving only the rahmonic component in (c). Note that the cepstrum subtraction method has the same formu- lation as cepstrum mean normalization, which is used in speech pro- cessing, but with the addition of the factor a ( k ) [5, 16, 17]. Rather than removing channel-effects (distortions) for Mel-frequency cep- strum coef ﬁcients, instead cepstrum components extraneous to the rahmonics are suppressed. Also the two methods are applied to the outputs of different signal models. Figure 2 shows various po wer cepstrograms for a simulated motor-boat transit using synthetic broadband radiated noise. The 0 50 100 150 200 250 300 Q u e f r e n c y ( s ) (a) 0 50 100 150 200 250 Time (s) 0 50 100 150 200 250 300 Q u e f r e n c y ( s ) (c) 0 50 100 150 200 250 300 Q u e f r e n c y ( s ) (b) Fig. 2 . V arious power cepstrograms for simulated acoustic data, with quefrencies up to 300 µ s. (a) V ariation with time of the power cepstrum, (b) Similar to (a), but with cepstrum subtraction where the non-rahmonic component is estimated from the entire transit ( M = 2700 ), and (c) Similar to (b) but the non-rahmonic com- ponent is estimated from the last 2 seconds ( M = 20 ) of received cepstra, which have signiﬁcant rahmonic contributions that bias the estimate. source/sensor geometry is the same for the at-sea experiment dis- cussed below in subsection 3.1. Figure 2(a) shows the power cepstrum without cepstrum subtraction. The horizontal banding associated with the non-rahmonic component is prominent, which can be seen to interfere with the rahmonics. Figure 2(b) shows that the ef fect of cepstrum subtraction is to suppress the non-rahmonic component, thus restoring the ﬁdelity of the rahmonic component. Here, M = 2700 which is sufﬁciently large so that the contribution of the rahmonic component to the time av eraged-power cepstrum is negligible. The strengths of the rahmonics relativ e to the non- rahmonics hav e increased. In contrast, Figure 2(c) shows that the cepstrum subtraction process has suppressed both the rahmonic and non-rahmonic components because M = 20 is small, i.e. the contribution of the rahmonic component to the time-average power cepstrum is signiﬁciant. Cepstrum subtraction requires then multi- path time delay τ β to be a rapidly changing function of m , otherwise the rhamonic strengths bias the estimate of the non-rahmonic com- ponent of the power cepstrum, violating (11). 3. EXPERIMENT AL RESUL TS The multipath time delay , or the time difference in the arriv als of the signal via the direct propagation path and the signal via the in- direct seaﬂoor-reﬂected multipath, is estimated using various cep- strum methods and the autocorrelation function. The performance of the time delay estimation methods are compared using acoustic data collected during an at-sea experiment. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Cepstrum Subtraction F actor 45 50 55 60 65 70 MAE ( s ) Fig. 3 . V ariation with the cepstrum subtraction factor a ( k ) of the mean absolute error for multipath time delay estimation. 5 0 5 10 15 20 25 30 SNR (dB) 0 50 100 150 200 250 MAE ( s ) Auto Correla tion R eceived Cepstr um Cepstr um Su btrac tion Fig. 4 . V ariation with signal-to-noise ratio of the mean absolute error for mulitpath time delay estimation. 3.1. Dataset An experiment was conducted where a propeller-dri ven boat, which radiated broadband underwater acoustic noise, transited past a hy- drophone located 1 m abov e the seaﬂoor in very shallo w water ( 20 m deep). Fourteen transits were recorded o ver a tw o day period, where the vessel approached the sensor from different directions. A spectrogram of the hydrophone’ s output was observed to display a Lloyd’ s mirror interference pattern over the entire frequency band ( ≤ 90 kHz) of the receiver , which was sampled at 250 kHz. Record- ing commenced when the vessel was inbound at a ground range of 200 m. The vessel transited past the sensor, and recording was ter- minated when the vessel was 200 m outbound. The vessel’ s position was logged at 0 . 1 s intervals using a RF tracking de vice. The multipath time delay is predicted as a function of the vessel’ s ground range, assuming an isovelocity ( 1520 ms − 1 ) sound propagation medium and specular reﬂection from a ﬂat seaﬂoor [18]. Acoustic recordings are di vided into 0 . 1 s duration sound clips. The power cepstrum, po wer cepstrum with cepstrum subtraction, and autocorrelation function are computed for each sound clip. The estimate of the multipath time delay is the quefrency (or lag) corre- sponding to the peak v alue in each of these functions. The estimated multipath time delay is compared with the predicted time delay . Six minutes of background recordings (with no source present) was also collected in or der to measure the ambient noise spectrum, which was used to produce various levels of additive noise for signal-to-noise ratio (SNR) experiments. 3.2. Comparison of time delay estimation methods Figure 3 sho ws the mean absolute error (MAE) for multipath time delay estimation with cepstrum subtraction, as a function of the sub- traction factor a ( k ) in (13). When a ( k ) = 0 , there is no cepstrum subtraction and the time delay is estimated from the received po wer cepstrum. When a ( k ) = 1 , full cepstrum subtraction occurs and when a ( k ) > 1 , there is over -subtraction. Non-rahmonic compo- nents are estimated by taking the mean of the recei ved power cepstra for the entire acoustic dataset. The MAE decreases with cepstrum subtraction, which demonstrates the beneﬁt of non-rahmonic com- ponent suppression in the receiv ed power cepstrum. The minimum MAE occurs when a ( k ) = 1 . 5 . This level of ov er-subtraction re- duces the MAE by 20 µ s when compared with nil subtraction i.e. a ( k ) = 0 . The multipath time delay is estimated from (a) the po wer cep- strum, (b) the power cepstrum with cepstrum (o ver) subtraction, and (c) the autocorrelation function. The estimate of the multipath time delay is the quefrency (or lag) corresponding to the peak value in each of these functions. Colored noise with the same power spectral density as background noise recordings (without the source) is added to recordings in varying amounts in order to change the SNR. This enables time delay estimation performance to be tested as a func- tion of SNR. Figure 4 shows that cepstrum subtraction provides the ov erall performance, conﬁrming that suppression of non-rahmonic components improv es time delay estimation performance. 4. CONCLUSION Cepstrum analysis enables estimation of the multipath time delay for broadband signals arriving via direct and indirect signal prop- agation paths at a receiving hydrophone. A cepstrum subtraction method suppresses the cepstrum components extraneous to the rha- monics. As a result, the magnitudes of the rahmonics relative to the non-rahmonics are increased. Also, there is an improv ement in multipath time delay estimation, which is demonstrated using real data collected during an at-sea experiment. The experimental results showed improv ed multipath time delay estimation performance in a very shallow w ater en vironment when compared with other time de- lay estimation methods, and robustness to decreasing signal-to-noise ratios. Multipath time delay estimation using cepstrum subtraction is superior to using the autocorrelation function when the SNR ≥ 0 . Howe ver , without cepstrum subtraction, the autocorrelation function is better than the po wer cepstrum for multipath time delay estima- tion in the present application. Cepstrum over -subtraction improves multipath time delay estimation. 5. REFERENCES [1] Robert J Urick, Principles of Underwater Sound , pp. 123–125, McGraw-Hill, Ne w Y ork, 2nd edition, 1975. [2] William M Care y , “Lloyd’ s mirror-image interference effects, ” Acoustics T oday , vol. 5, no. 2, pp. 14–20, 2009. [3] Daphne Kapolka, Jason K Wilson, Joseph A Rice, and Paul Hursky , “Equiv alence of the w av eguide in v ariant and two path ray theory methods for range prediction based on lloyd’ s mirror patterns, ” in Proceedings of Meetings on Acoustics 155 . ASA, 2008, vol. 4, p. 070002. [4] Adrian D Jones, David W Bartel, and Paul A Clarke, “ An ap- plication of range-frequency striations to seaﬂoor inv ersion in shallow oceans, ” in Pr oceedings of Acoustics 2012-F r emantle , 2012, pp. 1–7. [5] A Michael Noll, “Short-time spectrum and “cepstrum” tech- niques for v ocal-pitch detection, ” The Journal of the Acoustical Society of America , vol. 36, no. 2, pp. 296–302, 1964. [6] A Michael Noll, “Cepstrum pitch determination, ” The Journal of the Acoustical Society of America , vol. 41, no. 2, pp. 293– 309, 1967. [7] Alan V Oppenheim and Ronald W Schafer , “From frequency to quefrency: A history of the cepstrum, ” IEEE Signal Pro- cessing Magazine , vol. 21, no. 5, pp. 95–106, 2004. [8] Robert B Randall, “ A history of cepstrum analysis and its ap- plication to mechanical problems, ” Mechanical Systems and Signal Pr ocessing , vol. 97, pp. 3–19, 2017. [9] Eric L Ferguson, Rishi Ramakrishnan, Stefan B Williams, and Craig T Jin, “Con volutional neural networks for passive mon- itoring of a shallow water environment using a single sensor, ” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on . IEEE, 2017, pp. 2657– 2661. [10] Eric L Ferguson, Stefan B W illiams, and Craig T Jin, “Sound source localization in a multipath environment using conv o- lutional neural networks, ” in Acoustics, Speech and Signal Pr ocessing (ICASSP), 2018 IEEE International Conference on . IEEE, 2018, pp. 2386–2390. [11] Julius S Bendat and Allan G Piersol, “Engineering applications of correlation and spectral analysis, ” J ohn W ile y & Sons , pp. Sect. 6.2, 145–153, 1993. [12] Julius S Bendat and Allan G Piersol, Random Data: Analy- sis and Measurement Pr ocedur es , chapter 3, pp. 56–98, John W iley & Sons, Ne w Y ork, 1st edition, 1971. [13] Julius S Bendat and Allan G Piersol, Random Data: Analysis and Measurement Pr ocedur es , chapter 1, pp. 1–36, John Wile y & Sons, New Y ork, 1st edition, 1971. [14] Julius S Bendat and Allan G Piersol, Random Data: Analysis and Measur ement Pr ocedur es , pp. 344–380, John Wile y & Sons, New Y ork, 1st edition, 1971. [15] S V V aseghi, Advanced Digital Signal Processing , chapter 12, pp. 321–338, John Wile y & Sons, W est Sussex, UK, 4th edi- tion, 2008. [16] Philip N Garner , “Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition, ” Speech Communication , vol. 53, no. 8, pp. 991–1001, 2011. [17] Sadaoki Furui, “Cepstral analysis technique for automatic speaker veriﬁcation, ” IEEE T ransactions on Acoustics, Speech, and Signal Pr ocessing , vol. 29, no. 2, pp. 254–272, 1981. [18] Eric L Ferguson, Rishi Ramakrishnan, Stefan B Williams, and Craig T Jin, “Deep learning approach to passiv e monitoring of the underwater acoustic environment, ” The J ournal of the Acoustical Society of America , vol. 140, no. 4, pp. 3351–3351, 2016.

Improved multipath time delay estimation using cepstrum subtraction

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment