Robust spatial audio control relies on accurate acoustic propagation models, yet environmental variations, especially changes in the speed of sound, cause systematic mismatches that degrade performance. Existing methods either assume known sound speed, require multiple microphones, or rely on separate calibration, making them impractical for systems with minimal sensing. We propose an online sound speed estimator that operates during general multichannel audio playback and requires only a single observation microphone. The method exploits the structured effect of sound speed on the reproduced signal and estimates it by minimizing the mismatch between the measured audio and a parametric acoustic model. Simulations show accurate tracking of sound speed for diverse input signals and improved spatial control performance when the estimates are used to compensate propagation errors in a sound zone control framework.
Spatial audio control techniques such as sound zone control (SZC), spatial active noise control (ANC), and immersive audio reproduction are increasingly deployed in practical systems including personal audio devices, cars, and smart environments. These methods shape acoustic sound fields using multiple loudspeakers to enhance desired audio in specific regions while suppressing it elsewhere.
In practice, robust spatial audio control remains challenging because most methods rely on control filters computed offline from pre-measured acoustic impulse responses (IRs) that are assumed fixed during deployment. However, environmental changes such as listener movement, transducer drift, or temperature variations alter the IRs and degrade performance [1][2][3][4]. Among these factors, variations in the speed of sound are particularly critical, as they introduce systematic delay and phase mismatches that severely affect spatial control [1,5,6]. Nevertheless, only a few approaches address sound speed variations [1,5,[7][8][9]. Constraint-based IR reshaping [1], learned parametric propagation models [7], learned IR priors [9], and covariance-prior-based methods [8] have been proposed to improve robustness. However, these approaches typically require repeated calibration, multiple microphones during deployment, or substantial pre-measured training data. Adaptive filtering and secondary path modeling can track acoustic changes online [10,11], but require access to microphone signals at all control points, limiting applicability in minimally instrumented systems. Recently, [5] proposed interpolation-based IR modeling using a Sinc Interpolation-Compression/Expansion Resampling (SICER) framework, which enables recomputation of control filters at new sound speeds. However, this assumes that the sound speed is known or estimated separately, for example using temperature and humidity sensors. More broadly, classical sound speed estimation methods are often coupled with source localization and rely on multiple spatially distributed microphones [12][13][14][15], or require dedicated measurement procedures [16,17], which is poorly suited for online adaptation in systems with limited sensing infrastructure.
In this paper, we propose an online sound speed estimation method that operates during multichannel audio playback using only a single observation microphone, not necessarily placed at the control points. The method exploits the structured effect of sound speed variations on acoustic propagation, as described in [5], and estimates the sound speed directly from the reproduced audio signal without additional sensors. As an example application, the estimated sound speed is integrated into an SZC framework to compensate for propagation mismatches without modifying the underlying control architecture. The proposed approach enables practical, singlechannel sound speed tracking for robust spatial audio control.
We consider a general frame-based multichannel audio system where a set of loudspeakers, L, is used to generate a desired sound field at a set of control-point microphones M. For example, in SZC the loudspeakers are used to create different sound zones with different desired sound fields, cf. Section III-A [18,19]. We denote by h m,l ∈ R K the IR from the l th loudspeaker to the m th microphone. Each loudspeaker is assumed to be equipped with a finite impulse response (FIR) filter, q l ∈ R J to control the reproduced sound field, e.g., an ANC or SZC filter. Then, for an input signal frame
the signal frame reproduced by all L loudspeakers at microphone m and frame index τ , is
where * denotes convolution, and y l [τ ] ∈ R N is the l th loudspeaker output signal for frame τ . We use overlap-add and buffers for each convolution to avoid frame boundary errors [20], and let all signal frames have length N . We assume arXiv:2602.16416v1 [eess.AS] 18 Feb 2026 K -1 ≤ N and J -1 ≤ N such that the convolution tail can be stored in a single frame buffer [4]. For a vector v ∈ R M containing time-consecutive samples of a signal, v[n], we define a buffering operator, Buff N K-1 (v), which extracts the last K -1 elements of v and zero-pads to length N as
where
The reproduced signal from loudspeaker l at microphone m in frame τ is then
(3) Similarly, we can express y l [τ ] from x[τ ] and q l .
We first present our model for sound speed changes in the reproduced audio based on the SICER model for speed changes on IRs [5].
Following the SICER model in [5], an IR measured at sound speed c old , h ∈ R K , can be mapped to a new sound speed c new with IR, ĥ ∈ R I ĥ
where S (α) ∈ R I×K is a sinc interpolation matrix with the i th row defined as s
]. Using this model and assuming a uniform temperature in the environment, the reproduced signal under a sound speed change becomes
With this model for audio reproduction under new sound speeds, we can now estimate the change in sound speed based on recordings of reproduced audio.
We now prese
This content is AI-processed based on open access ArXiv data.