Highly-Reverberant Real Environment database: HRRE

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Speech recognition in highly-reverberant real environments remains a major challenge. An evaluation dataset for this task is needed. This report describes the generation of the Highly-Reverberant Real Environment database (HRRE). This database contains 13.4 hours of data recorded in real reverberant environments and consists of 20 different testing conditions which consider a wide range of reverberation times and speaker-to-microphone distances. These evaluation sets were generated by re-recording the clean test set of the Aurora-4 database which corresponds to five loudspeaker-microphone distances in four reverberant conditions.

💡 Research Summary

The paper addresses a critical gap in the field of automatic speech recognition (ASR) for highly reverberant real‑world environments: the lack of a dedicated evaluation corpus that isolates reverberation effects from other confounding factors such as additive noise or microphone array geometry. Existing corpora—CHiME‑2, CHiME‑3/4, REVERB, ASpIRE—either rely on simulated room impulse responses, include substantial background noise, or use multi‑channel recordings that make it difficult to study the pure influence of reverberation time (RT) and speaker‑to‑microphone distance (d). To fill this void, the authors introduce the Highly‑Reverberant Real Environment (HRRE) database.

HRRE is built by re‑recording the clean test portion of the Aurora‑4 corpus (330 utterances originally captured with a Sennheiser microphone) inside a purpose‑built reverberation chamber at the University of Chile. The chamber has an internal surface area of 100 m² and a volume of 63 m³, with a baseline RTmid of three seconds. By strategically adding sound‑absorbing panels, four distinct reverberant conditions were created, yielding RTmid values of 0.47 s, 0.84 s, 1.27 s, and 1.77 s—values commonly reported for meeting rooms and multipurpose halls. For each RT, five speaker‑to‑microphone distances were selected: 0.16 m, 0.32 m, 0.64 m, 1.28 m, and 2.56 m. The distances follow a geometric progression (each distance is twice the previous), covering scenarios from close‑talk (human‑robot interaction) to far‑field (large conference rooms).

The recording chain consisted of an HP Probook laptop, a Samson Servo 260 power amplifier, a Bose V201a loudspeaker, an Earthworks M30 measurement microphone, and a Focusrite Scarlett 2i2 audio interface. Prior to the main recordings, the loudspeaker output was calibrated in an anechoic chamber to produce an equivalent continuous sound pressure level (Leq) of 60 dBA measured one meter from the speaker, following IEC 60268:2003. Calibration used ten utterances from different speakers, measured with a Cesva SC‑310 sound level meter. During the actual recordings, the level was set to stay well above the measured background noise (maximum 37 dBA Leq) yet below the clipping threshold of the audio interface. The background noise was monitored at each distance, ensuring a quiet environment.

To align the re‑recorded utterances with the original Aurora‑4 files, the authors applied cross‑correlation to compensate for the system delay, guaranteeing sample‑accurate synchronization. Spectrograms of a sample utterance are presented for the most reverberant condition (RT = 1.77 s) at the nearest (0.16 m) and farthest (2.56 m) distances, illustrating the progressive smearing of temporal structure as both RT and distance increase.

The resulting corpus comprises 20 distinct test conditions (4 RT × 5 distances), amounting to 13.4 hours of speech data. All recordings are publicly available upon request via http://www.lptv.cl/en/hrre/, facilitating reproducible research. The authors emphasize that HRRE enables systematic investigation of how RT and d independently affect ASR performance, without the confounding influence of additive noise or multi‑channel processing. This makes HRRE especially valuable for evaluating dereverberation algorithms, distance‑adaptive acoustic models, and robust feature extraction techniques.

Funding acknowledgments note support from CONICYT‑Fondecyt grant 1151306, ONRG grant N°62909‑17‑1‑2002, and a doctoral fellowship for José Novoa. The Institute of Acoustics at Universidad Austral de Chile is thanked for providing the reverberation chamber and recording equipment.

While HRRE fills an important niche, the authors acknowledge its limitations: the recordings are monophonic, noise‑free, and captured in a single, relatively regular‑shaped room. Real‑world deployments often involve background chatter, HVAC noise, and irregular geometries. Future work is suggested to extend HRRE with additive noise at controlled SNRs, multi‑channel microphone arrays, and a broader variety of room shapes and surface materials, thereby bridging the gap between controlled laboratory evaluation and operational deployment.

Highly-Reverberant Real Environment database: HRRE

💡 Research Summary

Comments & Academic Discussion

Leave a Comment