Form and meaning co-determine the realization of tone in Taiwan Mandarin spontaneous speech: the case of T2-T3 and T3-T3 tone sandhi

Form and meaning co-determine the realization of tone in Taiwan Mandarin spontaneous speech: the case of T2-T3 and T3-T3 tone sandhi
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In Standard Chinese, Tone 3 (the dipping tone) becomes Tone 2 (rising tone) when followed by another Tone 3. Previous studies have noted that this sandhi process may be incomplete, in the sense that the assimilated Tone 3 is still distinct from a true Tone 2. While Mandarin Tone 3 sandhi is widely studied using carefully controlled laboratory speech (Xu, 1997) and more formal registers of Beijing Mandarin (Yuan & Y. Chen, 2014), less is known about its realization in spontaneous speech, and about the effect of contextual factors on tonal realization. The present study investigates the pitch contours of two-character words with T2-T3 and T3-T3 tone patterns in spontaneous Taiwan Mandarin conversations. Our analysis makes use of the Generative Additive Mixed Model (GAMM, Wood, 2017) to examine fundamental frequency (F0) contours as a function of normalized time. We consider various factors known to influence pitch contours, including gender, duration, word position, bigram probability, neighboring tones, speaker, and also novel predictors, word and word sense (Chuang et al., 2025). Our analyses revealed that in spontaneous Taiwan Mandarin, T3-T3 words become indistinguishable from T2-T3 words, indicating complete sandhi, once the strong effect of word (or word sense) is taken into account.


💡 Research Summary

The paper investigates how Tone 2‑Tone 3 (T2‑T3) and Tone 3‑Tone 3 (T3‑T3) disyllabic words are realized in spontaneous Taiwan Mandarin speech. While previous work on Mandarin tone sandhi has largely relied on laboratory recordings or formal broadcast speech—most notably showing an “incomplete neutralization” where the first Tone 3 in a T3‑T3 sequence (the sandhi tone, SR) remains slightly lower in pitch than a lexical Tone 2 (the LR tone)—the present study moves to a natural conversational setting. Using the Taiwanese Mandarin Spontaneous Speech Corpus (30 h of face‑to‑face dialogues, 31 female and 24 male speakers), the authors automatically extracted all T2‑T3 and T3‑T3 tokens, forced‑aligned them at the word and character level with EasyAlign, and manually verified the alignments. Fundamental‑frequency (F0) contours were time‑normalized and modeled with a Generalized Additive Mixed Model (GAMM, Wood 2017).

Predictors in the model included traditional factors known to affect pitch: speaker gender, token duration, utterance position, bigram probability, neighboring tones, and speaker as a random effect. Crucially, the authors introduced two novel fixed‑effects: the lexical item (“word”) and its semantic sense (“word sense”), the latter operationalized via contextualized embeddings as in Chuang et al. 2025.

When only the traditional predictors were entered, the model reproduced the classic finding: the SR tone in T3‑T3 tokens exhibited a modest but statistically reliable pitch deficit (≈5–8 Hz) relative to the LR tone in T2‑T3 tokens. However, once “word” and “word sense” were added, this difference vanished; the two patterns became statistically indistinguishable. In other words, after controlling for lexical identity and meaning, T3‑T3 sandhi in spontaneous Taiwan Mandarin is fully assimilated to T2‑T3, indicating complete sandhi.

Gender effects persisted: male speakers produced larger F0 rises for both SR and LR tones. Faster speech (shorter token duration) reduced pitch excursion, and utterance‑final positions introduced a downward bias due to sentence‑final intonation. Word frequency correlated with shorter durations, which in turn modestly reduced pitch range, but frequency had no direct effect on F0 once duration and speaking rate were accounted for.

The study therefore demonstrates that tonal realization is co‑determined by phonetic, sociophonetic, and lexical‑semantic factors. The strong influence of word and meaning aligns with the Discriminative Lexicon Model, suggesting that fine‑grained semantic information can predict pitch contours beyond purely phonological rules. The authors conclude that, contrary to findings from Beijing Mandarin, Taiwan Mandarin exhibits complete T3‑T3 sandhi in natural speech, and that future work should explore meaning‑driven tone modeling across dialects and integrate such models into speech‑recognition technology.


Comments & Academic Discussion

Loading comments...

Leave a Comment