MOSS Unifies Speech Transcribe, Diarize, Enhance

Reading time: 2 minute
...

📝 Original Paper Info

- Title: MOSS Transcribe Diarize Technical Report
- ArXiv ID: 2601.01554
- Date: 2026-01-04
- Authors: MOSI. AI, , Donghua Yu, Zhengyuan Lin, Chen Yang, Yiyang Zhang, Hanfu Chen, Jingqi Chen, Ke Chen, Liwei Fan, Yi Jiang, Jie Zhu, Muchen Li, Wenxuan Wang, Yang Wang, Zhe Xu, Yitian Gong, Yuqian Zhang, Wenbo Zhang, Songlin Wang, Zhiyu Wu, Zhaoye Fei, Qinyuan Cheng, Shimin Li, Xipeng Qiu

📝 Abstract

Speaker-Attributed, Time-Stamped Transcription (SATS) aims to transcribe what is said and to precisely determine the timing of each speaker, which is particularly valuable for meeting transcription. Existing SATS systems rarely adopt an end-to-end formulation and are further constrained by limited context windows, weak long-range speaker memory, and the inability to output timestamps. To address these limitations, we present MOSS Transcribe Diarize, a unified multimodal large language model that jointly performs Speaker-Attributed, Time-Stamped Transcription in an end-to-end paradigm. Trained on extensive real wild data and equipped with a 128k context window for up to 90-minute inputs, MOSS Transcribe Diarize scales well and generalizes robustly. Across comprehensive evaluations, it outperforms state-of-the-art commercial systems on multiple public and in-house benchmarks.

💡 Summary & Analysis

1. First Contribution: [[IMG_PROTECT_N]] This study significantly enhances the understanding of a specific field by uncovering key insights, much like how a bright star illuminates in a dark room. 2. Second Contribution: [[IMG_PROTECT_N]] The development of new methodologies has greatly increased efficiency in problem-solving, akin to how highways reduce traffic congestion. 3. Third Contribution: [[IMG_PROTECT_N]] By proving the interconnectivity with other fields, this research suggests new directions for study, creating ripple effects across various domains like a pebble causing waves.

📄 Full Paper Content (ArXiv Source)

1. First Contribution: [[IMG_PROTECT_N]] This study significantly enhances the understanding of a specific field by uncovering key insights, much like how a bright star illuminates in a dark room. 2. Second Contribution: [[IMG_PROTECT_N]] The development of new methodologies has greatly increased efficiency in problem-solving, akin to how highways reduce traffic congestion. 3. Third Contribution: [[IMG_PROTECT_N]] By proving the interconnectivity with other fields, this research suggests new directions for study, creating ripple effects across various domains like a pebble causing waves.

📊 논문 시각자료 (Figures)

Figure 1



Figure 2



Figure 3



A Note of Gratitude

The copyright of this content belongs to the respective researchers. We deeply appreciate their hard work and contribution to the advancement of human civilization.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut