MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning
Reading time: 1 minute
...
📝 Original Info
- Title: MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning
- ArXiv ID: 2601.01568
- Date: 2026-01-04
- Authors: Chunyu Qiang, Jun Wang, Xiaopeng Wang, Kang Yin, Yuxin Guo