MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

Reading time: 1 minute
...

📝 Original Info

  • Title: MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning
  • ArXiv ID: 2601.01568
  • Date: 2026-01-04
  • Authors: Chunyu Qiang, Jun Wang, Xiaopeng Wang, Kang Yin, Yuxin Guo

📝 Abstract

Audio-Video Joint Generation with Multimodal Control (b) Audio-Video Joint Generation with Timbre Control (c) Audio-Video Joint Generation with First Frame Control Framework Comparison Audio-Video Joint Generation with Multimodal Control. A white goat @speaker 0 stands indoors and says, "I am a goat, very cute." And point the front hoof to the opposite side.

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut