Qwen3-TTS is a powerful speech generation model offering support for voice cloning, voice design, ultra-high-quality human-like speech generation, and natural language-based voice control. It provides developers and users with the most extensive set of speech generation features available. At MVSep, we use the largest 1.7 billion parameter model.
Original model page: https://github.com/QwenLM/Qwen3-TTS
Qwen3-TTS (Voice Design) allows you to generate speech with a custom voice that can be described in detail in the "Voice description" field. You can specify the speaker's gender and age, and add emotions, such as "happy voice" or "sad voice". You can also choose the language for this model or leave it as "auto".