Qwen3-TTS is a powerful speech generation model offering comprehensive support for voice cloning, voice design, ultra-high-quality human-like speech generation, and natural language-based voice control. It provides developers and users with the most extensive set of speech generation features available. At MVSep, we use the largest 1.7 billion parameter model.
Original model page: https://github.com/QwenLM/Qwen3-TTS
Qwen3-TTS (Custom Voice) offers a set of 9 pre-defined speakers. Optionally, you can specify a "Voice description" to include emotions like "happy voice" or "sad voice". You can also choose the language for this model or leave it as "auto".