Qwen3-TTS-Flash

Text to Audio

The Qwen3-TTS-Flash is Tongyi's latest offline text-to-speech foundation model, featuring 17 expressive voices while enabling low-latency, high-stability audio synthesis. It supports multilingual and dialect outputs with consistent voice characteristics across languages. Trained on massive datasets, the system automatically adjusts vocal tones based on text semantics and demonstrates robust capabilities for synthesizing complex content. This model is provided as a snapshot version. This version is functionally equivalent to snapshot qwen3-tts-flash-2025-11-27. https://bailian.console.alibabacloud.com/cn-beijing?tab=model#/model-market/detail/qwen3-tts-flash?serviceSite=asia-pacific-china

Input

Text to Synthesize*

Voice

Language Type

Audio Format

Sample Rate

Volume

Min: 0.1Max: 2

Speed

Min: 0.5Max: 2

Pitch

Min: 0.5Max: 2

Return Timestamp

Result

Voice preview will appear here

Please enter text or describe audio to start generating