We're thrilled to announce that Alibaba Cloud CosyVoice third-generation models (V3) are now officially available on the UnifiedTTS platform! This major upgrade introduces 31 brand-new voices, faster synthesis speeds, and more competitive pricing.
Two V3 Versions
🚀 CosyVoice V3-Flash: Speed Edition
Model ID: cosyvoice-v3-flash
Pricing: 100 credits/1K characters (Half the price of v2!)
Voice Count: 31 high-quality Chinese voices
V3-Flash is designed for high-performance scenarios, offering excellent audio quality while significantly reducing costs. Whether for bulk content production or real-time applications, v3-flash delivers exceptional value.
⭐ CosyVoice V3-Plus: Premium Edition
Model ID: cosyvoice-v3-plus
Pricing: 200 credits/1K characters (Same price as v2)
Voice Count: 2 flagship voices (Long Anyang, Long Anhuan)
V3-Plus features two carefully selected premium voices, ideal for professional scenarios demanding the highest audio quality, such as brand voiceovers and premium content production.
🎭 31 New Voices, Rich Expression
The V3 series introduces 31 meticulously crafted Chinese voices, covering different ages, genders, and personality traits:
Wide Age Range
- Child Voice: 6-10 years (Long Huhu - innocent girl)
- Youth Voices: 20-30 years (10 voices, full of vitality)
- Middle-aged Voices: 30-40 years (experienced and composed)
- Elder Voices: 60 years (Long Laobo, Long Laoyi - weathered wisdom)
Diverse Personality Traits
- Business Professional: Elegant intellectual woman, wise mature man, precise professional woman
- Warm & Friendly: Gentle best friend, homey warm man, friendly lively woman
- Energetic & Youthful: Sunny boy, spirited energetic woman, free-spirited man
- Classic Characters: Classic Monkey King, cute robot, delicate talented woman
Dialect Voices 🌏
V3 introduces dialect-specific voices for the first time, making your content more locally authentic:
- Long Anyue: Spirited Cantonese man (Cantonese support)
- Long Shangge: Authentic Shaanxi man (Shaanxi dialect support)
- Long Anmin: Pure Minnan woman (Minnan dialect support)
All voices support both Chinese (Mandarin) and English.
💡 Typical Use Cases
V3-Flash Ideal For
- 📚 Audiobooks: Large text volumes, cost-sensitive
- 🎓 Online Education: Course narration, exercise reading
- 📱 Content Creation: Short videos, media voiceovers
- 🤖 Smart Customer Service: High-concurrency conversation scenarios
- 📻 Podcast Production: Long-form content creation
V3-Plus Ideal For
- 🎬 Brand Advertising: Premium brand image building
- 🎮 Game Voiceover: Main character/core role voicing
- 📺 Documentaries: Professional narration
- 🏢 Corporate Communications: Company intros, product demos
- 🎨 Artistic Works: High-quality audio creation