Alibaba Cloud CosyVoice V3 Launch: 31 New Voices, Faster Speed, Better Pricing

We're thrilled to announce that Alibaba Cloud CosyVoice third-generation models (V3) are now officially available on the UnifiedTTS platform! This major upgrade introduces 31 brand-new voices, faster synthesis speeds, and more competitive pricing.

Two V3 Versions

🚀 CosyVoice V3-Flash: Speed Edition

Model ID: cosyvoice-v3-flash

Pricing: 100 credits/1K characters (Half the price of v2!)

Voice Count: 31 high-quality Chinese voices

V3-Flash is designed for high-performance scenarios, offering excellent audio quality while significantly reducing costs. Whether for bulk content production or real-time applications, v3-flash delivers exceptional value.

⭐ CosyVoice V3-Plus: Premium Edition

Model ID: cosyvoice-v3-plus

Pricing: 200 credits/1K characters (Same price as v2)

Voice Count: 2 flagship voices (Long Anyang, Long Anhuan)

V3-Plus features two carefully selected premium voices, ideal for professional scenarios demanding the highest audio quality, such as brand voiceovers and premium content production.

🎭 31 New Voices, Rich Expression

The V3 series introduces 31 meticulously crafted Chinese voices, covering different ages, genders, and personality traits:

Wide Age Range

Child Voice: 6-10 years (Long Huhu - innocent girl)
Youth Voices: 20-30 years (10 voices, full of vitality)
Middle-aged Voices: 30-40 years (experienced and composed)
Elder Voices: 60 years (Long Laobo, Long Laoyi - weathered wisdom)

Diverse Personality Traits

Business Professional: Elegant intellectual woman, wise mature man, precise professional woman
Warm & Friendly: Gentle best friend, homey warm man, friendly lively woman
Energetic & Youthful: Sunny boy, spirited energetic woman, free-spirited man
Classic Characters: Classic Monkey King, cute robot, delicate talented woman

Dialect Voices 🌏

V3 introduces dialect-specific voices for the first time, making your content more locally authentic:

Long Anyue: Spirited Cantonese man (Cantonese support)
Long Shangge: Authentic Shaanxi man (Shaanxi dialect support)
Long Anmin: Pure Minnan woman (Minnan dialect support)

All voices support both Chinese (Mandarin) and English.

💡 Typical Use Cases

V3-Flash Ideal For

📚 Audiobooks: Large text volumes, cost-sensitive
🎓 Online Education: Course narration, exercise reading
📱 Content Creation: Short videos, media voiceovers
🤖 Smart Customer Service: High-concurrency conversation scenarios
📻 Podcast Production: Long-form content creation

V3-Plus Ideal For

🎬 Brand Advertising: Premium brand image building
🎮 Game Voiceover: Main character/core role voicing
📺 Documentaries: Professional narration
🏢 Corporate Communications: Company intros, product demos
🎨 Artistic Works: High-quality audio creation