Alibaba Cloud CosyVoice V3 Launch: 31 New Voices, Faster Speed, Better Pricing

2025-12-16
UnifiedTTS Team

We're thrilled to announce that Alibaba Cloud CosyVoice third-generation models (V3) are now officially available on the UnifiedTTS platform! This major upgrade introduces 31 brand-new voices, faster synthesis speeds, and more competitive pricing.

Two V3 Versions

🚀 CosyVoice V3-Flash: Speed Edition

Model ID: cosyvoice-v3-flash

Pricing: 100 credits/1K characters (Half the price of v2!)

Voice Count: 31 high-quality Chinese voices

V3-Flash is designed for high-performance scenarios, offering excellent audio quality while significantly reducing costs. Whether for bulk content production or real-time applications, v3-flash delivers exceptional value.

⭐ CosyVoice V3-Plus: Premium Edition

Model ID: cosyvoice-v3-plus

Pricing: 200 credits/1K characters (Same price as v2)

Voice Count: 2 flagship voices (Long Anyang, Long Anhuan)

V3-Plus features two carefully selected premium voices, ideal for professional scenarios demanding the highest audio quality, such as brand voiceovers and premium content production.

🎭 31 New Voices, Rich Expression

The V3 series introduces 31 meticulously crafted Chinese voices, covering different ages, genders, and personality traits:

Wide Age Range

  • Child Voice: 6-10 years (Long Huhu - innocent girl)
  • Youth Voices: 20-30 years (10 voices, full of vitality)
  • Middle-aged Voices: 30-40 years (experienced and composed)
  • Elder Voices: 60 years (Long Laobo, Long Laoyi - weathered wisdom)

Diverse Personality Traits

  • Business Professional: Elegant intellectual woman, wise mature man, precise professional woman
  • Warm & Friendly: Gentle best friend, homey warm man, friendly lively woman
  • Energetic & Youthful: Sunny boy, spirited energetic woman, free-spirited man
  • Classic Characters: Classic Monkey King, cute robot, delicate talented woman

Dialect Voices 🌏

V3 introduces dialect-specific voices for the first time, making your content more locally authentic:

  • Long Anyue: Spirited Cantonese man (Cantonese support)
  • Long Shangge: Authentic Shaanxi man (Shaanxi dialect support)
  • Long Anmin: Pure Minnan woman (Minnan dialect support)

All voices support both Chinese (Mandarin) and English.

💡 Typical Use Cases

V3-Flash Ideal For

  • 📚 Audiobooks: Large text volumes, cost-sensitive
  • 🎓 Online Education: Course narration, exercise reading
  • 📱 Content Creation: Short videos, media voiceovers
  • 🤖 Smart Customer Service: High-concurrency conversation scenarios
  • 📻 Podcast Production: Long-form content creation

V3-Plus Ideal For

  • 🎬 Brand Advertising: Premium brand image building
  • 🎮 Game Voiceover: Main character/core role voicing
  • 📺 Documentaries: Professional narration
  • 🏢 Corporate Communications: Company intros, product demos
  • 🎨 Artistic Works: High-quality audio creation