Amazon Unveils BASE TTS: Breakthrough 980M Parameter Text-to-Speech Model with Emergent Abilities

• Researchers at Amazon have created the largest text-to-speech model yet, called BASE TTS, with 980 million parameters • The medium-sized 400M parameter model showed emergent abilities to handle complex sentences with punctuation, emotions, foreign words etc. • Audio examples show the model handles tricky cases like whispers, questioning tones, and foreign words quite well • The model is streamable, generating speech moment-by-moment instead of all at once, reducing bandwidth • The researchers didn't publish the full model due to risks of misuse, but breakthroughs in text-to-speech are coming in 2024