The Eponymous Pickle: Advanced Speech Simulation

Wednesday, January 11, 2023

Advanced Speech Simulation

Security is slipping everywhere ...

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Text-to-speech model can preserve speaker's emotional tone and acoustic environment.

In Ars Technica, BENJ EDWARDS - 1/9/2023 M ... '

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.

Meta’s AI-powered audio codec promises 10x compression over MP3

Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3. ... '

The Eponymous Pickle

About Me

RSS

Blog Archive

Wednesday, January 11, 2023

Advanced Speech Simulation

No comments: