Assistants can now recognize individual voice patterns. Have trained my own that way, usually by repeating phrases. And that categorization is important to understand context. But this is usually in a forced, single person situation. More broadly, the 'cocktail party' problem is harder, beyond just a voice print, also using the context of the words spoken. Takes it beyond controlled environment dialog. Good challenge.
An AI has learned how to pick a single voice out of a crowd By Richard Gray
Devices like Amazon’s Echo and Google Home can usually deal with requests from a lone person, but like us they struggle in situations such as a noisy cocktail party, where several people are speaking at once.
Now an AI that is able to separate the voices of multiple speakers in real time promises to give automatic speech recognition a big boost, and could soon find its way into an elevator near you.
The technology, developed by researchers at the Mitsubishi Electric Research Laboratory in Cambridge, Massachusetts, was demonstrated in public for the first time at this month’s Combined Exhibition of Advanced Technologies show in Tokyo.
It uses a machine learning technique the team calls “deep clustering” to identifies unique features in the “voiceprint” of multiple speakers. It then groups the distinct features from each speaker’s voice together, allowing it to disentangle multiple voices and then reconstruct what each person was saying. “It was trained using 100 English speakers, but it can separate voices even if a speaker is Japanese,” says Niels Meinke, a spokesperson for Mitsubishi Electric. .... "
Monday, November 13, 2017
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment