Thursday, February 21, 2019

Google Makes more Speech Services Available

Impressive array of cognitive speech services, in 120 languages!   Now broadly available with demonstrations at the link.

Cloud Speech-to-Text

Speech-to-text conversion powered by machine learning and available for short-form or long-form audio.

Powerful speech recognition

Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google’s machine learning technology.

Some of the Betas in particular are indicative of future direction of capabilities:

Cloud Speech-to-Text features
Speech-to-text conversion powered by machine learning.
Automatic Speech Recognition
Automatic Speech Recognition (ASR) powered by deep learning neural networking to power your applications like voice search or speech transcription.
Global Vocabulary
Recognizes 120 languages and variants with an extensive vocabulary.
Phrase Hints
Speech recognition can be customized to a specific context by providing a set of words and phrases that are likely to be spoken. This is especially useful for adding custom words and names to the vocabulary and in voice-control use cases.
Real-time Streaming or Prerecorded Audio Support
Audio input can be streamed from an application’s microphone or sent from a prerecorded audio file (inline or through Google Cloud Storage). Multiple audio encodings are supported, including FLAC, AMR, PCMU, and Linear-16.
Auto-Detect Language BETA
When you need to support multilingual scenarios, you can now specify two to four language codes and Cloud Speech-to-Text will identify the correct language spoken and provide the transcript.
Noise Robustness
Handles noisy audio from many environments without requiring additional noise cancellation.
Inappropriate Content Filtering
Filter inappropriate content in text results for some languages.
Automatic Punctuation BETA
Accurately punctuates transcriptions (e.g., commas, question marks, and periods) with machine learning.
Model Selection BETA
Choose from a selection of four pre-built models: default, voice commands and search, phone calls, and video transcription.
Speaker Diarization BETA
Know who said what - you can now get automatic predictions about which of the speakers in a conversation spoke each utterance.
Multichannel Recognition BETA
In multiparticipant recordings where each participant is recorded in a separate channel (e.g., phone call with two channels or video conference with four channels), Cloud Speech-to-Text will recognize each channel separately and then annotate the transcripts so that they follow the same order as in real life.  .... " 

