/* ---- Google Analytics Code Below */

Sunday, January 31, 2021

Whats a Transformer?

A major advance in NLP capabilities?  Venturebeat does a good job of explaining and suggesting this will be one of the major AI advances of the year.  Essentially as I see it a library and model that supports better conversation among people and machines.  Open-Sourced.  Worth a deeper look. 

Microsoft trains world’s largest Transformer language model  By Khari Johnson  @kharijohnson  in Venturebeat

Microsoft AI & Research today shared what it calls the largest Transformer-based language generation model ever and open-sourced a deep learning library named DeepSpeed to make distributed training of large models easier.

At 17 billion parameters, Turing NLG is twice the size of Nvidia’s Megatron, now the second biggest Transformer model, and includes 10 times as many parameters as OpenAI’s GPT-2. Turing NLG achieves state-of-the-art results on a range of NLP tasks.

Like Google’s Meena and initially with GPT-2, at first Turing NLG may only be shared in private demos.

Language generation models with the Transformer architecture predict the word that comes next. They can be used to write stories, generate answers in complete sentences, and summarize text.

Experts from across the AI field told VentureBeat 2019 was a seminal year for NLP models using the Transformer architecture, an approach that led to advances in language generation and GLUE benchmark leaders like Facebook’s RoBERTa, Google’s XLNet, and Microsoft’s MT-DNN.

Also today: Microsoft open-sourced DeepSpeed, a deep learning library that’s optimized for developers to deliver low latency, high throughput inference.

DeepSpeed contains the Zero Redundancy Optimizer (ZeRO) for training models with 100 million parameters or more at scale, which Microsoft used to train Turing NLG.

“Beyond saving our users time by summarizing documents and emails, T-NLG can enhance experiences with the Microsoft Office suite by offering writing assistance to authors and answering questions that readers may ask about a document,” Microsoft AI Research applied scientist Corby Rosset wrote in a blog post today.  ... " 

No comments: