Skeptical here, but taking a look. In particular how this is measured. If generally true, this would be a very big thing. But why can I now quickly confuse alexa, google home, or Siri. Is it because understanding bare language is very different from participating in a contextually fluid conversation?
AI Models from Google and Microsoft Exceed Human Performance on Language Understanding Benchmark by Anthony Alford in Infoq
Research teams from Google and Microsoft have recently developed natural language processing (NLP) AI models which have scored higher than the human baseline score on the SuperGLUE benchmark. SuperGLUE measures a model's score on several natural language understanding (NLU) tasks, including question answering and reading comprehension.
Both teams submitted their models to the SuperGLUE Leaderboard on January 5. Microsoft Research's model Decoding-enhanced BERT with disentangled attention (DeBERTa) scored a 90.3 on the benchmark, slightly beating Google Brain's model, based on the Text-to-Text Transfer Transformer (T5) and the Meena chatbot, which scored 90.2. Both exceeded the human baseline score of 89.8. Microsoft has open-sourced a smaller version of DeBERTa and announced plans to release the code and models for the latest model. Google has not published details of their latest model; while the T5 code is open-source, the Meena chatbot is not.
The General Language Understanding Evaluation (GLUE) benchmark was developed in 2019 as a method for evaluating the performance of NLP models such as BERT and GPT. GLUE is a collection of nine NLU tasks based on publicly-available datasets. Because of the rapid pace of improvement in NLP models, GLUE's evaluation "headroom" has diminished, and researchers introduced SuperGLUE, a more challenging benchmark. ... "
No comments:
Post a Comment