Bigscience and open Language Models
Inside BigScience, the quest to build a powerful open language model
Kyle Wiggers @Kyle_L_Wiggers in VentureBeat.
January 10, 2022 9:30 AM
Roughly a year ago, Hugging Face, a Brooklyn, New York-based natural language processing startup, launched BigScience, an international project with more than 900 researchers that is designed to better understand and improve the quality of large natural language models. Large language models (LLMs) — algorithms that can recognize, predict, and generate language on the basis of text-based datasets — have captured the attention of entrepreneurs and tech enthusiasts alike. But the costly hardware required to develop LLMs has kept them largely out of reach of researchers without the resources of companies like OpenAI and DeepMind behind them.
Taking inspiration from organizations like the European Organization for Nuclear Research (also known as CERN), and the Large Hadron Collider, the goal of BigScience, then, is to create LLMs and large text datasets that will eventually be open-sourced to the broader AI community. The models will be trained on the Jean Zay supercomputer located near Paris, France, which ranks among the most powerful machines in the world.
“From Data to Knowledge”. How the Organization of Data Using LC:NC Can Drastically Reduce the Technical Complexity of Deriving Knowledge From Data._
While the implications for the enterprise might not be immediately clear, efforts like BigScience promise to make LLMs more accessible — and transparent — in the future. With the exception of several models created by EleutherAI, an open AI research group, few trained LLMs exist for research or deployment into production. OpenAI has declined to open source its most powerful model, GPT-3, in favor of exclusively licensing the source code to Microsoft. Meanwhile, companies like Nvidia have released the code for capable LLMs, but left the training of those LLMs to users with sufficiently powerful hardware. ... '
No comments:
Post a Comment