/* ---- Google Analytics Code Below */

Wednesday, April 12, 2023

AI Explainer: Foundation Models ​and the Next Era of AI

Using Bing GPT among  others now.

AI Explainer: Foundation models ​and the next era of AI   Published March 23, 2023

By Ahmed H. Awadallah , Senior Principal Research Manager, Microsoft

The release of OpenAI’s GPT-4 is a significant advance that builds on several years of rapid innovation in foundation models. GPT-4, which was trained on the Microsoft Azure AI supercomputer, has exhibited significantly improved abilities across many dimensions—from summarizing lengthy documents, to answering complex questions about a wide range of topics and explaining the reasoning behind those answers, to telling jokes and writing code and poetry.

Microsoft Senior Principal Research Manager Ahmed H. Awadallah was among a group of researchers across the company who have worked in partnership with OpenAI over several months to evaluate this new model’s capabilities. In this video, recapped below, he tells the story of the technical innovations in recent years that have brought us to this moment: the surprising progress of GPT-4’s predecessor models, leading up to the capabilities demonstrated in ChatGPT, and the integration of the latest models into Bing.

In this article

Everyday impact: Integrating foundation models and products [19:09-27:20]

While watching this video, you can hover to see video chapter titles and jump directly to those you’re interested in.

Over the last decade, AI has made significant progress on perception tasks like image recognition and language processing. More recently, the field is witnessing new advances in the form of generative AI, underpinned by a class of large-scale models known as foundation models. Foundation models are trained on massive amounts of data and are capable of performing a wide range of tasks. With a simple natural language prompt like “describe a scene of the sun rising over the beach,” generative AI models can output a detailed description or produce an image based on the generated description, which can then be animated or even turned into video. Many recent language models are not only good at generating text but also generating, explaining, and debugging code.

Three components have been driving these advances:

The transformer architecture: A popular choice across modalities, the transformer architecture is efficient, easy to scale and parallelize, and can model interdependence between different components in input and output data.

Scale: Growing model size and the use of increasingly large amounts of data have resulted in what is being termed as “emerging capabilities.” When models reach a critical size, they begin displaying capabilities not previously present.

In-context learning: Showing potential on a range of applications, from text classification to translation and summarization, this new training paradigm provides pre-trained models with instructions for new tasks or just a few examples instead of training or fine-tuning models on labeled data. Because no additional data or training is needed and prompts are provided in natural language, models can be applied right out of the box and aren’t limited to those with developer experience.

From GPT-3 to ChatGPT – a jump in generative capabilities [11:02-19:07]

With the November 2022 release of ChatGPT, a language model optimized for dialogue, we saw exciting developments in text generation. Compared with GPT-3, an earlier language model in the GPT family, ChatGPT not only provides longer, more thorough, and more structured responses to questions and instructions but can also produce answers in different styles, or tones, and tailor explanations to different audiences, like a child, a first-year college student, or someone with a PhD.

Earlier language models such as GPT-3 were trained to predict the next word in a sentence using large amounts of text from the web with no direct human supervision. Several additional training approaches have helped fuel the improved performance of later models such as ChatGPT. These models are being trained on code in addition to text, which seems to be providing another opportunity to identify the relationship between different parts of speech. This is resulting in models that are better at following instructions and reasoning than models trained on text alone. Human-generated data is also contributing to better outputs. Instruction tuning adds the step of training models on prompts and responses created by a human, while model-generated responses ranked by a human are being employed to train a reward model that can be used to train the main model with reinforcement learning.

The fast-paced advancements demonstrated by these models have challenged one of the traditional methods used to measure progress: benchmarks. Improvements are happening so fast that benchmarks are becoming obsolete, with many solved or saturated as quickly as they come out.

Everyday impact: Integrating foundation models and products [19:09-27:20]

Foundation models are already appearing in products available today. For example, GitHub Copilot leverages OpenAI Codex to assist in writing code. The AI pair programmer has been shown to not only make developers feel more productive but to support them in actually getting more done. A GitHub study found participants using Copilot were 55 percent more productive than participants without access to Copilot.

Combining language models optimized for dialogue with external knowledge sources and tools is another avenue for improved experiences. The new Bing, for instance, brings together these models and search. Years of research have yielded insight into the web search experience; much of it involves reviewing and synthesizing information across a variety of resources identified via multiple queries, which is time-consuming. The new Bing can do the heavy lifting for the searcher, working behind the scenes to make the necessary queries, collect results, synthesize the information, and present a single complete answer.

Large language models and foundation models more broadly are not without their limitations, however. There are issues such as reliability, accuracy, staleness, and provenance that need to be explored. Additionally, each specific application of one of these models comes with its own challenges and opportunities. For example, in applying foundation models to web search, we need to rethink the overall user experience, including how people interact with search and how we improve, measure, and personalize the experience over time.  ....  

No comments: