/* ---- Google Analytics Code Below */

Sunday, March 19, 2023

OpenAI Launches GPT-4 (Announcement, Description and Further Details)

 (Link throughs will be updated) 

OpenAI launches GPT-4!

MLOps Newsletter <mlops@substack.com> Unsubscribe

2:02 PM (2 hours ago)  to me

Open in app or online

You’re on the free list for MLOps Newsletter. For the full experience, become a paying subscriber.

Upgrade to paid

OpenAI launches GPT-4!

OpenAI introduces Whisper and ChatGPT APIs for commercial use cases


OpenAI released GPT-4 and it is a very significant improvement over GPT-3 or ChatGPT.

It is significantly better than ChatGPT(GPT3.5) in a variety of tasks through GPT-4 research.

Its main capabilities are:

Academic success as you see above graph, through visual inputs, it can solve a variety of exams.

Steerability: through socratic method, you can guide/influence and teach the model in a direction and correct some of its answers afterwards.

Visual Inputs: it accepts and can process the visual inputs.

You can learn more about doing the following things about GPT-4:

Read paper, View system card

Try on ChatGPT Plus

Join API waitlist

Rewatch demo livestream

Contribute to OpenAI Evals

My experience going through some examples:

answers become much more concise(comparing to verbosity in ChatGPT)

it can actually construct the sentences clearer and it can produce better/more readable long paragraphs comparing to ChatGPT.

Its code debugging capability is much better than ChatGPT.

The runs might take longer than ChatGPT. This might improve in future, though.

OpenAI also introduced ChatGPT and Whisper APIs officially. Engineers can now integrate ChatGPT and Whisper models into their apps and products through our API.

Google published a post on Vid2Seq, their new framework for captioning videos. The Vid2Seq architecture augments a language model with special time tokens, allowing it to seamlessly predict event boundaries and textual descriptions in the same output sequence. In order to pre-train this unified model, we leverage unlabeled narrated videos by reformulating sentence boundaries of transcribed speech as pseudo-event boundaries, and using the transcribed speech sentences as pseudo-event captions. The Vid2Seq architecture includes a visual encoder and a text encoder, which encode the video frames and the transcribed speech input, respectively. The resulting encodings are then forwarded to a text decoder, which autoregressively predicts the output sequence of dense event captions together with their temporal localization in the video. The architecture is initialized with a powerful visual backbone and a strong language model.

Yi Tay wrote about release of a new Flan 20B parameter model with UL2. In “Scaling Instruction-Finetuned language models (Chung et al.)” (also referred to sometimes as the Flan2 paper), the key idea is to train a large language model on a collection of datasets. These datasets are phrased as instructions which enable generalization across diverse tasks. Flan has been primarily trained on academic tasks. In Flan2, we released a series of T5 models ranging from 200M to 11B parameters that have been instruction tuned with Flan. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups.


Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.

You can use Evals to create and run evaluations that:

use datasets to generate prompts,

measure the quality of completions provided by an OpenAI model, and

compare performance across different datasets and models.

To get started:

Read through this doc and follow the setup instructions below.

Learn how to run existing evals: run-evals.md.

Familiarize yourself with the existing eval templates: eval-templates.md.

Walk through the process for building an eval: build-eval.md

See an example of implementing custom eval logic: custom-eval.md.

minrev is a PyTorch reimplementation of Reversible Vision Transformer architecture that is prefers simplicity over tricks, hackability over tedious organization, and interpretability over generality.

It is meant to serve as an educational guide for newcomers that are not familiar with the reversible backpropagation algorithm and reversible vision transformer.

The entire Reversible Vision Transformer is implemented from scratch in under <300 lines of pytorch code, including the memory-efficient reversible backpropagation algorithm (<100 lines). Even the driver code is < 150 lines. The repo supports both memory-efficient training and testing on CIFAR-10.

Vid2Seq is a single-stage dense video captioning model, pre-trained on narrated videos introduced in "Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning". The model takes frames and transcribed speech from an untrimmed minutes-long video as input, and outputs dense event captions together with their temporal localization in the video by predicting a single sequence of tokens. Pre-training is done with a generative and a denoising objective exploiting transcribed speech as pseudo dense event captioning supervision, using millions of narrated videos from YT-Temporal-1B. It is based on Jax.

evals has datasets written by language models, used in the paper on "Discovering Language Model Behaviors with Model-Written Evaluations."

These datasets are useful for:

Those who are interested in understanding the quality and properties of model-generated data

Those who wish to use our datasets to evaluate other models for the behaviors we examined in our work (e.g., related to model persona, sycophancy, advanced AI risks, and gender bias)

The evaluations were generated to be asked to dialogue agents (e.g., a model finetuned explicitly respond to a user's utterances, or a pretrained language model prompted to behave like a dialogue agent). However, it is possible to adapt the data to test other kinds of models as well.

CondaQA is a dataset to facilitate the future development of models that can process negation.

We collect paragraphs with diverse negation cues, and have crowdworkers ask questions about the implications of the negated statement in the passage. We have workers make three kinds of edits to the passage, before providing answers for the original passage and its edits:

📌 paraphrasing the negated statement

📌 changing the scope of the negation

📌 reversing the negation

FlexGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.

iBall, a basketball video-watching system that leverages gaze-moderated embedded visualizations to facilitate game understanding and engagement of casual fans. Video broadcasting and online video platforms make watching basketball games increasingly accessible. Yet, for new or casual fans, watching basketball videos is often confusing due to their limited basketball knowledge and the lack of accessible, on-demand information to resolve their confusion.

streamlit-jupyter is a Python package to preview and develop streamlit apps in jupyter notebooks.

UL2 uses Mixture-of-Denoisers (MoD), apre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

On Twitter

Twitter avatar for @nearcyan

near    @nearcyan

"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence" - Noam Shazeer (second author of the transformer paper, now CEO of Character AI) from the SwiGLU paper: arxiv.org/abs/2002.05202…


2:22 AM ∙ Mar 14, 2023


You're currently a free subscriber to MLOps Newsletter. For the full experience, upgrade your subscription.

Upgrade to paid

 Read MLOps Newsletter in the app

Listen to posts, join subscriber chats, and never miss an update from Bugra Akyildiz.

Get the iOS appGet the Android app

© 2023

548 Market Street PMB 72296, San Francisco, CA 94104

No comments: