/* ---- Google Analytics Code Below */

Saturday, July 25, 2020

GPT-3 and Crypto Assets

Just recently been talking digital Assets and how advances in crypto might influence them.  Here a piece from Coindesk that talks some of the issues, considering it.   I am thinking this, let me know if you have comments. Full opinion piece at the link.

Crypto Needn’t Fear GPT-3. It Should Embrace It
Jul 22, 2020 at 17:47 UTC   By Jesus Rodriquez  in Coindesk

Jesus Rodriguez is the CEO of IntoTheBlock, a market intelligence platform for crypto assets. He has held leadership roles at major technology companies and hedge funds. He is an active investor, speaker, author and guest lecturer at Columbia University. 

During the last few days, there has been an explosion of commentary in the crypto community about OpenAI’s new GPT-3 language generator model. Some of the comments express useful curiosity about GPT-3, while others are a bit to the extreme, asserting that the crypto community should be terrified about it. 

The interest is somewhat surprising because the GPT models are not exactly new and they have been making headlines in the machine learning community for over a year now. The research behind the first GPT model was published in June 2018, followed by GPT-2 in February 2019 and most recently GPT-3 two months ago. 

See also: What Is GPT-3 and Should We Be Terrified?

I think it is unlikely that GPT-3 by itself can have a major impact in the crypto ecosystem. However, the techniques behind GPT-3 represent the biggest advancement in deep learning in the last few years and, consequently, can become incredibly relevant to the analysis of crypto-assets. In this article, I would like to take a few minutes to dive into some of the concepts behind GPT-3 and contextualize it to the crypto world.  .... 




What is GPT-3?
GPT-3 is a massively large natural language understanding (NLU) model that uses an astonishing 175 billion parameters to master several language tasks. The size makes GPT-3 the largest NLU model in the world, surpassing Microsoft’s Turing-NLG and its predecessor GPT-2. 

GPT-3 is able to perform several language tasks such as machine translation, question answering, language analysis and, of course, text generation. GPT-3 has captured the attention of the media for its ability to generate fake text that is indistinguishable from real. 

How is this relevant for crypto? Imagine having the ability to regularly generate fake press releases that move the price of the smaller crypto assets? Sounds like a scary threat, but it is not the most important part of GPT-3. 

GPT-3 is a language-based model and, consequently, operates using textual datasets. From the crypto market standpoint, that capability is cool but certainly not that interesting. What we should really be paying attention to are the techniques behind GPT3. 

The magic behind GPT-3
GPT-3 is based on a new deep learning architecture known as transformers. The concept of transformers was originally outlined in the paper “Attention is all you need,” published in 2017 by members of the Google Brain team. 

The main innovation of the transformer architecture is the concept of “attention” (hence the title of the paper). Attention is typically used in a type of problem known as Seq2Seq, in which a model processes a sequence of items (words, letters, numbers) and outputs a different sequence. This type of problem is incredibly common in language intelligence scenarios such as text generation, machine translation, question answering and so on. 

Every time you see a Seq2Seq scenario, you should associate it with what’s called encoder-decoder architectures. Encoders capture the context of the input sequence and pass it to the decoder, which produces the output sequence. Attention mechanisms address the limitations of traditional neural network architectures by identifying the key aspects of the input that should be “paid attention to.” 

TRADITIONAL DEEP LEARNING ARCHITECTURES NEED CONSTANT FEEDBACK BETWEEN ENCODERS AND DECODERS, WHICH MAKES THEM HIGHLY INEFFICIENT. 

Think about a machine translation scenario from Spanish to English. Typically, the decoder will translate the Spanish text input into an intermediate representation known as the “imaginary language” that will be used by the decoder to translate it into English. More traditional deep learning architectures need constant feedback between encoders and decoders, which makes them highly inefficient. 

Conceptually, attention-mechanisms look at an input sequence and decide at each step what other parts of the sequence are important. For instance, in a machine translation scenario, the attention mechanism would highlight words the decoder “should pay attention to” to perform the translation. 

The transformer architecture that powered models like GPT-3 is a traditional encoder-decoder architecture that inserts attention blocks to improve efficiency. The role of that block is to look at the entire input and current outputs and infer dependencies that will help to optimize the production of the final output.

The transformer architecture has produced models that can be trained in massively large datasets and can be parallelized efficiently. Not surprisingly, after the original Google paper, there has been a race to build super large models that master different language tasks. Google’s BERT, Facebook’s RoBERTa, Microsoft’s Turing-NLG and OpenAI GPT-3 are newer examples of these models. 

GPT-2 astonished the world by operating using 1.5 billion parameters. That record was smashed by Microsoft’s Turing-NLG, which used 17 billion parameters, only for GPT-3 to use a ridiculous 175 billion parameters. All that happened in a year. Plain and simple: when it comes to transformers, bigger is better.  .... 

The first generation of transformer architectures has focused on language tasks. But, companies like Facebook and OpenAI have published recent research adapting transformer models to image classification. You might think that this is just an attempt to generate fake images. But the impact goes way beyond that.

Fake image generation is super important to streamline the training of image classification models in the absence of large labeled datasets. There have been attempts to adapt transformers to financial time series datasets, with the hope they can advance quantitative trading strategies.   ... "


No comments: