Up next


CS25 I Stanford Seminar - Transformers in Language: The development of GPT Models including GPT3

2,450,966 Views
AI Lover
3
Published on 12/19/22 / In How-to & Learning

While the Transformer architecture is used in a variety of applications across a number of domains, it first found success in natural language. Today, Transformers remain the de facto model in language - they achieve state-of-the-art results on most natural language benchmarks, and can generate text coherent enough to deceive human readers. In this talk, we will review recent progress in neural language modeling, discuss the link between generating text and solving downstream tasks, and explore how this led to the development of GPT models at OpenAI. Next, we’ll see how the same approach can be used to produce generative models and strong representations in other domains like images, text-to-image, and code. Finally, we will dive into the recently released code generating model, Codex, and examine this particularly interesting domain of study.


Mark Chen is a research scientist at OpenAI, where he manages the Algorithms Team. His research interests include generative modeling and representation learning, especially in the image and multimodal domains. Prior to OpenAI, Mark worked in high frequency trading and graduated from MIT. Mark is also a coach for the USA Computing Olympiad team.


A full list of guest lectures can be found here: https://www.youtube.com/playli....st?list=PLoROMvodv4r

0:00 Introduction
0:08 3-Gram Model (Shannon 1951)
0:27 Recurrent Neural Nets (Sutskever et al 2011)
1:12 Big LSTM (Jozefowicz et al 2016)
1:52 Transformer (Llu and Saleh et al 2018)
2:33 GPT-2: Big Transformer (Radford et al 2019)
3:38 GPT-3: Very Big Transformer (Brown et al 2019)
5:12 GPT-3: Can Humans Detect Generated News Articles?
9:09 Why Unsupervised Learning?
10:38 Is there a Big Trove of Unlabeled Data?
11:11 Why Use Autoregressive Generative Models for Unsupervised Learnin
13:00 Unsupervised Sentiment Neuron (Radford et al 2017)
14:11 Radford et al 2018)
15:21 Zero-Shot Reading Comprehension
16:44 GPT-2: Zero-Shot Translation
18:15 Language Model Metalearning
19:23 GPT-3: Few Shot Arithmetic
20:14 GPT-3: Few Shot Word Unscrambling
20:36 GPT-3: General Few Shot Learning
23:42 IGPT (Chen et al 2020): Can we apply GPT to images?
25:31 IGPT: Completions
26:24 IGPT: Feature Learning
32:20 Isn't Code Just Another Modality?
33:33 The HumanEval Dataset
36:00 The Pass @ K Metric
36:59 Codex: Training Details
38:03 An Easy Human Eval Problem (pass@1 -0.9)
38:36 A Medium HumanEval Problem (pass@1 -0.17)
39:00 A Hard HumanEval Problem (pass@1 -0.005)
41:26 Calibrating Sampling Temperature for Pass@k
42:19 The Unreasonable Effectiveness of Sampling
43:17 Can We Approximate Sampling Against an Oracle?
45:52 Main Figure
46:53 Limitations
47:38 Conclusion
48:19 Acknowledgements

#gpt3

Show more
0 Comments sort Sort By

Up next