Up next

Autoplay

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

Machine Learning | 0 Views

Lecture 12.3 Famous transformers (BERT, GPT-2, GPT-3)

0

0

2,314,014 Views

AI Lover

Published on 12/19/22 / In How-to & Learning

ERRATA:
In the "original transformer" (slide 51), in the source attention, the key and value come from the encoder, and the query comes from the decoder.

In this lecture we look at the details of some famous transformer models. How were they trained, and what could they do after they were trained.

slides: https://dlvu.github.io/slides/dlvu.lecture12.pdf
course website: https://dlvu.github.io
Lecturer: Peter Bloem

Show more

0 Comments

Up next

Autoplay

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

Machine Learning | 0 Views