Lecture 12.3 Famous transformers (BERT, GPT-2, GPT-3)
0
0
2,314,014 Views
Published on 12/19/22 / In
How-to & Learning
ERRATA:
In the "original transformer" (slide 51), in the source attention, the key and value come from the encoder, and the query comes from the decoder.
In this lecture we look at the details of some famous transformer models. How were they trained, and what could they do after they were trained.
slides: https://dlvu.github.io/slides/dlvu.lecture12.pdf
course website: https://dlvu.github.io
Lecturer: Peter Bloem
Show more
0 Comments
sort Sort By