Fine-tuning GPT Neo 2.7B and 1.3B

2,629,416 Views

AI Lover

Published on 12/19/22 / In How-to & Learning

Fine-tuning larger models can be tricky on consumer hardware. In this video I go over why its better to use large models for fine-tuning vs smaller models, I go over the issue with the naive approach to fine-tuning, and finally, I go over how to use DeepSpeed to successfully fine-tune even the largest GPT Neo model.

Notebook Git repo: https://github.com/mallorbc/GP....T_Neo_fine-tuning_no
finetuning repo: https://github.com/Xirider/finetune-gpt2xl
DeepSpeed repo: https://github.com/microsoft/DeepSpeed
happy transformers: https://happytransformer.com/
GPT article with images: https://towardsdatascience.com..../gpt-3-the-new-might

Timestamps

00:00 - Intro
00:36 - Background on fine-tuning
02:41 - Setting up Jupyter
05:17 - Incorrect naive fine-tuning method
10:02 - Correctly fine-tuning with DeepSpeed
15:35 - Fine-tuning the 2.7B model
16:35 - Fine-tuning the 1.3B model
17:55 - Looking at the README
20:08 - Outro and future work

Up next

Fine-tuning GPT Neo 2.7B and 1.3B

Up next

Please note that if you are under 18, you won't be able to access this site.

Up next

Fine-tuning GPT Neo 2.7B and 1.3B

Up next

Language