Fine-tuning GPT Neo 2.7B and 1.3B
Fine-tuning larger models can be tricky on consumer hardware. In this video I go over why its better to use large models for fine-tuning vs smaller models, I go over the issue with the naive approach to fine-tuning, and finally, I go over how to use DeepSpeed to successfully fine-tune even the largest GPT Neo model.
Notebook Git repo: https://github.com/mallorbc/GP....T_Neo_fine-tuning_no
finetuning repo: https://github.com/Xirider/finetune-gpt2xl
DeepSpeed repo: https://github.com/microsoft/DeepSpeed
happy transformers: https://happytransformer.com/
GPT article with images: https://towardsdatascience.com..../gpt-3-the-new-might
Timestamps
00:00 - Intro
00:36 - Background on fine-tuning
02:41 - Setting up Jupyter
05:17 - Incorrect naive fine-tuning method
10:02 - Correctly fine-tuning with DeepSpeed
15:35 - Fine-tuning the 2.7B model
16:35 - Fine-tuning the 1.3B model
17:55 - Looking at the README
20:08 - Outro and future work