How to build custom Datasets for Text in Pytorch
In this video we go through a bit more in depth into custom datasets and implement more advanced functions for dealing with text. Specifically we're looking at a image captioning dataset (Flickr8k data set) with an image and a corresponding caption text that describes what's going on in the image. I think the general principles from this video can be utilized to any project you're working with when dealing with text data be it either translation, question answering, sentiment analysis etc. I also recommend taking a look at my Torchtext which can also be quite helpful and simplify the data loading process.
✅ Support My Channel Through Patreon:
https://www.patreon.com/aladdinpersson
Flickr8k Dataset used in the video:
https://www.kaggle.com/dataset..../e1cd22253a9b23b0737
PyTorch Playlist:
https://www.youtube.com/playli....st?list=PLhhyoLH6Ijf
Github repository:
https://github.com/aladdinpers....son/Machine-Learning
OUTLINE:
0:00 - Introduction
2:05 - Overview of what we're going to do
4:05 - Imports
5:20 - Setup of Pytorch Dataset for loading Flickr
11:50 - Setup of Vocabulary and Numericalization
22:19 - Creating Collate for Padding of Batch
25:20 - Function for getting data loader
29:15 - Running code & fixing couple of errors
33:09 - Ending
Generative AI
Data Analytics