How to build custom Datasets for Text in Pytorch

2,822,958 Views

Generative AI

Published on 12/15/22 / In How-to & Learning

In this video we go through a bit more in depth into custom datasets and implement more advanced functions for dealing with text. Specifically we're looking at a image captioning dataset (Flickr8k data set) with an image and a corresponding caption text that describes what's going on in the image. I think the general principles from this video can be utilized to any project you're working with when dealing with text data be it either translation, question answering, sentiment analysis etc. I also recommend taking a look at my Torchtext which can also be quite helpful and simplify the data loading process.

✅ Support My Channel Through Patreon:
https://www.patreon.com/aladdinpersson

Flickr8k Dataset used in the video:
https://www.kaggle.com/dataset..../e1cd22253a9b23b0737

PyTorch Playlist:
https://www.youtube.com/playli....st?list=PLhhyoLH6Ijf

Github repository:
https://github.com/aladdinpers....son/Machine-Learning

OUTLINE:
0:00 - Introduction
2:05 - Overview of what we're going to do
4:05 - Imports
5:20 - Setup of Pytorch Dataset for loading Flickr
11:50 - Setup of Vocabulary and Numericalization
22:19 - Creating Collate for Padding of Batch
25:20 - Function for getting data loader
29:15 - Running code & fixing couple of errors
33:09 - Ending

Up next

How to build custom Datasets for Text in Pytorch

Up next

Please note that if you are under 18, you won't be able to access this site.

Up next

How to build custom Datasets for Text in Pytorch

Up next

Language