Pytorch Image Captioning Tutorial
In this tutorial we go through how an image captioning system works and implement one from scratch. Specifically we're looking at the caption dataset Flickr8k. There are multiple ways to improve the model: train a larger model (the one used is relatively small), train for longer and improve the model by adding attention similar to this paper: https://arxiv.org/abs/1502.03044.
Video of dataset (link in that video description to download the dataset yourself):
https://youtu.be/9sHcLvVXsns
✅ Support My Channel Through Patreon:
https://www.patreon.com/aladdinpersson
PyTorch Playlist:
https://www.youtube.com/playli....st?list=PLhhyoLH6Ijf
Github Repository:
https://github.com/aladdinpers....son/Machine-Learning
I stole the thumbnail image from Yunjeys Github on Image Captioning which I also used as a resource. The implementation in the video differs a bit, but it's definitely worth checking out:
https://github.com/yunjey/pytorch-tutorial
OUTLINE:
0:00 - Introduction
0:12 - Explanation of Image Captioning
05:15 - Overview of the code
06:07 - Implementation of CNN and RNN
20:03 - Setting up the training
30:36 - Fixing errors
32:18 - Small evaluation and ending