Up next


Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models

0 Views
Generative AI
3
Published on 12/14/25 / In How-to & Learning

Reinforcement Learning with Human Feedback (RLHF) is a method used for training Large Language Models (LLMs). In the heart of RLHF lies a very powerful reinforcement learning method called Proximal Policy Optimization. Learn about it in this simple video!

This is the first one in a series of 3 videos dedicated to the reinforcement learning methods used for training LLMs.

Full Playlist: https://www.youtube.com/playli....st?list=PLs8w1Cdi-zv

Video 0 (Optional): Introduction to deep reinforcement learning https://www.youtube.com/watch?v=SgC6AZss478
Video 1: Proximal Policy Optimization https://www.youtube.com/watch?v=TjHH_--7l8g
Video 2 (This one): Reinforcement Learning with Human Feedback
Video 3 (Coming soon!): Deterministic Policy Optimization

00:00 Introduction
00:48 Intro to Reinforcement Learning (RL)
02:47 Intro to Proximal Policy Optimization (PPO)
4:17 Intro to Large Language Models (LLMs)
6:50 Reinforcement Learning with Human Feedback (RLHF)
13:08 Interpretation of the Neural Networks
14:36 Conclusion

Get the Grokking Machine Learning book!
https://manning.com/books/grok....king-machine-learnin
Discount code (40%): serranoyt
(Use the discount code on checkout)

Show more
0 Comments sort Sort By

Up next