Posts, articles, and discussions
Community posts
![](https://huggingface.co/blog/assets/138_stackllama/thumbnail.png)
StackLLaMA: A hands-on guide to train LLaMA with RLHF
By April 5, 2023
![](https://huggingface.co/blog/assets/133_trl_peft/thumbnail.png)
Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
By March 9, 2023
![](https://huggingface.co/blog/assets/128_aivsai/thumbnail.png)
Introducing ⚔️ AI vs. AI ⚔️ a deep reinforcement learning multi-agents competition system
By February 7, 2023
![](https://huggingface.co/blog/assets/120_rlhf/thumbnail.png)
Illustrating Reinforcement Learning from Human Feedback (RLHF)
By December 9, 2022
![](https://huggingface.co/blog/assets/101_train-decision-transformers/thumbnail.gif)
Train your first Decision Transformer
By September 8, 2022
![](https://huggingface.co/blog/assets/93_deep_rl_ppo/thumbnail.png)
Proximal Policy Optimization (PPO)
By August 5, 2022
![](https://huggingface.co/blog/assets/89_deep_rl_a2c/thumbnail.gif)
Advantage Actor Critic (A2C)
By July 22, 2022
![](https://huggingface.co/blog/assets/85_policy_gradient/thumbnail.gif)
Policy Gradient with PyTorch
By June 30, 2022
![](https://huggingface.co/blog/assets/78_deep_rl_dqn/thumbnail.gif)
Deep Q-Learning with Atari
By June 7, 2022
![](https://huggingface.co/blog/assets/73_deep_rl_q_part2/thumbnail.gif)
An Introduction to Q-Learning Part 2
By May 20, 2022
![](https://huggingface.co/blog/assets/70_deep_rl_q_part1/thumbnail.gif)
An Introduction to Q-Learning Part 1
By May 18, 2022
![](https://huggingface.co/blog/assets/63_deep_rl_intro/thumbnail.png)
An Introduction to Deep Reinforcement Learning
By May 4, 2022
![](https://huggingface.co/blog/assets/58_decision-transformers/thumbnail.jpg)
Introducing Decision Transformers on Model Database 🤗
By March 28, 2022
![](https://huggingface.co/blog/assets/47_sb3/thumbnail.png)
Welcome Stable-baselines3 to the Model Database Hub 🤗
By January 21, 2022