Reinforcement Learning

Overviews

Blogs and Tutorials
- OpenAI Intro to RL Good intro to RL, with emphasis on deep learning methods
Books and Chapters
- Chapter 18 - Reinforcement Learning (UCSC only) from Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 2nd Ed. Good, concise introduction.
Lectures and Slides
- Lecture 14: Reinforcement Learning
Overview papers
- Levine et al 2020 - Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Papers

REINFORCE:
- Ronald J. Williams. A class of gradient-estimating algorithms for reinforcement learning in neural networks.
- Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning.
- See slide 10 here
DAGGER: Ross et al 2010 - A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning
Lorberbom et al 2019 - Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
PPO: Schulman et al 2017 - Proximal Policy Optimization Algorithms Used in RLHF (Ziegler 2019) and InstructGPT (Ouyang 2022)
Gao et al 2024 - Training Language Models to Self-Correct via Reinforcement Learning Applied to math and code
Applied to games
- Berner et al 2019 - Dota 2 with Large Scale Deep Reinforcement Learning blog Uses a policy-gradient method called Proximal Policy Optimization (PPO)
- Schrittwieser et al 2020 - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model MuZero. Learns the reward, action-policy, and value-function. Without knowledge of the rules, MuZero matched the superhuman performance of the AlphaZero.
- Perolat et al 2022 - Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning blog
Applied to Reasoning Chains

NLP RL Papers

(Some of the papers above should be moved to this section)

Applied to Text Games
- Narasimhan et al 2015 - Language Understanding for Text-based Games using Deep Reinforcement Learning

Reinforcement Learning with Verifiable Rewards

DeepSeek-R1-Zero-style reinforcement learning is sometimes called “reinforcement learning (RL) on verifiable rewards” (see for example Zhou 2025) or “RL with outcome supervision.”

Datasets

NLE: Küttler et al 2020 - The NetHack Learning Environment

Theory

Kakade 2003 - On the Sample Complexity of Reinforcement Learning PhD thesis.
Foster et al 2021 - The Statistical Complexity of Interactive Decision Making Introduces Decision-Estimation Coefficient (DEC), analogous to VC dimension but for interactive decision making. Proves upper and lower bounds with a realizability assumption.

Inverse Reinforcement Learning (IRL)

In inverse reinforcement learning (IRL), the agent learns the reward by watching example actions from optimal policies.

Overviews
- Arora & Doshi 2018 - A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
- Blog post: Inverse Reinforcement Learning (nice diagrams)
Non-NLP papers
- Finn et al 2016 - Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Resources

Refer to this page for an up-to-date list of resources.

General
- awesome-rl by dbobrenko is a repository of RL related resources grouped by RL sub-domains.
- awesome-rl by aikorea is another repository of RL related resources grouped by resource type.
Books
- Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto is the most classic reinforcement learning textbook.
Papers
- Key Papers in Deep RL by OpenAI is a list of must-read papers of classic RL algorithms selected by OpenAI researchers.
- Deep Reinforcement Learning by Yuxi Li is a comprehensive and up-to-date RL survey paper. It can also serve as a tutorial for people who want to have a general understanding of the field.
Courses
- CS285 Deep Reinforcement Learning at UC Berkeley by Professor Sergey Levine is the latest deep RL course. It covers more recent topics and delves deeper into each of them, so it might be difficult for people who are new to RL. [Course website] [Playlist]
- Introduction to Reinforcement Learning with David Silver by David Silver is an introductory RL course, which can be served as a course for beginners in RL. [Course website] [Playlist]
Blogs
- A (Long) Peek into Reinforcement Learning by Lilian Weng is a good blog post for beginners in RL. For most of the algorithms, it can give you a high-level intuition to help you with further systematic study.
Tutorials
- pytorch-rl by bentrevett is a practical introduction to RL using PyTorch.
- OpenAI Spinning Up by OpenAI might be the best educational resource to start with in deep RL. It covers key concepts in RL, kinds of RL algorithms, and a tutorial to the policy gradient algorithm. It also provides a resource list and algorithm documentations.
Frameworks
- OpenAI Gym by OpenAI is a toolkit for benchmarking RL algorithms.
Miscellaneous
- Professors Working in Reinforcement Learning by Rupali Bhati is a list of professors who work in RL.

People

Timothy Lillicrap

NLP Wiki

Table of Contents