ml:reinforcement_learning
Table of Contents
Reinforcement Learning
Overviews
- Blogs and Tutorials
- OpenAI Intro to RL Good intro to RL, with emphasis on deep learning methods
- Books and Chapters
- Chapter 18 - Reinforcement Learning (UCSC only) from Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 2nd Ed. Good, concise introduction.
- Lectures and Slides
- Overview papers
Papers
- REINFORCE:
- Ronald J. Williams. A class of gradient-estimating algorithms for reinforcement learning in neural networks.
- Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning.
- See slide 10 here
- PPO: Schulman et al 2017 - Proximal Policy Optimization Algorithms Used in RLHF (Ziegler 2019) and InstructGPT (Ouyang 2022)
- Gao et al 2024 - Training Language Models to Self-Correct via Reinforcement Learning Applied to math and code
- Applied to games
- Berner et al 2019 - Dota 2 with Large Scale Deep Reinforcement Learning blog Uses a policy-gradient method called Proximal Policy Optimization (PPO)
- Schrittwieser et al 2020 - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model MuZero. Learns the reward, action-policy, and value-function. Without knowledge of the rules, MuZero matched the superhuman performance of the AlphaZero.
- Applied to Reasoning Chains
NLP RL Papers
(Some of the papers above should be moved to this section)
- Applied to Text Games
Reinforcement Learning with Verifiable Rewards
DeepSeek-R1-Zero-style reinforcement learning is sometimes called “reinforcement learning (RL) on verifiable rewards” (see for example Zhou 2025) or “RL with outcome supervision.”
See also Large Reasoning Models
Datasets
Theory
- Foster et al 2021 - The Statistical Complexity of Interactive Decision Making Introduces Decision-Estimation Coefficient (DEC), analogous to VC dimension but for interactive decision making. Proves upper and lower bounds with a realizability assumption.
Inverse Reinforcement Learning (IRL)
In inverse reinforcement learning (IRL), the agent learns the reward by watching example actions from optimal policies.
- Overviews
- Blog post: Inverse Reinforcement Learning (nice diagrams)
- Non-NLP papers
Resources
Refer to this page for an up-to-date list of resources.
- General
- awesome-rl by dbobrenko is a repository of RL related resources grouped by RL sub-domains.
- awesome-rl by aikorea is another repository of RL related resources grouped by resource type.
- Books
- Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto is the most classic reinforcement learning textbook.
- Papers
- Key Papers in Deep RL by OpenAI is a list of must-read papers of classic RL algorithms selected by OpenAI researchers.
- Deep Reinforcement Learning by Yuxi Li is a comprehensive and up-to-date RL survey paper. It can also serve as a tutorial for people who want to have a general understanding of the field.
- Courses
- CS285 Deep Reinforcement Learning at UC Berkeley by Professor Sergey Levine is the latest deep RL course. It covers more recent topics and delves deeper into each of them, so it might be difficult for people who are new to RL. [Course website] [Playlist]
- Introduction to Reinforcement Learning with David Silver by David Silver is an introductory RL course, which can be served as a course for beginners in RL. [Course website] [Playlist]
- Blogs
- A (Long) Peek into Reinforcement Learning by Lilian Weng is a good blog post for beginners in RL. For most of the algorithms, it can give you a high-level intuition to help you with further systematic study.
- Tutorials
- pytorch-rl by bentrevett is a practical introduction to RL using PyTorch.
- OpenAI Spinning Up by OpenAI might be the best educational resource to start with in deep RL. It covers key concepts in RL, kinds of RL algorithms, and a tutorial to the policy gradient algorithm. It also provides a resource list and algorithm documentations.
- Frameworks
- OpenAI Gym by OpenAI is a toolkit for benchmarking RL algorithms.
- Miscellaneous
- Professors Working in Reinforcement Learning by Rupali Bhati is a list of professors who work in RL.
People
Related Pages
ml/reinforcement_learning.txt · Last modified: 2025/07/14 05:40 by jmflanig