Table of Contents

Reinforcement Learning

Overviews

Papers

NLP RL Papers

(Some of the papers above should be moved to this section)

Reinforcement Learning with Verifiable Rewards

DeepSeek-R1-Zero-style reinforcement learning is sometimes called “reinforcement learning (RL) on verifiable rewards” (see for example Zhou 2025) or “RL with outcome supervision.”

See also Large Reasoning Models

Datasets

Theory

Inverse Reinforcement Learning (IRL)

In inverse reinforcement learning (IRL), the agent learns the reward by watching example actions from optimal policies.

Resources

Refer to this page for an up-to-date list of resources.

People