User Tools

Site Tools


ml:reinforcement_learning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:reinforcement_learning [2025/06/17 16:45] – [Inverse Reinforcement Learning (IRL)] jmflanigml:reinforcement_learning [2025/07/14 05:40] (current) – [Reinforcement Learning with Verifiable Rewards] jmflanig
Line 28: Line 28:
     * [[https://aclanthology.org/2024.findings-emnlp.429.pdf|Wang et al 2024 - Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision]]     * [[https://aclanthology.org/2024.findings-emnlp.429.pdf|Wang et al 2024 - Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision]]
     * [[https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf|DeepSeek 2025 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]]     * [[https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf|DeepSeek 2025 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]]
 +
 +===== NLP RL Papers =====
 +(Some of the papers above should be moved to this section)
 +
 +  * **Applied to Text Games**
 +    * [[https://arxiv.org/pdf/1506.08941|Narasimhan et al 2015 - Language Understanding for Text-based Games using Deep Reinforcement Learning]]
 +
  
 ===== Reinforcement Learning with Verifiable Rewards ===== ===== Reinforcement Learning with Verifiable Rewards =====
 DeepSeek-R1-Zero-style reinforcement learning is sometimes called **"reinforcement learning (RL) on verifiable rewards"** (see for example [[https://arxiv.org/pdf/2505.21493|Zhou 2025]]) or **"RL with outcome supervision."** DeepSeek-R1-Zero-style reinforcement learning is sometimes called **"reinforcement learning (RL) on verifiable rewards"** (see for example [[https://arxiv.org/pdf/2505.21493|Zhou 2025]]) or **"RL with outcome supervision."**
 +
 +See also [[nlp:Large Reasoning Models]]
 +
     * [[https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf|DeepSeek 2025 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]]     * [[https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf|DeepSeek 2025 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]]
     * [[https://arxiv.org/pdf/2505.21493|Zhou et al 2025 - Reinforcing General Reasoning without Verifiers]]     * [[https://arxiv.org/pdf/2505.21493|Zhou et al 2025 - Reinforcing General Reasoning without Verifiers]]
 +
  
 ===== Datasets ===== ===== Datasets =====
ml/reinforcement_learning.1750178741.txt.gz · Last modified: 2025/06/17 16:45 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki